[non-stop] 08/10 linux native support

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

* [non-stop] 08/10 linux native support
@ 2008-06-15 21:10 Pedro Alves
  2008-06-25 21:17 ` Daniel Jacobowitz
  0 siblings, 1 reply; 20+ messages in thread
From: Pedro Alves @ 2008-06-15 21:10 UTC (permalink / raw)
  To: gdb-patches

[-- Attachment #1: Type: text/plain, Size: 499 bytes --]

This adds the linux native target non-stop mode support:

- Not stop all threads when a thread stops.

- Be sure we're not reading registers and memory from
  running threads.

- Add threads to the thread table as soon as we detect
  them.

- Avoid using ptrace on running threads.

- Implement target_stop_ptid to interrupt only
  one thread

- Getting the last pending event of a thread is
  different in nonstop due to the fact that
  stop_signal is per-thread in non-stop mode.

-- 
Pedro Alves

[-- Attachment #2: 008-non_stop_linux.diff --]
[-- Type: text/x-diff, Size: 18240 bytes --]

2008-06-15  Pedro Alves  <pedro@codesourcery.com>

	* linux-fork.c (linux_fork_killall): Use SIGKILL instead of
	PTRACE_KILL.

	* linux-nat.c (find_lwp_pid): Make public.
	(get_pending_status): Implement non-stop mode.
	(sigint_clear_callback): New.
	(linux_nat_resume): In non-stop mode, always resume only a single
	PTID.  Clear the sigint flag.
	(linux_handle_extended_wait): On a clone event, add new lwp to
	GDB's thread table, and mark as running, executing and stopped
	appropriatelly.
	(linux_nat_filter_event): Don't assume there are other running
	threads when a thread exits.
	(linux_nat_wait): Mark the main thread as running and executing.
	In non-stop mode, don't stop all lwps.
	(kill_callback): If lwp is not stopped, use SIGKILL.
	(linux_nat_thread_alive): Use signal 0 to detect if a thread is
	alive.
	(send_sigint_callback): New.
	(linux_nat_stop_ptid): New.
	(linux_nat_add_target): Set to_stop_ptid to linux_nat_stop_ptid.

	* linux-nat.h (struct lwp_info): Add sigint field.
	(find_lwp_pid): Declare.

	* linux-thread-db.c (thread_from_lwp, enable_thread_event)
	(check_event): Set proc_handle.pid to the stopped lwp.
	(thread_db_find_new_threads): If current lwp is executing, don't
	try to read from it.

---
 gdb/linux-fork.c      |    4 
 gdb/linux-nat.c       |  256 ++++++++++++++++++++++++++++++++++++++++----------
 gdb/linux-nat.h       |    6 +
 gdb/linux-thread-db.c |   15 ++
 4 files changed, 233 insertions(+), 48 deletions(-)

Index: src/gdb/linux-fork.c
===================================================================
--- src.orig/gdb/linux-fork.c	2008-06-15 20:25:42.000000000 +0100
+++ src/gdb/linux-fork.c	2008-06-15 20:56:56.000000000 +0100
@@ -337,7 +337,9 @@ linux_fork_killall (void)
     {
       pid = PIDGET (fp->ptid);
       do {
-	ptrace (PT_KILL, pid, 0, 0);
+	/* Use SIGKILL instead of PTRACE_KILL because the former works even
+	   if the thread is running, while the later doesn't.  */
+	kill (pid, SIGKILL);
 	ret = waitpid (pid, &status, 0);
 	/* We might get a SIGCHLD instead of an exit status.  This is
 	 aggravated by the first kill above - a child has just
Index: src/gdb/linux-nat.c
===================================================================
--- src.orig/gdb/linux-nat.c	2008-06-15 20:25:42.000000000 +0100
+++ src/gdb/linux-nat.c	2008-06-15 20:57:27.000000000 +0100
@@ -212,6 +212,10 @@ static void linux_nat_async (void (*call
 static int linux_nat_async_mask (int mask);
 static int kill_lwp (int lwpid, int signo);
 
+static int send_sigint_callback (struct lwp_info *lp, void *data);
+
+static int stop_callback (struct lwp_info *lp, void *data);
+
 /* Captures the result of a successful waitpid call, along with the
    options used in that call.  */
 struct waitpid_result
@@ -415,6 +419,8 @@ linux_test_for_tracefork (int original_p
   int child_pid, ret, status;
   long second_pid;
 
+  int events_enabled = linux_nat_async_events (0);
+
   linux_supports_tracefork_flag = 0;
   linux_supports_tracevforkdone_flag = 0;
 
@@ -454,6 +460,7 @@ linux_test_for_tracefork (int original_p
 	warning (_("linux_test_for_tracefork: unexpected wait status 0x%x from "
 		 "killed child"), status);
 
+      linux_nat_async_events (events_enabled);
       return;
     }
 
@@ -493,6 +500,8 @@ linux_test_for_tracefork (int original_p
   if (ret != 0)
     warning (_("linux_test_for_tracefork: failed to kill child"));
   my_waitpid (child_pid, &status, 0);
+
+  linux_nat_async_events (events_enabled);
 }
 
 /* Return non-zero iff we have tracefork functionality available.
@@ -920,7 +929,7 @@ delete_lwp (ptid_t ptid)
 /* Return a pointer to the structure describing the LWP corresponding
    to PID.  If no corresponding LWP could be found, return NULL.  */
 
-static struct lwp_info *
+struct lwp_info *
 find_lwp_pid (ptid_t ptid)
 {
   struct lwp_info *lp;
@@ -1306,16 +1315,76 @@ get_pending_status (struct lwp_info *lp,
      events are always cached in waitpid_queue.  */
 
   *status = 0;
-  if (GET_LWP (lp->ptid) == GET_LWP (last_ptid))
+
+  if (non_stop)
     {
-      if (stop_signal != TARGET_SIGNAL_0
-	  && signal_pass_state (stop_signal))
-	*status = W_STOPCODE (target_signal_to_host (stop_signal));
+      enum target_signal signo = TARGET_SIGNAL_0;
+
+      if (is_executing (lp->ptid))
+	{
+	  /* If the core thought this lwp was executing, we can only
+	     have pending events in the local queue.  */
+	  if (queued_waitpid (GET_LWP (lp->ptid), status, __WALL) != -1)
+	    {
+	      if (WIFSTOPPED (status))
+		signo = target_signal_from_host (WSTOPSIG (status));
+
+	      /* If not stopped, then the lwp is gone, no use in
+		 resending a signal.  */
+	    }
+	}
+      else
+	{
+	  /* If the core knows the thread is not executing, then we
+	     have then last signal recorded in
+	     thread_info->stop_signal, unless this is inferior_ptid,
+	     in which case, it's in the global stop_signal, due to
+	     context switching.  */
+
+	  if (ptid_equal (lp->ptid, inferior_ptid))
+	    signo = stop_signal;
+	  else
+	    {
+	      struct thread_info *tp = find_thread_pid (lp->ptid);
+	      gdb_assert (tp);
+	      signo = tp->stop_signal;
+	    }
+	}
+
+      if (signo != TARGET_SIGNAL_0
+	  && !signal_pass_state (signo))
+	{
+	  if (debug_linux_nat)
+	    fprintf_unfiltered (gdb_stdlog, "\
+GPT: lwp %s had signal %s, but it is in no pass state\n",
+				target_pid_to_str (lp->ptid),
+				target_signal_to_string (signo));
+	}
+      else
+	{
+	  if (signo != TARGET_SIGNAL_0)
+	    *status = W_STOPCODE (target_signal_to_host (signo));
+
+	  if (debug_linux_nat)
+	    fprintf_unfiltered (gdb_stdlog,
+				"GPT: lwp %s as pending signal %s\n",
+				target_pid_to_str (lp->ptid),
+				target_signal_to_string (signo));
+	}
     }
-  else if (target_can_async_p ())
-    queued_waitpid (GET_LWP (lp->ptid), status, __WALL);
   else
-    *status = lp->status;
+    {
+      if (GET_LWP (lp->ptid) == GET_LWP (last_ptid))
+	{
+	  if (stop_signal != TARGET_SIGNAL_0
+	      && signal_pass_state (stop_signal))
+	    *status = W_STOPCODE (target_signal_to_host (stop_signal));
+	}
+      else if (target_can_async_p ())
+	queued_waitpid (GET_LWP (lp->ptid), status, __WALL);
+      else
+	*status = lp->status;
+    }
 
   return 0;
 }
@@ -1379,6 +1448,13 @@ linux_nat_detach (char *args, int from_t
   if (target_can_async_p ())
     linux_nat_async (NULL, 0);
 
+  /* Stop all threads before detaching.  ptrace requires that the
+     thread is stopped to sucessfully detach.  */
+  iterate_over_lwps (stop_callback, NULL);
+  /* ... and wait until all of them have reported back that
+     they're no longer running.  */
+  iterate_over_lwps (stop_wait_callback, NULL);
+
   iterate_over_lwps (detach_callback, NULL);
 
   /* Only the initial process should be left right now.  */
@@ -1445,6 +1521,13 @@ resume_set_callback (struct lwp_info *lp
   return 0;
 }
 
+static int
+sigint_clear_callback (struct lwp_info *lp, void *data)
+{
+  lp->sigint = 0;
+  return 0;
+}
+
 static void
 linux_nat_resume (ptid_t ptid, int step, enum target_signal signo)
 {
@@ -1468,10 +1551,17 @@ linux_nat_resume (ptid_t ptid, int step,
   /* A specific PTID means `step only this process id'.  */
   resume_all = (PIDGET (ptid) == -1);
 
-  if (resume_all)
-    iterate_over_lwps (resume_set_callback, NULL);
-  else
-    iterate_over_lwps (resume_clear_callback, NULL);
+  if (non_stop && resume_all)
+    internal_error (__FILE__, __LINE__,
+		    "can't resume all in non-stop mode");
+
+  if (!non_stop)
+    {
+      if (resume_all)
+	iterate_over_lwps (resume_set_callback, NULL);
+      else
+	iterate_over_lwps (resume_clear_callback, NULL);
+    }
 
   /* If PID is -1, it's the current inferior that should be
      handled specially.  */
@@ -1481,6 +1571,7 @@ linux_nat_resume (ptid_t ptid, int step,
   lp = find_lwp_pid (ptid);
   gdb_assert (lp != NULL);
 
+  /* Convert to something the lower layer understands.  */
   ptid = pid_to_ptid (GET_LWP (lp->ptid));
 
   /* Remember if we're stepping.  */
@@ -1489,6 +1580,9 @@ linux_nat_resume (ptid_t ptid, int step,
   /* Mark this LWP as resumed.  */
   lp->resumed = 1;
 
+  /* Remove the SIGINT mark.  Used in non-stop mode.  */
+  lp->sigint = 0;
+
   /* If we have a pending wait status for this thread, there is no
      point in resuming the process.  But first make sure that
      linux_nat_wait won't preemptively handle the event - we
@@ -1631,6 +1725,8 @@ linux_handle_extended_wait (struct lwp_i
 	ourstatus->kind = TARGET_WAITKIND_VFORKED;
       else
 	{
+	  struct cleanup *old_chain;
+
 	  ourstatus->kind = TARGET_WAITKIND_IGNORE;
 	  new_lp = add_lwp (BUILD_LWP (new_pid, GET_PID (inferior_ptid)));
 	  new_lp->cloned = 1;
@@ -1650,20 +1746,43 @@ linux_handle_extended_wait (struct lwp_i
 	  else
 	    status = 0;
 
+	  /* Make thread_db aware of this thread.  We do this this
+	     early, so in non-stop mode, threads show up as they're
+	     created, instead of on next stop.  thread_db needs a
+	     stopped inferior_ptid --- since we know LP is stopped,
+	     use it this time.  */
+	  old_chain = save_inferior_ptid ();
+	  inferior_ptid = lp->ptid;
+	  lp->stopped = 1;
+	  target_find_new_threads ();
+	  do_cleanups (old_chain);
+	  if (!in_thread_list (new_lp->ptid))
+	    {
+	      /* We're not using thread_db.  Attach and add it to
+		 GDB's list.  */
+	      lin_lwp_attach_lwp (new_lp->ptid);
+	      target_post_attach (GET_LWP (new_lp->ptid));
+	      add_thread (new_lp->ptid);
+	    }
+
 	  if (stopping)
 	    new_lp->stopped = 1;
 	  else
 	    {
+ 	      new_lp->stopped = 0;
 	      new_lp->resumed = 1;
 	      ptrace (PTRACE_CONT,
 		      PIDGET (lp->waitstatus.value.related_pid), 0,
 		      status ? WSTOPSIG (status) : 0);
+	      set_running (new_lp->ptid, 1);
+	      set_executing (new_lp->ptid, 1);
 	    }
 
 	  if (debug_linux_nat)
 	    fprintf_unfiltered (gdb_stdlog,
 				"LHEW: Got clone event from LWP %ld, resuming\n",
 				GET_LWP (lp->ptid));
+	  lp->stopped = 0;
 	  ptrace (PTRACE_CONT, GET_LWP (lp->ptid), 0, 0);
 
 	  return 1;
@@ -2383,13 +2502,7 @@ linux_nat_filter_event (int lwpid, int s
 	 not the end of the debugged application and should be
 	 ignored.  */
       if (num_lwps > 0)
-	{
-	  /* Make sure there is at least one thread running.  */
-	  gdb_assert (iterate_over_lwps (running_callback, NULL));
-
-	  /* Discard the event.  */
-	  return NULL;
-	}
+	return NULL;
     }
 
   /* Check if the current LWP has previously exited.  In the nptl
@@ -2519,6 +2632,8 @@ linux_nat_wait (ptid_t ptid, struct targ
       lp->resumed = 1;
       /* Add the main thread to GDB's thread list.  */
       add_thread_silent (lp->ptid);
+      set_running (lp->ptid, 1);
+      set_executing (lp->ptid, 1);
     }
 
   sigemptyset (&flush_mask);
@@ -2747,19 +2862,23 @@ retry:
     fprintf_unfiltered (gdb_stdlog, "LLW: Candidate event %s in %s.\n",
 			status_to_str (status), target_pid_to_str (lp->ptid));
 
-  /* Now stop all other LWP's ...  */
-  iterate_over_lwps (stop_callback, NULL);
+  if (!non_stop)
+    {
+      /* Now stop all other LWP's ...  */
+      iterate_over_lwps (stop_callback, NULL);
 
-  /* ... and wait until all of them have reported back that they're no
-     longer running.  */
-  iterate_over_lwps (stop_wait_callback, &flush_mask);
-  iterate_over_lwps (flush_callback, &flush_mask);
-
-  /* If we're not waiting for a specific LWP, choose an event LWP from
-     among those that have had events.  Giving equal priority to all
-     LWPs that have had events helps prevent starvation.  */
-  if (pid == -1)
-    select_event_lwp (&lp, &status);
+      /* ... and wait until all of them have reported back that
+	 they're no longer running.  */
+      iterate_over_lwps (stop_wait_callback, &flush_mask);
+      iterate_over_lwps (flush_callback, &flush_mask);
+
+      /* If we're not waiting for a specific LWP, choose an event LWP
+	 from among those that have had events.  Giving equal priority
+	 to all LWPs that have had events helps prevent
+	 starvation.  */
+      if (pid == -1)
+	select_event_lwp (&lp, &status);
+    }
 
   /* Now that we've selected our final event LWP, cancel any
      breakpoints in other LWPs that have hit a GDB breakpoint.  See
@@ -2796,13 +2915,26 @@ static int
 kill_callback (struct lwp_info *lp, void *data)
 {
   errno = 0;
-  ptrace (PTRACE_KILL, GET_LWP (lp->ptid), 0, 0);
-  if (debug_linux_nat)
-    fprintf_unfiltered (gdb_stdlog,
-			"KC:  PTRACE_KILL %s, 0, 0 (%s)\n",
-			target_pid_to_str (lp->ptid),
-			errno ? safe_strerror (errno) : "OK");
 
+  /* PTRACE_KILL doesn't work when the thread is running.  */
+  if (!lp->stopped)
+    {
+      kill_lwp (GET_LWP (lp->ptid), SIGKILL);
+      if (debug_linux_nat)
+	fprintf_unfiltered (gdb_stdlog,
+			    "KC:  kill_lwp (SIGKILL) %s (%s)\n",
+			    target_pid_to_str (lp->ptid),
+			    errno ? safe_strerror (errno) : "OK");
+    }
+  else
+    {
+      ptrace (PTRACE_KILL, GET_LWP (lp->ptid), 0, 0);
+      if (debug_linux_nat)
+	fprintf_unfiltered (gdb_stdlog,
+			    "KC:  PTRACE_KILL %s, 0, 0 (%s)\n",
+			    target_pid_to_str (lp->ptid),
+			    errno ? safe_strerror (errno) : "OK");
+    }
   return 0;
 }
 
@@ -2943,22 +3075,22 @@ linux_nat_xfer_partial (struct target_op
 static int
 linux_nat_thread_alive (ptid_t ptid)
 {
+  int err;
+
   gdb_assert (is_lwp (ptid));
 
-  errno = 0;
-  ptrace (PTRACE_PEEKUSER, GET_LWP (ptid), 0, 0);
+  /* Send signal 0 instead of anything ptrace, because ptracing a
+     running thread errors out claiming that the thread doesn't
+     exist.  */
+  err = kill_lwp (GET_LWP (ptid), 0);
+
   if (debug_linux_nat)
     fprintf_unfiltered (gdb_stdlog,
-			"LLTA: PTRACE_PEEKUSER %s, 0, 0 (%s)\n",
+			"LLTA: KILL(SIG0) %s (%s)\n",
 			target_pid_to_str (ptid),
-			errno ? safe_strerror (errno) : "OK");
+			err ? safe_strerror (err) : "OK");
 
-  /* Not every Linux kernel implements PTRACE_PEEKUSER.  But we can
-     handle that case gracefully since ptrace will first do a lookup
-     for the process based upon the passed-in pid.  If that fails we
-     will get either -ESRCH or -EPERM, otherwise the child exists and
-     is alive.  */
-  if (errno == ESRCH || errno == EPERM)
+  if (err != 0)
     return 0;
 
   return 1;
@@ -4140,6 +4272,33 @@ linux_nat_set_async_mode (int on)
   linux_nat_async_enabled = on;
 }
 
+static int
+send_sigint_callback (struct lwp_info *lp, void *data)
+{
+  /* Use is_stopped instead of lp->stopped, because the lwp may be
+     stopped due to an internal event, and we want to interrupt it in
+     that case too.  What we want is to check if the thread is stopped
+     from the point of view of the user.  */
+  if (!is_stopped (lp->ptid) && !lp->sigint)
+    {
+      kill_lwp (GET_LWP (lp->ptid), SIGINT);
+      lp->sigint = 1;
+    }
+  return 0;
+}
+
+static void
+linux_nat_stop_ptid (ptid_t ptid)
+{
+  if (ptid_equal (ptid, minus_one_ptid))
+    iterate_over_lwps (send_sigint_callback, &ptid);
+  else
+    {
+      struct lwp_info *lp = find_lwp_pid (ptid);
+      send_sigint_callback (lp, NULL);
+    }
+}
+
 void
 linux_nat_add_target (struct target_ops *t)
 {
@@ -4170,6 +4329,9 @@ linux_nat_add_target (struct target_ops 
   t->to_terminal_inferior = linux_nat_terminal_inferior;
   t->to_terminal_ours = linux_nat_terminal_ours;
 
+  /* Methods for non-stop support.  */
+  t->to_stop_ptid = linux_nat_stop_ptid;
+
   /* We don't change the stratum; this target will sit at
      process_stratum and thread_db will set at thread_stratum.  This
      is a little strange, since this is a multi-threaded-capable
Index: src/gdb/linux-nat.h
===================================================================
--- src.orig/gdb/linux-nat.h	2008-06-15 20:25:42.000000000 +0100
+++ src/gdb/linux-nat.h	2008-06-15 20:27:15.000000000 +0100
@@ -37,6 +37,10 @@ struct lwp_info
      SIGCHLD.  */
   int cloned;
 
+  /* Non-zero if we sent this LWP a SIGINT (but the LWP didn't report
+     it back yet).  */
+  int sigint;
+
   /* Non-zero if we sent this LWP a SIGSTOP (but the LWP didn't report
      it back yet).  */
   int signalled;
@@ -88,6 +92,8 @@ extern struct lwp_info *lwp_list;
 #define is_lwp(ptid)		(GET_LWP (ptid) != 0)
 #define BUILD_LWP(lwp, pid)	ptid_build (pid, lwp, 0)
 
+struct lwp_info *find_lwp_pid (ptid_t ptid);
+
 /* Attempt to initialize libthread_db.  */
 void check_for_thread_db (void);
 
Index: src/gdb/linux-thread-db.c
===================================================================
--- src.orig/gdb/linux-thread-db.c	2008-06-15 20:25:41.000000000 +0100
+++ src/gdb/linux-thread-db.c	2008-06-15 20:27:15.000000000 +0100
@@ -308,6 +308,8 @@ thread_from_lwp (ptid_t ptid)
      LWP.  */
   gdb_assert (GET_LWP (ptid) != 0);
 
+  /* Access an lwp we know is stopped.  */
+  proc_handle.pid = GET_LWP (ptid);
   err = td_ta_map_lwp2thr_p (thread_agent, GET_LWP (ptid), &th);
   if (err != TD_OK)
     error (_("Cannot find user-level thread for LWP %ld: %s"),
@@ -418,6 +420,9 @@ enable_thread_event (td_thragent_t *thre
   td_notify_t notify;
   td_err_e err;
 
+  /* Access an lwp we know is stopped.  */
+  proc_handle.pid = GET_LWP (inferior_ptid);
+
   /* Get the breakpoint address for thread EVENT.  */
   err = td_ta_event_addr_p (thread_agent, event, &notify);
   if (err != TD_OK)
@@ -761,6 +766,9 @@ check_event (ptid_t ptid)
   if (stop_pc != td_create_bp_addr && stop_pc != td_death_bp_addr)
     return;
 
+  /* Access an lwp we know is stopped.  */
+  proc_handle.pid = GET_LWP (ptid);
+
   /* If we are at a create breakpoint, we do not know what new lwp
      was created and cannot specifically locate the event message for it.
      We have to call td_ta_event_getmsg() to get
@@ -955,7 +963,14 @@ static void
 thread_db_find_new_threads (void)
 {
   td_err_e err;
+  struct lwp_info *lp = find_lwp_pid (inferior_ptid);
+
+  if (!lp || !lp->stopped)
+    /* In linux, we can only read memory through a stopped lwp.  */
+    return;
 
+  /* Access an lwp we know is stopped.  */
+  proc_handle.pid = GET_LWP (inferior_ptid);
   /* Iterate over all user-space threads to discover new threads.  */
   err = td_ta_thr_iter_p (thread_agent, find_new_threads_callback, NULL,
 			  TD_THR_ANY_STATE, TD_THR_LOWEST_PRIORITY,

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [non-stop] 08/10 linux native support
  2008-06-15 21:10 [non-stop] 08/10 linux native support Pedro Alves
@ 2008-06-25 21:17 ` Daniel Jacobowitz
  2008-06-25 22:03   ` Pedro Alves
  0 siblings, 1 reply; 20+ messages in thread
From: Daniel Jacobowitz @ 2008-06-25 21:17 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

On Sun, Jun 15, 2008 at 10:05:49PM +0100, Pedro Alves wrote:
> @@ -920,7 +929,7 @@ delete_lwp (ptid_t ptid)
>  /* Return a pointer to the structure describing the LWP corresponding
>     to PID.  If no corresponding LWP could be found, return NULL.  */
>  
> -static struct lwp_info *
> +struct lwp_info *
>  find_lwp_pid (ptid_t ptid)
>  {
>    struct lwp_info *lp;

If you need this function global, please rename it first.

> @@ -1306,16 +1315,76 @@ get_pending_status (struct lwp_info *lp,
>       events are always cached in waitpid_queue.  */
>  
>    *status = 0;
> -  if (GET_LWP (lp->ptid) == GET_LWP (last_ptid))
> +
> +  if (non_stop)
>      {
> -      if (stop_signal != TARGET_SIGNAL_0
> -	  && signal_pass_state (stop_signal))
> -	*status = W_STOPCODE (target_signal_to_host (stop_signal));
> +      enum target_signal signo = TARGET_SIGNAL_0;
> +
> +      if (is_executing (lp->ptid))
> +	{
> +	  /* If the core thought this lwp was executing, we can only
> +	     have pending events in the local queue.  */
> +	  if (queued_waitpid (GET_LWP (lp->ptid), status, __WALL) != -1)
> +	    {
> +	      if (WIFSTOPPED (status))
> +		signo = target_signal_from_host (WSTOPSIG (status));
> +
> +	      /* If not stopped, then the lwp is gone, no use in
> +		 resending a signal.  */
> +	    }

How do we get here if the core thinks the thread is executing?  Is it
when linux-nat.c resumes the thread without telling the core it
stopped?  A little more detail here would be helpful.

> +      else
> +	{
> +	  /* If the core knows the thread is not executing, then we
> +	     have then last signal recorded in
> +	     thread_info->stop_signal, unless this is inferior_ptid,
> +	     in which case, it's in the global stop_signal, due to
> +	     context switching.  */

I wish we could keep this stuff in the thread struct all the time...

> @@ -1489,6 +1580,9 @@ linux_nat_resume (ptid_t ptid, int step,
>    /* Mark this LWP as resumed.  */
>    lp->resumed = 1;
>  
> +  /* Remove the SIGINT mark.  Used in non-stop mode.  */
> +  lp->sigint = 0;
> +

Confused.  Why does resuming the thread affect whether we have sent it
a SIGINT, but not received it back yet?

> @@ -1650,20 +1746,43 @@ linux_handle_extended_wait (struct lwp_i
>  	  else
>  	    status = 0;
>  
> +	  /* Make thread_db aware of this thread.  We do this this
> +	     early, so in non-stop mode, threads show up as they're
> +	     created, instead of on next stop.  thread_db needs a
> +	     stopped inferior_ptid --- since we know LP is stopped,
> +	     use it this time.  */
> +	  old_chain = save_inferior_ptid ();
> +	  inferior_ptid = lp->ptid;
> +	  lp->stopped = 1;
> +	  target_find_new_threads ();
> +	  do_cleanups (old_chain);
> +	  if (!in_thread_list (new_lp->ptid))
> +	    {
> +	      /* We're not using thread_db.  Attach and add it to
> +		 GDB's list.  */
> +	      lin_lwp_attach_lwp (new_lp->ptid);
> +	      target_post_attach (GET_LWP (new_lp->ptid));
> +	      add_thread (new_lp->ptid);
> +	    }
> +

This may be trouble.  Sometimes the thread state is not
atomically updated, so peeking at it right after creation but before
an event can fail.

Why is it necessary?  We already know the ptid since we made them
independent of thread_db TID some time ago.  attach_thread should cope
if the thread is already in GDB's thread list when the event
eventually arrives.  So we should be able to just add the new
thread directly.

> @@ -2796,13 +2915,26 @@ static int
>  kill_callback (struct lwp_info *lp, void *data)
>  {
>    errno = 0;
> -  ptrace (PTRACE_KILL, GET_LWP (lp->ptid), 0, 0);
> -  if (debug_linux_nat)
> -    fprintf_unfiltered (gdb_stdlog,
> -			"KC:  PTRACE_KILL %s, 0, 0 (%s)\n",
> -			target_pid_to_str (lp->ptid),
> -			errno ? safe_strerror (errno) : "OK");
>  
> +  /* PTRACE_KILL doesn't work when the thread is running.  */
> +  if (!lp->stopped)
> +    {
> +      kill_lwp (GET_LWP (lp->ptid), SIGKILL);
> +      if (debug_linux_nat)
> +	fprintf_unfiltered (gdb_stdlog,
> +			    "KC:  kill_lwp (SIGKILL) %s (%s)\n",
> +			    target_pid_to_str (lp->ptid),
> +			    errno ? safe_strerror (errno) : "OK");
> +    }
> +  else
> +    {
> +      ptrace (PTRACE_KILL, GET_LWP (lp->ptid), 0, 0);
> +      if (debug_linux_nat)
> +	fprintf_unfiltered (gdb_stdlog,
> +			    "KC:  PTRACE_KILL %s, 0, 0 (%s)\n",
> +			    target_pid_to_str (lp->ptid),
> +			    errno ? safe_strerror (errno) : "OK");
> +    }
>    return 0;
>  }
>  

SIGKILL should work even if the thread is stopped.

-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [non-stop] 08/10 linux native support
  2008-06-25 21:17 ` Daniel Jacobowitz
@ 2008-06-25 22:03   ` Pedro Alves
  2008-06-25 22:12     ` Pedro Alves
  2008-06-25 23:08     ` Daniel Jacobowitz
  0 siblings, 2 replies; 20+ messages in thread
From: Pedro Alves @ 2008-06-25 22:03 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb-patches

A Wednesday 25 June 2008 21:19:46, Daniel Jacobowitz wrote:
> On Sun, Jun 15, 2008 at 10:05:49PM +0100, Pedro Alves wrote:
> > @@ -920,7 +929,7 @@ delete_lwp (ptid_t ptid)
> >  /* Return a pointer to the structure describing the LWP corresponding
> >     to PID.  If no corresponding LWP could be found, return NULL.  */
> >
> > -static struct lwp_info *
> > +struct lwp_info *
> >  find_lwp_pid (ptid_t ptid)
> >  {
> >    struct lwp_info *lp;
>
> If you need this function global, please rename it first.
>

Ack, will do.

> > @@ -1306,16 +1315,76 @@ get_pending_status (struct lwp_info *lp,
> >       events are always cached in waitpid_queue.  */
> >
> >    *status = 0;
> > -  if (GET_LWP (lp->ptid) == GET_LWP (last_ptid))
> > +
> > +  if (non_stop)
> >      {
> > -      if (stop_signal != TARGET_SIGNAL_0
> > -	  && signal_pass_state (stop_signal))
> > -	*status = W_STOPCODE (target_signal_to_host (stop_signal));
> > +      enum target_signal signo = TARGET_SIGNAL_0;
> > +
> > +      if (is_executing (lp->ptid))
> > +	{
> > +	  /* If the core thought this lwp was executing, we can only
> > +	     have pending events in the local queue.  */
> > +	  if (queued_waitpid (GET_LWP (lp->ptid), status, __WALL) != -1)
> > +	    {
> > +	      if (WIFSTOPPED (status))
> > +		signo = target_signal_from_host (WSTOPSIG (status));
> > +
> > +	      /* If not stopped, then the lwp is gone, no use in
> > +		 resending a signal.  */
> > +	    }
>
> How do we get here if the core thinks the thread is executing?  Is it
> when linux-nat.c resumes the thread without telling the core it
> stopped?  A little more detail here would be helpful.

Nope, this function is used while detaching, as you know.  Due to the
fact that PTRACE_DETACH needs the threads to be stopped to work,
there's a stop_callback/stop_wait_callback sequence over
all threads just before detaching.  In non-stop mode, for
threads that *were* running when linux_nat_detach is reached, the core
will still believe they were executing, as the executing state
is managed in handle_inferior_event.  I'll more comments there, unless
you think I should do things differently.

>
> > +      else
> > +	{
> > +	  /* If the core knows the thread is not executing, then we
> > +	     have then last signal recorded in
> > +	     thread_info->stop_signal, unless this is inferior_ptid,
> > +	     in which case, it's in the global stop_signal, due to
> > +	     context switching.  */
>
> I wish we could keep this stuff in the thread struct all the time...
>


Working on it... That pesky context switching...


> > @@ -1489,6 +1580,9 @@ linux_nat_resume (ptid_t ptid, int step,
> >    /* Mark this LWP as resumed.  */
> >    lp->resumed = 1;
> >
> > +  /* Remove the SIGINT mark.  Used in non-stop mode.  */
> > +  lp->sigint = 0;
> > +
>
> Confused.  Why does resuming the thread affect whether we have sent it
> a SIGINT, but not received it back yet?
>

Hmm, I was under the impression that it was possible to push more
than one SIGINT into a thread's signal queue, but I just tried it, and
it doesn't seem like it is.  This check was meant to prevent that
happening.

> > @@ -1650,20 +1746,43 @@ linux_handle_extended_wait (struct lwp_i
> >  	  else
> >  	    status = 0;
> >
> > +	  /* Make thread_db aware of this thread.  We do this this
> > +	     early, so in non-stop mode, threads show up as they're
> > +	     created, instead of on next stop.  thread_db needs a
> > +	     stopped inferior_ptid --- since we know LP is stopped,
> > +	     use it this time.  */
> > +	  old_chain = save_inferior_ptid ();
> > +	  inferior_ptid = lp->ptid;
> > +	  lp->stopped = 1;
> > +	  target_find_new_threads ();
> > +	  do_cleanups (old_chain);
> > +	  if (!in_thread_list (new_lp->ptid))
> > +	    {
> > +	      /* We're not using thread_db.  Attach and add it to
> > +		 GDB's list.  */
> > +	      lin_lwp_attach_lwp (new_lp->ptid);
> > +	      target_post_attach (GET_LWP (new_lp->ptid));
> > +	      add_thread (new_lp->ptid);
> > +	    }
> > +
>
> This may be trouble.  Sometimes the thread state is not
> atomically updated, so peeking at it right after creation but before
> an event can fail.
>

Oh, that's not nice.  Is this something that's worth and/or possible
to fix in libthreaddb?

> Why is it necessary?  We already know the ptid since we made them
> independent of thread_db TID some time ago.  attach_thread should cope
> if the thread is already in GDB's thread list when the event
> eventually arrives.  So we should be able to just add the new
> thread directly.

That's right, the only thing we'll miss if we do that, is the
thread_db id of the thread in output like:

[New Thread 0xf7e11b90 (LWP 26100)]
             ^^^^^^^^
And info threads:

  2 Thread 0xf7e11b90 (LWP 26100)  (running)
             ^^^^^^^^

Those will only show up on the next stop event (of any thread).
It may take a while, if all threads are running (unless we do
momentarily stop threads trick).

Having a TARGET_WAITKIND_NEW_THREAD so we could pass the event
to the thread-db layer (and do the immediate resume either there,
or in handle_inferior_event would also get rid of the
target_find_new_threads call, but it has then the
same race issue...

>
> > @@ -2796,13 +2915,26 @@ static int
> >  kill_callback (struct lwp_info *lp, void *data)
> >  {
> >    errno = 0;
> > -  ptrace (PTRACE_KILL, GET_LWP (lp->ptid), 0, 0);
> > -  if (debug_linux_nat)
> > -    fprintf_unfiltered (gdb_stdlog,
> > -			"KC:  PTRACE_KILL %s, 0, 0 (%s)\n",
> > -			target_pid_to_str (lp->ptid),
> > -			errno ? safe_strerror (errno) : "OK");
> >
> > +  /* PTRACE_KILL doesn't work when the thread is running.  */
> > +  if (!lp->stopped)
> > +    {
> > +      kill_lwp (GET_LWP (lp->ptid), SIGKILL);
> > +      if (debug_linux_nat)
> > +	fprintf_unfiltered (gdb_stdlog,
> > +			    "KC:  kill_lwp (SIGKILL) %s (%s)\n",
> > +			    target_pid_to_str (lp->ptid),
> > +			    errno ? safe_strerror (errno) : "OK");
> > +    }
> > +  else
> > +    {
> > +      ptrace (PTRACE_KILL, GET_LWP (lp->ptid), 0, 0);
> > +      if (debug_linux_nat)
> > +	fprintf_unfiltered (gdb_stdlog,
> > +			    "KC:  PTRACE_KILL %s, 0, 0 (%s)\n",
> > +			    target_pid_to_str (lp->ptid),
> > +			    errno ? safe_strerror (errno) : "OK");
> > +    }
> >    return 0;
> >  }
>

> SIGKILL should work even if the thread is stopped.

I think I'll need a SIGCONT as well in that case.  For some
reason, I wasn't getting that to work all the times.  I'll
experiment some more.

As always, thanks much for the review.

-- 
Pedro Alves


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [non-stop] 08/10 linux native support
  2008-06-25 22:03   ` Pedro Alves
@ 2008-06-25 22:12     ` Pedro Alves
  2008-06-25 22:52       ` Daniel Jacobowitz
  2008-06-25 23:08     ` Daniel Jacobowitz
  1 sibling, 1 reply; 20+ messages in thread
From: Pedro Alves @ 2008-06-25 22:12 UTC (permalink / raw)
  To: gdb-patches; +Cc: Daniel Jacobowitz

A Wednesday 25 June 2008 22:17:25, Pedro Alves wrote:
> A Wednesday 25 June 2008 21:19:46, Daniel Jacobowitz wrote:

> > > @@ -1489,6 +1580,9 @@ linux_nat_resume (ptid_t ptid, int step,
> > >    /* Mark this LWP as resumed.  */
> > >    lp->resumed = 1;
> > >
> > > +  /* Remove the SIGINT mark.  Used in non-stop mode.  */
> > > +  lp->sigint = 0;
> > > +
> >
> > Confused.  Why does resuming the thread affect whether we have sent it
> > a SIGINT, but not received it back yet?
>
> Hmm, I was under the impression that it was possible to push more
> than one SIGINT into a thread's signal queue, but I just tried it, and
> it doesn't seem like it is.  This check was meant to prevent that
> happening.

I'm confused.  It does seem I can put more than one SIGINT in the
queue sometimes afterall.  (I just changed the code to do two kill's
in a row instead of one).  If so, the check is needed to prevent the
race where the thread hasn't reported the stop due to the SIGINT
yet, so is_stopped is still false, and the user is doing "interrupt"
on it (/me imagines user clicking a bunch of times on the IDE button).

The clearing on resume was just a safe place to always clear it.

-- 
Pedro Alves


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [non-stop] 08/10 linux native support
  2008-06-25 22:12     ` Pedro Alves
@ 2008-06-25 22:52       ` Daniel Jacobowitz
  0 siblings, 0 replies; 20+ messages in thread
From: Daniel Jacobowitz @ 2008-06-25 22:52 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

On Wed, Jun 25, 2008 at 10:23:18PM +0100, Pedro Alves wrote:
> A Wednesday 25 June 2008 22:17:25, Pedro Alves wrote:
> > Hmm, I was under the impression that it was possible to push more
> > than one SIGINT into a thread's signal queue, but I just tried it, and
> > it doesn't seem like it is.  This check was meant to prevent that
> > happening.
> 
> I'm confused.  It does seem I can put more than one SIGINT in the
> queue sometimes afterall.  (I just changed the code to do two kill's
> in a row instead of one).  If so, the check is needed to prevent the
> race where the thread hasn't reported the stop due to the SIGINT
> yet, so is_stopped is still false, and the user is doing "interrupt"
> on it (/me imagines user clicking a bunch of times on the IDE button).
> 
> The clearing on resume was just a safe place to always clear it.

Realtime signals can stack in the queue.  Non-realtime signals, like
SIGINT, can not.  Your first sigint is probably already delivered -
even though GDB hasn't waited for the program yet, this still counts
as dequeuing the signal.  Try three, and they won't stack.

The SIGINT is not necessarily the next signal to be received.  We
might get a SIGTRAP first, or anything else.  So the resume has
nothing to do with whether there's an "in-flight" SIGINT.  The same
complications as for SIGSTOP apply.

-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [non-stop] 08/10 linux native support
  2008-06-25 22:03   ` Pedro Alves
  2008-06-25 22:12     ` Pedro Alves
@ 2008-06-25 23:08     ` Daniel Jacobowitz
  2008-07-02  3:35       ` Pedro Alves
  1 sibling, 1 reply; 20+ messages in thread
From: Daniel Jacobowitz @ 2008-06-25 23:08 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

On Wed, Jun 25, 2008 at 10:17:25PM +0100, Pedro Alves wrote:
> > This may be trouble.  Sometimes the thread state is not
> > atomically updated, so peeking at it right after creation but before
> > an event can fail.
> >
> 
> Oh, that's not nice.  Is this something that's worth and/or possible
> to fix in libthreaddb?

I don't remember whether it's fixed in current libthread_db, or else
impossible to fix due to the kernel interfaces involved.  There's
tension between having the thread on the list early enough and having
its entry be correct.  I know I wrote a related kernel patch, which
was never merged.  libthread_db is better about this than it used to
be though.

> > Why is it necessary?  We already know the ptid since we made them
> > independent of thread_db TID some time ago.  attach_thread should cope
> > if the thread is already in GDB's thread list when the event
> > eventually arrives.  So we should be able to just add the new
> > thread directly.
> 
> That's right, the only thing we'll miss if we do that, is the
> thread_db id of the thread in output like:
> 
> [New Thread 0xf7e11b90 (LWP 26100)]
>              ^^^^^^^^
> And info threads:
> 
>   2 Thread 0xf7e11b90 (LWP 26100)  (running)
>              ^^^^^^^^
> 
> Those will only show up on the next stop event (of any thread).
> It may take a while, if all threads are running (unless we do
> momentarily stop threads trick).

Oh, dear.  Options:

  - delay the notification until thread_db discovers the thread,
    if libthread_db is already active

  - display the notification without the thread ID; we'll have the
    LWP ID and we could add the GDB thread number

  - go with your code and fix broken situations as they arise

I'm undecided.  Note that your code is unnecessarily quadratic, by the
way.  It'll walk the entire thread list; we could just load the new
thread since we know its LWP ID.  libthread_db may still do a walk in
that case though...

> > SIGKILL should work even if the thread is stopped.
> 
> I think I'll need a SIGCONT as well in that case.  For some
> reason, I wasn't getting that to work all the times.  I'll
> experiment some more.

Kernels may vary in this regard.  Your code seems reasonable.
PTRACE_KILL is supposed to be just SIGKILL + PTRACE_CONT, and SIGKILL
is supposed to work even on stopped processes, but the details come
and go... as you know, signal handling is a very touchy area and hard
to write tests for.

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [non-stop] 08/10 linux native support
  2008-06-25 23:08     ` Daniel Jacobowitz
@ 2008-07-02  3:35       ` Pedro Alves
  2008-07-07 18:20         ` Daniel Jacobowitz
  0 siblings, 1 reply; 20+ messages in thread
From: Pedro Alves @ 2008-07-02  3:35 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb-patches

[-- Attachment #1: Type: text/plain, Size: 2663 bytes --]

A Wednesday 25 June 2008 23:12:20, Daniel Jacobowitz wrote:

> > That's right, the only thing we'll miss if we do that, is the
> > thread_db id of the thread in output like:
> >
> > [New Thread 0xf7e11b90 (LWP 26100)]
> >              ^^^^^^^^
> > And info threads:
> >
> >   2 Thread 0xf7e11b90 (LWP 26100)  (running)
> >              ^^^^^^^^
> >
> > Those will only show up on the next stop event (of any thread).
> > It may take a while, if all threads are running (unless we do
> > momentarily stop threads trick).
>
> Oh, dear.  Options:
>
>   - delay the notification until thread_db discovers the thread,
>     if libthread_db is already active
>
>   - display the notification without the thread ID; we'll have the
>     LWP ID and we could add the GDB thread number
>
>   - go with your code and fix broken situations as they arise
>
> I'm undecided.  Note that your code is unnecessarily quadratic, by the
> way.  It'll walk the entire thread list; we could just load the new
> thread since we know its LWP ID.  libthread_db may still do a walk in
> that case though...

I'd prefer the third, if it's still on the plate.
I've tried not calling into target_find_new_threads, but wasn't
convinced that the saving justifies the possible ugliness we
introduce.  The walk is most surelly still performed to map an
LWP to a user thread.

If it is justifiable, what is the prefered interface to call
into thread_db in this case?  A new target method?  The double
call I'm making into thread db, is because this may be the
earliest that we can get thread_db info on the main thread (the
parent of the first clone).  I couldn't find another place to
put it that didn't end up introducing more walks.

>
> > > SIGKILL should work even if the thread is stopped.
> >
> > I think I'll need a SIGCONT as well in that case.  For some
> > reason, I wasn't getting that to work all the times.  I'll
> > experiment some more.
>
> Kernels may vary in this regard.  Your code seems reasonable.
> PTRACE_KILL is supposed to be just SIGKILL + PTRACE_CONT, and SIGKILL
> is supposed to work even on stopped processes, but the details come
> and go... as you know, signal handling is a very touchy area and hard
> to write tests for.

I had second thoughts on the code.  It's racy.  If a running thread
happens to stop (and GDB doesn't know about it yet) just before
sending SIGKILL, I'd miss sending PTRACE_CONT in kernels that need
it.  I think it's just better to stop all threads before killing
them, and then killing them like we used to.

Obviously, the patch still needs the new thread management part
to be resolved, but otherwise, it should be done.

-- 
Pedro Alves

[-- Attachment #2: 008-non_stop_linux.diff --]
[-- Type: text/x-diff, Size: 20672 bytes --]

2008-07-02  Pedro Alves  <pedro@codesourcery.com>

	Non-stop linux native.

	* linux-fork.c (linux_fork_killall): Use SIGKILL instead of
	PTRACE_KILL.

	* linux-nat.c (linux_test_for_tracefork): Block events while we're
	here.
	(find_lwp_pid): Rename to...
	(linux_nat_find_lwp_pid): ... this.  Make public.  Update all
	callers.
	(get_pending_status): Implement non-stop mode.
	(linux_nat_detach): Stop threads before detaching.
	(linux_nat_resume): In non-stop mode, always resume only a single
	PTID.
	(linux_handle_extended_wait): On a clone event, add new lwp to
	GDB's thread table, and mark as running, executing and stopped
	appropriatelly.
	(linux_nat_filter_event): Don't assume there are other running
	threads when a thread exits.
	(linux_nat_wait): Mark the main thread as running and executing.
	In non-stop mode, don't stop all lwps.
	(linux_nat_kill): Stop lwps before killing them.
	(linux_nat_thread_alive): Use signal 0 to detect if a thread is
	alive.
	(send_sigint_callback): New.
	(linux_nat_stop): New.
	(linux_nat_add_target): Set to_stop to linux_nat_stop.

	* linux-nat.h (thread_db_attach_lwp): Declare.
	(linux_nat_find_lwp_pid): Declare.

	* linux-thread-db.c (thread_from_lwp, enable_thread_event)
	(check_event): Set proc_handle.pid to the stopped lwp.
	(thread_db_attach_lwp): New.
	(thread_db_find_new_threads): If current lwp is executing, don't
	try to read from it.

---
 gdb/linux-fork.c      |    4 
 gdb/linux-nat.c       |  258 ++++++++++++++++++++++++++++++++++++++++----------
 gdb/linux-nat.h       |    4 
 gdb/linux-thread-db.c |   50 +++++++++
 4 files changed, 268 insertions(+), 48 deletions(-)

Index: src/gdb/linux-fork.c
===================================================================
--- src.orig/gdb/linux-fork.c	2008-07-01 16:31:23.000000000 +0100
+++ src/gdb/linux-fork.c	2008-07-01 16:31:31.000000000 +0100
@@ -337,7 +337,9 @@ linux_fork_killall (void)
     {
       pid = PIDGET (fp->ptid);
       do {
-	ptrace (PT_KILL, pid, 0, 0);
+	/* Use SIGKILL instead of PTRACE_KILL because the former works even
+	   if the thread is running, while the later doesn't.  */
+	kill (pid, SIGKILL);
 	ret = waitpid (pid, &status, 0);
 	/* We might get a SIGCHLD instead of an exit status.  This is
 	 aggravated by the first kill above - a child has just
Index: src/gdb/linux-nat.c
===================================================================
--- src.orig/gdb/linux-nat.c	2008-07-01 16:31:29.000000000 +0100
+++ src/gdb/linux-nat.c	2008-07-01 20:18:03.000000000 +0100
@@ -285,6 +285,9 @@ static void linux_nat_async (void (*call
 static int linux_nat_async_mask (int mask);
 static int kill_lwp (int lwpid, int signo);
 
+static int send_sigint_callback (struct lwp_info *lp, void *data);
+static int stop_callback (struct lwp_info *lp, void *data);
+
 /* Captures the result of a successful waitpid call, along with the
    options used in that call.  */
 struct waitpid_result
@@ -487,6 +490,9 @@ linux_test_for_tracefork (int original_p
 {
   int child_pid, ret, status;
   long second_pid;
+  enum sigchld_state async_events_original_state;
+
+  async_events_original_state = linux_nat_async_events (sigchld_sync);
 
   linux_supports_tracefork_flag = 0;
   linux_supports_tracevforkdone_flag = 0;
@@ -517,6 +523,7 @@ linux_test_for_tracefork (int original_p
       if (ret != 0)
 	{
 	  warning (_("linux_test_for_tracefork: failed to kill child"));
+	  linux_nat_async_events (async_events_original_state);
 	  return;
 	}
 
@@ -527,6 +534,7 @@ linux_test_for_tracefork (int original_p
 	warning (_("linux_test_for_tracefork: unexpected wait status 0x%x from "
 		 "killed child"), status);
 
+      linux_nat_async_events (async_events_original_state);
       return;
     }
 
@@ -566,6 +574,8 @@ linux_test_for_tracefork (int original_p
   if (ret != 0)
     warning (_("linux_test_for_tracefork: failed to kill child"));
   my_waitpid (child_pid, &status, 0);
+
+  linux_nat_async_events (async_events_original_state);
 }
 
 /* Return non-zero iff we have tracefork functionality available.
@@ -985,8 +995,8 @@ delete_lwp (ptid_t ptid)
 /* Return a pointer to the structure describing the LWP corresponding
    to PID.  If no corresponding LWP could be found, return NULL.  */
 
-static struct lwp_info *
-find_lwp_pid (ptid_t ptid)
+struct lwp_info *
+linux_nat_find_lwp_pid (ptid_t ptid)
 {
   struct lwp_info *lp;
   int lwp;
@@ -1207,7 +1217,7 @@ lin_lwp_attach_lwp (ptid_t ptid)
 
   async_events_original_state = linux_nat_async_events (sigchld_sync);
 
-  lp = find_lwp_pid (ptid);
+  lp = linux_nat_find_lwp_pid (ptid);
 
   /* We assume that we're already attached to any LWP that has an id
      equal to the overall process id, and to any LWP that is already
@@ -1376,16 +1386,80 @@ get_pending_status (struct lwp_info *lp,
      events are always cached in waitpid_queue.  */
 
   *status = 0;
-  if (GET_LWP (lp->ptid) == GET_LWP (last_ptid))
+
+  if (non_stop)
     {
-      if (stop_signal != TARGET_SIGNAL_0
-	  && signal_pass_state (stop_signal))
-	*status = W_STOPCODE (target_signal_to_host (stop_signal));
+      enum target_signal signo = TARGET_SIGNAL_0;
+
+      if (is_executing (lp->ptid))
+	{
+	  /* If the core thought this lwp was executing --- e.g., the
+	     executing property hasn't been updated yet, but the
+	     thread has been stopped with a stop_callback /
+	     stop_wait_callback sequence (see linux_nat_detach for
+	     example) --- we can only have pending events in the local
+	     queue.  */
+	  if (queued_waitpid (GET_LWP (lp->ptid), status, __WALL) != -1)
+	    {
+	      if (WIFSTOPPED (status))
+		signo = target_signal_from_host (WSTOPSIG (status));
+
+	      /* If not stopped, then the lwp is gone, no use in
+		 resending a signal.  */
+	    }
+	}
+      else
+	{
+	  /* If the core knows the thread is not executing, then we
+	     have the last signal recorded in
+	     thread_info->stop_signal, unless this is inferior_ptid,
+	     in which case, it's in the global stop_signal, due to
+	     context switching.  */
+
+	  if (ptid_equal (lp->ptid, inferior_ptid))
+	    signo = stop_signal;
+	  else
+	    {
+	      struct thread_info *tp = find_thread_pid (lp->ptid);
+	      gdb_assert (tp);
+	      signo = tp->stop_signal;
+	    }
+	}
+
+      if (signo != TARGET_SIGNAL_0
+	  && !signal_pass_state (signo))
+	{
+	  if (debug_linux_nat)
+	    fprintf_unfiltered (gdb_stdlog, "\
+GPT: lwp %s had signal %s, but it is in no pass state\n",
+				target_pid_to_str (lp->ptid),
+				target_signal_to_string (signo));
+	}
+      else
+	{
+	  if (signo != TARGET_SIGNAL_0)
+	    *status = W_STOPCODE (target_signal_to_host (signo));
+
+	  if (debug_linux_nat)
+	    fprintf_unfiltered (gdb_stdlog,
+				"GPT: lwp %s as pending signal %s\n",
+				target_pid_to_str (lp->ptid),
+				target_signal_to_string (signo));
+	}
     }
-  else if (target_can_async_p ())
-    queued_waitpid (GET_LWP (lp->ptid), status, __WALL);
   else
-    *status = lp->status;
+    {
+      if (GET_LWP (lp->ptid) == GET_LWP (last_ptid))
+	{
+	  if (stop_signal != TARGET_SIGNAL_0
+	      && signal_pass_state (stop_signal))
+	    *status = W_STOPCODE (target_signal_to_host (stop_signal));
+	}
+      else if (target_can_async_p ())
+	queued_waitpid (GET_LWP (lp->ptid), status, __WALL);
+      else
+	*status = lp->status;
+    }
 
   return 0;
 }
@@ -1449,6 +1523,13 @@ linux_nat_detach (char *args, int from_t
   if (target_can_async_p ())
     linux_nat_async (NULL, 0);
 
+  /* Stop all threads before detaching.  ptrace requires that the
+     thread is stopped to sucessfully detach.  */
+  iterate_over_lwps (stop_callback, NULL);
+  /* ... and wait until all of them have reported back that
+     they're no longer running.  */
+  iterate_over_lwps (stop_wait_callback, NULL);
+
   iterate_over_lwps (detach_callback, NULL);
 
   /* Only the initial process should be left right now.  */
@@ -1538,19 +1619,27 @@ linux_nat_resume (ptid_t ptid, int step,
   /* A specific PTID means `step only this process id'.  */
   resume_all = (PIDGET (ptid) == -1);
 
-  if (resume_all)
-    iterate_over_lwps (resume_set_callback, NULL);
-  else
-    iterate_over_lwps (resume_clear_callback, NULL);
+  if (non_stop && resume_all)
+    internal_error (__FILE__, __LINE__,
+		    "can't resume all in non-stop mode");
+
+  if (!non_stop)
+    {
+      if (resume_all)
+	iterate_over_lwps (resume_set_callback, NULL);
+      else
+	iterate_over_lwps (resume_clear_callback, NULL);
+    }
 
   /* If PID is -1, it's the current inferior that should be
      handled specially.  */
   if (PIDGET (ptid) == -1)
     ptid = inferior_ptid;
 
-  lp = find_lwp_pid (ptid);
+  lp = linux_nat_find_lwp_pid (ptid);
   gdb_assert (lp != NULL);
 
+  /* Convert to something the lower layer understands.  */
   ptid = pid_to_ptid (GET_LWP (lp->ptid));
 
   /* Remember if we're stepping.  */
@@ -1701,6 +1790,8 @@ linux_handle_extended_wait (struct lwp_i
 	ourstatus->kind = TARGET_WAITKIND_VFORKED;
       else
 	{
+	  struct cleanup *old_chain;
+
 	  ourstatus->kind = TARGET_WAITKIND_IGNORE;
 	  new_lp = add_lwp (BUILD_LWP (new_pid, GET_PID (inferior_ptid)));
 	  new_lp->cloned = 1;
@@ -1720,20 +1811,54 @@ linux_handle_extended_wait (struct lwp_i
 	  else
 	    status = 0;
 
+#if 0
+	  /* Make thread_db aware of this thread.  We do this this
+	     early, so in non-stop mode, threads show up as they're
+	     created, instead of on next stop, and so that they have
+	     the correct running state.  thread_db_find_new_threads
+	     needs a stopped inferior_ptid --- since we know LP is
+	     stopped, use it this time.  */
+	  old_chain = save_inferior_ptid ();
+	  inferior_ptid = lp->ptid;
+	  lp->stopped = 1;
+	  target_find_new_threads ();
+	  do_cleanups (old_chain);
+	  if (!in_thread_list (new_lp->ptid))
+#else
+	  /* "Attach"ing to the parent forces the thread_db target to
+	     build its private data structures for the parent, which
+	     may have not had them setup yet.  */
+	  thread_db_attach_lwp (lp->ptid);
+	  /* Do the same to the child, which, if thread_db is active,
+	     adds the child to GDB's thread list.  */
+	  if (!thread_db_attach_lwp (new_lp->ptid))
+#endif
+	    {
+	      /* We're not using thread_db.  Attach and add it to
+		 GDB's list.  */
+	      lin_lwp_attach_lwp (new_lp->ptid);
+	      target_post_attach (GET_LWP (new_lp->ptid));
+	      add_thread (new_lp->ptid);
+	    }
+
 	  if (stopping)
 	    new_lp->stopped = 1;
 	  else
 	    {
+	      new_lp->stopped = 0;
 	      new_lp->resumed = 1;
 	      ptrace (PTRACE_CONT,
 		      PIDGET (lp->waitstatus.value.related_pid), 0,
 		      status ? WSTOPSIG (status) : 0);
+	      set_running (new_lp->ptid, 1);
+	      set_executing (new_lp->ptid, 1);
 	    }
 
 	  if (debug_linux_nat)
 	    fprintf_unfiltered (gdb_stdlog,
 				"LHEW: Got clone event from LWP %ld, resuming\n",
 				GET_LWP (lp->ptid));
+	  lp->stopped = 0;
 	  ptrace (PTRACE_CONT, GET_LWP (lp->ptid), 0, 0);
 
 	  return 1;
@@ -2358,7 +2483,7 @@ linux_nat_filter_event (int lwpid, int s
 {
   struct lwp_info *lp;
 
-  lp = find_lwp_pid (pid_to_ptid (lwpid));
+  lp = linux_nat_find_lwp_pid (pid_to_ptid (lwpid));
 
   /* Check for stop events reported by a process we didn't already
      know about - anything not already in our LWP list.
@@ -2453,13 +2578,7 @@ linux_nat_filter_event (int lwpid, int s
 	 not the end of the debugged application and should be
 	 ignored.  */
       if (num_lwps > 0)
-	{
-	  /* Make sure there is at least one thread running.  */
-	  gdb_assert (iterate_over_lwps (running_callback, NULL));
-
-	  /* Discard the event.  */
-	  return NULL;
-	}
+	return NULL;
     }
 
   /* Check if the current LWP has previously exited.  In the nptl
@@ -2589,6 +2708,8 @@ linux_nat_wait (ptid_t ptid, struct targ
       lp->resumed = 1;
       /* Add the main thread to GDB's thread list.  */
       add_thread_silent (lp->ptid);
+      set_running (lp->ptid, 1);
+      set_executing (lp->ptid, 1);
     }
 
   sigemptyset (&flush_mask);
@@ -2635,7 +2756,7 @@ retry:
 			    target_pid_to_str (ptid));
 
       /* We have a specific LWP to check.  */
-      lp = find_lwp_pid (ptid);
+      lp = linux_nat_find_lwp_pid (ptid);
       gdb_assert (lp);
       status = lp->status;
       lp->status = 0;
@@ -2816,19 +2937,23 @@ retry:
     fprintf_unfiltered (gdb_stdlog, "LLW: Candidate event %s in %s.\n",
 			status_to_str (status), target_pid_to_str (lp->ptid));
 
-  /* Now stop all other LWP's ...  */
-  iterate_over_lwps (stop_callback, NULL);
+  if (!non_stop)
+    {
+      /* Now stop all other LWP's ...  */
+      iterate_over_lwps (stop_callback, NULL);
 
-  /* ... and wait until all of them have reported back that they're no
-     longer running.  */
-  iterate_over_lwps (stop_wait_callback, &flush_mask);
-  iterate_over_lwps (flush_callback, &flush_mask);
-
-  /* If we're not waiting for a specific LWP, choose an event LWP from
-     among those that have had events.  Giving equal priority to all
-     LWPs that have had events helps prevent starvation.  */
-  if (pid == -1)
-    select_event_lwp (&lp, &status);
+      /* ... and wait until all of them have reported back that
+	 they're no longer running.  */
+      iterate_over_lwps (stop_wait_callback, &flush_mask);
+      iterate_over_lwps (flush_callback, &flush_mask);
+
+      /* If we're not waiting for a specific LWP, choose an event LWP
+	 from among those that have had events.  Giving equal priority
+	 to all LWPs that have had events helps prevent
+	 starvation.  */
+      if (pid == -1)
+	select_event_lwp (&lp, &status);
+    }
 
   /* Now that we've selected our final event LWP, cancel any
      breakpoints in other LWPs that have hit a GDB breakpoint.  See
@@ -2960,6 +3085,13 @@ linux_nat_kill (void)
     }
   else
     {
+      /* Stop all threads before killing them, since ptrace requires
+	 that the thread is stopped to sucessfully PTRACE_KILL.  */
+      iterate_over_lwps (stop_callback, NULL);
+      /* ... and wait until all of them have reported back that
+	 they're no longer running.  */
+      iterate_over_lwps (stop_wait_callback, NULL);
+
       /* Kill all LWP's ...  */
       iterate_over_lwps (kill_callback, NULL);
 
@@ -3012,22 +3144,22 @@ linux_nat_xfer_partial (struct target_op
 static int
 linux_nat_thread_alive (ptid_t ptid)
 {
+  int err;
+
   gdb_assert (is_lwp (ptid));
 
-  errno = 0;
-  ptrace (PTRACE_PEEKUSER, GET_LWP (ptid), 0, 0);
+  /* Send signal 0 instead of anything ptrace, because ptracing a
+     running thread errors out claiming that the thread doesn't
+     exist.  */
+  err = kill_lwp (GET_LWP (ptid), 0);
+
   if (debug_linux_nat)
     fprintf_unfiltered (gdb_stdlog,
-			"LLTA: PTRACE_PEEKUSER %s, 0, 0 (%s)\n",
+			"LLTA: KILL(SIG0) %s (%s)\n",
 			target_pid_to_str (ptid),
-			errno ? safe_strerror (errno) : "OK");
+			err ? safe_strerror (err) : "OK");
 
-  /* Not every Linux kernel implements PTRACE_PEEKUSER.  But we can
-     handle that case gracefully since ptrace will first do a lookup
-     for the process based upon the passed-in pid.  If that fails we
-     will get either -ESRCH or -EPERM, otherwise the child exists and
-     is alive.  */
-  if (errno == ESRCH || errno == EPERM)
+  if (err != 0)
     return 0;
 
   return 1;
@@ -4229,6 +4361,35 @@ linux_nat_set_async_mode (int on)
   linux_nat_async_enabled = on;
 }
 
+static int
+send_sigint_callback (struct lwp_info *lp, void *data)
+{
+  /* Use is_running instead of !lp->stopped, because the lwp may be
+     stopped due to an internal event, and we want to interrupt it in
+     that case too.  What we want is to check if the thread is stopped
+     from the point of view of the user.  */
+  if (is_running (lp->ptid))
+    kill_lwp (GET_LWP (lp->ptid), SIGINT);
+  return 0;
+}
+
+static void
+linux_nat_stop (ptid_t ptid)
+{
+  if (non_stop)
+    {
+      if (ptid_equal (ptid, minus_one_ptid))
+	iterate_over_lwps (send_sigint_callback, &ptid);
+      else
+	{
+	  struct lwp_info *lp = linux_nat_find_lwp_pid (ptid);
+	  send_sigint_callback (lp, NULL);
+	}
+    }
+  else
+    linux_ops->to_stop (ptid);
+}
+
 void
 linux_nat_add_target (struct target_ops *t)
 {
@@ -4259,6 +4420,9 @@ linux_nat_add_target (struct target_ops 
   t->to_terminal_inferior = linux_nat_terminal_inferior;
   t->to_terminal_ours = linux_nat_terminal_ours;
 
+  /* Methods for non-stop support.  */
+  t->to_stop = linux_nat_stop;
+
   /* We don't change the stratum; this target will sit at
      process_stratum and thread_db will set at thread_stratum.  This
      is a little strange, since this is a multi-threaded-capable
@@ -4286,7 +4450,7 @@ linux_nat_set_new_thread (struct target_
 struct siginfo *
 linux_nat_get_siginfo (ptid_t ptid)
 {
-  struct lwp_info *lp = find_lwp_pid (ptid);
+  struct lwp_info *lp = linux_nat_find_lwp_pid (ptid);
 
   gdb_assert (lp != NULL);
 
Index: src/gdb/linux-nat.h
===================================================================
--- src.orig/gdb/linux-nat.h	2008-07-01 16:31:23.000000000 +0100
+++ src/gdb/linux-nat.h	2008-07-01 16:31:31.000000000 +0100
@@ -94,6 +94,8 @@ void check_for_thread_db (void);
 /* Tell the thread_db layer what native target operations to use.  */
 void thread_db_init (struct target_ops *);
 
+int thread_db_attach_lwp (ptid_t ptid);
+
 /* Find process PID's pending signal set from /proc/pid/status.  */
 void linux_proc_pending_signals (int pid, sigset_t *pending, sigset_t *blocked, sigset_t *ignored);
 
@@ -107,6 +109,8 @@ struct lwp_info *iterate_over_lwps (int 
 						     void *), 
 				    void *data);
 
+struct lwp_info *linux_nat_find_lwp_pid (ptid_t ptid);
+
 /* Create a prototype generic GNU/Linux target.  The client can
    override it with local methods.  */
 struct target_ops * linux_target (void);
Index: src/gdb/linux-thread-db.c
===================================================================
--- src.orig/gdb/linux-thread-db.c	2008-07-01 16:31:23.000000000 +0100
+++ src/gdb/linux-thread-db.c	2008-07-01 20:17:54.000000000 +0100
@@ -308,6 +308,8 @@ thread_from_lwp (ptid_t ptid)
      LWP.  */
   gdb_assert (GET_LWP (ptid) != 0);
 
+  /* Access an lwp we know is stopped.  */
+  proc_handle.pid = GET_LWP (ptid);
   err = td_ta_map_lwp2thr_p (thread_agent, GET_LWP (ptid), &th);
   if (err != TD_OK)
     error (_("Cannot find user-level thread for LWP %ld: %s"),
@@ -332,6 +334,41 @@ thread_from_lwp (ptid_t ptid)
 }
 \f
 
+/* Attach to lwp PTID, doing whatever else is required to have this
+   LWP under the debugger's control --- e.g., enabling event
+   reporting.  Returns true on success.  */
+int
+thread_db_attach_lwp (ptid_t ptid)
+{
+  td_thrhandle_t th;
+  td_thrinfo_t ti;
+  td_err_e err;
+
+  if (!using_thread_db)
+    return 0;
+
+  /* This ptid comes from linux-nat.c, which should always fill in the
+     LWP.  */
+  gdb_assert (GET_LWP (ptid) != 0);
+
+  /* Access an lwp we know is stopped.  */
+  proc_handle.pid = GET_LWP (ptid);
+  err = td_ta_map_lwp2thr_p (thread_agent, GET_LWP (ptid), &th);
+  if (err != TD_OK)
+    /* Cannot find user-level thread.  */
+    return 0;
+
+  err = td_thr_get_info_p (&th, &ti);
+  if (err != TD_OK)
+    {
+      warning (_("Cannot get thread info: %s"), thread_db_err_str (err));
+      return 0;
+    }
+
+  attach_thread (ptid, &th, &ti);
+  return 1;
+}
+
 void
 thread_db_init (struct target_ops *target)
 {
@@ -418,6 +455,9 @@ enable_thread_event (td_thragent_t *thre
   td_notify_t notify;
   td_err_e err;
 
+  /* Access an lwp we know is stopped.  */
+  proc_handle.pid = GET_LWP (inferior_ptid);
+
   /* Get the breakpoint address for thread EVENT.  */
   err = td_ta_event_addr_p (thread_agent, event, &notify);
   if (err != TD_OK)
@@ -761,6 +801,9 @@ check_event (ptid_t ptid)
   if (stop_pc != td_create_bp_addr && stop_pc != td_death_bp_addr)
     return;
 
+  /* Access an lwp we know is stopped.  */
+  proc_handle.pid = GET_LWP (ptid);
+
   /* If we are at a create breakpoint, we do not know what new lwp
      was created and cannot specifically locate the event message for it.
      We have to call td_ta_event_getmsg() to get
@@ -955,7 +998,14 @@ static void
 thread_db_find_new_threads (void)
 {
   td_err_e err;
+  struct lwp_info *lp = linux_nat_find_lwp_pid (inferior_ptid);
+
+  if (!lp || !lp->stopped)
+    /* In linux, we can only read memory through a stopped lwp.  */
+    return;
 
+  /* Access an lwp we know is stopped.  */
+  proc_handle.pid = GET_LWP (inferior_ptid);
   /* Iterate over all user-space threads to discover new threads.  */
   err = td_ta_thr_iter_p (thread_agent, find_new_threads_callback, NULL,
 			  TD_THR_ANY_STATE, TD_THR_LOWEST_PRIORITY,

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [non-stop] 08/10 linux native support
  2008-07-02  3:35       ` Pedro Alves
@ 2008-07-07 18:20         ` Daniel Jacobowitz
  2008-07-09  3:25           ` Michael Snyder
  2008-07-10 15:28           ` Pedro Alves
  0 siblings, 2 replies; 20+ messages in thread
From: Daniel Jacobowitz @ 2008-07-07 18:20 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

On Wed, Jul 02, 2008 at 04:34:50AM +0100, Pedro Alves wrote:
> @@ -337,7 +337,9 @@ linux_fork_killall (void)
>      {
>        pid = PIDGET (fp->ptid);
>        do {
> -	ptrace (PT_KILL, pid, 0, 0);
> +	/* Use SIGKILL instead of PTRACE_KILL because the former works even
> +	   if the thread is running, while the later doesn't.  */
> +	kill (pid, SIGKILL);
>  	ret = waitpid (pid, &status, 0);
>  	/* We might get a SIGCHLD instead of an exit status.  This is
>  	 aggravated by the first kill above - a child has just

This is OK but if anyone wants to make fork support handle
multi-threaded programs someday we may need to expose kill_lwp.

(We could make fork support work; it's checkpoint support that's
terminally stuck, because of lack of Solaris's rfork.)

> @@ -1720,20 +1811,54 @@ linux_handle_extended_wait (struct lwp_i
>  	  else
>  	    status = 0;
>  
> +#if 0
> +	  /* Make thread_db aware of this thread.  We do this this
> +	     early, so in non-stop mode, threads show up as they're
> +	     created, instead of on next stop, and so that they have
> +	     the correct running state.  thread_db_find_new_threads
> +	     needs a stopped inferior_ptid --- since we know LP is
> +	     stopped, use it this time.  */
> +	  old_chain = save_inferior_ptid ();
> +	  inferior_ptid = lp->ptid;
> +	  lp->stopped = 1;
> +	  target_find_new_threads ();
> +	  do_cleanups (old_chain);
> +	  if (!in_thread_list (new_lp->ptid))
> +#else
> +	  /* "Attach"ing to the parent forces the thread_db target to
> +	     build its private data structures for the parent, which
> +	     may have not had them setup yet.  */
> +	  thread_db_attach_lwp (lp->ptid);
> +	  /* Do the same to the child, which, if thread_db is active,
> +	     adds the child to GDB's thread list.  */
> +	  if (!thread_db_attach_lwp (new_lp->ptid))
> +#endif

This (the thread_db_attach_lwp version) looks reasonable to me.  Ugly,
but reasonable.  Why do we need the parent's data?

> +	    {
> +	      /* We're not using thread_db.  Attach and add it to
> +		 GDB's list.  */
> +	      lin_lwp_attach_lwp (new_lp->ptid);
> +	      target_post_attach (GET_LWP (new_lp->ptid));
> +	      add_thread (new_lp->ptid);
> +	    }
> +

Why do we need to call lin_lwp_attach_lwp?  Won't that try to
PTRACE_ATTACH?  And we've already called add_lwp above.


-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [non-stop] 08/10 linux native support
  2008-07-07 18:20         ` Daniel Jacobowitz
@ 2008-07-09  3:25           ` Michael Snyder
  2008-07-09  3:47             ` Daniel Jacobowitz
  2008-07-09  7:56             ` Mark Kettenis
  2008-07-10 15:28           ` Pedro Alves
  1 sibling, 2 replies; 20+ messages in thread
From: Michael Snyder @ 2008-07-09  3:25 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: Pedro Alves, gdb-patches

On Mon, 2008-07-07 at 14:20 -0400, Daniel Jacobowitz wrote:
> On Wed, Jul 02, 2008 at 04:34:50AM +0100, Pedro Alves wrote:
> > @@ -337,7 +337,9 @@ linux_fork_killall (void)
> >      {
> >        pid = PIDGET (fp->ptid);
> >        do {
> > -	ptrace (PT_KILL, pid, 0, 0);
> > +	/* Use SIGKILL instead of PTRACE_KILL because the former works even
> > +	   if the thread is running, while the later doesn't.  */
> > +	kill (pid, SIGKILL);
> >  	ret = waitpid (pid, &status, 0);
> >  	/* We might get a SIGCHLD instead of an exit status.  This is
> >  	 aggravated by the first kill above - a child has just
> 
> This is OK but if anyone wants to make fork support handle
> multi-threaded programs someday we may need to expose kill_lwp.

Fork is undefined in a multi-threaded program.




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [non-stop] 08/10 linux native support
  2008-07-09  3:25           ` Michael Snyder
@ 2008-07-09  3:47             ` Daniel Jacobowitz
  2008-07-09  3:55               ` Michael Snyder
  2008-07-09  7:56             ` Mark Kettenis
  1 sibling, 1 reply; 20+ messages in thread
From: Daniel Jacobowitz @ 2008-07-09  3:47 UTC (permalink / raw)
  To: Michael Snyder; +Cc: Pedro Alves, gdb-patches

On Tue, Jul 08, 2008 at 08:25:36PM -0700, Michael Snyder wrote:
> Fork is undefined in a multi-threaded program.

According to what?  Not POSIX - otherwise there wouldn't be
pthread_atfork.

I think even vfork is valid in multithreaded programs; in any case, it
behaves sensibly with NPTL.

-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [non-stop] 08/10 linux native support
  2008-07-09  3:47             ` Daniel Jacobowitz
@ 2008-07-09  3:55               ` Michael Snyder
  2008-07-09  7:55                 ` Mark Kettenis
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Snyder @ 2008-07-09  3:55 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: Pedro Alves, gdb-patches

On Tue, 2008-07-08 at 23:47 -0400, Daniel Jacobowitz wrote:
> On Tue, Jul 08, 2008 at 08:25:36PM -0700, Michael Snyder wrote:
> > Fork is undefined in a multi-threaded program.
> 
> According to what?  Not POSIX - otherwise there wouldn't be
> pthread_atfork.
> 
> I think even vfork is valid in multithreaded programs; in any case, it
> behaves sensibly with NPTL.

Well, I should perhaps say for, is *not well* defined in a
 multi-threaded
program -- at least not defined consistently.  I had to research this
at my last job but one, and it was very difficult to find anything that
definitively says that it is UN-defined... but try to find Posix
explicitly saying what *should* happen if a multi-threaded program 
forks.

Here's IEEE std 1003.1:

A process shall be created with a single thread. If a multi-threaded
process calls fork(), the new process shall contain a replica of the
calling thread and its entire address space, possibly including the
states of mutexes and other resources. Consequently, to avoid errors,
the child process may only execute async-signal-safe operations until
such time as one of the exec functions is called. [THR] [Option Start]
 Fork handlers may be established by means of the pthread_atfork()
function in order to maintain application invariants across fork()
calls. [Option End]

The issue is that only the thread that actually calls fork()
will be duplicated in the child, but the mutexes (which may have
been held by another thread) will be duplicated, and therefore
the child may deadlock.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [non-stop] 08/10 linux native support
  2008-07-09  3:55               ` Michael Snyder
@ 2008-07-09  7:55                 ` Mark Kettenis
  0 siblings, 0 replies; 20+ messages in thread
From: Mark Kettenis @ 2008-07-09  7:55 UTC (permalink / raw)
  To: msnyder; +Cc: drow, pedro, gdb-patches

> From: Michael Snyder <msnyder@specifix.com>
> Date: Tue, 08 Jul 2008 20:55:06 -0700
> 
> Here's IEEE std 1003.1:
> 
> A process shall be created with a single thread. If a multi-threaded
> process calls fork(), the new process shall contain a replica of the
> calling thread and its entire address space, possibly including the
> states of mutexes and other resources. Consequently, to avoid errors,
> the child process may only execute async-signal-safe operations until
> such time as one of the exec functions is called. [THR] [Option Start]
>  Fork handlers may be established by means of the pthread_atfork()
> function in order to maintain application invariants across fork()
> calls. [Option End]
> 
> 
> The issue is that only the thread that actually calls fork()
> will be duplicated in the child, but the mutexes (which may have
> been held by another thread) will be duplicated, and therefore
> the child may deadlock.

And I'd argue that this is a direct logical consequence of the fact
that fork() behaviour is pretty well defined by POSIX.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [non-stop] 08/10 linux native support
  2008-07-09  3:25           ` Michael Snyder
  2008-07-09  3:47             ` Daniel Jacobowitz
@ 2008-07-09  7:56             ` Mark Kettenis
  1 sibling, 0 replies; 20+ messages in thread
From: Mark Kettenis @ 2008-07-09  7:56 UTC (permalink / raw)
  To: msnyder; +Cc: drow, pedro, gdb-patches

> From: Michael Snyder <msnyder@specifix.com>
> Date: Tue, 08 Jul 2008 20:25:36 -0700
> 
> On Mon, 2008-07-07 at 14:20 -0400, Daniel Jacobowitz wrote:
> > On Wed, Jul 02, 2008 at 04:34:50AM +0100, Pedro Alves wrote:
> > > @@ -337,7 +337,9 @@ linux_fork_killall (void)
> > >      {
> > >        pid = PIDGET (fp->ptid);
> > >        do {
> > > -	ptrace (PT_KILL, pid, 0, 0);
> > > +	/* Use SIGKILL instead of PTRACE_KILL because the former works even
> > > +	   if the thread is running, while the later doesn't.  */
> > > +	kill (pid, SIGKILL);
> > >  	ret = waitpid (pid, &status, 0);
> > >  	/* We might get a SIGCHLD instead of an exit status.  This is
> > >  	 aggravated by the first kill above - a child has just
> > 
> > This is OK but if anyone wants to make fork support handle
> > multi-threaded programs someday we may need to expose kill_lwp.
> 
> Fork is undefined in a multi-threaded program.

No it's not.  It's supposed to fork only the running thread, that is,
you get a copy of the vm space withe a single thread in it whose
initial state is a copy of the state of the thread executing fork.

Some OS'es offer an alternative fork that forks all running threads
but it is non-standard.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [non-stop] 08/10 linux native support
  2008-07-07 18:20         ` Daniel Jacobowitz
  2008-07-09  3:25           ` Michael Snyder
@ 2008-07-10 15:28           ` Pedro Alves
  2008-07-10 17:15             ` Daniel Jacobowitz
  1 sibling, 1 reply; 20+ messages in thread
From: Pedro Alves @ 2008-07-10 15:28 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb-patches

[-- Attachment #1: Type: text/plain, Size: 4664 bytes --]

On Monday 07 July 2008 19:20:09, Daniel Jacobowitz wrote:
> On Wed, Jul 02, 2008 at 04:34:50AM +0100, Pedro Alves wrote:

> > @@ -1720,20 +1811,54 @@ linux_handle_extended_wait (struct lwp_i
> >  	  else
> >  	    status = 0;
> >
> > +#if 0
> > +	  /* Make thread_db aware of this thread.  We do this this
> > +	     early, so in non-stop mode, threads show up as they're
> > +	     created, instead of on next stop, and so that they have
> > +	     the correct running state.  thread_db_find_new_threads
> > +	     needs a stopped inferior_ptid --- since we know LP is
> > +	     stopped, use it this time.  */
> > +	  old_chain = save_inferior_ptid ();
> > +	  inferior_ptid = lp->ptid;
> > +	  lp->stopped = 1;
> > +	  target_find_new_threads ();
> > +	  do_cleanups (old_chain);
> > +	  if (!in_thread_list (new_lp->ptid))
> > +#else
> > +	  /* "Attach"ing to the parent forces the thread_db target to
> > +	     build its private data structures for the parent, which
> > +	     may have not had them setup yet.  */
> > +	  thread_db_attach_lwp (lp->ptid);
> > +	  /* Do the same to the child, which, if thread_db is active,
> > +	     adds the child to GDB's thread list.  */
> > +	  if (!thread_db_attach_lwp (new_lp->ptid))
> > +#endif
>
> This (the thread_db_attach_lwp version) looks reasonable to me.  Ugly,
> but reasonable.  Why do we need the parent's data?

Due to this:

 (gdb) r&
 Starting program: /home/pedro/gdb/tests/threads32
 (gdb) [Thread debugging using libthread_db enabled]
 [New Thread 0xf7df0b90 (LWP 24154)]
 [New Thread 0xf75efb90 (LWP 24155)]
 info threads
   3 Thread 0xf75efb90 (LWP 24155)  (running)
   2 Thread 0xf7df0b90 (LWP 24154)  (running)
 * 1 LWP 24151  (running)
 (gdb) interrupt
 (gdb)
 Program received signal SIGINT, Interrupt.
 0xffffe410 in __kernel_vsyscall ()
 info threads
 During symbol reading, incomplete CFI data; unspecified registers (e.g., eax) at 0xffffe411.
   3 Thread 0xf75efb90 (LWP 24155)  (running)
   2 Thread 0xf7df0b90 (LWP 24154)  (running)
 * 1 Thread 0xf7df16b0 (LWP 24151)  0xffffe410 in __kernel_vsyscall ()

This is where I had gotten myself stuck.  The main thread id
isn't discovered until late when we stop.  The thing is that the main
thread is already in GDB's thread list since very early, before
thread_db was detected.  Since we can only query thread_db when we
have a stopped thread around, I couldn't find any other place to
learn about the main thread's thread_db id, until it has stopped.
Even doing it on PTRACE_EVENT_CLONE doesn't solve the
single-threaded + thread_db case.

Also, this happened:

(gdb) b 81
Breakpoint 1 at 0x80485fd: file threads.c, line 81.
(gdb) r
Starting program: /home/pedro/gdb/tests/threads32
[Thread debugging using libthread_db enabled]
[New Thread 0xf7dd8b90 (LWP 28225)]
[New Thread 0xf75d7b90 (LWP 28226)]

Breakpoint 1, thread_function1 (arg=0x1) at threads.c:81
81              usleep (1);  /* Loop increment.  */
(gdb) info threads
  3 Thread 0xf75d7b90 (LWP 28226)  thread_function1 (arg=0x1) at threads.c:81
  2 Thread 0xf7dd8b90 (LWP 28225)  (running)
* 1 LWP 28222  (running)

The issue here is that have_threads returns true here:

linux-thread-db.c:thread_db_wait
...
  /* If we do not know about the main thread yet, this would be a good time to
     find it.  */
  if (ourstatus->kind == TARGET_WAITKIND_STOPPED && !have_threads ())
    thread_db_find_new_threads ();

... because there are already threads that thread_db learned about,
so we'd not look for info regarding the main thread.

The attached patch fixes these issues, by changing this bit above
to do:

  if (ourstatus->kind == TARGET_WAITKIND_STOPPED)
    /* If we do not know about the main thread yet, this would be a
       good time to find it.  */
    iterate_over_threads (thread_db_claim_lwp_callback, &ptid);

... and adds the necessary glue for that.


>
> > +	    {
> > +	      /* We're not using thread_db.  Attach and add it to
> > +		 GDB's list.  */
> > +	      lin_lwp_attach_lwp (new_lp->ptid);
> > +	      target_post_attach (GET_LWP (new_lp->ptid));
> > +	      add_thread (new_lp->ptid);
> > +	    }
> > +
>
> Why do we need to call lin_lwp_attach_lwp?  Won't that try to
> PTRACE_ATTACH?  And we've already called add_lwp above.

It won't do PTRACE_ATTACH, exactly because we've already
called add_lwp above.  Ah, I see.  lin_lwp_attach_lwp will
only do lp->stopped = 1 in this case, the "attach to every
thread" thing is meant for the target_attach'ing case.  I've
removed that call.

Apart from the what's mentioned above, nothing else changed.
Does the patch look closer to ok now?

I've regtested it on x86_64-unknown-linux-gnu.

-- 
Pedro Alves

[-- Attachment #2: 008-non_stop_linux.diff --]
[-- Type: text/x-diff, Size: 22222 bytes --]

2008-07-10  Pedro Alves  <pedro@codesourcery.com>

	Non-stop linux native.

	* linux-nat.c (linux_test_for_tracefork): Block events while we're
	here.
	(find_lwp_pid): Rename to...
	(linux_nat_find_lwp_pid): ... this.  Make public.  Update all
	callers.
	(get_pending_status): Implement non-stop mode.
	(linux_nat_detach): Stop threads before detaching.
	(linux_nat_resume): In non-stop mode, always resume only a single
	PTID.
	(linux_handle_extended_wait): In non-stop mode, on a clone event,
	add new lwp to GDB's thread table, and mark as running, executing
	and stopped appropriatelly.
	(linux_nat_filter_event): Don't assume there are other running
	threads when a thread exits.
	(linux_nat_wait): Mark the main thread as running and executing.
	In non-stop mode, don't stop all lwps.
	(linux_nat_kill): Stop lwps before killing them.
	(linux_nat_thread_alive): Use signal 0 to detect if a thread is
	alive.
	(send_sigint_callback): New.
	(linux_nat_stop): New.
	(linux_nat_add_target): Set to_stop to linux_nat_stop.

	* linux-nat.h (thread_db_attach_lwp): Declare.
	(linux_nat_find_lwp_pid): Declare.

	* linux-thread-db.c (thread_from_lwp, enable_thread_event)
	(check_event): Set proc_handle.pid to the stopped lwp.
	(thread_db_attach_lwp, thread_db_attach_lwp_1): New.
	(attach_thread): Don't set the private field if thread returns 0
	as ti_tid.
	(thread_db_claim_lwp_callback): New.
	(thread_db_wait): Claim any thread that the thread_db target
	didn't know about yet, but GDB's core did.
	(thread_db_find_new_threads): If current lwp is executing, don't
	try to read from it.

	* linux-fork.c (linux_fork_killall): Use SIGKILL instead of
	PTRACE_KILL.

---
 gdb/linux-fork.c      |    4 
 gdb/linux-nat.c       |  250 ++++++++++++++++++++++++++++++++++++++++----------
 gdb/linux-nat.h       |    4 
 gdb/linux-thread-db.c |   95 +++++++++++++++++--
 4 files changed, 295 insertions(+), 58 deletions(-)

Index: src/gdb/linux-nat.c
===================================================================
--- src.orig/gdb/linux-nat.c	2008-07-10 16:03:44.000000000 +0100
+++ src/gdb/linux-nat.c	2008-07-10 16:03:46.000000000 +0100
@@ -285,6 +285,9 @@ static void linux_nat_async (void (*call
 static int linux_nat_async_mask (int mask);
 static int kill_lwp (int lwpid, int signo);
 
+static int send_sigint_callback (struct lwp_info *lp, void *data);
+static int stop_callback (struct lwp_info *lp, void *data);
+
 /* Captures the result of a successful waitpid call, along with the
    options used in that call.  */
 struct waitpid_result
@@ -487,6 +490,9 @@ linux_test_for_tracefork (int original_p
 {
   int child_pid, ret, status;
   long second_pid;
+  enum sigchld_state async_events_original_state;
+
+  async_events_original_state = linux_nat_async_events (sigchld_sync);
 
   linux_supports_tracefork_flag = 0;
   linux_supports_tracevforkdone_flag = 0;
@@ -517,6 +523,7 @@ linux_test_for_tracefork (int original_p
       if (ret != 0)
 	{
 	  warning (_("linux_test_for_tracefork: failed to kill child"));
+	  linux_nat_async_events (async_events_original_state);
 	  return;
 	}
 
@@ -527,6 +534,7 @@ linux_test_for_tracefork (int original_p
 	warning (_("linux_test_for_tracefork: unexpected wait status 0x%x from "
 		 "killed child"), status);
 
+      linux_nat_async_events (async_events_original_state);
       return;
     }
 
@@ -566,6 +574,8 @@ linux_test_for_tracefork (int original_p
   if (ret != 0)
     warning (_("linux_test_for_tracefork: failed to kill child"));
   my_waitpid (child_pid, &status, 0);
+
+  linux_nat_async_events (async_events_original_state);
 }
 
 /* Return non-zero iff we have tracefork functionality available.
@@ -985,8 +995,8 @@ delete_lwp (ptid_t ptid)
 /* Return a pointer to the structure describing the LWP corresponding
    to PID.  If no corresponding LWP could be found, return NULL.  */
 
-static struct lwp_info *
-find_lwp_pid (ptid_t ptid)
+struct lwp_info *
+linux_nat_find_lwp_pid (ptid_t ptid)
 {
   struct lwp_info *lp;
   int lwp;
@@ -1207,7 +1217,7 @@ lin_lwp_attach_lwp (ptid_t ptid)
 
   async_events_original_state = linux_nat_async_events (sigchld_sync);
 
-  lp = find_lwp_pid (ptid);
+  lp = linux_nat_find_lwp_pid (ptid);
 
   /* We assume that we're already attached to any LWP that has an id
      equal to the overall process id, and to any LWP that is already
@@ -1376,16 +1386,80 @@ get_pending_status (struct lwp_info *lp,
      events are always cached in waitpid_queue.  */
 
   *status = 0;
-  if (GET_LWP (lp->ptid) == GET_LWP (last_ptid))
+
+  if (non_stop)
     {
-      if (stop_signal != TARGET_SIGNAL_0
-	  && signal_pass_state (stop_signal))
-	*status = W_STOPCODE (target_signal_to_host (stop_signal));
+      enum target_signal signo = TARGET_SIGNAL_0;
+
+      if (is_executing (lp->ptid))
+	{
+	  /* If the core thought this lwp was executing --- e.g., the
+	     executing property hasn't been updated yet, but the
+	     thread has been stopped with a stop_callback /
+	     stop_wait_callback sequence (see linux_nat_detach for
+	     example) --- we can only have pending events in the local
+	     queue.  */
+	  if (queued_waitpid (GET_LWP (lp->ptid), status, __WALL) != -1)
+	    {
+	      if (WIFSTOPPED (status))
+		signo = target_signal_from_host (WSTOPSIG (status));
+
+	      /* If not stopped, then the lwp is gone, no use in
+		 resending a signal.  */
+	    }
+	}
+      else
+	{
+	  /* If the core knows the thread is not executing, then we
+	     have the last signal recorded in
+	     thread_info->stop_signal, unless this is inferior_ptid,
+	     in which case, it's in the global stop_signal, due to
+	     context switching.  */
+
+	  if (ptid_equal (lp->ptid, inferior_ptid))
+	    signo = stop_signal;
+	  else
+	    {
+	      struct thread_info *tp = find_thread_pid (lp->ptid);
+	      gdb_assert (tp);
+	      signo = tp->stop_signal;
+	    }
+	}
+
+      if (signo != TARGET_SIGNAL_0
+	  && !signal_pass_state (signo))
+	{
+	  if (debug_linux_nat)
+	    fprintf_unfiltered (gdb_stdlog, "\
+GPT: lwp %s had signal %s, but it is in no pass state\n",
+				target_pid_to_str (lp->ptid),
+				target_signal_to_string (signo));
+	}
+      else
+	{
+	  if (signo != TARGET_SIGNAL_0)
+	    *status = W_STOPCODE (target_signal_to_host (signo));
+
+	  if (debug_linux_nat)
+	    fprintf_unfiltered (gdb_stdlog,
+				"GPT: lwp %s as pending signal %s\n",
+				target_pid_to_str (lp->ptid),
+				target_signal_to_string (signo));
+	}
     }
-  else if (target_can_async_p ())
-    queued_waitpid (GET_LWP (lp->ptid), status, __WALL);
   else
-    *status = lp->status;
+    {
+      if (GET_LWP (lp->ptid) == GET_LWP (last_ptid))
+	{
+	  if (stop_signal != TARGET_SIGNAL_0
+	      && signal_pass_state (stop_signal))
+	    *status = W_STOPCODE (target_signal_to_host (stop_signal));
+	}
+      else if (target_can_async_p ())
+	queued_waitpid (GET_LWP (lp->ptid), status, __WALL);
+      else
+	*status = lp->status;
+    }
 
   return 0;
 }
@@ -1449,6 +1523,13 @@ linux_nat_detach (char *args, int from_t
   if (target_can_async_p ())
     linux_nat_async (NULL, 0);
 
+  /* Stop all threads before detaching.  ptrace requires that the
+     thread is stopped to sucessfully detach.  */
+  iterate_over_lwps (stop_callback, NULL);
+  /* ... and wait until all of them have reported back that
+     they're no longer running.  */
+  iterate_over_lwps (stop_wait_callback, NULL);
+
   iterate_over_lwps (detach_callback, NULL);
 
   /* Only the initial process should be left right now.  */
@@ -1538,19 +1619,27 @@ linux_nat_resume (ptid_t ptid, int step,
   /* A specific PTID means `step only this process id'.  */
   resume_all = (PIDGET (ptid) == -1);
 
-  if (resume_all)
-    iterate_over_lwps (resume_set_callback, NULL);
-  else
-    iterate_over_lwps (resume_clear_callback, NULL);
+  if (non_stop && resume_all)
+    internal_error (__FILE__, __LINE__,
+		    "can't resume all in non-stop mode");
+
+  if (!non_stop)
+    {
+      if (resume_all)
+	iterate_over_lwps (resume_set_callback, NULL);
+      else
+	iterate_over_lwps (resume_clear_callback, NULL);
+    }
 
   /* If PID is -1, it's the current inferior that should be
      handled specially.  */
   if (PIDGET (ptid) == -1)
     ptid = inferior_ptid;
 
-  lp = find_lwp_pid (ptid);
+  lp = linux_nat_find_lwp_pid (ptid);
   gdb_assert (lp != NULL);
 
+  /* Convert to something the lower layer understands.  */
   ptid = pid_to_ptid (GET_LWP (lp->ptid));
 
   /* Remember if we're stepping.  */
@@ -1720,10 +1809,38 @@ linux_handle_extended_wait (struct lwp_i
 	  else
 	    status = 0;
 
+	  if (non_stop)
+	    {
+	      /* Add the new thread to GDB's lists as soon as possible
+		 so that:
+
+		 1) the frontend doesn't have to wait for a stop to
+		 display them, and,
+
+		 2) we can tag it with the correct running state.  */
+
+	      /* If the thread_db layer is active, let it know about
+		 this new thread.  */
+	      if (!thread_db_attach_lwp (new_lp->ptid))
+		{
+		  /* We're not using thread_db.  Add the thread to
+		     GDB's list anyway.  */
+		  target_post_attach (GET_LWP (new_lp->ptid));
+		  add_thread (new_lp->ptid);
+		}
+
+	      if (!stopping)
+		{
+		  set_executing (new_lp->ptid, 1);
+		  set_running (new_lp->ptid, 1);
+		}
+	    }
+
 	  if (stopping)
 	    new_lp->stopped = 1;
 	  else
 	    {
+	      new_lp->stopped = 0;
 	      new_lp->resumed = 1;
 	      ptrace (PTRACE_CONT,
 		      PIDGET (lp->waitstatus.value.related_pid), 0,
@@ -2368,7 +2485,7 @@ linux_nat_filter_event (int lwpid, int s
 {
   struct lwp_info *lp;
 
-  lp = find_lwp_pid (pid_to_ptid (lwpid));
+  lp = linux_nat_find_lwp_pid (pid_to_ptid (lwpid));
 
   /* Check for stop events reported by a process we didn't already
      know about - anything not already in our LWP list.
@@ -2463,13 +2580,7 @@ linux_nat_filter_event (int lwpid, int s
 	 not the end of the debugged application and should be
 	 ignored.  */
       if (num_lwps > 0)
-	{
-	  /* Make sure there is at least one thread running.  */
-	  gdb_assert (iterate_over_lwps (running_callback, NULL));
-
-	  /* Discard the event.  */
-	  return NULL;
-	}
+	return NULL;
     }
 
   /* Check if the current LWP has previously exited.  In the nptl
@@ -2599,6 +2710,8 @@ linux_nat_wait (ptid_t ptid, struct targ
       lp->resumed = 1;
       /* Add the main thread to GDB's thread list.  */
       add_thread_silent (lp->ptid);
+      set_running (lp->ptid, 1);
+      set_executing (lp->ptid, 1);
     }
 
   sigemptyset (&flush_mask);
@@ -2645,7 +2758,7 @@ retry:
 			    target_pid_to_str (ptid));
 
       /* We have a specific LWP to check.  */
-      lp = find_lwp_pid (ptid);
+      lp = linux_nat_find_lwp_pid (ptid);
       gdb_assert (lp);
       status = lp->status;
       lp->status = 0;
@@ -2826,19 +2939,23 @@ retry:
     fprintf_unfiltered (gdb_stdlog, "LLW: Candidate event %s in %s.\n",
 			status_to_str (status), target_pid_to_str (lp->ptid));
 
-  /* Now stop all other LWP's ...  */
-  iterate_over_lwps (stop_callback, NULL);
+  if (!non_stop)
+    {
+      /* Now stop all other LWP's ...  */
+      iterate_over_lwps (stop_callback, NULL);
 
-  /* ... and wait until all of them have reported back that they're no
-     longer running.  */
-  iterate_over_lwps (stop_wait_callback, &flush_mask);
-  iterate_over_lwps (flush_callback, &flush_mask);
-
-  /* If we're not waiting for a specific LWP, choose an event LWP from
-     among those that have had events.  Giving equal priority to all
-     LWPs that have had events helps prevent starvation.  */
-  if (pid == -1)
-    select_event_lwp (&lp, &status);
+      /* ... and wait until all of them have reported back that
+	 they're no longer running.  */
+      iterate_over_lwps (stop_wait_callback, &flush_mask);
+      iterate_over_lwps (flush_callback, &flush_mask);
+
+      /* If we're not waiting for a specific LWP, choose an event LWP
+	 from among those that have had events.  Giving equal priority
+	 to all LWPs that have had events helps prevent
+	 starvation.  */
+      if (pid == -1)
+	select_event_lwp (&lp, &status);
+    }
 
   /* Now that we've selected our final event LWP, cancel any
      breakpoints in other LWPs that have hit a GDB breakpoint.  See
@@ -2970,6 +3087,13 @@ linux_nat_kill (void)
     }
   else
     {
+      /* Stop all threads before killing them, since ptrace requires
+	 that the thread is stopped to sucessfully PTRACE_KILL.  */
+      iterate_over_lwps (stop_callback, NULL);
+      /* ... and wait until all of them have reported back that
+	 they're no longer running.  */
+      iterate_over_lwps (stop_wait_callback, NULL);
+
       /* Kill all LWP's ...  */
       iterate_over_lwps (kill_callback, NULL);
 
@@ -3022,22 +3146,22 @@ linux_nat_xfer_partial (struct target_op
 static int
 linux_nat_thread_alive (ptid_t ptid)
 {
+  int err;
+
   gdb_assert (is_lwp (ptid));
 
-  errno = 0;
-  ptrace (PTRACE_PEEKUSER, GET_LWP (ptid), 0, 0);
+  /* Send signal 0 instead of anything ptrace, because ptracing a
+     running thread errors out claiming that the thread doesn't
+     exist.  */
+  err = kill_lwp (GET_LWP (ptid), 0);
+
   if (debug_linux_nat)
     fprintf_unfiltered (gdb_stdlog,
-			"LLTA: PTRACE_PEEKUSER %s, 0, 0 (%s)\n",
+			"LLTA: KILL(SIG0) %s (%s)\n",
 			target_pid_to_str (ptid),
-			errno ? safe_strerror (errno) : "OK");
+			err ? safe_strerror (err) : "OK");
 
-  /* Not every Linux kernel implements PTRACE_PEEKUSER.  But we can
-     handle that case gracefully since ptrace will first do a lookup
-     for the process based upon the passed-in pid.  If that fails we
-     will get either -ESRCH or -EPERM, otherwise the child exists and
-     is alive.  */
-  if (errno == ESRCH || errno == EPERM)
+  if (err != 0)
     return 0;
 
   return 1;
@@ -4239,6 +4363,35 @@ linux_nat_set_async_mode (int on)
   linux_nat_async_enabled = on;
 }
 
+static int
+send_sigint_callback (struct lwp_info *lp, void *data)
+{
+  /* Use is_running instead of !lp->stopped, because the lwp may be
+     stopped due to an internal event, and we want to interrupt it in
+     that case too.  What we want is to check if the thread is stopped
+     from the point of view of the user.  */
+  if (is_running (lp->ptid))
+    kill_lwp (GET_LWP (lp->ptid), SIGINT);
+  return 0;
+}
+
+static void
+linux_nat_stop (ptid_t ptid)
+{
+  if (non_stop)
+    {
+      if (ptid_equal (ptid, minus_one_ptid))
+	iterate_over_lwps (send_sigint_callback, &ptid);
+      else
+	{
+	  struct lwp_info *lp = linux_nat_find_lwp_pid (ptid);
+	  send_sigint_callback (lp, NULL);
+	}
+    }
+  else
+    linux_ops->to_stop (ptid);
+}
+
 void
 linux_nat_add_target (struct target_ops *t)
 {
@@ -4269,6 +4422,9 @@ linux_nat_add_target (struct target_ops 
   t->to_terminal_inferior = linux_nat_terminal_inferior;
   t->to_terminal_ours = linux_nat_terminal_ours;
 
+  /* Methods for non-stop support.  */
+  t->to_stop = linux_nat_stop;
+
   /* We don't change the stratum; this target will sit at
      process_stratum and thread_db will set at thread_stratum.  This
      is a little strange, since this is a multi-threaded-capable
@@ -4296,7 +4452,7 @@ linux_nat_set_new_thread (struct target_
 struct siginfo *
 linux_nat_get_siginfo (ptid_t ptid)
 {
-  struct lwp_info *lp = find_lwp_pid (ptid);
+  struct lwp_info *lp = linux_nat_find_lwp_pid (ptid);
 
   gdb_assert (lp != NULL);
 
Index: src/gdb/linux-nat.h
===================================================================
--- src.orig/gdb/linux-nat.h	2008-07-10 16:03:44.000000000 +0100
+++ src/gdb/linux-nat.h	2008-07-10 16:03:46.000000000 +0100
@@ -94,6 +94,8 @@ void check_for_thread_db (void);
 /* Tell the thread_db layer what native target operations to use.  */
 void thread_db_init (struct target_ops *);
 
+int thread_db_attach_lwp (ptid_t ptid);
+
 /* Find process PID's pending signal set from /proc/pid/status.  */
 void linux_proc_pending_signals (int pid, sigset_t *pending, sigset_t *blocked, sigset_t *ignored);
 
@@ -107,6 +109,8 @@ struct lwp_info *iterate_over_lwps (int 
 						     void *), 
 				    void *data);
 
+struct lwp_info *linux_nat_find_lwp_pid (ptid_t ptid);
+
 /* Create a prototype generic GNU/Linux target.  The client can
    override it with local methods.  */
 struct target_ops * linux_target (void);
Index: src/gdb/linux-thread-db.c
===================================================================
--- src.orig/gdb/linux-thread-db.c	2008-07-10 16:03:44.000000000 +0100
+++ src/gdb/linux-thread-db.c	2008-07-10 16:14:30.000000000 +0100
@@ -308,6 +308,8 @@ thread_from_lwp (ptid_t ptid)
      LWP.  */
   gdb_assert (GET_LWP (ptid) != 0);
 
+  /* Access an lwp we know is stopped.  */
+  proc_handle.pid = GET_LWP (ptid);
   err = td_ta_map_lwp2thr_p (thread_agent, GET_LWP (ptid), &th);
   if (err != TD_OK)
     error (_("Cannot find user-level thread for LWP %ld: %s"),
@@ -332,6 +334,49 @@ thread_from_lwp (ptid_t ptid)
 }
 \f
 
+/* Attach to lwp PTID, doing whatever else is required to have this
+   LWP under the debugger's control --- e.g., enabling event
+   reporting.  Access thread_db through STOPPED_PTID.  Returns true on
+   success.  */
+static int
+thread_db_attach_lwp_1 (ptid_t stopped_ptid, ptid_t ptid)
+{
+  td_thrhandle_t th;
+  td_thrinfo_t ti;
+  td_err_e err;
+
+  if (!using_thread_db)
+    return 0;
+
+  /* This ptid comes from linux-nat.c, which should always fill in the
+     LWP.  */
+  gdb_assert (GET_LWP (ptid) != 0);
+
+  /* Access an lwp we know is stopped.  */
+  proc_handle.pid = GET_LWP (stopped_ptid);
+  err = td_ta_map_lwp2thr_p (thread_agent, GET_LWP (ptid), &th);
+  if (err != TD_OK)
+    /* Cannot find user-level thread.  */
+    return 0;
+
+  err = td_thr_get_info_p (&th, &ti);
+  if (err != TD_OK)
+    {
+      warning (_("Cannot get thread info: %s"), thread_db_err_str (err));
+      return 0;
+    }
+
+  attach_thread (ptid, &th, &ti);
+  return 1;
+}
+
+/* Same as thread_db_attach_lwp_1, but assume PTID is stopped.  */
+int
+thread_db_attach_lwp (ptid_t ptid)
+{
+  return thread_db_attach_lwp_1 (ptid, ptid);
+}
+
 void
 thread_db_init (struct target_ops *target)
 {
@@ -418,6 +463,9 @@ enable_thread_event (td_thragent_t *thre
   td_notify_t notify;
   td_err_e err;
 
+  /* Access an lwp we know is stopped.  */
+  proc_handle.pid = GET_LWP (inferior_ptid);
+
   /* Get the breakpoint address for thread EVENT.  */
   err = td_ta_event_addr_p (thread_agent, event, &notify);
   if (err != TD_OK)
@@ -685,16 +733,22 @@ attach_thread (ptid_t ptid, const td_thr
       && lin_lwp_attach_lwp (BUILD_LWP (ti_p->ti_lid, GET_PID (ptid))) < 0)
     return;
 
+  if (ti_p->ti_tid == 0)
+    {
+      /* A thread ID of zero may mean the thread library has not
+	 initialized yet.  Leave it with private == NULL until the
+	 thread_db target claims it.  */
+
+      /* We should only get here if GDB already knew about this
+	 thread.  */
+      gdb_assert (tp != NULL);
+      return;
+    }
+
   /* Construct the thread's private data.  */
   private = xmalloc (sizeof (struct private_thread_info));
   memset (private, 0, sizeof (struct private_thread_info));
 
-  /* A thread ID of zero may mean the thread library has not initialized
-     yet.  But we shouldn't even get here if that's the case.  FIXME:
-     if we change GDB to always have at least one thread in the thread
-     list this will have to go somewhere else; maybe private == NULL
-     until the thread_db target claims it.  */
-  gdb_assert (ti_p->ti_tid != 0);
   private->th = *th_p;
   private->tid = ti_p->ti_tid;
 
@@ -761,6 +815,9 @@ check_event (ptid_t ptid)
   if (stop_pc != td_create_bp_addr && stop_pc != td_death_bp_addr)
     return;
 
+  /* Access an lwp we know is stopped.  */
+  proc_handle.pid = GET_LWP (ptid);
+
   /* If we are at a create breakpoint, we do not know what new lwp
      was created and cannot specifically locate the event message for it.
      We have to call td_ta_event_getmsg() to get
@@ -820,6 +877,17 @@ check_event (ptid_t ptid)
   while (loop);
 }
 
+/* Claim threads the lower layer added, but that we didn't know about
+   yet.  */
+static int
+thread_db_claim_lwp_callback (struct thread_info *tp, void *arg)
+{
+  ptid_t *stopped_ptid = arg;
+  if (tp && tp->private == NULL)
+    thread_db_attach_lwp_1 (*stopped_ptid, tp->ptid);
+  return 0;
+}
+
 static ptid_t
 thread_db_wait (ptid_t ptid, struct target_waitstatus *ourstatus)
 {
@@ -841,10 +909,10 @@ thread_db_wait (ptid_t ptid, struct targ
       return ptid;
     }
 
-  /* If we do not know about the main thread yet, this would be a good time to
-     find it.  */
-  if (ourstatus->kind == TARGET_WAITKIND_STOPPED && !have_threads ())
-    thread_db_find_new_threads ();
+  if (ourstatus->kind == TARGET_WAITKIND_STOPPED)
+    /* If we do not know about the main thread yet, this would be a
+       good time to find it.  */
+    iterate_over_threads (thread_db_claim_lwp_callback, &ptid);
 
   if (ourstatus->kind == TARGET_WAITKIND_STOPPED
       && ourstatus->value.sig == TARGET_SIGNAL_TRAP)
@@ -955,7 +1023,14 @@ static void
 thread_db_find_new_threads (void)
 {
   td_err_e err;
+  struct lwp_info *lp = linux_nat_find_lwp_pid (inferior_ptid);
+
+  if (!lp || !lp->stopped)
+    /* In linux, we can only read memory through a stopped lwp.  */
+    return;
 
+  /* Access an lwp we know is stopped.  */
+  proc_handle.pid = GET_LWP (inferior_ptid);
   /* Iterate over all user-space threads to discover new threads.  */
   err = td_ta_thr_iter_p (thread_agent, find_new_threads_callback, NULL,
 			  TD_THR_ANY_STATE, TD_THR_LOWEST_PRIORITY,
Index: src/gdb/linux-fork.c
===================================================================
--- src.orig/gdb/linux-fork.c	2008-07-10 16:03:44.000000000 +0100
+++ src/gdb/linux-fork.c	2008-07-10 16:03:46.000000000 +0100
@@ -337,7 +337,9 @@ linux_fork_killall (void)
     {
       pid = PIDGET (fp->ptid);
       do {
-	ptrace (PT_KILL, pid, 0, 0);
+	/* Use SIGKILL instead of PTRACE_KILL because the former works even
+	   if the thread is running, while the later doesn't.  */
+	kill (pid, SIGKILL);
 	ret = waitpid (pid, &status, 0);
 	/* We might get a SIGCHLD instead of an exit status.  This is
 	 aggravated by the first kill above - a child has just

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [non-stop] 08/10 linux native support
  2008-07-10 15:28           ` Pedro Alves
@ 2008-07-10 17:15             ` Daniel Jacobowitz
  2008-07-10 18:01               ` Pedro Alves
  0 siblings, 1 reply; 20+ messages in thread
From: Daniel Jacobowitz @ 2008-07-10 17:15 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

On Thu, Jul 10, 2008 at 04:27:49PM +0100, Pedro Alves wrote:
> > This (the thread_db_attach_lwp version) looks reasonable to me.  Ugly,
> > but reasonable.  Why do we need the parent's data?
> 
> Due to this:
> 
>  (gdb) r&
>  Starting program: /home/pedro/gdb/tests/threads32
>  (gdb) [Thread debugging using libthread_db enabled]
>  [New Thread 0xf7df0b90 (LWP 24154)]
>  [New Thread 0xf75efb90 (LWP 24155)]
>  info threads
>    3 Thread 0xf75efb90 (LWP 24155)  (running)
>    2 Thread 0xf7df0b90 (LWP 24154)  (running)
>  * 1 LWP 24151  (running)

Why didn't this thread get identified at the shared library event,
when libthread_db was loaded?  It already existed by then, being the
main thread.

> The issue here is that have_threads returns true here:
> 
> linux-thread-db.c:thread_db_wait
> ...
>   /* If we do not know about the main thread yet, this would be a good time to
>      find it.  */
>   if (ourstatus->kind == TARGET_WAITKIND_STOPPED && !have_threads ())
>     thread_db_find_new_threads ();
> 
> ... because there are already threads that thread_db learned about,
> so we'd not look for info regarding the main thread.

Which ought to fix this too; if we identify threads as soon as
libthread_db is activated then we won't reach this situation.  If
there's other places where we add a newly created thread without
walking all threads, then they can get a call similar to the above
(that's for the static application case where we won't get a handy
shared library event for libpthread.so).

-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [non-stop] 08/10 linux native support
  2008-07-10 17:15             ` Daniel Jacobowitz
@ 2008-07-10 18:01               ` Pedro Alves
  2008-07-10 19:59                 ` Daniel Jacobowitz
  0 siblings, 1 reply; 20+ messages in thread
From: Pedro Alves @ 2008-07-10 18:01 UTC (permalink / raw)
  To: gdb-patches; +Cc: Daniel Jacobowitz

On Thursday 10 July 2008 18:15:18, Daniel Jacobowitz wrote:
> On Thu, Jul 10, 2008 at 04:27:49PM +0100, Pedro Alves wrote:
> > > This (the thread_db_attach_lwp version) looks reasonable to me.  Ugly,
> > > but reasonable.  Why do we need the parent's data?
> >
> > Due to this:
> >
> >  (gdb) r&
> >  Starting program: /home/pedro/gdb/tests/threads32
> >  (gdb) [Thread debugging using libthread_db enabled]
> >  [New Thread 0xf7df0b90 (LWP 24154)]
> >  [New Thread 0xf75efb90 (LWP 24155)]
> >  info threads
> >    3 Thread 0xf75efb90 (LWP 24155)  (running)
> >    2 Thread 0xf7df0b90 (LWP 24154)  (running)
> >  * 1 LWP 24151  (running)
>
> Why didn't this thread get identified at the shared library event,
> when libthread_db was loaded?  It already existed by then, being the
> main thread.

Because we hit this in find_new_threads_callback:

  if (ti.ti_tid == 0)
    {
      /* A thread ID of zero means that this is the main thread, but
	 glibc has not yet initialized thread-local storage and the
	 pthread library.  We do not know what the thread's TID will
	 be yet.  Just enable event reporting and otherwise ignore
	 it.  */

#0  find_new_threads_callback (th_p=0xffb316c4, data=0x0) at ../../src/gdb/linux-thread-db.c:1011
#1  0xf7dfbb59 in ?? () from /lib32/libthread_db.so.1
#2  0xf7dfbc11 in td_ta_thr_iter () from /lib32/libthread_db.so.1
#3  0x080a8dec in thread_db_find_new_threads () at ../../src/gdb/linux-thread-db.c:1044
#4  0x080a8371 in check_for_thread_db () at ../../src/gdb/linux-thread-db.c:665
#5  0x080a83af in thread_db_new_objfile (objfile=0x84a6cf8) at ../../src/gdb/linux-thread-db.c:679
...

That's from here:

649       /* Now attempt to open a connection to the thread library.  */
650       err = td_ta_new_p (&proc_handle, &thread_agent);
651       switch (err)
652         {
653         case TD_NOLIBTHREAD:
654           /* No thread library was detected.  */
655           break;
656
657         case TD_OK:
658           printf_unfiltered (_("[Thread debugging using libthread_db enabled]\n"));
659
660           /* The thread library was detected.  Activate the thread_db target.  */
661           push_target (&thread_db_ops);
662           using_thread_db = 1;
663
664           enable_thread_event_reporting ();
665           thread_db_find_new_threads ();
666           break;

-- 
Pedro Alves


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [non-stop] 08/10 linux native support
  2008-07-10 18:01               ` Pedro Alves
@ 2008-07-10 19:59                 ` Daniel Jacobowitz
  2008-07-10 21:51                   ` Pedro Alves
  0 siblings, 1 reply; 20+ messages in thread
From: Daniel Jacobowitz @ 2008-07-10 19:59 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

On Thu, Jul 10, 2008 at 07:01:23PM +0100, Pedro Alves wrote:
> Because we hit this in find_new_threads_callback:
> 
>   if (ti.ti_tid == 0)
>     {
>       /* A thread ID of zero means that this is the main thread, but
> 	 glibc has not yet initialized thread-local storage and the
> 	 pthread library.  We do not know what the thread's TID will
> 	 be yet.  Just enable event reporting and otherwise ignore
> 	 it.  */

Right.  Could you try this version?

Basically the same as your previous posting, except that I moved the
logic assuring we find the first thread when we find the first child
into the thread-db layer.

-- 
Daniel Jacobowitz
CodeSourcery

2008-07-02  Pedro Alves  <pedro@codesourcery.com>

	Non-stop linux native.

	* linux-fork.c (linux_fork_killall): Use SIGKILL instead of
	PTRACE_KILL.

	* linux-nat.c (linux_test_for_tracefork): Block events while we're
	here.
	(find_lwp_pid): Rename to...
	(linux_nat_find_lwp_pid): ... this.  Make public.  Update all
	callers.
	(get_pending_status): Implement non-stop mode.
	(linux_nat_detach): Stop threads before detaching.
	(linux_nat_resume): In non-stop mode, always resume only a single
	PTID.
	(linux_handle_extended_wait): On a clone event, add new lwp to
	GDB's thread table, and mark as running, executing and stopped
	appropriately.
	(linux_nat_filter_event): Don't assume there are other running
	threads when a thread exits.
	(linux_nat_wait): Mark the main thread as running and executing.
	In non-stop mode, don't stop all lwps.
	(linux_nat_kill): Stop lwps before killing them.
	(linux_nat_thread_alive): Use signal 0 to detect if a thread is
	alive.
	(send_sigint_callback): New.
	(linux_nat_stop): New.
	(linux_nat_add_target): Set to_stop to linux_nat_stop.

	* linux-nat.h (thread_db_attach_lwp): Declare.
	(linux_nat_find_lwp_pid): Declare.

	* linux-thread-db.c (check_event): Set proc_handle.pid to the stopped
	lwp.
	(thread_from_lwp, enable_thread_event) Likewise.  Check for new threads
	if we have none.
	(thread_db_attach_lwp): New.
	(thread_db_find_new_threads_1): New, from thread_db_find_new_threads.
	If current lwp is executing, don't try to read from it.
	(thread_db_find_new_threads, thread_db_wait)
	(thread_db_get_thread_local_address): Use thread_db_find_new_threads_1.
	(thread_get_info_callback): Check for new threads if we have none.

--- src/gdb/linux-fork.c	24 Apr 2008 10:21:44 -0000	1.18
+++ src/gdb/linux-fork.c	10 Jul 2008 19:53:11 -0000
@@ -337,7 +337,9 @@ linux_fork_killall (void)
     {
       pid = PIDGET (fp->ptid);
       do {
-	ptrace (PT_KILL, pid, 0, 0);
+	/* Use SIGKILL instead of PTRACE_KILL because the former works even
+	   if the thread is running, while the later doesn't.  */
+	kill (pid, SIGKILL);
 	ret = waitpid (pid, &status, 0);
 	/* We might get a SIGCHLD instead of an exit status.  This is
 	 aggravated by the first kill above - a child has just
--- src/gdb/linux-nat.c	10 Jul 2008 09:30:59 -0000	1.92
+++ src/gdb/linux-nat.c	10 Jul 2008 19:53:11 -0000
@@ -319,6 +319,9 @@ static void linux_nat_async (void (*call
 static int linux_nat_async_mask (int mask);
 static int kill_lwp (int lwpid, int signo);
 
+static int send_sigint_callback (struct lwp_info *lp, void *data);
+static int stop_callback (struct lwp_info *lp, void *data);
+
 /* Captures the result of a successful waitpid call, along with the
    options used in that call.  */
 struct waitpid_result
@@ -521,6 +524,9 @@ linux_test_for_tracefork (int original_p
 {
   int child_pid, ret, status;
   long second_pid;
+  enum sigchld_state async_events_original_state;
+
+  async_events_original_state = linux_nat_async_events (sigchld_sync);
 
   linux_supports_tracefork_flag = 0;
   linux_supports_tracevforkdone_flag = 0;
@@ -551,6 +557,7 @@ linux_test_for_tracefork (int original_p
       if (ret != 0)
 	{
 	  warning (_("linux_test_for_tracefork: failed to kill child"));
+	  linux_nat_async_events (async_events_original_state);
 	  return;
 	}
 
@@ -561,6 +568,7 @@ linux_test_for_tracefork (int original_p
 	warning (_("linux_test_for_tracefork: unexpected wait status 0x%x from "
 		 "killed child"), status);
 
+      linux_nat_async_events (async_events_original_state);
       return;
     }
 
@@ -600,6 +608,8 @@ linux_test_for_tracefork (int original_p
   if (ret != 0)
     warning (_("linux_test_for_tracefork: failed to kill child"));
   my_waitpid (child_pid, &status, 0);
+
+  linux_nat_async_events (async_events_original_state);
 }
 
 /* Return non-zero iff we have tracefork functionality available.
@@ -1019,8 +1029,8 @@ delete_lwp (ptid_t ptid)
 /* Return a pointer to the structure describing the LWP corresponding
    to PID.  If no corresponding LWP could be found, return NULL.  */
 
-static struct lwp_info *
-find_lwp_pid (ptid_t ptid)
+struct lwp_info *
+linux_nat_find_lwp_pid (ptid_t ptid)
 {
   struct lwp_info *lp;
   int lwp;
@@ -1241,7 +1251,7 @@ lin_lwp_attach_lwp (ptid_t ptid)
 
   async_events_original_state = linux_nat_async_events (sigchld_sync);
 
-  lp = find_lwp_pid (ptid);
+  lp = linux_nat_find_lwp_pid (ptid);
 
   /* We assume that we're already attached to any LWP that has an id
      equal to the overall process id, and to any LWP that is already
@@ -1441,16 +1451,80 @@ get_pending_status (struct lwp_info *lp,
      events are always cached in waitpid_queue.  */
 
   *status = 0;
-  if (GET_LWP (lp->ptid) == GET_LWP (last_ptid))
+
+  if (non_stop)
     {
-      if (stop_signal != TARGET_SIGNAL_0
-	  && signal_pass_state (stop_signal))
-	*status = W_STOPCODE (target_signal_to_host (stop_signal));
+      enum target_signal signo = TARGET_SIGNAL_0;
+
+      if (is_executing (lp->ptid))
+	{
+	  /* If the core thought this lwp was executing --- e.g., the
+	     executing property hasn't been updated yet, but the
+	     thread has been stopped with a stop_callback /
+	     stop_wait_callback sequence (see linux_nat_detach for
+	     example) --- we can only have pending events in the local
+	     queue.  */
+	  if (queued_waitpid (GET_LWP (lp->ptid), status, __WALL) != -1)
+	    {
+	      if (WIFSTOPPED (status))
+		signo = target_signal_from_host (WSTOPSIG (status));
+
+	      /* If not stopped, then the lwp is gone, no use in
+		 resending a signal.  */
+	    }
+	}
+      else
+	{
+	  /* If the core knows the thread is not executing, then we
+	     have the last signal recorded in
+	     thread_info->stop_signal, unless this is inferior_ptid,
+	     in which case, it's in the global stop_signal, due to
+	     context switching.  */
+
+	  if (ptid_equal (lp->ptid, inferior_ptid))
+	    signo = stop_signal;
+	  else
+	    {
+	      struct thread_info *tp = find_thread_pid (lp->ptid);
+	      gdb_assert (tp);
+	      signo = tp->stop_signal;
+	    }
+	}
+
+      if (signo != TARGET_SIGNAL_0
+	  && !signal_pass_state (signo))
+	{
+	  if (debug_linux_nat)
+	    fprintf_unfiltered (gdb_stdlog, "\
+GPT: lwp %s had signal %s, but it is in no pass state\n",
+				target_pid_to_str (lp->ptid),
+				target_signal_to_string (signo));
+	}
+      else
+	{
+	  if (signo != TARGET_SIGNAL_0)
+	    *status = W_STOPCODE (target_signal_to_host (signo));
+
+	  if (debug_linux_nat)
+	    fprintf_unfiltered (gdb_stdlog,
+				"GPT: lwp %s as pending signal %s\n",
+				target_pid_to_str (lp->ptid),
+				target_signal_to_string (signo));
+	}
     }
-  else if (target_can_async_p ())
-    queued_waitpid (GET_LWP (lp->ptid), status, __WALL);
   else
-    *status = lp->status;
+    {
+      if (GET_LWP (lp->ptid) == GET_LWP (last_ptid))
+	{
+	  if (stop_signal != TARGET_SIGNAL_0
+	      && signal_pass_state (stop_signal))
+	    *status = W_STOPCODE (target_signal_to_host (stop_signal));
+	}
+      else if (target_can_async_p ())
+	queued_waitpid (GET_LWP (lp->ptid), status, __WALL);
+      else
+	*status = lp->status;
+    }
 
   return 0;
 }
@@ -1514,6 +1588,13 @@ linux_nat_detach (char *args, int from_t
   if (target_can_async_p ())
     linux_nat_async (NULL, 0);
 
+  /* Stop all threads before detaching.  ptrace requires that the
+     thread is stopped to sucessfully detach.  */
+  iterate_over_lwps (stop_callback, NULL);
+  /* ... and wait until all of them have reported back that
+     they're no longer running.  */
+  iterate_over_lwps (stop_wait_callback, NULL);
+
   iterate_over_lwps (detach_callback, NULL);
 
   /* Only the initial process should be left right now.  */
@@ -1603,19 +1684,27 @@ linux_nat_resume (ptid_t ptid, int step,
   /* A specific PTID means `step only this process id'.  */
   resume_all = (PIDGET (ptid) == -1);
 
-  if (resume_all)
-    iterate_over_lwps (resume_set_callback, NULL);
-  else
-    iterate_over_lwps (resume_clear_callback, NULL);
+  if (non_stop && resume_all)
+    internal_error (__FILE__, __LINE__,
+		    "can't resume all in non-stop mode");
+
+  if (!non_stop)
+    {
+      if (resume_all)
+	iterate_over_lwps (resume_set_callback, NULL);
+      else
+	iterate_over_lwps (resume_clear_callback, NULL);
+    }
 
   /* If PID is -1, it's the current inferior that should be
      handled specially.  */
   if (PIDGET (ptid) == -1)
     ptid = inferior_ptid;
 
-  lp = find_lwp_pid (ptid);
+  lp = linux_nat_find_lwp_pid (ptid);
   gdb_assert (lp != NULL);
 
+  /* Convert to something the lower layer understands.  */
   ptid = pid_to_ptid (GET_LWP (lp->ptid));
 
   /* Remember if we're stepping.  */
@@ -1766,9 +1855,12 @@ linux_handle_extended_wait (struct lwp_i
 	ourstatus->kind = TARGET_WAITKIND_VFORKED;
       else
 	{
+	  struct cleanup *old_chain;
+
 	  ourstatus->kind = TARGET_WAITKIND_IGNORE;
 	  new_lp = add_lwp (BUILD_LWP (new_pid, GET_PID (inferior_ptid)));
 	  new_lp->cloned = 1;
+	  new_lp->stopped = 1;
 
 	  if (WSTOPSIG (status) != SIGSTOP)
 	    {
@@ -1785,20 +1877,33 @@ linux_handle_extended_wait (struct lwp_i
 	  else
 	    status = 0;
 
-	  if (stopping)
-	    new_lp->stopped = 1;
-	  else
+	  /* "Attach"ing to the child adds it to GDB's thread list, if
+	     thread_db is active.  */
+	  if (!thread_db_attach_lwp (new_lp->ptid))
+	    {
+	      /* We're not using thread_db.  Attach and add it to
+		 GDB's list.  */
+	      lin_lwp_attach_lwp (new_lp->ptid);
+	      target_post_attach (GET_LWP (new_lp->ptid));
+	      add_thread (new_lp->ptid);
+	    }
+
+	  if (!stopping)
 	    {
+	      new_lp->stopped = 0;
 	      new_lp->resumed = 1;
 	      ptrace (PTRACE_CONT,
 		      PIDGET (lp->waitstatus.value.related_pid), 0,
 		      status ? WSTOPSIG (status) : 0);
+	      set_running (new_lp->ptid, 1);
+	      set_executing (new_lp->ptid, 1);
 	    }
 
 	  if (debug_linux_nat)
 	    fprintf_unfiltered (gdb_stdlog,
 				"LHEW: Got clone event from LWP %ld, resuming\n",
 				GET_LWP (lp->ptid));
+	  lp->stopped = 0;
 	  ptrace (PTRACE_CONT, GET_LWP (lp->ptid), 0, 0);
 
 	  return 1;
@@ -2433,7 +2538,7 @@ linux_nat_filter_event (int lwpid, int s
 {
   struct lwp_info *lp;
 
-  lp = find_lwp_pid (pid_to_ptid (lwpid));
+  lp = linux_nat_find_lwp_pid (pid_to_ptid (lwpid));
 
   /* Check for stop events reported by a process we didn't already
      know about - anything not already in our LWP list.
@@ -2528,13 +2633,7 @@ linux_nat_filter_event (int lwpid, int s
 	 not the end of the debugged application and should be
 	 ignored.  */
       if (num_lwps > 0)
-	{
-	  /* Make sure there is at least one thread running.  */
-	  gdb_assert (iterate_over_lwps (running_callback, NULL));
-
-	  /* Discard the event.  */
-	  return NULL;
-	}
+	return NULL;
     }
 
   /* Check if the current LWP has previously exited.  In the nptl
@@ -2664,6 +2763,8 @@ linux_nat_wait (ptid_t ptid, struct targ
       lp->resumed = 1;
       /* Add the main thread to GDB's thread list.  */
       add_thread_silent (lp->ptid);
+      set_running (lp->ptid, 1);
+      set_executing (lp->ptid, 1);
     }
 
   sigemptyset (&flush_mask);
@@ -2710,7 +2811,7 @@ retry:
 			    target_pid_to_str (ptid));
 
       /* We have a specific LWP to check.  */
-      lp = find_lwp_pid (ptid);
+      lp = linux_nat_find_lwp_pid (ptid);
       gdb_assert (lp);
       status = lp->status;
       lp->status = 0;
@@ -2891,19 +2992,23 @@ retry:
     fprintf_unfiltered (gdb_stdlog, "LLW: Candidate event %s in %s.\n",
 			status_to_str (status), target_pid_to_str (lp->ptid));
 
-  /* Now stop all other LWP's ...  */
-  iterate_over_lwps (stop_callback, NULL);
+  if (!non_stop)
+    {
+      /* Now stop all other LWP's ...  */
+      iterate_over_lwps (stop_callback, NULL);
 
-  /* ... and wait until all of them have reported back that they're no
-     longer running.  */
-  iterate_over_lwps (stop_wait_callback, &flush_mask);
-  iterate_over_lwps (flush_callback, &flush_mask);
-
-  /* If we're not waiting for a specific LWP, choose an event LWP from
-     among those that have had events.  Giving equal priority to all
-     LWPs that have had events helps prevent starvation.  */
-  if (pid == -1)
-    select_event_lwp (&lp, &status);
+      /* ... and wait until all of them have reported back that
+	 they're no longer running.  */
+      iterate_over_lwps (stop_wait_callback, &flush_mask);
+      iterate_over_lwps (flush_callback, &flush_mask);
+
+      /* If we're not waiting for a specific LWP, choose an event LWP
+	 from among those that have had events.  Giving equal priority
+	 to all LWPs that have had events helps prevent
+	 starvation.  */
+      if (pid == -1)
+	select_event_lwp (&lp, &status);
+    }
 
   /* Now that we've selected our final event LWP, cancel any
      breakpoints in other LWPs that have hit a GDB breakpoint.  See
@@ -3035,6 +3140,13 @@ linux_nat_kill (void)
     }
   else
     {
+      /* Stop all threads before killing them, since ptrace requires
+	 that the thread is stopped to sucessfully PTRACE_KILL.  */
+      iterate_over_lwps (stop_callback, NULL);
+      /* ... and wait until all of them have reported back that
+	 they're no longer running.  */
+      iterate_over_lwps (stop_wait_callback, NULL);
+
       /* Kill all LWP's ...  */
       iterate_over_lwps (kill_callback, NULL);
 
@@ -3087,22 +3199,22 @@ linux_nat_xfer_partial (struct target_op
 static int
 linux_nat_thread_alive (ptid_t ptid)
 {
+  int err;
+
   gdb_assert (is_lwp (ptid));
 
-  errno = 0;
-  ptrace (PTRACE_PEEKUSER, GET_LWP (ptid), 0, 0);
+  /* Send signal 0 instead of anything ptrace, because ptracing a
+     running thread errors out claiming that the thread doesn't
+     exist.  */
+  err = kill_lwp (GET_LWP (ptid), 0);
+
   if (debug_linux_nat)
     fprintf_unfiltered (gdb_stdlog,
-			"LLTA: PTRACE_PEEKUSER %s, 0, 0 (%s)\n",
+			"LLTA: KILL(SIG0) %s (%s)\n",
 			target_pid_to_str (ptid),
-			errno ? safe_strerror (errno) : "OK");
+			err ? safe_strerror (err) : "OK");
 
-  /* Not every Linux kernel implements PTRACE_PEEKUSER.  But we can
-     handle that case gracefully since ptrace will first do a lookup
-     for the process based upon the passed-in pid.  If that fails we
-     will get either -ESRCH or -EPERM, otherwise the child exists and
-     is alive.  */
-  if (errno == ESRCH || errno == EPERM)
+  if (err != 0)
     return 0;
 
   return 1;
@@ -4304,6 +4416,35 @@ linux_nat_set_async_mode (int on)
   linux_nat_async_enabled = on;
 }
 
+static int
+send_sigint_callback (struct lwp_info *lp, void *data)
+{
+  /* Use is_running instead of !lp->stopped, because the lwp may be
+     stopped due to an internal event, and we want to interrupt it in
+     that case too.  What we want is to check if the thread is stopped
+     from the point of view of the user.  */
+  if (is_running (lp->ptid))
+    kill_lwp (GET_LWP (lp->ptid), SIGINT);
+  return 0;
+}
+
+static void
+linux_nat_stop (ptid_t ptid)
+{
+  if (non_stop)
+    {
+      if (ptid_equal (ptid, minus_one_ptid))
+	iterate_over_lwps (send_sigint_callback, &ptid);
+      else
+	{
+	  struct lwp_info *lp = linux_nat_find_lwp_pid (ptid);
+	  send_sigint_callback (lp, NULL);
+	}
+    }
+  else
+    linux_ops->to_stop (ptid);
+}
+
 void
 linux_nat_add_target (struct target_ops *t)
 {
@@ -4334,6 +4475,9 @@ linux_nat_add_target (struct target_ops 
   t->to_terminal_inferior = linux_nat_terminal_inferior;
   t->to_terminal_ours = linux_nat_terminal_ours;
 
+  /* Methods for non-stop support.  */
+  t->to_stop = linux_nat_stop;
+
   /* We don't change the stratum; this target will sit at
      process_stratum and thread_db will set at thread_stratum.  This
      is a little strange, since this is a multi-threaded-capable
@@ -4361,7 +4505,7 @@ linux_nat_set_new_thread (struct target_
 struct siginfo *
 linux_nat_get_siginfo (ptid_t ptid)
 {
-  struct lwp_info *lp = find_lwp_pid (ptid);
+  struct lwp_info *lp = linux_nat_find_lwp_pid (ptid);
 
   gdb_assert (lp != NULL);
 
--- src/gdb/linux-nat.h	21 Mar 2008 15:44:53 -0000	1.23
+++ src/gdb/linux-nat.h	10 Jul 2008 19:53:11 -0000
@@ -94,6 +94,8 @@ void check_for_thread_db (void);
 /* Tell the thread_db layer what native target operations to use.  */
 void thread_db_init (struct target_ops *);
 
+int thread_db_attach_lwp (ptid_t ptid);
+
 /* Find process PID's pending signal set from /proc/pid/status.  */
 void linux_proc_pending_signals (int pid, sigset_t *pending, sigset_t *blocked, sigset_t *ignored);
 
@@ -107,6 +109,8 @@ struct lwp_info *iterate_over_lwps (int 
 						     void *), 
 				    void *data);
 
+struct lwp_info *linux_nat_find_lwp_pid (ptid_t ptid);
+
 /* Create a prototype generic GNU/Linux target.  The client can
    override it with local methods.  */
 struct target_ops * linux_target (void);
--- src/gdb/linux-thread-db.c	5 Jun 2008 21:03:59 -0000	1.44
+++ src/gdb/linux-thread-db.c	10 Jul 2008 19:53:11 -0000
@@ -142,6 +142,7 @@ static CORE_ADDR td_create_bp_addr;
 static CORE_ADDR td_death_bp_addr;
 
 /* Prototypes for local functions.  */
+static void thread_db_find_new_threads_1 (ptid_t ptid);
 static void thread_db_find_new_threads (void);
 static void attach_thread (ptid_t ptid, const td_thrhandle_t *th_p,
 			   const td_thrinfo_t *ti_p);
@@ -283,7 +284,10 @@ thread_get_info_callback (const td_thrha
   if (thread_info == NULL)
     {
       /* New thread.  Attach to it now (why wait?).  */
-      attach_thread (thread_ptid, thp, &ti);
+      if (!have_threads ())
+	thread_db_find_new_threads_1 (thread_ptid);
+      else
+	attach_thread (thread_ptid, thp, &ti);
       thread_info = find_thread_pid (thread_ptid);
       gdb_assert (thread_info != NULL);
     }
@@ -308,6 +312,8 @@ thread_from_lwp (ptid_t ptid)
      LWP.  */
   gdb_assert (GET_LWP (ptid) != 0);
 
+  /* Access an lwp we know is stopped.  */
+  proc_handle.pid = GET_LWP (ptid);
   err = td_ta_map_lwp2thr_p (thread_agent, GET_LWP (ptid), &th);
   if (err != TD_OK)
     error (_("Cannot find user-level thread for LWP %ld: %s"),
@@ -332,6 +338,48 @@ thread_from_lwp (ptid_t ptid)
 }
 \f
 
+/* Attach to lwp PTID, doing whatever else is required to have this
+   LWP under the debugger's control --- e.g., enabling event
+   reporting.  Returns true on success.  */
+int
+thread_db_attach_lwp (ptid_t ptid)
+{
+  td_thrhandle_t th;
+  td_thrinfo_t ti;
+  td_err_e err;
+
+  if (!using_thread_db)
+    return 0;
+
+  /* This ptid comes from linux-nat.c, which should always fill in the
+     LWP.  */
+  gdb_assert (GET_LWP (ptid) != 0);
+
+  /* Access an lwp we know is stopped.  */
+  proc_handle.pid = GET_LWP (ptid);
+
+  /* If we have only looked at the first thread before libpthread was
+     initialized, we may not know its thread ID yet.  Make sure we do
+     before we add another thread to the list.  */
+  if (!have_threads ())
+    thread_db_find_new_threads_1 (ptid);
+
+  err = td_ta_map_lwp2thr_p (thread_agent, GET_LWP (ptid), &th);
+  if (err != TD_OK)
+    /* Cannot find user-level thread.  */
+    return 0;
+
+  err = td_thr_get_info_p (&th, &ti);
+  if (err != TD_OK)
+    {
+      warning (_("Cannot get thread info: %s"), thread_db_err_str (err));
+      return 0;
+    }
+
+  attach_thread (ptid, &th, &ti);
+  return 1;
+}
+
 void
 thread_db_init (struct target_ops *target)
 {
@@ -418,6 +466,9 @@ enable_thread_event (td_thragent_t *thre
   td_notify_t notify;
   td_err_e err;
 
+  /* Access an lwp we know is stopped.  */
+  proc_handle.pid = GET_LWP (inferior_ptid);
+
   /* Get the breakpoint address for thread EVENT.  */
   err = td_ta_event_addr_p (thread_agent, event, &notify);
   if (err != TD_OK)
@@ -761,6 +812,15 @@ check_event (ptid_t ptid)
   if (stop_pc != td_create_bp_addr && stop_pc != td_death_bp_addr)
     return;
 
+  /* Access an lwp we know is stopped.  */
+  proc_handle.pid = GET_LWP (ptid);
+
+  /* If we have only looked at the first thread before libpthread was
+     initialized, we may not know its thread ID yet.  Make sure we do
+     before we add another thread to the list.  */
+  if (!have_threads ())
+    thread_db_find_new_threads_1 (ptid);
+
   /* If we are at a create breakpoint, we do not know what new lwp
      was created and cannot specifically locate the event message for it.
      We have to call td_ta_event_getmsg() to get
@@ -844,7 +904,7 @@ thread_db_wait (ptid_t ptid, struct targ
   /* If we do not know about the main thread yet, this would be a good time to
      find it.  */
   if (ourstatus->kind == TARGET_WAITKIND_STOPPED && !have_threads ())
-    thread_db_find_new_threads ();
+    thread_db_find_new_threads_1 (ptid);
 
   if (ourstatus->kind == TARGET_WAITKIND_STOPPED
       && ourstatus->value.sig == TARGET_SIGNAL_TRAP)
@@ -951,11 +1011,21 @@ find_new_threads_callback (const td_thrh
   return 0;
 }
 
+/* Search for new threads, accessing memory through stopped thread
+   PTID.  */
+
 static void
-thread_db_find_new_threads (void)
+thread_db_find_new_threads_1 (ptid_t ptid)
 {
   td_err_e err;
+  struct lwp_info *lp = linux_nat_find_lwp_pid (ptid);
 
+  if (!lp || !lp->stopped)
+    /* In linux, we can only read memory through a stopped lwp.  */
+    return;
+
+  /* Access an lwp we know is stopped.  */
+  proc_handle.pid = GET_LWP (ptid);
   /* Iterate over all user-space threads to discover new threads.  */
   err = td_ta_thr_iter_p (thread_agent, find_new_threads_callback, NULL,
 			  TD_THR_ANY_STATE, TD_THR_LOWEST_PRIORITY,
@@ -964,6 +1034,12 @@ thread_db_find_new_threads (void)
     error (_("Cannot find new threads: %s"), thread_db_err_str (err));
 }
 
+static void
+thread_db_find_new_threads (void)
+{
+  thread_db_find_new_threads_1 (inferior_ptid);
+}
+
 static char *
 thread_db_pid_to_str (ptid_t ptid)
 {
@@ -1015,7 +1091,7 @@ thread_db_get_thread_local_address (ptid
 
   /* If we have not discovered any threads yet, check now.  */
   if (!have_threads ())
-    thread_db_find_new_threads ();
+    thread_db_find_new_threads_1 (ptid);
 
   /* Find the matching thread.  */
   thread_info = find_thread_pid (ptid);


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [non-stop] 08/10 linux native support
  2008-07-10 19:59                 ` Daniel Jacobowitz
@ 2008-07-10 21:51                   ` Pedro Alves
  2008-07-10 22:15                     ` Daniel Jacobowitz
  0 siblings, 1 reply; 20+ messages in thread
From: Pedro Alves @ 2008-07-10 21:51 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb-patches

[-- Attachment #1: Type: text/plain, Size: 815 bytes --]

A Thursday 10 July 2008 20:58:55, Daniel Jacobowitz wrote:
> Right.  Could you try this version?

Thanks!

> Basically the same as your previous posting, except that I moved the
> logic assuring we find the first thread when we find the first child
> into the thread-db layer.

Then, this patch cleaned it up a bit further.  Basically, it gets
rid of the find_lwp_pid call in thread_db_find_new_threads.  Instead
I'm using ALL_LWPS, which is already exported.  This gets rid
of the find_lwp_pid -> linux_nat_lwp_pid rename throughout, and
removes the need for thread_db_find_new_threads_1.  I then
reimported a couple of comments and cleanups that were on
the last patch, since you had picked up the previous-to-last.

Otherwise, the logic is the same.

Regtested on x86_64-unknown-linux-gnu.

OK?

-- 
Pedro Alves

[-- Attachment #2: 008-non_stop_linux.diff --]
[-- Type: text/x-diff, Size: 19508 bytes --]

2008-07-10  Pedro Alves  <pedro@codesourcery.com>

	Non-stop linux native.

	* linux-nat.c (linux_test_for_tracefork): Block events while we're
	here.
	(find_lwp_pid): Rename to...
	(linux_nat_find_lwp_pid): ... this.  Make public.  Update all
	callers.
	(get_pending_status): Implement non-stop mode.
	(linux_nat_detach): Stop threads before detaching.
	(linux_nat_resume): In non-stop mode, always resume only a single
	PTID.
	(linux_handle_extended_wait): On a clone event, in non-stop mode,
	add new lwp to GDB's thread table, and mark as running, executing
	and stopped appropriately.
	(linux_nat_filter_event): Don't assume there are other running
	threads when a thread exits.
	(linux_nat_wait): Mark the main thread as running and executing.
	In non-stop mode, don't stop all lwps.
	(linux_nat_kill): Stop lwps before killing them.
	(linux_nat_thread_alive): Use signal 0 to detect if a thread is
	alive.
	(send_sigint_callback): New.
	(linux_nat_stop): New.
	(linux_nat_add_target): Set to_stop to linux_nat_stop.

	* linux-nat.h (thread_db_attach_lwp): Declare.

	* linux-thread-db.c (thread_get_info_callback): Check for new
	threads if we have none.
	(thread_from_lwp, enable_thread_event): Set proc_handle.pid to the
	stopped lwp.  Check for new threads if we have none.
	(thread_db_attach_lwp): New.
	(thread_db_init): Set proc_handle.pid to inferior_ptid.
	(check_event): Set proc_handle.pid to the stopped lwp.
	(thread_db_find_new_threads): Set proc_handle.pid to any stopped
	lwp available, bail out if there is none.

	* linux-fork.c (linux_fork_killall): Use SIGKILL instead of
	PTRACE_KILL.

---
 gdb/linux-fork.c      |    4 
 gdb/linux-nat.c       |  246 ++++++++++++++++++++++++++++++++++++++++----------
 gdb/linux-nat.h       |    2 
 gdb/linux-thread-db.c |   77 +++++++++++++++
 4 files changed, 282 insertions(+), 47 deletions(-)

Index: src/gdb/linux-nat.c
===================================================================
--- src.orig/gdb/linux-nat.c	2008-07-10 22:14:19.000000000 +0100
+++ src/gdb/linux-nat.c	2008-07-10 22:19:40.000000000 +0100
@@ -285,6 +285,9 @@ static void linux_nat_async (void (*call
 static int linux_nat_async_mask (int mask);
 static int kill_lwp (int lwpid, int signo);
 
+static int send_sigint_callback (struct lwp_info *lp, void *data);
+static int stop_callback (struct lwp_info *lp, void *data);
+
 /* Captures the result of a successful waitpid call, along with the
    options used in that call.  */
 struct waitpid_result
@@ -487,6 +490,9 @@ linux_test_for_tracefork (int original_p
 {
   int child_pid, ret, status;
   long second_pid;
+  enum sigchld_state async_events_original_state;
+
+  async_events_original_state = linux_nat_async_events (sigchld_sync);
 
   linux_supports_tracefork_flag = 0;
   linux_supports_tracevforkdone_flag = 0;
@@ -517,6 +523,7 @@ linux_test_for_tracefork (int original_p
       if (ret != 0)
 	{
 	  warning (_("linux_test_for_tracefork: failed to kill child"));
+	  linux_nat_async_events (async_events_original_state);
 	  return;
 	}
 
@@ -527,6 +534,7 @@ linux_test_for_tracefork (int original_p
 	warning (_("linux_test_for_tracefork: unexpected wait status 0x%x from "
 		 "killed child"), status);
 
+      linux_nat_async_events (async_events_original_state);
       return;
     }
 
@@ -566,6 +574,8 @@ linux_test_for_tracefork (int original_p
   if (ret != 0)
     warning (_("linux_test_for_tracefork: failed to kill child"));
   my_waitpid (child_pid, &status, 0);
+
+  linux_nat_async_events (async_events_original_state);
 }
 
 /* Return non-zero iff we have tracefork functionality available.
@@ -1376,16 +1386,80 @@ get_pending_status (struct lwp_info *lp,
      events are always cached in waitpid_queue.  */
 
   *status = 0;
-  if (GET_LWP (lp->ptid) == GET_LWP (last_ptid))
+
+  if (non_stop)
     {
-      if (stop_signal != TARGET_SIGNAL_0
-	  && signal_pass_state (stop_signal))
-	*status = W_STOPCODE (target_signal_to_host (stop_signal));
+      enum target_signal signo = TARGET_SIGNAL_0;
+
+      if (is_executing (lp->ptid))
+	{
+	  /* If the core thought this lwp was executing --- e.g., the
+	     executing property hasn't been updated yet, but the
+	     thread has been stopped with a stop_callback /
+	     stop_wait_callback sequence (see linux_nat_detach for
+	     example) --- we can only have pending events in the local
+	     queue.  */
+	  if (queued_waitpid (GET_LWP (lp->ptid), status, __WALL) != -1)
+	    {
+	      if (WIFSTOPPED (status))
+		signo = target_signal_from_host (WSTOPSIG (status));
+
+	      /* If not stopped, then the lwp is gone, no use in
+		 resending a signal.  */
+	    }
+	}
+      else
+	{
+	  /* If the core knows the thread is not executing, then we
+	     have the last signal recorded in
+	     thread_info->stop_signal, unless this is inferior_ptid,
+	     in which case, it's in the global stop_signal, due to
+	     context switching.  */
+
+	  if (ptid_equal (lp->ptid, inferior_ptid))
+	    signo = stop_signal;
+	  else
+	    {
+	      struct thread_info *tp = find_thread_pid (lp->ptid);
+	      gdb_assert (tp);
+	      signo = tp->stop_signal;
+	    }
+	}
+
+      if (signo != TARGET_SIGNAL_0
+	  && !signal_pass_state (signo))
+	{
+	  if (debug_linux_nat)
+	    fprintf_unfiltered (gdb_stdlog, "\
+GPT: lwp %s had signal %s, but it is in no pass state\n",
+				target_pid_to_str (lp->ptid),
+				target_signal_to_string (signo));
+	}
+      else
+	{
+	  if (signo != TARGET_SIGNAL_0)
+	    *status = W_STOPCODE (target_signal_to_host (signo));
+
+	  if (debug_linux_nat)
+	    fprintf_unfiltered (gdb_stdlog,
+				"GPT: lwp %s as pending signal %s\n",
+				target_pid_to_str (lp->ptid),
+				target_signal_to_string (signo));
+	}
     }
-  else if (target_can_async_p ())
-    queued_waitpid (GET_LWP (lp->ptid), status, __WALL);
   else
-    *status = lp->status;
+    {
+      if (GET_LWP (lp->ptid) == GET_LWP (last_ptid))
+	{
+	  if (stop_signal != TARGET_SIGNAL_0
+	      && signal_pass_state (stop_signal))
+	    *status = W_STOPCODE (target_signal_to_host (stop_signal));
+	}
+      else if (target_can_async_p ())
+	queued_waitpid (GET_LWP (lp->ptid), status, __WALL);
+      else
+	*status = lp->status;
+    }
 
   return 0;
 }
@@ -1449,6 +1523,13 @@ linux_nat_detach (char *args, int from_t
   if (target_can_async_p ())
     linux_nat_async (NULL, 0);
 
+  /* Stop all threads before detaching.  ptrace requires that the
+     thread is stopped to sucessfully detach.  */
+  iterate_over_lwps (stop_callback, NULL);
+  /* ... and wait until all of them have reported back that
+     they're no longer running.  */
+  iterate_over_lwps (stop_wait_callback, NULL);
+
   iterate_over_lwps (detach_callback, NULL);
 
   /* Only the initial process should be left right now.  */
@@ -1538,10 +1619,17 @@ linux_nat_resume (ptid_t ptid, int step,
   /* A specific PTID means `step only this process id'.  */
   resume_all = (PIDGET (ptid) == -1);
 
-  if (resume_all)
-    iterate_over_lwps (resume_set_callback, NULL);
-  else
-    iterate_over_lwps (resume_clear_callback, NULL);
+  if (non_stop && resume_all)
+    internal_error (__FILE__, __LINE__,
+		    "can't resume all in non-stop mode");
+
+  if (!non_stop)
+    {
+      if (resume_all)
+	iterate_over_lwps (resume_set_callback, NULL);
+      else
+	iterate_over_lwps (resume_clear_callback, NULL);
+    }
 
   /* If PID is -1, it's the current inferior that should be
      handled specially.  */
@@ -1551,6 +1639,7 @@ linux_nat_resume (ptid_t ptid, int step,
   lp = find_lwp_pid (ptid);
   gdb_assert (lp != NULL);
 
+  /* Convert to something the lower layer understands.  */
   ptid = pid_to_ptid (GET_LWP (lp->ptid));
 
   /* Remember if we're stepping.  */
@@ -1701,9 +1790,12 @@ linux_handle_extended_wait (struct lwp_i
 	ourstatus->kind = TARGET_WAITKIND_VFORKED;
       else
 	{
+	  struct cleanup *old_chain;
+
 	  ourstatus->kind = TARGET_WAITKIND_IGNORE;
 	  new_lp = add_lwp (BUILD_LWP (new_pid, GET_PID (inferior_ptid)));
 	  new_lp->cloned = 1;
+	  new_lp->stopped = 1;
 
 	  if (WSTOPSIG (status) != SIGSTOP)
 	    {
@@ -1720,13 +1812,38 @@ linux_handle_extended_wait (struct lwp_i
 	  else
 	    status = 0;
 
-	  if (stopping)
-	    new_lp->stopped = 1;
-	  else
+	  if (non_stop)
+	    {
+	      /* Add the new thread to GDB's lists as soon as possible
+		 so that:
+
+		 1) the frontend doesn't have to wait for a stop to
+		 display them, and,
+
+		 2) we tag it with the correct running state.  */
+
+	      /* If the thread_db layer is active, let it know about
+		 this new thread, and add it to GDB's list.  */
+	      if (!thread_db_attach_lwp (new_lp->ptid))
+		{
+		  /* We're not using thread_db.  Add it to GDB's
+		     list.  */
+		  target_post_attach (GET_LWP (new_lp->ptid));
+		  add_thread (new_lp->ptid);
+		}
+
+	      if (!stopping)
+		{
+		  set_running (new_lp->ptid, 1);
+		  set_executing (new_lp->ptid, 1);
+		}
+	    }
+
+	  if (!stopping)
 	    {
+	      new_lp->stopped = 0;
 	      new_lp->resumed = 1;
-	      ptrace (PTRACE_CONT,
-		      PIDGET (lp->waitstatus.value.related_pid), 0,
+	      ptrace (PTRACE_CONT, new_pid, 0,
 		      status ? WSTOPSIG (status) : 0);
 	    }
 
@@ -2463,13 +2580,7 @@ linux_nat_filter_event (int lwpid, int s
 	 not the end of the debugged application and should be
 	 ignored.  */
       if (num_lwps > 0)
-	{
-	  /* Make sure there is at least one thread running.  */
-	  gdb_assert (iterate_over_lwps (running_callback, NULL));
-
-	  /* Discard the event.  */
-	  return NULL;
-	}
+	return NULL;
     }
 
   /* Check if the current LWP has previously exited.  In the nptl
@@ -2599,6 +2710,8 @@ linux_nat_wait (ptid_t ptid, struct targ
       lp->resumed = 1;
       /* Add the main thread to GDB's thread list.  */
       add_thread_silent (lp->ptid);
+      set_running (lp->ptid, 1);
+      set_executing (lp->ptid, 1);
     }
 
   sigemptyset (&flush_mask);
@@ -2826,19 +2939,23 @@ retry:
     fprintf_unfiltered (gdb_stdlog, "LLW: Candidate event %s in %s.\n",
 			status_to_str (status), target_pid_to_str (lp->ptid));
 
-  /* Now stop all other LWP's ...  */
-  iterate_over_lwps (stop_callback, NULL);
+  if (!non_stop)
+    {
+      /* Now stop all other LWP's ...  */
+      iterate_over_lwps (stop_callback, NULL);
 
-  /* ... and wait until all of them have reported back that they're no
-     longer running.  */
-  iterate_over_lwps (stop_wait_callback, &flush_mask);
-  iterate_over_lwps (flush_callback, &flush_mask);
-
-  /* If we're not waiting for a specific LWP, choose an event LWP from
-     among those that have had events.  Giving equal priority to all
-     LWPs that have had events helps prevent starvation.  */
-  if (pid == -1)
-    select_event_lwp (&lp, &status);
+      /* ... and wait until all of them have reported back that
+	 they're no longer running.  */
+      iterate_over_lwps (stop_wait_callback, &flush_mask);
+      iterate_over_lwps (flush_callback, &flush_mask);
+
+      /* If we're not waiting for a specific LWP, choose an event LWP
+	 from among those that have had events.  Giving equal priority
+	 to all LWPs that have had events helps prevent
+	 starvation.  */
+      if (pid == -1)
+	select_event_lwp (&lp, &status);
+    }
 
   /* Now that we've selected our final event LWP, cancel any
      breakpoints in other LWPs that have hit a GDB breakpoint.  See
@@ -2970,6 +3087,13 @@ linux_nat_kill (void)
     }
   else
     {
+      /* Stop all threads before killing them, since ptrace requires
+	 that the thread is stopped to sucessfully PTRACE_KILL.  */
+      iterate_over_lwps (stop_callback, NULL);
+      /* ... and wait until all of them have reported back that
+	 they're no longer running.  */
+      iterate_over_lwps (stop_wait_callback, NULL);
+
       /* Kill all LWP's ...  */
       iterate_over_lwps (kill_callback, NULL);
 
@@ -3022,22 +3146,22 @@ linux_nat_xfer_partial (struct target_op
 static int
 linux_nat_thread_alive (ptid_t ptid)
 {
+  int err;
+
   gdb_assert (is_lwp (ptid));
 
-  errno = 0;
-  ptrace (PTRACE_PEEKUSER, GET_LWP (ptid), 0, 0);
+  /* Send signal 0 instead of anything ptrace, because ptracing a
+     running thread errors out claiming that the thread doesn't
+     exist.  */
+  err = kill_lwp (GET_LWP (ptid), 0);
+
   if (debug_linux_nat)
     fprintf_unfiltered (gdb_stdlog,
-			"LLTA: PTRACE_PEEKUSER %s, 0, 0 (%s)\n",
+			"LLTA: KILL(SIG0) %s (%s)\n",
 			target_pid_to_str (ptid),
-			errno ? safe_strerror (errno) : "OK");
+			err ? safe_strerror (err) : "OK");
 
-  /* Not every Linux kernel implements PTRACE_PEEKUSER.  But we can
-     handle that case gracefully since ptrace will first do a lookup
-     for the process based upon the passed-in pid.  If that fails we
-     will get either -ESRCH or -EPERM, otherwise the child exists and
-     is alive.  */
-  if (errno == ESRCH || errno == EPERM)
+  if (err != 0)
     return 0;
 
   return 1;
@@ -4239,6 +4363,35 @@ linux_nat_set_async_mode (int on)
   linux_nat_async_enabled = on;
 }
 
+static int
+send_sigint_callback (struct lwp_info *lp, void *data)
+{
+  /* Use is_running instead of !lp->stopped, because the lwp may be
+     stopped due to an internal event, and we want to interrupt it in
+     that case too.  What we want is to check if the thread is stopped
+     from the point of view of the user.  */
+  if (is_running (lp->ptid))
+    kill_lwp (GET_LWP (lp->ptid), SIGINT);
+  return 0;
+}
+
+static void
+linux_nat_stop (ptid_t ptid)
+{
+  if (non_stop)
+    {
+      if (ptid_equal (ptid, minus_one_ptid))
+	iterate_over_lwps (send_sigint_callback, &ptid);
+      else
+	{
+	  struct lwp_info *lp = find_lwp_pid (ptid);
+	  send_sigint_callback (lp, NULL);
+	}
+    }
+  else
+    linux_ops->to_stop (ptid);
+}
+
 void
 linux_nat_add_target (struct target_ops *t)
 {
@@ -4269,6 +4422,9 @@ linux_nat_add_target (struct target_ops 
   t->to_terminal_inferior = linux_nat_terminal_inferior;
   t->to_terminal_ours = linux_nat_terminal_ours;
 
+  /* Methods for non-stop support.  */
+  t->to_stop = linux_nat_stop;
+
   /* We don't change the stratum; this target will sit at
      process_stratum and thread_db will set at thread_stratum.  This
      is a little strange, since this is a multi-threaded-capable
Index: src/gdb/linux-nat.h
===================================================================
--- src.orig/gdb/linux-nat.h	2008-07-10 22:14:19.000000000 +0100
+++ src/gdb/linux-nat.h	2008-07-10 22:14:30.000000000 +0100
@@ -94,6 +94,8 @@ void check_for_thread_db (void);
 /* Tell the thread_db layer what native target operations to use.  */
 void thread_db_init (struct target_ops *);
 
+int thread_db_attach_lwp (ptid_t ptid);
+
 /* Find process PID's pending signal set from /proc/pid/status.  */
 void linux_proc_pending_signals (int pid, sigset_t *pending, sigset_t *blocked, sigset_t *ignored);
 
Index: src/gdb/linux-thread-db.c
===================================================================
--- src.orig/gdb/linux-thread-db.c	2008-07-10 22:14:19.000000000 +0100
+++ src/gdb/linux-thread-db.c	2008-07-10 22:15:05.000000000 +0100
@@ -283,7 +283,10 @@ thread_get_info_callback (const td_thrha
   if (thread_info == NULL)
     {
       /* New thread.  Attach to it now (why wait?).  */
-      attach_thread (thread_ptid, thp, &ti);
+      if (!have_threads ())
+	thread_db_find_new_threads ();
+      else
+	attach_thread (thread_ptid, thp, &ti);
       thread_info = find_thread_pid (thread_ptid);
       gdb_assert (thread_info != NULL);
     }
@@ -308,6 +311,8 @@ thread_from_lwp (ptid_t ptid)
      LWP.  */
   gdb_assert (GET_LWP (ptid) != 0);
 
+  /* Access an lwp we know is stopped.  */
+  proc_handle.pid = GET_LWP (ptid);
   err = td_ta_map_lwp2thr_p (thread_agent, GET_LWP (ptid), &th);
   if (err != TD_OK)
     error (_("Cannot find user-level thread for LWP %ld: %s"),
@@ -332,6 +337,48 @@ thread_from_lwp (ptid_t ptid)
 }
 \f
 
+/* Attach to lwp PTID, doing whatever else is required to have this
+   LWP under the debugger's control --- e.g., enabling event
+   reporting.  Returns true on success.  */
+int
+thread_db_attach_lwp (ptid_t ptid)
+{
+  td_thrhandle_t th;
+  td_thrinfo_t ti;
+  td_err_e err;
+
+  if (!using_thread_db)
+    return 0;
+
+  /* This ptid comes from linux-nat.c, which should always fill in the
+     LWP.  */
+  gdb_assert (GET_LWP (ptid) != 0);
+
+  /* Access an lwp we know is stopped.  */
+  proc_handle.pid = GET_LWP (ptid);
+
+  /* If we have only looked at the first thread before libpthread was
+     initialized, we may not know its thread ID yet.  Make sure we do
+     before we add another thread to the list.  */
+  if (!have_threads ())
+    thread_db_find_new_threads ();
+
+  err = td_ta_map_lwp2thr_p (thread_agent, GET_LWP (ptid), &th);
+  if (err != TD_OK)
+    /* Cannot find user-level thread.  */
+    return 0;
+
+  err = td_thr_get_info_p (&th, &ti);
+  if (err != TD_OK)
+    {
+      warning (_("Cannot get thread info: %s"), thread_db_err_str (err));
+      return 0;
+    }
+
+  attach_thread (ptid, &th, &ti);
+  return 1;
+}
+
 void
 thread_db_init (struct target_ops *target)
 {
@@ -418,6 +465,9 @@ enable_thread_event (td_thragent_t *thre
   td_notify_t notify;
   td_err_e err;
 
+  /* Access an lwp we know is stopped.  */
+  proc_handle.pid = GET_LWP (inferior_ptid);
+
   /* Get the breakpoint address for thread EVENT.  */
   err = td_ta_event_addr_p (thread_agent, event, &notify);
   if (err != TD_OK)
@@ -761,6 +811,15 @@ check_event (ptid_t ptid)
   if (stop_pc != td_create_bp_addr && stop_pc != td_death_bp_addr)
     return;
 
+  /* Access an lwp we know is stopped.  */
+  proc_handle.pid = GET_LWP (ptid);
+
+  /* If we have only looked at the first thread before libpthread was
+     initialized, we may not know its thread ID yet.  Make sure we do
+     before we add another thread to the list.  */
+  if (!have_threads ())
+    thread_db_find_new_threads ();
+
   /* If we are at a create breakpoint, we do not know what new lwp
      was created and cannot specifically locate the event message for it.
      We have to call td_ta_event_getmsg() to get
@@ -951,11 +1010,27 @@ find_new_threads_callback (const td_thrh
   return 0;
 }
 
+/* Search for new threads, accessing memory through stopped thread
+   PTID.  */
+
 static void
 thread_db_find_new_threads (void)
 {
   td_err_e err;
+  struct lwp_info *lp;
+  ptid_t ptid;
+
+  /* In linux, we can only read memory through a stopped lwp.  */
+  ALL_LWPS (lp, ptid)
+    if (lp->stopped)
+      break;
+
+  if (!lp)
+    /* There is no stopped thread.  Bail out.  */
+    return;
 
+  /* Access an lwp we know is stopped.  */
+  proc_handle.pid = GET_LWP (ptid);
   /* Iterate over all user-space threads to discover new threads.  */
   err = td_ta_thr_iter_p (thread_agent, find_new_threads_callback, NULL,
 			  TD_THR_ANY_STATE, TD_THR_LOWEST_PRIORITY,
Index: src/gdb/linux-fork.c
===================================================================
--- src.orig/gdb/linux-fork.c	2008-07-10 22:14:19.000000000 +0100
+++ src/gdb/linux-fork.c	2008-07-10 22:14:30.000000000 +0100
@@ -337,7 +337,9 @@ linux_fork_killall (void)
     {
       pid = PIDGET (fp->ptid);
       do {
-	ptrace (PT_KILL, pid, 0, 0);
+	/* Use SIGKILL instead of PTRACE_KILL because the former works even
+	   if the thread is running, while the later doesn't.  */
+	kill (pid, SIGKILL);
 	ret = waitpid (pid, &status, 0);
 	/* We might get a SIGCHLD instead of an exit status.  This is
 	 aggravated by the first kill above - a child has just

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [non-stop] 08/10 linux native support
  2008-07-10 21:51                   ` Pedro Alves
@ 2008-07-10 22:15                     ` Daniel Jacobowitz
  2008-07-10 23:01                       ` Pedro Alves
  0 siblings, 1 reply; 20+ messages in thread
From: Daniel Jacobowitz @ 2008-07-10 22:15 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

On Thu, Jul 10, 2008 at 10:51:27PM +0100, Pedro Alves wrote:
> Then, this patch cleaned it up a bit further.  Basically, it gets
> rid of the find_lwp_pid call in thread_db_find_new_threads.  Instead
> I'm using ALL_LWPS, which is already exported.  This gets rid
> of the find_lwp_pid -> linux_nat_lwp_pid rename throughout, and
> removes the need for thread_db_find_new_threads_1.  I then
> reimported a couple of comments and cleanups that were on
> the last patch, since you had picked up the previous-to-last.
> 
> Otherwise, the logic is the same.
> 
> Regtested on x86_64-unknown-linux-gnu.
> 
> OK?

Looks good to me.

-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [non-stop] 08/10 linux native support
  2008-07-10 22:15                     ` Daniel Jacobowitz
@ 2008-07-10 23:01                       ` Pedro Alves
  0 siblings, 0 replies; 20+ messages in thread
From: Pedro Alves @ 2008-07-10 23:01 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb-patches

A Thursday 10 July 2008 23:15:28, Daniel Jacobowitz wrote:
> >
> > OK?
>
> Looks good to me.

Checked in.  Thanks!

Non-stop in linux native x86 is working in HEAD now, but if the
current thread exits, you'll be in trouble.  :-)
The rest of the series takes care of that, and adds the continue -a,
and interrupt -a commands.

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2008-07-10 23:01 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-15 21:10 [non-stop] 08/10 linux native support Pedro Alves
2008-06-25 21:17 ` Daniel Jacobowitz
2008-06-25 22:03   ` Pedro Alves
2008-06-25 22:12     ` Pedro Alves
2008-06-25 22:52       ` Daniel Jacobowitz
2008-06-25 23:08     ` Daniel Jacobowitz
2008-07-02  3:35       ` Pedro Alves
2008-07-07 18:20         ` Daniel Jacobowitz
2008-07-09  3:25           ` Michael Snyder
2008-07-09  3:47             ` Daniel Jacobowitz
2008-07-09  3:55               ` Michael Snyder
2008-07-09  7:55                 ` Mark Kettenis
2008-07-09  7:56             ` Mark Kettenis
2008-07-10 15:28           ` Pedro Alves
2008-07-10 17:15             ` Daniel Jacobowitz
2008-07-10 18:01               ` Pedro Alves
2008-07-10 19:59                 ` Daniel Jacobowitz
2008-07-10 21:51                   ` Pedro Alves
2008-07-10 22:15                     ` Daniel Jacobowitz
2008-07-10 23:01                       ` Pedro Alves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox