RFC: nptl threading patch for linux

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

* RFC: nptl threading patch for linux
@ 2003-04-24 22:05 J. Johnston
  2003-05-09 22:00 ` Daniel Jacobowitz
  2003-06-02 18:49 ` Michael Snyder
  0 siblings, 2 replies; 12+ messages in thread
From: J. Johnston @ 2003-04-24 22:05 UTC (permalink / raw)
  To: gdb-patches

[-- Attachment #1: Type: text/plain, Size: 4643 bytes --]

The following is the last part of my revised nptl patch that has
been broken up per Daniel J.'s suggestion.  There are no generated
files included in the patch.

A bit of background is needed.  First of all, in the nptl model,
lwps and pids are distinct entities.  If you perform a ps on a
multithreaded application, you will not see its child threads show
up.  This differs from the linuxthreads model whereby an lwp was
actually a pid.  In the nptl model, you cannot issue a kill to
a child thread via its lwp.  If you want to do this, you need to use
the tkill syscall instead.  The action of sending a signal to a specific
lwp is used commonly in lin-lwp.c.  The first part of the nptl change is to
determine at configuration time if we have the tkill syscall and then at run-time,
if we can use it.  A new function kill_lwp has been added which either calls the
tkill syscall or the regular kill function as appropriate.

Another key difference in behavior between nptl and linuxthreads is what happens
when a thread exits.  In the linuxthreads model, each thread that exits causes
an exit event to occur.  The main thread can exit before all of its children.
In the nptl model, only the main thread generates the exit event and it only
does so after all child threads have exited.  So, to determine when an lwp
has exited, we have to constantly check its status when we are able.  When
we get an exit event we have to determine if we are exiting the program or
just one of the threads.  Some additional logic has been added to the exit
checking code such that if we have (lwp == pid), then we stop all active
threads and check if they have already exited.  If they have not exited,
we resume these threads, otherwise, we delete them.

Daniel J. brought up a good point regarding my previous attempt at this logic
whereby I stopped all threads and resumed all threads.  That old logic is wrong
when scheduler_locking is set on.  The new logic addresses this because
it only stops/resumes threads which are not already stopped.  Currently, if we lock
into a linuxthreads thread and we run it until exit, we will cause a gdb_assert() to trigger
because we will field the exit event and end up with no running threads.  Under nptl, we
never get notified of the thread exiting so for this scenario, we end up hung.
In this instance, there is nothing we can do without a kernel change.  Daniel J.
is looking into a kernel change.  For user threads, we could activate the death event
in thread-db.c and make a low-level call to notify the lwp layer, but this does not
handle lwps created directly by the end-user.  This issue needs to be discussed
further outside of this patch because I want to propose changing the assertion to
allow the user to resume the other threads.  I have tested the change with such a
scenario (locking a thread and running until exit).  For linuxthreads we trigger the
assertion as before.  For nptl, we hang as expected as we never get the exit event.
A bug can be opened for this if the patch is accepted.

The last piece of logic has to do with a new interface needed by the nptl libthread_db
library to get the thread area.  I have grouped this together with the lin-lwp changes
because the lin-lwp changes cannot work without this crucial change.

This patch has been tested for both linuxthreads and with an nptl kernel.

Ok to commit?

-- Jeff J.

2003-04-24  Jeff Johnston  <jjohnstn@redhat.com>

  	* acconfig.h: Add HAVE_TKILL_SYSCALL definition check.
  	* config.in: Regenerated.
	* configure.in: Add test for syscall function and check for
  	__NR_tkill macro in <syscall.h> to set HAVE_TKILL_SYSCALL.
  	* configure: Regenerated.
  	* thread-db.c (check_event): For create/death event breakpoints,
  	loop through all messages to ensure that we read the message
  	corresponding to the breakpoint we are at.
  	* lin-lwp.c [HAVE_TKILL_SYSCALL]: Include <unistd.h> and
  	<sys/syscall.h>.
  	(kill_lwp): New function that uses tkill syscall or
  	uses kill, depending on whether threading model is nptl or not.
  	All callers of kill() changed to use kill_lwp().
  	(lin_lwp_wait): Make special check when WIFEXITED occurs to
  	see if all threads have already exited in the nptl model.
	(stop_and_resume_callback): New callback function used by the
	lin_lwp_wait thread exit handling code.
  	(stop_wait_callback): Check for threads already having exited and
  	delete such threads fromt the lwp list when discovered.
  	(stop_callback): Don't assert retcode of kill call.

	Roland McGrath  <roland@redhat.com>
  	* i386-linux-nat.c (ps_get_thread_area): New function needed by
  	nptl libthread_db.



[-- Attachment #2: nptl2a.patch --]
[-- Type: text/plain, Size: 10774 bytes --]

Index: lin-lwp.c
===================================================================
RCS file: /cvs/src/src/gdb/lin-lwp.c,v
retrieving revision 1.43
diff -u -r1.43 lin-lwp.c
--- lin-lwp.c	28 Mar 2003 21:42:41 -0000	1.43
+++ lin-lwp.c	23 Apr 2003 22:52:44 -0000
@@ -24,6 +24,10 @@
 #include "gdb_string.h"
 #include <errno.h>
 #include <signal.h>
+#ifdef HAVE_TKILL_SYSCALL
+#include <unistd.h>
+#include <sys/syscall.h>
+#endif
 #include <sys/ptrace.h>
 #include "gdb_wait.h"
 
@@ -156,6 +160,7 @@
 
 /* Prototypes for local functions.  */
 static int stop_wait_callback (struct lwp_info *lp, void *data);
+static int lin_lwp_thread_alive (ptid_t ptid);
 \f
 /* Convert wait status STATUS to a string.  Used for printing debug
    messages only.  */
@@ -627,6 +632,32 @@
 }
 \f
 
+/* Issue kill to specified lwp.  */
+
+static int tkill_failed;
+
+static int
+kill_lwp (int lwpid, int signo)
+{
+  errno = 0;
+
+/* Use tkill, if possible, in case we are using nptl threads.  If tkill
+   fails, then we are not using nptl threads and we should be using kill.  */
+
+#ifdef HAVE_TKILL_SYSCALL
+  if (!tkill_failed)
+    {
+      int ret = syscall (__NR_tkill, lwpid, signo);
+      if (errno != ENOSYS)
+	return ret;
+      errno = 0;
+      tkill_failed = 1;
+    }
+#endif
+
+  return kill (lwpid, signo);
+}
+
 /* Send a SIGSTOP to LP.  */
 
 static int
@@ -642,8 +673,15 @@
 			      "SC:  kill %s **<SIGSTOP>**\n",
 			      target_pid_to_str (lp->ptid));
 	}
-      ret = kill (GET_LWP (lp->ptid), SIGSTOP);
-      gdb_assert (ret == 0);
+      errno = 0;
+      ret = kill_lwp (GET_LWP (lp->ptid), SIGSTOP);
+      if (debug_lin_lwp)
+	{
+	  fprintf_unfiltered (gdb_stdlog,
+			      "SC:  lwp kill %d %s\n",
+			      ret,
+			      errno ? safe_strerror (errno) : "ERRNO-OK");
+	}
 
       lp->signalled = 1;
       gdb_assert (lp->status == 0);
@@ -667,11 +705,23 @@
 
       gdb_assert (lp->status == 0);
 
-      pid = waitpid (GET_LWP (lp->ptid), &status, lp->cloned ? __WCLONE : 0);
+      pid = waitpid (GET_LWP (lp->ptid), &status, 0);
       if (pid == -1 && errno == ECHILD)
-	/* OK, the proccess has disappeared.  We'll catch the actual
-	   exit event in lin_lwp_wait.  */
-	return 0;
+	{
+	  pid = waitpid (GET_LWP (lp->ptid), &status, __WCLONE);
+	  if (pid == -1 && errno == ECHILD)
+	    {
+	      /* The thread has previously exited.  We need to delete it now
+	         because in the case of nptl threads, there won't be an
+	         exit event unless it is the main thread.  */
+	      if (debug_lin_lwp)
+		fprintf_unfiltered (gdb_stdlog,
+				    "SWC: %s exited.\n",
+				    target_pid_to_str (lp->ptid));
+	      delete_lwp (lp->ptid);
+	      return 0;
+	    }
+	}
 
       gdb_assert (pid == GET_LWP (lp->ptid));
 
@@ -683,6 +733,7 @@
 			      status_to_str (status));
 	}
 
+      /* Check if the thread has exited.  */
       if (WIFEXITED (status) || WIFSIGNALED (status))
 	{
 	  gdb_assert (num_lwps > 1);
@@ -697,7 +748,31 @@
 				 target_pid_to_str (lp->ptid));
 	    }
 	  if (debug_lin_lwp)
-	    fprintf_unfiltered (gdb_stdlog, "SWC: %s exited.\n",
+	    fprintf_unfiltered (gdb_stdlog,
+				"SWC: %s exited.\n",
+				target_pid_to_str (lp->ptid));
+
+	  delete_lwp (lp->ptid);
+	  return 0;
+	}
+
+      /* Check if the current LWP has previously exited.  For nptl threads,
+         there is no exit signal issued for LWPs that are not the
+         main thread so we should check whenever the thread is stopped.  */
+      if (!lin_lwp_thread_alive (lp->ptid))
+	{
+	  if (in_thread_list (lp->ptid))
+	    {
+	      /* Core GDB cannot deal with us deleting the current
+	         thread.  */
+	      if (!ptid_equal (lp->ptid, inferior_ptid))
+		delete_thread (lp->ptid);
+	      printf_unfiltered ("[%s exited]\n",
+				 target_pid_to_str (lp->ptid));
+	    }
+	  if (debug_lin_lwp)
+	    fprintf_unfiltered (gdb_stdlog,
+				"SWC: %s already exited.\n",
 				target_pid_to_str (lp->ptid));
 
 	  delete_lwp (lp->ptid);
@@ -756,7 +831,14 @@
 	      /* If there's another event, throw it back into the queue. */
 	      if (lp->status)
 		{
-		  kill (GET_LWP (lp->ptid), WSTOPSIG (lp->status));
+		  if (debug_lin_lwp)
+		    {
+		      fprintf_unfiltered (gdb_stdlog,
+					  "SWC: kill %s, %s\n",
+					  target_pid_to_str (lp->ptid),
+					  status_to_str ((int) status));
+		    }
+		  kill_lwp (GET_LWP (lp->ptid), WSTOPSIG (lp->status));
 		}
 	      /* Save the sigtrap event. */
 	      lp->status = status;
@@ -800,7 +882,7 @@
 					  target_pid_to_str (lp->ptid),
 					  status_to_str ((int) status));
 		    }
-		  kill (GET_LWP (lp->ptid), WSTOPSIG (status));
+		  kill_lwp (GET_LWP (lp->ptid), WSTOPSIG (status));
 		}
 	      return 0;
 	    }
@@ -1049,6 +1131,25 @@
 
 #endif
 
+/* Stop an active thread, verify it still exists, then resume it.  */
+
+static int
+stop_and_resume_callback (struct lwp_info *lp, void *data)
+{
+  struct lwp_info *ptr;
+
+  if (!lp->stopped && !lp->signalled)
+    {
+      stop_callback (lp, NULL);
+      stop_wait_callback (lp, NULL);
+      /* Resume if the lwp still exists.  */
+      for (ptr = lwp_list; ptr; ptr = ptr->next)
+	if (lp == ptr)
+	  resume_callback (lp, NULL);
+    }
+  return 0;
+}
+
 static ptid_t
 lin_lwp_wait (ptid_t ptid, struct target_waitstatus *ourstatus)
 {
@@ -1206,10 +1307,61 @@
 		}
 	    }
 
-	  /* Make sure we don't report a TARGET_WAITKIND_EXITED or
-	     TARGET_WAITKIND_SIGNALLED event if there are still LWP's
-	     left in the process.  */
+	  /* Check if the thread has exited.  */
 	  if ((WIFEXITED (status) || WIFSIGNALED (status)) && num_lwps > 1)
+	    {
+	      if (in_thread_list (lp->ptid))
+		{
+		  /* Core GDB cannot deal with us deleting the current
+		     thread.  */
+		  if (!ptid_equal (lp->ptid, inferior_ptid))
+		    delete_thread (lp->ptid);
+		  printf_unfiltered ("[%s exited]\n",
+				     target_pid_to_str (lp->ptid));
+		}
+
+	      /* If this is the main thread, we must stop all threads and
+	         verify if they are still alive.  This is because in the nptl
+	         thread model, there is no signal issued for exiting LWPs
+	         other than the main thread.  We only get the main thread
+	         exit signal once all child threads have already exited.
+	         If we stop all the threads and use the stop_wait_callback
+	         to check if they have exited we can determine whether this
+	         signal should be ignored or whether it means the end of the
+	         debugged application, regardless of which threading model
+	         is being used.  */
+	      if (GET_PID (lp->ptid) == GET_LWP (lp->ptid))
+		{
+		  lp->stopped = 1;
+		  iterate_over_lwps (stop_and_resume_callback, NULL);
+		}
+
+	      if (debug_lin_lwp)
+		fprintf_unfiltered (gdb_stdlog,
+				    "LLW: %s exited.\n",
+				    target_pid_to_str (lp->ptid));
+
+	      delete_lwp (lp->ptid);
+
+	      /* If there is at least one more LWP, then the exit signal
+	         was not the end of the debugged application and should be
+	         ignored.  */
+	      if (num_lwps > 0)
+		{
+		  /* Make sure there is at least one thread running.  */
+		  gdb_assert (iterate_over_lwps (running_callback, NULL));
+
+		  /* Discard the event.  */
+		  status = 0;
+		  continue;
+		}
+	    }
+
+	  /* Check if the current LWP has previously exited.  In the nptl
+	     thread model, LWPs other than the main thread do not issue
+	     signals when they exit so we must check whenever the thread
+	     has stopped.  A similar check is made in stop_wait_callback().  */
+	  if (num_lwps > 1 && !lin_lwp_thread_alive (lp->ptid))
 	    {
 	      if (in_thread_list (lp->ptid))
 		{
Index: acconfig.h
===================================================================
RCS file: /cvs/src/src/gdb/acconfig.h,v
retrieving revision 1.24
diff -u -r1.24 acconfig.h
--- acconfig.h	4 Jan 2003 00:34:42 -0000	1.24
+++ acconfig.h	23 Apr 2003 22:52:44 -0000
@@ -95,6 +95,9 @@
 /* Define if using Solaris thread debugging.  */
 #undef HAVE_THREAD_DB_LIB
 
+/* Define if you support the tkill syscall.  */
+#undef HAVE_TKILL_SYSCALL
+
 /* Define on a GNU/Linux system to work around problems in sys/procfs.h.  */
 #undef START_INFERIOR_TRAPS_EXPECTED
 #undef sys_quotactl
Index: configure.in
===================================================================
RCS file: /cvs/src/src/gdb/configure.in,v
retrieving revision 1.126
diff -u -r1.126 configure.in
--- configure.in	26 Feb 2003 15:10:47 -0000	1.126
+++ configure.in	23 Apr 2003 22:52:44 -0000
@@ -384,6 +384,7 @@
 AC_CHECK_FUNCS(setpgid setpgrp)
 AC_CHECK_FUNCS(sigaction sigprocmask sigsetmask)
 AC_CHECK_FUNCS(socketpair)
+AC_CHECK_FUNCS(syscall)
 
 dnl AC_FUNC_SETPGRP does not work when cross compiling
 dnl Instead, assume we will have a prototype for setpgrp if cross compiling.
@@ -909,6 +910,24 @@
 if test "x$gdb_cv_thread_db_h_has_td_notalloc" = "xyes"; then
   AC_DEFINE(THREAD_DB_HAS_TD_NOTALLOC, 1,
             [Define if <thread_db.h> has the TD_NOTALLOC error code.])
+fi
+
+dnl See if we have a sys/syscall header file that has __NR_tkill.
+if test "x$ac_cv_header_sys_syscall_h" = "xyes"; then
+   AC_CACHE_CHECK([whether <sys/syscall.h> has __NR_tkill],
+                  gdb_cv_sys_syscall_h_has_tkill,
+     AC_TRY_COMPILE(
+       [#include <sys/syscall.h>],
+       [int i = __NR_tkill;],
+       gdb_cv_sys_syscall_h_has_tkill=yes,
+       gdb_cv_sys_syscall_h_has_tkill=no
+     )
+   )
+fi
+dnl See if we can issue tkill syscall.
+if test "x$gdb_cv_sys_syscall_h_has_tkill" = "xyes" && test "x$ac_cv_func_syscall" = "xyes"; then
+  AC_DEFINE(HAVE_TKILL_SYSCALL, 1,
+            [Define if we can use the tkill syscall.])
 fi
 
 dnl Handle optional features that can be enabled.
Index: i386-linux-nat.c
===================================================================
RCS file: /cvs/src/src/gdb/i386-linux-nat.c,v
retrieving revision 1.44
diff -u -r1.44 i386-linux-nat.c
--- i386-linux-nat.c	16 Apr 2003 15:22:02 -0000	1.44
+++ i386-linux-nat.c	23 Apr 2003 22:52:44 -0000
@@ -70,6 +70,9 @@
 /* Defines I386_LINUX_ORIG_EAX_REGNUM.  */
 #include "i386-linux-tdep.h"
 
+/* Defines ps_err_e, struct ps_prochandle.  */
+#include "gdb_proc_service.h"
+
 /* Prototypes for local functions.  */
 static void dummy_sse_values (void);
 
@@ -682,6 +685,21 @@
 	  offsetof (struct user, u_debugreg[regnum]), value);
   if (errno != 0)
     perror_with_name ("Couldn't write debug register");
+}
+
+extern ps_err_e
+ps_get_thread_area(const struct ps_prochandle *ph, 
+		   lwpid_t lwpid, int idx, void **base)
+{
+  unsigned long int desc[3];
+#define PTRACE_GET_THREAD_AREA 25
+
+  if  (ptrace (PTRACE_GET_THREAD_AREA, 
+	       lwpid, (void *) idx, (unsigned long) &desc) < 0)
+    return PS_ERR;
+
+  *(int *)base = desc[1];
+  return PS_OK;
 }
 
 void

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: nptl threading patch for linux
  2003-04-24 22:05 RFC: nptl threading patch for linux J. Johnston
@ 2003-05-09 22:00 ` Daniel Jacobowitz
  2003-05-09 22:36   ` Andrew Cagney
                     ` (2 more replies)
  2003-06-02 18:49 ` Michael Snyder
  1 sibling, 3 replies; 12+ messages in thread
From: Daniel Jacobowitz @ 2003-05-09 22:00 UTC (permalink / raw)
  To: J. Johnston; +Cc: gdb-patches

On Thu, Apr 24, 2003 at 04:52:04PM -0400, J. Johnston wrote:
> The following is the last part of my revised nptl patch that has
> been broken up per Daniel J.'s suggestion.  There are no generated
> files included in the patch.

Well, this patch doesn't work for me :(  Using 2.5.69, since I don't
have any of the Red Hat kernels available here at the moment.  It looks
like GDB bellies up around the second thread creation.

A backtrace looks like:
#0  0xffffe402 in ?? ()
#1  0x080e1332 in stop_wait_callback (lp=0x0, data=0xbffff450)
    at /opt/src/gdb/src-gdblinks/gdb/lin-lwp.c:708
#2  0x080e159a in stop_wait_callback (lp=0x0, data=0xbffff450)
    at /opt/src/gdb/src-gdblinks/gdb/lin-lwp.c:870
#3  0x080e159a in stop_wait_callback (lp=0x0, data=0xbffff450)
    at /opt/src/gdb/src-gdblinks/gdb/lin-lwp.c:870
#4  0x080e159a in stop_wait_callback (lp=0x0, data=0xbffff450)
    at /opt/src/gdb/src-gdblinks/gdb/lin-lwp.c:870

And that's not just the stack unwinder getting confused.  We really did
recurse until we ran out of stack.

The superficial reason is this:
SWC: Pending event Segmentation Fault (stopped) in LWP 4490

i.e. every time we resume it with no signal it SIGSEGV's again, and we
never get the SIGSTOP.

Here's some more of the log:
(gdb) c
Continuing.
LLR: PTRACE_SINGLESTEP process 4498, 0 (resume event thread)
LLW: waitpid 4498 received Trace/breakpoint trap (stopped)
LLTA: PTRACE_PEEKUSER LWP 4498, 0, 0 (OK)
LLW: Candidate event Trace/breakpoint trap (stopped) in LWP 4498.
SEL: Select single-step LWP 4498
LLW: trap_ptid is LWP 4498.
RC:  PTRACE_CONT LWP 4497, 0, 0 (resume sibling)
LLR: PTRACE_CONT process 4498, 0 (resume event thread)
LLW: waitpid 4497 received Trace/breakpoint trap (stopped)
LLTA: PTRACE_PEEKUSER LWP 4497, 0, 0 (OK)
LLW: Candidate event Trace/breakpoint trap (stopped) in LWP 4497.
SC:  kill LWP 4498 **<SIGSTOP>**
SC:  lwp kill 0 ERRNO-OK
SWC: waitpid LWP 4498 received Stopped (signal) (stopped)
LLTA: PTRACE_PEEKUSER LWP 4498, 0, 0 (OK)
LLW: trap_ptid is LWP 4497.
[New Thread 1077276112 (LWP 4499)]
LLAL: PTRACE_ATTACH LWP 4499, 0, 0 (OK)
LLAL: waitpid LWP 4499 received Stopped (signal) (stopped)
LLR: PTRACE_SINGLESTEP process 4497, 0 (resume event thread)
LLW: waitpid 4497 received Trace/breakpoint trap (stopped)
LLTA: PTRACE_PEEKUSER LWP 4497, 0, 0 (OK)
LLW: Candidate event Trace/breakpoint trap (stopped) in LWP 4497.
SEL: Select single-step LWP 4497
LLW: trap_ptid is LWP 4497.
RC:  PTRACE_CONT LWP 4499, 0, 0 (resume sibling)
RC:  PTRACE_CONT LWP 4498, 0, 0 (resume sibling)
LLR: PTRACE_CONT process 4497, 0 (resume event thread)
LLW: waitpid 4499 received Trace/breakpoint trap (stopped)
LLTA: PTRACE_PEEKUSER LWP 4499, 0, 0 (OK)
LLW: Candidate event Trace/breakpoint trap (stopped) in LWP 4499.
SC:  kill LWP 4498 **<SIGSTOP>**
SC:  lwp kill 0 ERRNO-OK
SC:  kill LWP 4497 **<SIGSTOP>**
SC:  lwp kill 0 ERRNO-OK
SWC: waitpid LWP 4498 received Stopped (signal) (stopped)
LLTA: PTRACE_PEEKUSER LWP 4498, 0, 0 (OK)
SWC: waitpid LWP 4497 received Trace/breakpoint trap (stopped)
LLTA: PTRACE_PEEKUSER LWP 4497, 0, 0 (OK)
PTRACE_CONT LWP 4497, 0, 0 (OK)
SWC: Candidate SIGTRAP event in LWP 4497
SWC: waitpid LWP 4497 received Trace/breakpoint trap (stopped)
LLTA: PTRACE_PEEKUSER LWP 4497, 0, 0 (OK)
PTRACE_CONT LWP 4497, 0, 0 (OK)
SWC: Candidate SIGTRAP event in LWP 4497
SWC: waitpid LWP 4497 received Segmentation fault (stopped)
LLTA: PTRACE_PEEKUSER LWP 4497, 0, 0 (OK)
SWC: Pending event Segmentation fault (stopped) in LWP 4497
SWC: PTRACE_CONT LWP 4497, 0, 0 (OK)
SWC: waitpid LWP 4497 received Segmentation fault (stopped)
LLTA: PTRACE_PEEKUSER LWP 4497, 0, 0 (OK)

A little interpretation: 4497 hits the creation breakpoint.  We atach
to 4499.  4499 hits the common_routine breakpoint.  We stop 4497.  It
hits the breakpoint at thread creation again for the next thread.  We
PTRACE_CONT 4497 again trying to get the SIGSTOP, and get another
SIGTRAP - probably we were backed up from the breakpoint last time so
we hit it again.  We try _again_, and SIGSEGV because we're on the
second byte of a multi-byte instruction, the first byte having been
replaced by a breakpoint.

Life explodes.

So:
  - stop_wait_callback should be fixed to not be so dumb when this
    happens.
  - we need to figure out how we got into this mess.
  - and why the SIGSTOP never showed up.

I avoid this entire foul issue in gdbserver by not backtracking and
resuming the application; instead I just set a flag marking the next
SIGSTOP as "expected".  It's still not perfect but it's a great deal
better.  I can do even better when I have some time to play with
PTRACE_GETSIGINFO.

I'm waiting for GDB to tell me how we got here.  The backtrace is more
than 40K frames, since I forgot to shrink the stack limit.  50K...
170K... ooh!

#174697 0x080e1724 in stop_wait_callback (lp=0x0, data=0xbffff450)
    at /opt/src/gdb/src-gdblinks/gdb/lin-lwp.c:830
#174698 0x080e033d in iterate_over_lwps (callback=0x80e12d0 <stop_wait_callback>, data=0x1181)
    at /opt/src/gdb/src-gdblinks/gdb/lin-lwp.c:293
#174699 0x080e251e in lin_lwp_wait (ptid={pid = -1, lwp = 0, tid = 0}, ourstatus=0x72)
    at /opt/src/gdb/src-gdblinks/gdb/lin-lwp.c:1499
#174700 0x08128ca3 in thread_db_wait (ptid={pid = -1, lwp = 0, tid = 0}, ourstatus=0xffffffff)
    at /opt/src/gdb/src-gdblinks/gdb/thread-db.c:846
#174701 0x080bc19e in wait_for_inferior () at /opt/src/gdb/src-gdblinks/gdb/infrun.c:1003
#174702 0x080bbf13 in proceed (addr=3221222720, siggnal=144, step=0)
    at /opt/src/gdb/src-gdblinks/gdb/infrun.c:814
#174703 0x080b8fb0 in continue_command (proc_count_exp=0x0, from_tty=1)
    at /opt/src/gdb/src-gdblinks/gdb/infcmd.c:539

It wasn't worth the wait.  That didn't help much.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: nptl threading patch for linux
  2003-05-09 22:00 ` Daniel Jacobowitz
@ 2003-05-09 22:36   ` Andrew Cagney
  2003-05-10  0:58     ` Daniel Jacobowitz
  2003-05-09 23:38   ` J. Johnston
  2003-05-10 21:18   ` Daniel Jacobowitz
  2 siblings, 1 reply; 12+ messages in thread
From: Andrew Cagney @ 2003-05-09 22:36 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: J. Johnston, gdb-patches

> On Thu, Apr 24, 2003 at 04:52:04PM -0400, J. Johnston wrote:
> 
>> The following is the last part of my revised nptl patch that has
>> been broken up per Daniel J.'s suggestion.  There are no generated
>> files included in the patch.
> 
> 
> Well, this patch doesn't work for me :(  Using 2.5.69, since I don't
> have any of the Red Hat kernels available here at the moment.  It looks
> like GDB bellies up around the second thread creation.

Hmm, I've a horrible feeling that people are going to get bogged down 
trying to sort out which GLIBC and Kernel they should all use.

Is there a list of Kernel and GLIBC fixes that are needed?

Andrew





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: nptl threading patch for linux
  2003-05-09 22:36   ` Andrew Cagney
@ 2003-05-10  0:58     ` Daniel Jacobowitz
  0 siblings, 0 replies; 12+ messages in thread
From: Daniel Jacobowitz @ 2003-05-10  0:58 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: J. Johnston, gdb-patches

On Fri, May 09, 2003 at 06:36:29PM -0400, Andrew Cagney wrote:
> >On Thu, Apr 24, 2003 at 04:52:04PM -0400, J. Johnston wrote:
> >
> >>The following is the last part of my revised nptl patch that has
> >>been broken up per Daniel J.'s suggestion.  There are no generated
> >>files included in the patch.
> >
> >
> >Well, this patch doesn't work for me :(  Using 2.5.69, since I don't
> >have any of the Red Hat kernels available here at the moment.  It looks
> >like GDB bellies up around the second thread creation.
> 
> Hmm, I've a horrible feeling that people are going to get bogged down 
> trying to sort out which GLIBC and Kernel they should all use.
> 
> Is there a list of Kernel and GLIBC fixes that are needed?

I'm just using the most recent releases of NPTL and the Linux kernel,
and the glibc CVS snapshot associated with that release of NPTL.  Today
it's 2.5.69-bk and 0.37.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: nptl threading patch for linux
  2003-05-09 22:00 ` Daniel Jacobowitz
  2003-05-09 22:36   ` Andrew Cagney
@ 2003-05-09 23:38   ` J. Johnston
  2003-05-10  0:58     ` Daniel Jacobowitz
  2003-05-10 21:18   ` Daniel Jacobowitz
  2 siblings, 1 reply; 12+ messages in thread
From: J. Johnston @ 2003-05-09 23:38 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb-patches

Daniel Jacobowitz wrote:
> On Thu, Apr 24, 2003 at 04:52:04PM -0400, J. Johnston wrote:
> 
>>The following is the last part of my revised nptl patch that has
>>been broken up per Daniel J.'s suggestion.  There are no generated
>>files included in the patch.
> 
> 
> Well, this patch doesn't work for me :(  Using 2.5.69, since I don't
> have any of the Red Hat kernels available here at the moment.  It looks
> like GDB bellies up around the second thread creation.
>

Is this one of the gdb.threads testcases?  If not, do any of those run
for you and/or can you send me a testcase for the problem below so we can at least
have something common to compare?

-- Jeff J.

> A backtrace looks like:
> #0  0xffffe402 in ?? ()
> #1  0x080e1332 in stop_wait_callback (lp=0x0, data=0xbffff450)
>     at /opt/src/gdb/src-gdblinks/gdb/lin-lwp.c:708
> #2  0x080e159a in stop_wait_callback (lp=0x0, data=0xbffff450)
>     at /opt/src/gdb/src-gdblinks/gdb/lin-lwp.c:870
> #3  0x080e159a in stop_wait_callback (lp=0x0, data=0xbffff450)
>     at /opt/src/gdb/src-gdblinks/gdb/lin-lwp.c:870
> #4  0x080e159a in stop_wait_callback (lp=0x0, data=0xbffff450)
>     at /opt/src/gdb/src-gdblinks/gdb/lin-lwp.c:870
> 
> And that's not just the stack unwinder getting confused.  We really did
> recurse until we ran out of stack.
> 
> The superficial reason is this:
> SWC: Pending event Segmentation Fault (stopped) in LWP 4490
> 
> i.e. every time we resume it with no signal it SIGSEGV's again, and we
> never get the SIGSTOP.
> 
> Here's some more of the log:
> (gdb) c
> Continuing.
> LLR: PTRACE_SINGLESTEP process 4498, 0 (resume event thread)
> LLW: waitpid 4498 received Trace/breakpoint trap (stopped)
> LLTA: PTRACE_PEEKUSER LWP 4498, 0, 0 (OK)
> LLW: Candidate event Trace/breakpoint trap (stopped) in LWP 4498.
> SEL: Select single-step LWP 4498
> LLW: trap_ptid is LWP 4498.
> RC:  PTRACE_CONT LWP 4497, 0, 0 (resume sibling)
> LLR: PTRACE_CONT process 4498, 0 (resume event thread)
> LLW: waitpid 4497 received Trace/breakpoint trap (stopped)
> LLTA: PTRACE_PEEKUSER LWP 4497, 0, 0 (OK)
> LLW: Candidate event Trace/breakpoint trap (stopped) in LWP 4497.
> SC:  kill LWP 4498 **<SIGSTOP>**
> SC:  lwp kill 0 ERRNO-OK
> SWC: waitpid LWP 4498 received Stopped (signal) (stopped)
> LLTA: PTRACE_PEEKUSER LWP 4498, 0, 0 (OK)
> LLW: trap_ptid is LWP 4497.
> [New Thread 1077276112 (LWP 4499)]
> LLAL: PTRACE_ATTACH LWP 4499, 0, 0 (OK)
> LLAL: waitpid LWP 4499 received Stopped (signal) (stopped)
> LLR: PTRACE_SINGLESTEP process 4497, 0 (resume event thread)
> LLW: waitpid 4497 received Trace/breakpoint trap (stopped)
> LLTA: PTRACE_PEEKUSER LWP 4497, 0, 0 (OK)
> LLW: Candidate event Trace/breakpoint trap (stopped) in LWP 4497.
> SEL: Select single-step LWP 4497
> LLW: trap_ptid is LWP 4497.
> RC:  PTRACE_CONT LWP 4499, 0, 0 (resume sibling)
> RC:  PTRACE_CONT LWP 4498, 0, 0 (resume sibling)
> LLR: PTRACE_CONT process 4497, 0 (resume event thread)
> LLW: waitpid 4499 received Trace/breakpoint trap (stopped)
> LLTA: PTRACE_PEEKUSER LWP 4499, 0, 0 (OK)
> LLW: Candidate event Trace/breakpoint trap (stopped) in LWP 4499.
> SC:  kill LWP 4498 **<SIGSTOP>**
> SC:  lwp kill 0 ERRNO-OK
> SC:  kill LWP 4497 **<SIGSTOP>**
> SC:  lwp kill 0 ERRNO-OK
> SWC: waitpid LWP 4498 received Stopped (signal) (stopped)
> LLTA: PTRACE_PEEKUSER LWP 4498, 0, 0 (OK)
> SWC: waitpid LWP 4497 received Trace/breakpoint trap (stopped)
> LLTA: PTRACE_PEEKUSER LWP 4497, 0, 0 (OK)
> PTRACE_CONT LWP 4497, 0, 0 (OK)
> SWC: Candidate SIGTRAP event in LWP 4497
> SWC: waitpid LWP 4497 received Trace/breakpoint trap (stopped)
> LLTA: PTRACE_PEEKUSER LWP 4497, 0, 0 (OK)
> PTRACE_CONT LWP 4497, 0, 0 (OK)
> SWC: Candidate SIGTRAP event in LWP 4497
> SWC: waitpid LWP 4497 received Segmentation fault (stopped)
> LLTA: PTRACE_PEEKUSER LWP 4497, 0, 0 (OK)
> SWC: Pending event Segmentation fault (stopped) in LWP 4497
> SWC: PTRACE_CONT LWP 4497, 0, 0 (OK)
> SWC: waitpid LWP 4497 received Segmentation fault (stopped)
> LLTA: PTRACE_PEEKUSER LWP 4497, 0, 0 (OK)
> 
> 
> A little interpretation: 4497 hits the creation breakpoint.  We atach
> to 4499.  4499 hits the common_routine breakpoint.  We stop 4497.  It
> hits the breakpoint at thread creation again for the next thread.  We
> PTRACE_CONT 4497 again trying to get the SIGSTOP, and get another
> SIGTRAP - probably we were backed up from the breakpoint last time so
> we hit it again.  We try _again_, and SIGSEGV because we're on the
> second byte of a multi-byte instruction, the first byte having been
> replaced by a breakpoint.
> 
> Life explodes.
> 
> 
> So:
>   - stop_wait_callback should be fixed to not be so dumb when this
>     happens.
>   - we need to figure out how we got into this mess.
>   - and why the SIGSTOP never showed up.
> 
> I avoid this entire foul issue in gdbserver by not backtracking and
> resuming the application; instead I just set a flag marking the next
> SIGSTOP as "expected".  It's still not perfect but it's a great deal
> better.  I can do even better when I have some time to play with
> PTRACE_GETSIGINFO.
> 
> I'm waiting for GDB to tell me how we got here.  The backtrace is more
> than 40K frames, since I forgot to shrink the stack limit.  50K...
> 170K... ooh!
> 
> #174697 0x080e1724 in stop_wait_callback (lp=0x0, data=0xbffff450)
>     at /opt/src/gdb/src-gdblinks/gdb/lin-lwp.c:830
> #174698 0x080e033d in iterate_over_lwps (callback=0x80e12d0 <stop_wait_callback>, data=0x1181)
>     at /opt/src/gdb/src-gdblinks/gdb/lin-lwp.c:293
> #174699 0x080e251e in lin_lwp_wait (ptid={pid = -1, lwp = 0, tid = 0}, ourstatus=0x72)
>     at /opt/src/gdb/src-gdblinks/gdb/lin-lwp.c:1499
> #174700 0x08128ca3 in thread_db_wait (ptid={pid = -1, lwp = 0, tid = 0}, ourstatus=0xffffffff)
>     at /opt/src/gdb/src-gdblinks/gdb/thread-db.c:846
> #174701 0x080bc19e in wait_for_inferior () at /opt/src/gdb/src-gdblinks/gdb/infrun.c:1003
> #174702 0x080bbf13 in proceed (addr=3221222720, siggnal=144, step=0)
>     at /opt/src/gdb/src-gdblinks/gdb/infrun.c:814
> #174703 0x080b8fb0 in continue_command (proc_count_exp=0x0, from_tty=1)
>     at /opt/src/gdb/src-gdblinks/gdb/infcmd.c:539
> 
> It wasn't worth the wait.  That didn't help much.
> 
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: nptl threading patch for linux
  2003-05-09 23:38   ` J. Johnston
@ 2003-05-10  0:58     ` Daniel Jacobowitz
  0 siblings, 0 replies; 12+ messages in thread
From: Daniel Jacobowitz @ 2003-05-10  0:58 UTC (permalink / raw)
  To: J. Johnston; +Cc: gdb-patches

On Fri, May 09, 2003 at 07:38:34PM -0400, J. Johnston wrote:
> Daniel Jacobowitz wrote:
> >On Thu, Apr 24, 2003 at 04:52:04PM -0400, J. Johnston wrote:
> >
> >>The following is the last part of my revised nptl patch that has
> >>been broken up per Daniel J.'s suggestion.  There are no generated
> >>files included in the patch.
> >
> >
> >Well, this patch doesn't work for me :(  Using 2.5.69, since I don't
> >have any of the Red Hat kernels available here at the moment.  It looks
> >like GDB bellies up around the second thread creation.
> >
> 
> Is this one of the gdb.threads testcases?  If not, do any of those run
> for you and/or can you send me a testcase for the problem below so we can 
> at least
> have something common to compare?

Sorry, I forgot to say.  This is just pthreads.exp, with a breakpoint
on common_routine.

> 
> -- Jeff J.
> 
> >A backtrace looks like:
> >#0  0xffffe402 in ?? ()
> >#1  0x080e1332 in stop_wait_callback (lp=0x0, data=0xbffff450)
> >    at /opt/src/gdb/src-gdblinks/gdb/lin-lwp.c:708
> >#2  0x080e159a in stop_wait_callback (lp=0x0, data=0xbffff450)
> >    at /opt/src/gdb/src-gdblinks/gdb/lin-lwp.c:870
> >#3  0x080e159a in stop_wait_callback (lp=0x0, data=0xbffff450)
> >    at /opt/src/gdb/src-gdblinks/gdb/lin-lwp.c:870
> >#4  0x080e159a in stop_wait_callback (lp=0x0, data=0xbffff450)
> >    at /opt/src/gdb/src-gdblinks/gdb/lin-lwp.c:870
> >
> >And that's not just the stack unwinder getting confused.  We really did
> >recurse until we ran out of stack.
> >
> >The superficial reason is this:
> >SWC: Pending event Segmentation Fault (stopped) in LWP 4490
> >
> >i.e. every time we resume it with no signal it SIGSEGV's again, and we
> >never get the SIGSTOP.
> >
> >Here's some more of the log:
> >(gdb) c
> >Continuing.
> >LLR: PTRACE_SINGLESTEP process 4498, 0 (resume event thread)
> >LLW: waitpid 4498 received Trace/breakpoint trap (stopped)
> >LLTA: PTRACE_PEEKUSER LWP 4498, 0, 0 (OK)
> >LLW: Candidate event Trace/breakpoint trap (stopped) in LWP 4498.
> >SEL: Select single-step LWP 4498
> >LLW: trap_ptid is LWP 4498.
> >RC:  PTRACE_CONT LWP 4497, 0, 0 (resume sibling)
> >LLR: PTRACE_CONT process 4498, 0 (resume event thread)
> >LLW: waitpid 4497 received Trace/breakpoint trap (stopped)
> >LLTA: PTRACE_PEEKUSER LWP 4497, 0, 0 (OK)
> >LLW: Candidate event Trace/breakpoint trap (stopped) in LWP 4497.
> >SC:  kill LWP 4498 **<SIGSTOP>**
> >SC:  lwp kill 0 ERRNO-OK
> >SWC: waitpid LWP 4498 received Stopped (signal) (stopped)
> >LLTA: PTRACE_PEEKUSER LWP 4498, 0, 0 (OK)
> >LLW: trap_ptid is LWP 4497.
> >[New Thread 1077276112 (LWP 4499)]
> >LLAL: PTRACE_ATTACH LWP 4499, 0, 0 (OK)
> >LLAL: waitpid LWP 4499 received Stopped (signal) (stopped)
> >LLR: PTRACE_SINGLESTEP process 4497, 0 (resume event thread)
> >LLW: waitpid 4497 received Trace/breakpoint trap (stopped)
> >LLTA: PTRACE_PEEKUSER LWP 4497, 0, 0 (OK)
> >LLW: Candidate event Trace/breakpoint trap (stopped) in LWP 4497.
> >SEL: Select single-step LWP 4497
> >LLW: trap_ptid is LWP 4497.
> >RC:  PTRACE_CONT LWP 4499, 0, 0 (resume sibling)
> >RC:  PTRACE_CONT LWP 4498, 0, 0 (resume sibling)
> >LLR: PTRACE_CONT process 4497, 0 (resume event thread)
> >LLW: waitpid 4499 received Trace/breakpoint trap (stopped)
> >LLTA: PTRACE_PEEKUSER LWP 4499, 0, 0 (OK)
> >LLW: Candidate event Trace/breakpoint trap (stopped) in LWP 4499.
> >SC:  kill LWP 4498 **<SIGSTOP>**
> >SC:  lwp kill 0 ERRNO-OK
> >SC:  kill LWP 4497 **<SIGSTOP>**
> >SC:  lwp kill 0 ERRNO-OK
> >SWC: waitpid LWP 4498 received Stopped (signal) (stopped)
> >LLTA: PTRACE_PEEKUSER LWP 4498, 0, 0 (OK)
> >SWC: waitpid LWP 4497 received Trace/breakpoint trap (stopped)
> >LLTA: PTRACE_PEEKUSER LWP 4497, 0, 0 (OK)
> >PTRACE_CONT LWP 4497, 0, 0 (OK)
> >SWC: Candidate SIGTRAP event in LWP 4497
> >SWC: waitpid LWP 4497 received Trace/breakpoint trap (stopped)
> >LLTA: PTRACE_PEEKUSER LWP 4497, 0, 0 (OK)
> >PTRACE_CONT LWP 4497, 0, 0 (OK)
> >SWC: Candidate SIGTRAP event in LWP 4497
> >SWC: waitpid LWP 4497 received Segmentation fault (stopped)
> >LLTA: PTRACE_PEEKUSER LWP 4497, 0, 0 (OK)
> >SWC: Pending event Segmentation fault (stopped) in LWP 4497
> >SWC: PTRACE_CONT LWP 4497, 0, 0 (OK)
> >SWC: waitpid LWP 4497 received Segmentation fault (stopped)
> >LLTA: PTRACE_PEEKUSER LWP 4497, 0, 0 (OK)
> >
> >
> >A little interpretation: 4497 hits the creation breakpoint.  We atach
> >to 4499.  4499 hits the common_routine breakpoint.  We stop 4497.  It
> >hits the breakpoint at thread creation again for the next thread.  We
> >PTRACE_CONT 4497 again trying to get the SIGSTOP, and get another
> >SIGTRAP - probably we were backed up from the breakpoint last time so
> >we hit it again.  We try _again_, and SIGSEGV because we're on the
> >second byte of a multi-byte instruction, the first byte having been
> >replaced by a breakpoint.
> >
> >Life explodes.
> >
> >
> >So:
> >  - stop_wait_callback should be fixed to not be so dumb when this
> >    happens.
> >  - we need to figure out how we got into this mess.
> >  - and why the SIGSTOP never showed up.
> >
> >I avoid this entire foul issue in gdbserver by not backtracking and
> >resuming the application; instead I just set a flag marking the next
> >SIGSTOP as "expected".  It's still not perfect but it's a great deal
> >better.  I can do even better when I have some time to play with
> >PTRACE_GETSIGINFO.
> >
> >I'm waiting for GDB to tell me how we got here.  The backtrace is more
> >than 40K frames, since I forgot to shrink the stack limit.  50K...
> >170K... ooh!
> >
> >#174697 0x080e1724 in stop_wait_callback (lp=0x0, data=0xbffff450)
> >    at /opt/src/gdb/src-gdblinks/gdb/lin-lwp.c:830
> >#174698 0x080e033d in iterate_over_lwps (callback=0x80e12d0 
> ><stop_wait_callback>, data=0x1181)
> >    at /opt/src/gdb/src-gdblinks/gdb/lin-lwp.c:293
> >#174699 0x080e251e in lin_lwp_wait (ptid={pid = -1, lwp = 0, tid = 0}, 
> >ourstatus=0x72)
> >    at /opt/src/gdb/src-gdblinks/gdb/lin-lwp.c:1499
> >#174700 0x08128ca3 in thread_db_wait (ptid={pid = -1, lwp = 0, tid = 0}, 
> >ourstatus=0xffffffff)
> >    at /opt/src/gdb/src-gdblinks/gdb/thread-db.c:846
> >#174701 0x080bc19e in wait_for_inferior () at 
> >/opt/src/gdb/src-gdblinks/gdb/infrun.c:1003
> >#174702 0x080bbf13 in proceed (addr=3221222720, siggnal=144, step=0)
> >    at /opt/src/gdb/src-gdblinks/gdb/infrun.c:814
> >#174703 0x080b8fb0 in continue_command (proc_count_exp=0x0, from_tty=1)
> >    at /opt/src/gdb/src-gdblinks/gdb/infcmd.c:539
> >
> >It wasn't worth the wait.  That didn't help much.
> >
> >
> 
> 
> 

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: nptl threading patch for linux
  2003-05-09 22:00 ` Daniel Jacobowitz
  2003-05-09 22:36   ` Andrew Cagney
  2003-05-09 23:38   ` J. Johnston
@ 2003-05-10 21:18   ` Daniel Jacobowitz
  2003-05-10 21:59     ` Daniel Jacobowitz
  2 siblings, 1 reply; 12+ messages in thread
From: Daniel Jacobowitz @ 2003-05-10 21:18 UTC (permalink / raw)
  To: J. Johnston, gdb-patches, roland

On Fri, May 09, 2003 at 06:00:11PM -0400, Daniel Jacobowitz wrote:
> On Thu, Apr 24, 2003 at 04:52:04PM -0400, J. Johnston wrote:
> > The following is the last part of my revised nptl patch that has
> > been broken up per Daniel J.'s suggestion.  There are no generated
> > files included in the patch.
> 
> Well, this patch doesn't work for me :(  Using 2.5.69, since I don't
> have any of the Red Hat kernels available here at the moment.  It looks
> like GDB bellies up around the second thread creation.

Sorry, I found the problem.  Configure was not finding tkill.  Entirely
a local problem; but how would you feel about something which set the
default number for __NR_tkill if __i386__?  Or has someone already
discouraged that approach?

So:

>   - stop_wait_callback should be fixed to not be so dumb when this
>     happens.

This one is still true.  Patch for another day.

>   - we need to figure out how we got into this mess.
>   - and why the SIGSTOP never showed up.

But these are resolved.

I now get fairly good test results with your patch and NPTL; there are
some failures because of the vsyscall issue being currently discussed. 
Backtraces in linux-dp don't work because the threads are stopped in
the kernel page.

I can also report that the kernel change I am testing to report thread
exits does not appear to cause your patch any problems.  Yay!  More on
that in a little while.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: nptl threading patch for linux
  2003-05-10 21:18   ` Daniel Jacobowitz
@ 2003-05-10 21:59     ` Daniel Jacobowitz
  2003-05-12 19:36       ` J. Johnston
  0 siblings, 1 reply; 12+ messages in thread
From: Daniel Jacobowitz @ 2003-05-10 21:59 UTC (permalink / raw)
  To: J. Johnston, gdb-patches, roland

On Sat, May 10, 2003 at 05:18:55PM -0400, Daniel Jacobowitz wrote:
> On Fri, May 09, 2003 at 06:00:11PM -0400, Daniel Jacobowitz wrote:
> > On Thu, Apr 24, 2003 at 04:52:04PM -0400, J. Johnston wrote:
> > > The following is the last part of my revised nptl patch that has
> > > been broken up per Daniel J.'s suggestion.  There are no generated
> > > files included in the patch.
> > 
> > Well, this patch doesn't work for me :(  Using 2.5.69, since I don't
> > have any of the Red Hat kernels available here at the moment.  It looks
> > like GDB bellies up around the second thread creation.
> 
> Sorry, I found the problem.  Configure was not finding tkill.  Entirely
> a local problem; but how would you feel about something which set the
> default number for __NR_tkill if __i386__?  Or has someone already
> discouraged that approach?
> 
> So:
> 
> >   - stop_wait_callback should be fixed to not be so dumb when this
> >     happens.
> 
> This one is still true.  Patch for another day.
> 
> >   - we need to figure out how we got into this mess.
> >   - and why the SIGSTOP never showed up.
> 
> But these are resolved.
> 
> 
> I now get fairly good test results with your patch and NPTL; there are
> some failures because of the vsyscall issue being currently discussed. 
> Backtraces in linux-dp don't work because the threads are stopped in
> the kernel page.
> 
> I can also report that the kernel change I am testing to report thread
> exits does not appear to cause your patch any problems.  Yay!  More on
> that in a little while.

I could cause segfaults in both the inferior and GDB, and some missed
single-steps.  I don't know if my kernel patch is at fault or your
patch, but I figured I'd write them up anyway for posterity and later
review.

Start with gdb.threads/print-threads.  Put a breakpoint on
thread_function and one on the printf ("Done\n") main.  Run, disable
the first breakpoint when you hit it, and say next.  You'll hit the
breakpoint in main instead of staying within thread_function.

This is timing sensitive.  It didn't show up with set debug
lin-lwp/target both on.

In other interesting notes, it looks like there is a (related?) problem
with target_thread_alive.  The LWP I'm single-stepping in appears to be
marked as not alive about half the time.  No idea what's up with that. 
It appears to come from thread_db_thread_alive, not from
lin_lwp_thread_alive, which always succeeds.

I can't reproduce the SIGSEGV now for some reason.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: nptl threading patch for linux
  2003-05-10 21:59     ` Daniel Jacobowitz
@ 2003-05-12 19:36       ` J. Johnston
  2003-08-17 20:49         ` Daniel Jacobowitz
  0 siblings, 1 reply; 12+ messages in thread
From: J. Johnston @ 2003-05-12 19:36 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb-patches, roland

Daniel Jacobowitz wrote:
> On Sat, May 10, 2003 at 05:18:55PM -0400, Daniel Jacobowitz wrote:
> 
>>On Fri, May 09, 2003 at 06:00:11PM -0400, Daniel Jacobowitz wrote:
>>
>>>On Thu, Apr 24, 2003 at 04:52:04PM -0400, J. Johnston wrote:
>>>
>>>>The following is the last part of my revised nptl patch that has
>>>>been broken up per Daniel J.'s suggestion.  There are no generated
>>>>files included in the patch.
>>>
>>>Well, this patch doesn't work for me :(  Using 2.5.69, since I don't
>>>have any of the Red Hat kernels available here at the moment.  It looks
>>>like GDB bellies up around the second thread creation.
>>
>>Sorry, I found the problem.  Configure was not finding tkill.  Entirely
>>a local problem; but how would you feel about something which set the
>>default number for __NR_tkill if __i386__?  Or has someone already
>>discouraged that approach?
>>

Glad to hear this.

>>So:
>>
>>
>>>  - stop_wait_callback should be fixed to not be so dumb when this
>>>    happens.
>>
>>This one is still true.  Patch for another day.
>>
>>
>>>  - we need to figure out how we got into this mess.
>>>  - and why the SIGSTOP never showed up.
>>
>>But these are resolved.
>>
>>
>>I now get fairly good test results with your patch and NPTL; there are
>>some failures because of the vsyscall issue being currently discussed. 
>>Backtraces in linux-dp don't work because the threads are stopped in
>>the kernel page.
>>
>>I can also report that the kernel change I am testing to report thread
>>exits does not appear to cause your patch any problems.  Yay!  More on
>>that in a little while.
> 

More good news.

> 
> I could cause segfaults in both the inferior and GDB, and some missed
> single-steps.  I don't know if my kernel patch is at fault or your
> patch, but I figured I'd write them up anyway for posterity and later
> review.
> 
> Start with gdb.threads/print-threads.  Put a breakpoint on
> thread_function and one on the printf ("Done\n") main.  Run, disable
> the first breakpoint when you hit it, and say next.  You'll hit the
> breakpoint in main instead of staying within thread_function.
> 

This does not fail on my test system.  I end up on line 42 after the next
is issued.

> This is timing sensitive.  It didn't show up with set debug
> lin-lwp/target both on.
> 
> In other interesting notes, it looks like there is a (related?) problem
> with target_thread_alive.  The LWP I'm single-stepping in appears to be
> marked as not alive about half the time.  No idea what's up with that. 
> It appears to come from thread_db_thread_alive, not from
> lin_lwp_thread_alive, which always succeeds.
> 
> I can't reproduce the SIGSEGV now for some reason.
> 

Have you managed to trace which test in thread_db_alive is returning false?

-- Jeff J.




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: nptl threading patch for linux
  2003-05-12 19:36       ` J. Johnston
@ 2003-08-17 20:49         ` Daniel Jacobowitz
  0 siblings, 0 replies; 12+ messages in thread
From: Daniel Jacobowitz @ 2003-08-17 20:49 UTC (permalink / raw)
  To: J. Johnston; +Cc: gdb-patches, roland

On Mon, May 12, 2003 at 03:36:35PM -0400, J. Johnston wrote:
> >
> >I could cause segfaults in both the inferior and GDB, and some missed
> >single-steps.  I don't know if my kernel patch is at fault or your
> >patch, but I figured I'd write them up anyway for posterity and later
> >review.
> >
> >Start with gdb.threads/print-threads.  Put a breakpoint on
> >thread_function and one on the printf ("Done\n") main.  Run, disable
> >the first breakpoint when you hit it, and say next.  You'll hit the
> >breakpoint in main instead of staying within thread_function.
> >
> 
> This does not fail on my test system.  I end up on line 42 after the next
> is issued.

Using 2.5.72 on a single-processor machine, which is what I had lying
around today, I could still reproduce it.  I don't think it's new;
rather, I think it's annoying.

We single-step the thread; because we are not at a breakpoint, since
it's been disabled, all other threads are continued during the
single-step.  The second time we do this, we get a thread creation
event.  GDB proceeds to lose track of the fact that it was
single-stepping, and resumes.

> >In other interesting notes, it looks like there is a (related?) problem
> >with target_thread_alive.  The LWP I'm single-stepping in appears to be
> >marked as not alive about half the time.  No idea what's up with that. 
> >It appears to come from thread_db_thread_alive, not from
> >lin_lwp_thread_alive, which always succeeds.
> >
> >I can't reproduce the SIGSEGV now for some reason.
> >
> 
> Have you managed to trace which test in thread_db_alive is returning false?

It may have been my imagination.  Now all appears well.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: nptl threading patch for linux
  2003-04-24 22:05 RFC: nptl threading patch for linux J. Johnston
  2003-05-09 22:00 ` Daniel Jacobowitz
@ 2003-06-02 18:49 ` Michael Snyder
  2003-06-04 20:52   ` J. Johnston
  1 sibling, 1 reply; 12+ messages in thread
From: Michael Snyder @ 2003-06-02 18:49 UTC (permalink / raw)
  To: J. Johnston; +Cc: gdb-patches


After reviewing the on-list discussion, it sounds like this patch
should go in.  Jeff, would you do the honors?

Michael

"J. Johnston" wrote:
> 
> The following is the last part of my revised nptl patch that has
> been broken up per Daniel J.'s suggestion.  There are no generated
> files included in the patch.
> 
> A bit of background is needed.  First of all, in the nptl model,
> lwps and pids are distinct entities.  If you perform a ps on a
> multithreaded application, you will not see its child threads show
> up.  This differs from the linuxthreads model whereby an lwp was
> actually a pid.  In the nptl model, you cannot issue a kill to
> a child thread via its lwp.  If you want to do this, you need to use
> the tkill syscall instead.  The action of sending a signal to a specific
> lwp is used commonly in lin-lwp.c.  The first part of the nptl change is to
> determine at configuration time if we have the tkill syscall and then at run-time,
> if we can use it.  A new function kill_lwp has been added which either calls the
> tkill syscall or the regular kill function as appropriate.
> 
> Another key difference in behavior between nptl and linuxthreads is what happens
> when a thread exits.  In the linuxthreads model, each thread that exits causes
> an exit event to occur.  The main thread can exit before all of its children.
> In the nptl model, only the main thread generates the exit event and it only
> does so after all child threads have exited.  So, to determine when an lwp
> has exited, we have to constantly check its status when we are able.  When
> we get an exit event we have to determine if we are exiting the program or
> just one of the threads.  Some additional logic has been added to the exit
> checking code such that if we have (lwp == pid), then we stop all active
> threads and check if they have already exited.  If they have not exited,
> we resume these threads, otherwise, we delete them.
> 
> Daniel J. brought up a good point regarding my previous attempt at this logic
> whereby I stopped all threads and resumed all threads.  That old logic is wrong
> when scheduler_locking is set on.  The new logic addresses this because
> it only stops/resumes threads which are not already stopped.  Currently, if we lock
> into a linuxthreads thread and we run it until exit, we will cause a gdb_assert() to trigger
> because we will field the exit event and end up with no running threads.  Under nptl, we
> never get notified of the thread exiting so for this scenario, we end up hung.
> In this instance, there is nothing we can do without a kernel change.  Daniel J.
> is looking into a kernel change.  For user threads, we could activate the death event
> in thread-db.c and make a low-level call to notify the lwp layer, but this does not
> handle lwps created directly by the end-user.  This issue needs to be discussed
> further outside of this patch because I want to propose changing the assertion to
> allow the user to resume the other threads.  I have tested the change with such a
> scenario (locking a thread and running until exit).  For linuxthreads we trigger the
> assertion as before.  For nptl, we hang as expected as we never get the exit event.
> A bug can be opened for this if the patch is accepted.
> 
> The last piece of logic has to do with a new interface needed by the nptl libthread_db
> library to get the thread area.  I have grouped this together with the lin-lwp changes
> because the lin-lwp changes cannot work without this crucial change.
> 
> This patch has been tested for both linuxthreads and with an nptl kernel.
> 
> Ok to commit?
> 
> -- Jeff J.
> 
> 2003-04-24  Jeff Johnston  <jjohnstn@redhat.com>
> 
>         * acconfig.h: Add HAVE_TKILL_SYSCALL definition check.
>         * config.in: Regenerated.
>         * configure.in: Add test for syscall function and check for
>         __NR_tkill macro in <syscall.h> to set HAVE_TKILL_SYSCALL.
>         * configure: Regenerated.
>         * thread-db.c (check_event): For create/death event breakpoints,
>         loop through all messages to ensure that we read the message
>         corresponding to the breakpoint we are at.
>         * lin-lwp.c [HAVE_TKILL_SYSCALL]: Include <unistd.h> and
>         <sys/syscall.h>.
>         (kill_lwp): New function that uses tkill syscall or
>         uses kill, depending on whether threading model is nptl or not.
>         All callers of kill() changed to use kill_lwp().
>         (lin_lwp_wait): Make special check when WIFEXITED occurs to
>         see if all threads have already exited in the nptl model.
>         (stop_and_resume_callback): New callback function used by the
>         lin_lwp_wait thread exit handling code.
>         (stop_wait_callback): Check for threads already having exited and
>         delete such threads fromt the lwp list when discovered.
>         (stop_callback): Don't assert retcode of kill call.
> 
>         Roland McGrath  <roland@redhat.com>
>         * i386-linux-nat.c (ps_get_thread_area): New function needed by
>         nptl libthread_db.
> 
>   -------------------------------------------------------------------------------
> Index: lin-lwp.c
> ===================================================================
> RCS file: /cvs/src/src/gdb/lin-lwp.c,v
> retrieving revision 1.43
> diff -u -r1.43 lin-lwp.c
> --- lin-lwp.c   28 Mar 2003 21:42:41 -0000      1.43
> +++ lin-lwp.c   23 Apr 2003 22:52:44 -0000
> @@ -24,6 +24,10 @@
>  #include "gdb_string.h"
>  #include <errno.h>
>  #include <signal.h>
> +#ifdef HAVE_TKILL_SYSCALL
> +#include <unistd.h>
> +#include <sys/syscall.h>
> +#endif
>  #include <sys/ptrace.h>
>  #include "gdb_wait.h"
> 
> @@ -156,6 +160,7 @@
> 
>  /* Prototypes for local functions.  */
>  static int stop_wait_callback (struct lwp_info *lp, void *data);
> +static int lin_lwp_thread_alive (ptid_t ptid);
> 
>  /* Convert wait status STATUS to a string.  Used for printing debug
>     messages only.  */
> @@ -627,6 +632,32 @@
>  }
> 
> 
> +/* Issue kill to specified lwp.  */
> +
> +static int tkill_failed;
> +
> +static int
> +kill_lwp (int lwpid, int signo)
> +{
> +  errno = 0;
> +
> +/* Use tkill, if possible, in case we are using nptl threads.  If tkill
> +   fails, then we are not using nptl threads and we should be using kill.  */
> +
> +#ifdef HAVE_TKILL_SYSCALL
> +  if (!tkill_failed)
> +    {
> +      int ret = syscall (__NR_tkill, lwpid, signo);
> +      if (errno != ENOSYS)
> +       return ret;
> +      errno = 0;
> +      tkill_failed = 1;
> +    }
> +#endif
> +
> +  return kill (lwpid, signo);
> +}
> +
>  /* Send a SIGSTOP to LP.  */
> 
>  static int
> @@ -642,8 +673,15 @@
>                               "SC:  kill %s **<SIGSTOP>**\n",
>                               target_pid_to_str (lp->ptid));
>         }
> -      ret = kill (GET_LWP (lp->ptid), SIGSTOP);
> -      gdb_assert (ret == 0);
> +      errno = 0;
> +      ret = kill_lwp (GET_LWP (lp->ptid), SIGSTOP);
> +      if (debug_lin_lwp)
> +       {
> +         fprintf_unfiltered (gdb_stdlog,
> +                             "SC:  lwp kill %d %s\n",
> +                             ret,
> +                             errno ? safe_strerror (errno) : "ERRNO-OK");
> +       }
> 
>        lp->signalled = 1;
>        gdb_assert (lp->status == 0);
> @@ -667,11 +705,23 @@
> 
>        gdb_assert (lp->status == 0);
> 
> -      pid = waitpid (GET_LWP (lp->ptid), &status, lp->cloned ? __WCLONE : 0);
> +      pid = waitpid (GET_LWP (lp->ptid), &status, 0);
>        if (pid == -1 && errno == ECHILD)
> -       /* OK, the proccess has disappeared.  We'll catch the actual
> -          exit event in lin_lwp_wait.  */
> -       return 0;
> +       {
> +         pid = waitpid (GET_LWP (lp->ptid), &status, __WCLONE);
> +         if (pid == -1 && errno == ECHILD)
> +           {
> +             /* The thread has previously exited.  We need to delete it now
> +                because in the case of nptl threads, there won't be an
> +                exit event unless it is the main thread.  */
> +             if (debug_lin_lwp)
> +               fprintf_unfiltered (gdb_stdlog,
> +                                   "SWC: %s exited.\n",
> +                                   target_pid_to_str (lp->ptid));
> +             delete_lwp (lp->ptid);
> +             return 0;
> +           }
> +       }
> 
>        gdb_assert (pid == GET_LWP (lp->ptid));
> 
> @@ -683,6 +733,7 @@
>                               status_to_str (status));
>         }
> 
> +      /* Check if the thread has exited.  */
>        if (WIFEXITED (status) || WIFSIGNALED (status))
>         {
>           gdb_assert (num_lwps > 1);
> @@ -697,7 +748,31 @@
>                                  target_pid_to_str (lp->ptid));
>             }
>           if (debug_lin_lwp)
> -           fprintf_unfiltered (gdb_stdlog, "SWC: %s exited.\n",
> +           fprintf_unfiltered (gdb_stdlog,
> +                               "SWC: %s exited.\n",
> +                               target_pid_to_str (lp->ptid));
> +
> +         delete_lwp (lp->ptid);
> +         return 0;
> +       }
> +
> +      /* Check if the current LWP has previously exited.  For nptl threads,
> +         there is no exit signal issued for LWPs that are not the
> +         main thread so we should check whenever the thread is stopped.  */
> +      if (!lin_lwp_thread_alive (lp->ptid))
> +       {
> +         if (in_thread_list (lp->ptid))
> +           {
> +             /* Core GDB cannot deal with us deleting the current
> +                thread.  */
> +             if (!ptid_equal (lp->ptid, inferior_ptid))
> +               delete_thread (lp->ptid);
> +             printf_unfiltered ("[%s exited]\n",
> +                                target_pid_to_str (lp->ptid));
> +           }
> +         if (debug_lin_lwp)
> +           fprintf_unfiltered (gdb_stdlog,
> +                               "SWC: %s already exited.\n",
>                                 target_pid_to_str (lp->ptid));
> 
>           delete_lwp (lp->ptid);
> @@ -756,7 +831,14 @@
>               /* If there's another event, throw it back into the queue. */
>               if (lp->status)
>                 {
> -                 kill (GET_LWP (lp->ptid), WSTOPSIG (lp->status));
> +                 if (debug_lin_lwp)
> +                   {
> +                     fprintf_unfiltered (gdb_stdlog,
> +                                         "SWC: kill %s, %s\n",
> +                                         target_pid_to_str (lp->ptid),
> +                                         status_to_str ((int) status));
> +                   }
> +                 kill_lwp (GET_LWP (lp->ptid), WSTOPSIG (lp->status));
>                 }
>               /* Save the sigtrap event. */
>               lp->status = status;
> @@ -800,7 +882,7 @@
>                                           target_pid_to_str (lp->ptid),
>                                           status_to_str ((int) status));
>                     }
> -                 kill (GET_LWP (lp->ptid), WSTOPSIG (status));
> +                 kill_lwp (GET_LWP (lp->ptid), WSTOPSIG (status));
>                 }
>               return 0;
>             }
> @@ -1049,6 +1131,25 @@
> 
>  #endif
> 
> +/* Stop an active thread, verify it still exists, then resume it.  */
> +
> +static int
> +stop_and_resume_callback (struct lwp_info *lp, void *data)
> +{
> +  struct lwp_info *ptr;
> +
> +  if (!lp->stopped && !lp->signalled)
> +    {
> +      stop_callback (lp, NULL);
> +      stop_wait_callback (lp, NULL);
> +      /* Resume if the lwp still exists.  */
> +      for (ptr = lwp_list; ptr; ptr = ptr->next)
> +       if (lp == ptr)
> +         resume_callback (lp, NULL);
> +    }
> +  return 0;
> +}
> +
>  static ptid_t
>  lin_lwp_wait (ptid_t ptid, struct target_waitstatus *ourstatus)
>  {
> @@ -1206,10 +1307,61 @@
>                 }
>             }
> 
> -         /* Make sure we don't report a TARGET_WAITKIND_EXITED or
> -            TARGET_WAITKIND_SIGNALLED event if there are still LWP's
> -            left in the process.  */
> +         /* Check if the thread has exited.  */
>           if ((WIFEXITED (status) || WIFSIGNALED (status)) && num_lwps > 1)
> +           {
> +             if (in_thread_list (lp->ptid))
> +               {
> +                 /* Core GDB cannot deal with us deleting the current
> +                    thread.  */
> +                 if (!ptid_equal (lp->ptid, inferior_ptid))
> +                   delete_thread (lp->ptid);
> +                 printf_unfiltered ("[%s exited]\n",
> +                                    target_pid_to_str (lp->ptid));
> +               }
> +
> +             /* If this is the main thread, we must stop all threads and
> +                verify if they are still alive.  This is because in the nptl
> +                thread model, there is no signal issued for exiting LWPs
> +                other than the main thread.  We only get the main thread
> +                exit signal once all child threads have already exited.
> +                If we stop all the threads and use the stop_wait_callback
> +                to check if they have exited we can determine whether this
> +                signal should be ignored or whether it means the end of the
> +                debugged application, regardless of which threading model
> +                is being used.  */
> +             if (GET_PID (lp->ptid) == GET_LWP (lp->ptid))
> +               {
> +                 lp->stopped = 1;
> +                 iterate_over_lwps (stop_and_resume_callback, NULL);
> +               }
> +
> +             if (debug_lin_lwp)
> +               fprintf_unfiltered (gdb_stdlog,
> +                                   "LLW: %s exited.\n",
> +                                   target_pid_to_str (lp->ptid));
> +
> +             delete_lwp (lp->ptid);
> +
> +             /* If there is at least one more LWP, then the exit signal
> +                was not the end of the debugged application and should be
> +                ignored.  */
> +             if (num_lwps > 0)
> +               {
> +                 /* Make sure there is at least one thread running.  */
> +                 gdb_assert (iterate_over_lwps (running_callback, NULL));
> +
> +                 /* Discard the event.  */
> +                 status = 0;
> +                 continue;
> +               }
> +           }
> +
> +         /* Check if the current LWP has previously exited.  In the nptl
> +            thread model, LWPs other than the main thread do not issue
> +            signals when they exit so we must check whenever the thread
> +            has stopped.  A similar check is made in stop_wait_callback().  */
> +         if (num_lwps > 1 && !lin_lwp_thread_alive (lp->ptid))
>             {
>               if (in_thread_list (lp->ptid))
>                 {
> Index: acconfig.h
> ===================================================================
> RCS file: /cvs/src/src/gdb/acconfig.h,v
> retrieving revision 1.24
> diff -u -r1.24 acconfig.h
> --- acconfig.h  4 Jan 2003 00:34:42 -0000       1.24
> +++ acconfig.h  23 Apr 2003 22:52:44 -0000
> @@ -95,6 +95,9 @@
>  /* Define if using Solaris thread debugging.  */
>  #undef HAVE_THREAD_DB_LIB
> 
> +/* Define if you support the tkill syscall.  */
> +#undef HAVE_TKILL_SYSCALL
> +
>  /* Define on a GNU/Linux system to work around problems in sys/procfs.h.  */
>  #undef START_INFERIOR_TRAPS_EXPECTED
>  #undef sys_quotactl
> Index: configure.in
> ===================================================================
> RCS file: /cvs/src/src/gdb/configure.in,v
> retrieving revision 1.126
> diff -u -r1.126 configure.in
> --- configure.in        26 Feb 2003 15:10:47 -0000      1.126
> +++ configure.in        23 Apr 2003 22:52:44 -0000
> @@ -384,6 +384,7 @@
>  AC_CHECK_FUNCS(setpgid setpgrp)
>  AC_CHECK_FUNCS(sigaction sigprocmask sigsetmask)
>  AC_CHECK_FUNCS(socketpair)
> +AC_CHECK_FUNCS(syscall)
> 
>  dnl AC_FUNC_SETPGRP does not work when cross compiling
>  dnl Instead, assume we will have a prototype for setpgrp if cross compiling.
> @@ -909,6 +910,24 @@
>  if test "x$gdb_cv_thread_db_h_has_td_notalloc" = "xyes"; then
>    AC_DEFINE(THREAD_DB_HAS_TD_NOTALLOC, 1,
>              [Define if <thread_db.h> has the TD_NOTALLOC error code.])
> +fi
> +
> +dnl See if we have a sys/syscall header file that has __NR_tkill.
> +if test "x$ac_cv_header_sys_syscall_h" = "xyes"; then
> +   AC_CACHE_CHECK([whether <sys/syscall.h> has __NR_tkill],
> +                  gdb_cv_sys_syscall_h_has_tkill,
> +     AC_TRY_COMPILE(
> +       [#include <sys/syscall.h>],
> +       [int i = __NR_tkill;],
> +       gdb_cv_sys_syscall_h_has_tkill=yes,
> +       gdb_cv_sys_syscall_h_has_tkill=no
> +     )
> +   )
> +fi
> +dnl See if we can issue tkill syscall.
> +if test "x$gdb_cv_sys_syscall_h_has_tkill" = "xyes" && test "x$ac_cv_func_syscall" = "xyes"; then
> +  AC_DEFINE(HAVE_TKILL_SYSCALL, 1,
> +            [Define if we can use the tkill syscall.])
>  fi
> 
>  dnl Handle optional features that can be enabled.
> Index: i386-linux-nat.c
> ===================================================================
> RCS file: /cvs/src/src/gdb/i386-linux-nat.c,v
> retrieving revision 1.44
> diff -u -r1.44 i386-linux-nat.c
> --- i386-linux-nat.c    16 Apr 2003 15:22:02 -0000      1.44
> +++ i386-linux-nat.c    23 Apr 2003 22:52:44 -0000
> @@ -70,6 +70,9 @@
>  /* Defines I386_LINUX_ORIG_EAX_REGNUM.  */
>  #include "i386-linux-tdep.h"
> 
> +/* Defines ps_err_e, struct ps_prochandle.  */
> +#include "gdb_proc_service.h"
> +
>  /* Prototypes for local functions.  */
>  static void dummy_sse_values (void);
> 
> @@ -682,6 +685,21 @@
>           offsetof (struct user, u_debugreg[regnum]), value);
>    if (errno != 0)
>      perror_with_name ("Couldn't write debug register");
> +}
> +
> +extern ps_err_e
> +ps_get_thread_area(const struct ps_prochandle *ph,
> +                  lwpid_t lwpid, int idx, void **base)
> +{
> +  unsigned long int desc[3];
> +#define PTRACE_GET_THREAD_AREA 25
> +
> +  if  (ptrace (PTRACE_GET_THREAD_AREA,
> +              lwpid, (void *) idx, (unsigned long) &desc) < 0)
> +    return PS_ERR;
> +
> +  *(int *)base = desc[1];
> +  return PS_OK;
>  }
> 
>  void


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RFC: nptl threading patch for linux
  2003-06-02 18:49 ` Michael Snyder
@ 2003-06-04 20:52   ` J. Johnston
  0 siblings, 0 replies; 12+ messages in thread
From: J. Johnston @ 2003-06-04 20:52 UTC (permalink / raw)
  To: Michael Snyder; +Cc: gdb-patches

Thanks Michael.  Patch checked in.  I deleted the ChangeLog entry
for thread-db.c which was already made into a separate patch.

-- Jeff J.

Michael Snyder wrote:
> After reviewing the on-list discussion, it sounds like this patch
> should go in.  Jeff, would you do the honors?
> 
> Michael
> 
> "J. Johnston" wrote:
> 
>>The following is the last part of my revised nptl patch that has
>>been broken up per Daniel J.'s suggestion.  There are no generated
>>files included in the patch.
>>
>>A bit of background is needed.  First of all, in the nptl model,
>>lwps and pids are distinct entities.  If you perform a ps on a
>>multithreaded application, you will not see its child threads show
>>up.  This differs from the linuxthreads model whereby an lwp was
>>actually a pid.  In the nptl model, you cannot issue a kill to
>>a child thread via its lwp.  If you want to do this, you need to use
>>the tkill syscall instead.  The action of sending a signal to a specific
>>lwp is used commonly in lin-lwp.c.  The first part of the nptl change is to
>>determine at configuration time if we have the tkill syscall and then at run-time,
>>if we can use it.  A new function kill_lwp has been added which either calls the
>>tkill syscall or the regular kill function as appropriate.
>>
>>Another key difference in behavior between nptl and linuxthreads is what happens
>>when a thread exits.  In the linuxthreads model, each thread that exits causes
>>an exit event to occur.  The main thread can exit before all of its children.
>>In the nptl model, only the main thread generates the exit event and it only
>>does so after all child threads have exited.  So, to determine when an lwp
>>has exited, we have to constantly check its status when we are able.  When
>>we get an exit event we have to determine if we are exiting the program or
>>just one of the threads.  Some additional logic has been added to the exit
>>checking code such that if we have (lwp == pid), then we stop all active
>>threads and check if they have already exited.  If they have not exited,
>>we resume these threads, otherwise, we delete them.
>>
>>Daniel J. brought up a good point regarding my previous attempt at this logic
>>whereby I stopped all threads and resumed all threads.  That old logic is wrong
>>when scheduler_locking is set on.  The new logic addresses this because
>>it only stops/resumes threads which are not already stopped.  Currently, if we lock
>>into a linuxthreads thread and we run it until exit, we will cause a gdb_assert() to trigger
>>because we will field the exit event and end up with no running threads.  Under nptl, we
>>never get notified of the thread exiting so for this scenario, we end up hung.
>>In this instance, there is nothing we can do without a kernel change.  Daniel J.
>>is looking into a kernel change.  For user threads, we could activate the death event
>>in thread-db.c and make a low-level call to notify the lwp layer, but this does not
>>handle lwps created directly by the end-user.  This issue needs to be discussed
>>further outside of this patch because I want to propose changing the assertion to
>>allow the user to resume the other threads.  I have tested the change with such a
>>scenario (locking a thread and running until exit).  For linuxthreads we trigger the
>>assertion as before.  For nptl, we hang as expected as we never get the exit event.
>>A bug can be opened for this if the patch is accepted.
>>
>>The last piece of logic has to do with a new interface needed by the nptl libthread_db
>>library to get the thread area.  I have grouped this together with the lin-lwp changes
>>because the lin-lwp changes cannot work without this crucial change.
>>
>>This patch has been tested for both linuxthreads and with an nptl kernel.
>>
>>Ok to commit?
>>
>>-- Jeff J.
>>
>>2003-04-24  Jeff Johnston  <jjohnstn@redhat.com>
>>
>>        * acconfig.h: Add HAVE_TKILL_SYSCALL definition check.
>>        * config.in: Regenerated.
>>        * configure.in: Add test for syscall function and check for
>>        __NR_tkill macro in <syscall.h> to set HAVE_TKILL_SYSCALL.
>>        * configure: Regenerated.
>>        * thread-db.c (check_event): For create/death event breakpoints,
>>        loop through all messages to ensure that we read the message
>>        corresponding to the breakpoint we are at.
>>        * lin-lwp.c [HAVE_TKILL_SYSCALL]: Include <unistd.h> and
>>        <sys/syscall.h>.
>>        (kill_lwp): New function that uses tkill syscall or
>>        uses kill, depending on whether threading model is nptl or not.
>>        All callers of kill() changed to use kill_lwp().
>>        (lin_lwp_wait): Make special check when WIFEXITED occurs to
>>        see if all threads have already exited in the nptl model.
>>        (stop_and_resume_callback): New callback function used by the
>>        lin_lwp_wait thread exit handling code.
>>        (stop_wait_callback): Check for threads already having exited and
>>        delete such threads fromt the lwp list when discovered.
>>        (stop_callback): Don't assert retcode of kill call.
>>
>>        Roland McGrath  <roland@redhat.com>
>>        * i386-linux-nat.c (ps_get_thread_area): New function needed by
>>        nptl libthread_db.
>>
>>  -------------------------------------------------------------------------------
>>Index: lin-lwp.c
>>===================================================================
>>RCS file: /cvs/src/src/gdb/lin-lwp.c,v
>>retrieving revision 1.43
>>diff -u -r1.43 lin-lwp.c
>>--- lin-lwp.c   28 Mar 2003 21:42:41 -0000      1.43
>>+++ lin-lwp.c   23 Apr 2003 22:52:44 -0000
>>@@ -24,6 +24,10 @@
>> #include "gdb_string.h"
>> #include <errno.h>
>> #include <signal.h>
>>+#ifdef HAVE_TKILL_SYSCALL
>>+#include <unistd.h>
>>+#include <sys/syscall.h>
>>+#endif
>> #include <sys/ptrace.h>
>> #include "gdb_wait.h"
>>
>>@@ -156,6 +160,7 @@
>>
>> /* Prototypes for local functions.  */
>> static int stop_wait_callback (struct lwp_info *lp, void *data);
>>+static int lin_lwp_thread_alive (ptid_t ptid);
>>
>> /* Convert wait status STATUS to a string.  Used for printing debug
>>    messages only.  */
>>@@ -627,6 +632,32 @@
>> }
>>
>>
>>+/* Issue kill to specified lwp.  */
>>+
>>+static int tkill_failed;
>>+
>>+static int
>>+kill_lwp (int lwpid, int signo)
>>+{
>>+  errno = 0;
>>+
>>+/* Use tkill, if possible, in case we are using nptl threads.  If tkill
>>+   fails, then we are not using nptl threads and we should be using kill.  */
>>+
>>+#ifdef HAVE_TKILL_SYSCALL
>>+  if (!tkill_failed)
>>+    {
>>+      int ret = syscall (__NR_tkill, lwpid, signo);
>>+      if (errno != ENOSYS)
>>+       return ret;
>>+      errno = 0;
>>+      tkill_failed = 1;
>>+    }
>>+#endif
>>+
>>+  return kill (lwpid, signo);
>>+}
>>+
>> /* Send a SIGSTOP to LP.  */
>>
>> static int
>>@@ -642,8 +673,15 @@
>>                              "SC:  kill %s **<SIGSTOP>**\n",
>>                              target_pid_to_str (lp->ptid));
>>        }
>>-      ret = kill (GET_LWP (lp->ptid), SIGSTOP);
>>-      gdb_assert (ret == 0);
>>+      errno = 0;
>>+      ret = kill_lwp (GET_LWP (lp->ptid), SIGSTOP);
>>+      if (debug_lin_lwp)
>>+       {
>>+         fprintf_unfiltered (gdb_stdlog,
>>+                             "SC:  lwp kill %d %s\n",
>>+                             ret,
>>+                             errno ? safe_strerror (errno) : "ERRNO-OK");
>>+       }
>>
>>       lp->signalled = 1;
>>       gdb_assert (lp->status == 0);
>>@@ -667,11 +705,23 @@
>>
>>       gdb_assert (lp->status == 0);
>>
>>-      pid = waitpid (GET_LWP (lp->ptid), &status, lp->cloned ? __WCLONE : 0);
>>+      pid = waitpid (GET_LWP (lp->ptid), &status, 0);
>>       if (pid == -1 && errno == ECHILD)
>>-       /* OK, the proccess has disappeared.  We'll catch the actual
>>-          exit event in lin_lwp_wait.  */
>>-       return 0;
>>+       {
>>+         pid = waitpid (GET_LWP (lp->ptid), &status, __WCLONE);
>>+         if (pid == -1 && errno == ECHILD)
>>+           {
>>+             /* The thread has previously exited.  We need to delete it now
>>+                because in the case of nptl threads, there won't be an
>>+                exit event unless it is the main thread.  */
>>+             if (debug_lin_lwp)
>>+               fprintf_unfiltered (gdb_stdlog,
>>+                                   "SWC: %s exited.\n",
>>+                                   target_pid_to_str (lp->ptid));
>>+             delete_lwp (lp->ptid);
>>+             return 0;
>>+           }
>>+       }
>>
>>       gdb_assert (pid == GET_LWP (lp->ptid));
>>
>>@@ -683,6 +733,7 @@
>>                              status_to_str (status));
>>        }
>>
>>+      /* Check if the thread has exited.  */
>>       if (WIFEXITED (status) || WIFSIGNALED (status))
>>        {
>>          gdb_assert (num_lwps > 1);
>>@@ -697,7 +748,31 @@
>>                                 target_pid_to_str (lp->ptid));
>>            }
>>          if (debug_lin_lwp)
>>-           fprintf_unfiltered (gdb_stdlog, "SWC: %s exited.\n",
>>+           fprintf_unfiltered (gdb_stdlog,
>>+                               "SWC: %s exited.\n",
>>+                               target_pid_to_str (lp->ptid));
>>+
>>+         delete_lwp (lp->ptid);
>>+         return 0;
>>+       }
>>+
>>+      /* Check if the current LWP has previously exited.  For nptl threads,
>>+         there is no exit signal issued for LWPs that are not the
>>+         main thread so we should check whenever the thread is stopped.  */
>>+      if (!lin_lwp_thread_alive (lp->ptid))
>>+       {
>>+         if (in_thread_list (lp->ptid))
>>+           {
>>+             /* Core GDB cannot deal with us deleting the current
>>+                thread.  */
>>+             if (!ptid_equal (lp->ptid, inferior_ptid))
>>+               delete_thread (lp->ptid);
>>+             printf_unfiltered ("[%s exited]\n",
>>+                                target_pid_to_str (lp->ptid));
>>+           }
>>+         if (debug_lin_lwp)
>>+           fprintf_unfiltered (gdb_stdlog,
>>+                               "SWC: %s already exited.\n",
>>                                target_pid_to_str (lp->ptid));
>>
>>          delete_lwp (lp->ptid);
>>@@ -756,7 +831,14 @@
>>              /* If there's another event, throw it back into the queue. */
>>              if (lp->status)
>>                {
>>-                 kill (GET_LWP (lp->ptid), WSTOPSIG (lp->status));
>>+                 if (debug_lin_lwp)
>>+                   {
>>+                     fprintf_unfiltered (gdb_stdlog,
>>+                                         "SWC: kill %s, %s\n",
>>+                                         target_pid_to_str (lp->ptid),
>>+                                         status_to_str ((int) status));
>>+                   }
>>+                 kill_lwp (GET_LWP (lp->ptid), WSTOPSIG (lp->status));
>>                }
>>              /* Save the sigtrap event. */
>>              lp->status = status;
>>@@ -800,7 +882,7 @@
>>                                          target_pid_to_str (lp->ptid),
>>                                          status_to_str ((int) status));
>>                    }
>>-                 kill (GET_LWP (lp->ptid), WSTOPSIG (status));
>>+                 kill_lwp (GET_LWP (lp->ptid), WSTOPSIG (status));
>>                }
>>              return 0;
>>            }
>>@@ -1049,6 +1131,25 @@
>>
>> #endif
>>
>>+/* Stop an active thread, verify it still exists, then resume it.  */
>>+
>>+static int
>>+stop_and_resume_callback (struct lwp_info *lp, void *data)
>>+{
>>+  struct lwp_info *ptr;
>>+
>>+  if (!lp->stopped && !lp->signalled)
>>+    {
>>+      stop_callback (lp, NULL);
>>+      stop_wait_callback (lp, NULL);
>>+      /* Resume if the lwp still exists.  */
>>+      for (ptr = lwp_list; ptr; ptr = ptr->next)
>>+       if (lp == ptr)
>>+         resume_callback (lp, NULL);
>>+    }
>>+  return 0;
>>+}
>>+
>> static ptid_t
>> lin_lwp_wait (ptid_t ptid, struct target_waitstatus *ourstatus)
>> {
>>@@ -1206,10 +1307,61 @@
>>                }
>>            }
>>
>>-         /* Make sure we don't report a TARGET_WAITKIND_EXITED or
>>-            TARGET_WAITKIND_SIGNALLED event if there are still LWP's
>>-            left in the process.  */
>>+         /* Check if the thread has exited.  */
>>          if ((WIFEXITED (status) || WIFSIGNALED (status)) && num_lwps > 1)
>>+           {
>>+             if (in_thread_list (lp->ptid))
>>+               {
>>+                 /* Core GDB cannot deal with us deleting the current
>>+                    thread.  */
>>+                 if (!ptid_equal (lp->ptid, inferior_ptid))
>>+                   delete_thread (lp->ptid);
>>+                 printf_unfiltered ("[%s exited]\n",
>>+                                    target_pid_to_str (lp->ptid));
>>+               }
>>+
>>+             /* If this is the main thread, we must stop all threads and
>>+                verify if they are still alive.  This is because in the nptl
>>+                thread model, there is no signal issued for exiting LWPs
>>+                other than the main thread.  We only get the main thread
>>+                exit signal once all child threads have already exited.
>>+                If we stop all the threads and use the stop_wait_callback
>>+                to check if they have exited we can determine whether this
>>+                signal should be ignored or whether it means the end of the
>>+                debugged application, regardless of which threading model
>>+                is being used.  */
>>+             if (GET_PID (lp->ptid) == GET_LWP (lp->ptid))
>>+               {
>>+                 lp->stopped = 1;
>>+                 iterate_over_lwps (stop_and_resume_callback, NULL);
>>+               }
>>+
>>+             if (debug_lin_lwp)
>>+               fprintf_unfiltered (gdb_stdlog,
>>+                                   "LLW: %s exited.\n",
>>+                                   target_pid_to_str (lp->ptid));
>>+
>>+             delete_lwp (lp->ptid);
>>+
>>+             /* If there is at least one more LWP, then the exit signal
>>+                was not the end of the debugged application and should be
>>+                ignored.  */
>>+             if (num_lwps > 0)
>>+               {
>>+                 /* Make sure there is at least one thread running.  */
>>+                 gdb_assert (iterate_over_lwps (running_callback, NULL));
>>+
>>+                 /* Discard the event.  */
>>+                 status = 0;
>>+                 continue;
>>+               }
>>+           }
>>+
>>+         /* Check if the current LWP has previously exited.  In the nptl
>>+            thread model, LWPs other than the main thread do not issue
>>+            signals when they exit so we must check whenever the thread
>>+            has stopped.  A similar check is made in stop_wait_callback().  */
>>+         if (num_lwps > 1 && !lin_lwp_thread_alive (lp->ptid))
>>            {
>>              if (in_thread_list (lp->ptid))
>>                {
>>Index: acconfig.h
>>===================================================================
>>RCS file: /cvs/src/src/gdb/acconfig.h,v
>>retrieving revision 1.24
>>diff -u -r1.24 acconfig.h
>>--- acconfig.h  4 Jan 2003 00:34:42 -0000       1.24
>>+++ acconfig.h  23 Apr 2003 22:52:44 -0000
>>@@ -95,6 +95,9 @@
>> /* Define if using Solaris thread debugging.  */
>> #undef HAVE_THREAD_DB_LIB
>>
>>+/* Define if you support the tkill syscall.  */
>>+#undef HAVE_TKILL_SYSCALL
>>+
>> /* Define on a GNU/Linux system to work around problems in sys/procfs.h.  */
>> #undef START_INFERIOR_TRAPS_EXPECTED
>> #undef sys_quotactl
>>Index: configure.in
>>===================================================================
>>RCS file: /cvs/src/src/gdb/configure.in,v
>>retrieving revision 1.126
>>diff -u -r1.126 configure.in
>>--- configure.in        26 Feb 2003 15:10:47 -0000      1.126
>>+++ configure.in        23 Apr 2003 22:52:44 -0000
>>@@ -384,6 +384,7 @@
>> AC_CHECK_FUNCS(setpgid setpgrp)
>> AC_CHECK_FUNCS(sigaction sigprocmask sigsetmask)
>> AC_CHECK_FUNCS(socketpair)
>>+AC_CHECK_FUNCS(syscall)
>>
>> dnl AC_FUNC_SETPGRP does not work when cross compiling
>> dnl Instead, assume we will have a prototype for setpgrp if cross compiling.
>>@@ -909,6 +910,24 @@
>> if test "x$gdb_cv_thread_db_h_has_td_notalloc" = "xyes"; then
>>   AC_DEFINE(THREAD_DB_HAS_TD_NOTALLOC, 1,
>>             [Define if <thread_db.h> has the TD_NOTALLOC error code.])
>>+fi
>>+
>>+dnl See if we have a sys/syscall header file that has __NR_tkill.
>>+if test "x$ac_cv_header_sys_syscall_h" = "xyes"; then
>>+   AC_CACHE_CHECK([whether <sys/syscall.h> has __NR_tkill],
>>+                  gdb_cv_sys_syscall_h_has_tkill,
>>+     AC_TRY_COMPILE(
>>+       [#include <sys/syscall.h>],
>>+       [int i = __NR_tkill;],
>>+       gdb_cv_sys_syscall_h_has_tkill=yes,
>>+       gdb_cv_sys_syscall_h_has_tkill=no
>>+     )
>>+   )
>>+fi
>>+dnl See if we can issue tkill syscall.
>>+if test "x$gdb_cv_sys_syscall_h_has_tkill" = "xyes" && test "x$ac_cv_func_syscall" = "xyes"; then
>>+  AC_DEFINE(HAVE_TKILL_SYSCALL, 1,
>>+            [Define if we can use the tkill syscall.])
>> fi
>>
>> dnl Handle optional features that can be enabled.
>>Index: i386-linux-nat.c
>>===================================================================
>>RCS file: /cvs/src/src/gdb/i386-linux-nat.c,v
>>retrieving revision 1.44
>>diff -u -r1.44 i386-linux-nat.c
>>--- i386-linux-nat.c    16 Apr 2003 15:22:02 -0000      1.44
>>+++ i386-linux-nat.c    23 Apr 2003 22:52:44 -0000
>>@@ -70,6 +70,9 @@
>> /* Defines I386_LINUX_ORIG_EAX_REGNUM.  */
>> #include "i386-linux-tdep.h"
>>
>>+/* Defines ps_err_e, struct ps_prochandle.  */
>>+#include "gdb_proc_service.h"
>>+
>> /* Prototypes for local functions.  */
>> static void dummy_sse_values (void);
>>
>>@@ -682,6 +685,21 @@
>>          offsetof (struct user, u_debugreg[regnum]), value);
>>   if (errno != 0)
>>     perror_with_name ("Couldn't write debug register");
>>+}
>>+
>>+extern ps_err_e
>>+ps_get_thread_area(const struct ps_prochandle *ph,
>>+                  lwpid_t lwpid, int idx, void **base)
>>+{
>>+  unsigned long int desc[3];
>>+#define PTRACE_GET_THREAD_AREA 25
>>+
>>+  if  (ptrace (PTRACE_GET_THREAD_AREA,
>>+              lwpid, (void *) idx, (unsigned long) &desc) < 0)
>>+    return PS_ERR;
>>+
>>+  *(int *)base = desc[1];
>>+  return PS_OK;
>> }
>>
>> void
> 
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2003-08-17 20:49 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-24 22:05 RFC: nptl threading patch for linux J. Johnston
2003-05-09 22:00 ` Daniel Jacobowitz
2003-05-09 22:36   ` Andrew Cagney
2003-05-10  0:58     ` Daniel Jacobowitz
2003-05-09 23:38   ` J. Johnston
2003-05-10  0:58     ` Daniel Jacobowitz
2003-05-10 21:18   ` Daniel Jacobowitz
2003-05-10 21:59     ` Daniel Jacobowitz
2003-05-12 19:36       ` J. Johnston
2003-08-17 20:49         ` Daniel Jacobowitz
2003-06-02 18:49 ` Michael Snyder
2003-06-04 20:52   ` J. Johnston

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox