[RFC]: fix for recycled thread ids

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

* [RFC]: fix for recycled thread ids
@ 2004-03-19  0:36 Jeff Johnston
  2004-03-19  1:53 ` Daniel Jacobowitz
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff Johnston @ 2004-03-19  0:36 UTC (permalink / raw)
  To: gdb-patches

[-- Attachment #1: Type: text/plain, Size: 2737 bytes --]

The following patch fixes a problem when a user application creates a thread 
shortly after another thread has completed.  For nptl, thread ids are addresses. 
  If a thread completes/dies, the tid is available for reuse by a new thread.

On RH9 and RHEL3, nptl threads do not have exit events associated with them.  I 
have already discussed this with Daniel J. who feels that the kernels are not 
doing the right thing, but regardless, the current and previous RH nptl kernels 
are behaving this way and gdb needs to handle it.  As such, when a new thread is 
created, if it is reusing the tid of a previous thread that gdb hasn't figured 
out isn't around any more, gdb ignores the create event and the new thread is 
not added.  Ignoring the event is done because it is possible for gdb to find 
out about the thread before it's creation event is reported and so the create 
event can be redundant information.

The following fix removes the problem by also checking if the original lwp given 
to the old thread in the list matches the lwp for the new thread.  IIRC, there 
is at least one platform that allows threads to change their lwps dynamically. 
I do not believe that platform uses the thread-db layer, but to be safe, I did 
not use a direct compare between the original lwp and the new lwp.  Instead, I 
added a target vector routine that by default never returns an unequal 
comparison.  For linux, the comparison routine points to a new routine in 
lin-lwp.c which does the expected comparison.  The comparison is based on C 
compare routines whereby 0 means equal and non-zero means unequal.

Now, if a thread id for a new thread is found to already be on the list, the 
target comparison is made between the original lwp stored for the old thread and 
the lwp for the new thread.  If the comparison is unequal, the old thread is 
deleted and then the new thread is added.  Otherwise, the new thread event is 
ignored as before.

Ok to commit?

-- Jeff J.

2004-03-18  Jeff Johnston  <jjohnstn@redhat.com>

         * thread-db.c (struct private_thread_info): Add orig_lwp field.
         (attach_thread): Save original lwp for thread after attaching.
         (check_event): When a create event occurs and the thread is already
         in the thread list, check if the old thread has gone away and the
         thread id is being recycled.
         * target.h (struct target_ops): Add to_thread_compare function.
         * target.c (update_current_target): Add to_thread_compare support.
         (debug_to_thread_compare): New function.
         (setup_target_debug): Add debug_to_thread_compare.
         * lin-lwp.c (lin_lwp_thread_compare): New function.
         (init_lin_lwp_ops): Add lin_lwp_thread_compare.


[-- Attachment #2: thread.patch --]
[-- Type: text/plain, Size: 5910 bytes --]

Index: thread-db.c
===================================================================
RCS file: /cvs/src/src/gdb/thread-db.c,v
retrieving revision 1.37
diff -u -p -r1.37 thread-db.c
--- thread-db.c	29 Feb 2004 02:39:47 -0000	1.37
+++ thread-db.c	18 Mar 2004 23:14:01 -0000
@@ -156,6 +156,8 @@ struct private_thread_info
 
   td_thrhandle_t th;
   td_thrinfo_t ti;
+
+  lwpid_t orig_lwp;
 };
 \f
 
@@ -714,6 +716,9 @@ attach_thread (ptid_t ptid, const td_thr
   ATTACH_LWP (BUILD_LWP (ti_p->ti_lid, GET_PID (ptid)), 0);
 #endif
 
+  /* Save original lwp for future reference.  */
+  tp->private->orig_lwp = ti_p->ti_lid;
+
   /* Enable thread event reporting for this thread.  */
   err = td_thr_event_enable_p (th_p, 1);
   if (err != TD_OK)
@@ -850,8 +855,35 @@ check_event (ptid_t ptid)
 
 	  /* We may already know about this thread, for instance when the
 	     user has issued the `info threads' command before the SIGTRAP
-	     for hitting the thread creation breakpoint was reported.  */
-	  if (!in_thread_list (ptid))
+	     for hitting the thread creation breakpoint was reported.
+	     However, we have a problem when threads die and new threads
+	     are created shortly after.
+
+	     If a thread dies and gdb is unaware the thread has died,
+	     a new thread could reuse the tid that is no longer being
+	     used.  Thus, if we find the new tid is already in our list,
+	     we must punt to the lower level to tell us if the new thread
+	     and old thread are one and the same.  If they are different,
+	     the old thread must have gone away and the tid is being
+	     recycled.
+
+	     If we know the original thread has gone away, we can
+	     delete the previous entry with the tid in the thread list and then
+	     attach the new thread.  Otherwise, we ignore the create event.  */
+
+	  if (in_thread_list (ptid))
+	    {
+	      struct thread_info *t = find_thread_pid (ptid);
+	      if (target_beneath->to_thread_compare (BUILD_LWP (t->private->orig_lwp, 
+								GET_PID (ptid)), 
+						     BUILD_LWP (ti.ti_lid, 
+								GET_PID (ptid))))
+		{
+		  delete_thread (ptid);
+		  attach_thread (ptid, msg.th_p, &ti, 1);
+		}
+	    }
+	  else
 	    attach_thread (ptid, msg.th_p, &ti, 1);
 
 	  break;
Index: target.h
===================================================================
RCS file: /cvs/src/src/gdb/target.h,v
retrieving revision 1.57
diff -u -p -r1.57 target.h
--- target.h	4 Feb 2004 21:49:55 -0000	1.57
+++ target.h	18 Mar 2004 23:14:01 -0000
@@ -369,6 +369,7 @@ struct target_ops
     int (*to_can_run) (void);
     void (*to_notice_signals) (ptid_t ptid);
     int (*to_thread_alive) (ptid_t ptid);
+    int (*to_thread_compare) (ptid_t ptid1, ptid_t ptid2);
     void (*to_find_new_threads) (void);
     char *(*to_pid_to_str) (ptid_t);
     char *(*to_extra_thread_info) (struct thread_info *);
Index: target.c
===================================================================
RCS file: /cvs/src/src/gdb/target.c,v
retrieving revision 1.71
diff -u -p -r1.71 target.c
--- target.c	9 Mar 2004 16:16:52 -0000	1.71
+++ target.c	18 Mar 2004 23:14:01 -0000
@@ -161,6 +161,8 @@ static void debug_to_notice_signals (pti
 
 static int debug_to_thread_alive (ptid_t);
 
+static int debug_to_thread_compare (ptid_t, ptid_t);
+
 static void debug_to_stop (void);
 
 /* Pointer to array of target architecture structures; the size of the
@@ -424,6 +426,7 @@ update_current_target (void)
       INHERIT (to_can_run, t);
       INHERIT (to_notice_signals, t);
       INHERIT (to_thread_alive, t);
+      INHERIT (to_thread_compare, t);
       INHERIT (to_find_new_threads, t);
       INHERIT (to_pid_to_str, t);
       INHERIT (to_extra_thread_info, t);
@@ -601,6 +604,9 @@ update_current_target (void)
   de_fault (to_thread_alive, 
 	    (int (*) (ptid_t)) 
 	    return_zero);
+  de_fault (to_thread_compare, 
+	    (int (*) (ptid_t, ptid_t)) 
+	    return_zero);
   de_fault (to_find_new_threads, 
 	    (void (*) (void)) 
 	    target_ignore);
@@ -2263,6 +2269,19 @@ debug_to_thread_alive (ptid_t ptid)
   return retval;
 }
 
+static int
+debug_to_thread_compare (ptid_t ptid1, ptid_t ptid2)
+{
+  int retval;
+
+  retval = debug_target.to_thread_compare (ptid1, ptid2);
+
+  fprintf_unfiltered (gdb_stdlog, "target_thread_compare (%ld, %ld) = %d\n",
+		      TIDGET (ptid1), TIDGET (ptid2), retval);
+
+  return retval;
+}
+
 static void
 debug_to_find_new_threads (void)
 {
@@ -2393,6 +2412,7 @@ setup_target_debug (void)
   current_target.to_can_run = debug_to_can_run;
   current_target.to_notice_signals = debug_to_notice_signals;
   current_target.to_thread_alive = debug_to_thread_alive;
+  current_target.to_thread_compare = debug_to_thread_compare;
   current_target.to_find_new_threads = debug_to_find_new_threads;
   current_target.to_stop = debug_to_stop;
   current_target.to_xfer_partial = debug_to_xfer_partial;
Index: lin-lwp.c
===================================================================
RCS file: /cvs/src/src/gdb/lin-lwp.c,v
retrieving revision 1.52
diff -u -p -r1.52 lin-lwp.c
--- lin-lwp.c	8 Oct 2003 20:05:56 -0000	1.52
+++ lin-lwp.c	18 Mar 2004 23:14:01 -0000
@@ -1728,6 +1728,13 @@ lin_lwp_thread_alive (ptid_t ptid)
   return 1;
 }
 
+/* Compare thread lwps and return 0 if equal, non-zero if not equal.  */
+static int
+lin_lwp_thread_compare (ptid_t old, ptid_t new)
+{
+  return (GET_LWP (old) != GET_LWP (new));
+}
+
 static char *
 lin_lwp_pid_to_str (ptid_t ptid)
 {
@@ -1764,6 +1771,7 @@ init_lin_lwp_ops (void)
   lin_lwp_ops.to_create_inferior = lin_lwp_create_inferior;
   lin_lwp_ops.to_mourn_inferior = lin_lwp_mourn_inferior;
   lin_lwp_ops.to_thread_alive = lin_lwp_thread_alive;
+  lin_lwp_ops.to_thread_compare = lin_lwp_thread_compare;
   lin_lwp_ops.to_pid_to_str = lin_lwp_pid_to_str;
   lin_lwp_ops.to_post_startup_inferior = child_post_startup_inferior;
   lin_lwp_ops.to_post_attach = child_post_attach;

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC]: fix for recycled thread ids
  2004-03-19  0:36 [RFC]: fix for recycled thread ids Jeff Johnston
@ 2004-03-19  1:53 ` Daniel Jacobowitz
  2004-03-19 18:44   ` Jeff Johnston
  2004-03-19 20:33   ` [RFC]: fix for recycled thread ids Andrew Cagney
  0 siblings, 2 replies; 30+ messages in thread
From: Daniel Jacobowitz @ 2004-03-19  1:53 UTC (permalink / raw)
  To: Jeff Johnston; +Cc: gdb-patches

On Thu, Mar 18, 2004 at 07:36:25PM -0500, Jeff Johnston wrote:
> The following patch fixes a problem when a user application creates a 
> thread shortly after another thread has completed.  For nptl, thread ids 
> are addresses. If a thread completes/dies, the tid is available for reuse 
>  by a new thread.

Does NPTL re-use the TID quickly, or cycle around the way LT did so
that we only see this under high thread pressure?

> On RH9 and RHEL3, nptl threads do not have exit events associated with 
> them.  I have already discussed this with Daniel J. who feels that the 
> kernels are not doing the right thing, but regardless, the current and 
> previous RH nptl kernels are behaving this way and gdb needs to handle it.  
> As such, when a new thread is created, if it is reusing the tid of a 
> previous thread that gdb hasn't figured out isn't around any more, gdb 
> ignores the create event and the new thread is not added.  Ignoring the 
> event is done because it is possible for gdb to find out about the thread 
> before it's creation event is reported and so the create event can be 
> redundant information.

What I haven't seen a good explanation of is what problem this causes. 
If a thread goes away, and then a new thread using the same ID is
created, and then we stop, what do we lose besides the cosmetic fact
that there is no [New Thread] message?  Does anything go wrong?

Also, I would like the issue of whether or not it is a kernel bug
resolved before we discuss working around it in GDB.

> The following fix removes the problem by also checking if the original lwp 
> given to the old thread in the list matches the lwp for the new thread.  
> IIRC, there is at least one platform that allows threads to change their 
> lwps dynamically. I do not believe that platform uses the thread-db layer, 
> but to be safe, I did not use a direct compare between the original lwp and 
> the new lwp.  Instead, I added a target vector routine that by default 
> never returns an unequal comparison.  For linux, the comparison routine 
> points to a new routine in lin-lwp.c which does the expected comparison.  
> The comparison is based on C compare routines whereby 0 means equal and 
> non-zero means unequal.

At least one such "platform" was NGPT, which runs on Linux (and worked
OK with thread-db.c, too).  It's obsolete nowadays.

> Now, if a thread id for a new thread is found to already be on the list, 
> the target comparison is made between the original lwp stored for the old 
> thread and the lwp for the new thread.  If the comparison is unequal, the 
> old thread is deleted and then the new thread is added.  Otherwise, the new 
> thread event is ignored as before.

If we aren't going to support migration, then why not just store the
LWP ID in the thread PTID structure?  That's why it has three fields,
I'd guess.

I don't think that the target hook makes much sense, since thread-db.c
is basically Linux-specific at this point and since the contents being
passed to the hook are thread-db-specific.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC]: fix for recycled thread ids
  2004-03-19  1:53 ` Daniel Jacobowitz
@ 2004-03-19 18:44   ` Jeff Johnston
  2004-03-19 19:01     ` Daniel Jacobowitz
  2004-03-19 20:33   ` [RFC]: fix for recycled thread ids Andrew Cagney
  1 sibling, 1 reply; 30+ messages in thread
From: Jeff Johnston @ 2004-03-19 18:44 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb-patches

[-- Attachment #1: Type: text/plain, Size: 4389 bytes --]

Daniel Jacobowitz wrote:
> On Thu, Mar 18, 2004 at 07:36:25PM -0500, Jeff Johnston wrote:
> 
>>The following patch fixes a problem when a user application creates a 
>>thread shortly after another thread has completed.  For nptl, thread ids 
>>are addresses. If a thread completes/dies, the tid is available for reuse 
>> by a new thread.
> 
> 
> Does NPTL re-use the TID quickly, or cycle around the way LT did so
> that we only see this under high thread pressure?
>

I can't say for sure as I don't maintain libthread_db.  The test case in 
question does create high thread pressure, but I think it would be a mistake to 
generalize and think that this couldn't happen in an existing application.

> 
>>On RH9 and RHEL3, nptl threads do not have exit events associated with 
>>them.  I have already discussed this with Daniel J. who feels that the 
>>kernels are not doing the right thing, but regardless, the current and 
>>previous RH nptl kernels are behaving this way and gdb needs to handle it.  
>>As such, when a new thread is created, if it is reusing the tid of a 
>>previous thread that gdb hasn't figured out isn't around any more, gdb 
>>ignores the create event and the new thread is not added.  Ignoring the 
>>event is done because it is possible for gdb to find out about the thread 
>>before it's creation event is reported and so the create event can be 
>>redundant information.
> 
> 
> What I haven't seen a good explanation of is what problem this causes. 
> If a thread goes away, and then a new thread using the same ID is
> created, and then we stop, what do we lose besides the cosmetic fact
> that there is no [New Thread] message?  Does anything go wrong?
> 
> Also, I would like the issue of whether or not it is a kernel bug
> resolved before we discuss working around it in GDB.
> 

The problem is if a global signal is passed on to the inferior program when 
there are threads we have not attached to, the process terminates.  A Ctrl-C is 
such a signal.  In the example program, we only attach to the first 100 threads 
and when the Ctrl-C is issued, we get:

ptrace: No such process.
thread_db_get_info: cannot get thread info: generic error

The end-user is cooked.

Regarding resolving this issue as a kernel error, any fix for RHEL3 won't get 
shipped until Update 3.  I know of no scheduled update for RH9 and this would 
not qualify as a security update.

If I read your subsequent comments right, you wouldn't have a problem with me 
just making the lwp compare directly in the thread-db.c code.  This would mean 
the change would be all of 6 lines of code.  Does it make a lot of sense to 
withhold this change in light of the time-frame a kernel fix could be issued and 
the simplicity of the change itself?

> 
>>The following fix removes the problem by also checking if the original lwp 
>>given to the old thread in the list matches the lwp for the new thread.  
>>IIRC, there is at least one platform that allows threads to change their 
>>lwps dynamically. I do not believe that platform uses the thread-db layer, 
>>but to be safe, I did not use a direct compare between the original lwp and 
>>the new lwp.  Instead, I added a target vector routine that by default 
>>never returns an unequal comparison.  For linux, the comparison routine 
>>points to a new routine in lin-lwp.c which does the expected comparison.  
>>The comparison is based on C compare routines whereby 0 means equal and 
>>non-zero means unequal.
> 
> 
> At least one such "platform" was NGPT, which runs on Linux (and worked
> OK with thread-db.c, too).  It's obsolete nowadays.
> 
> 
>>Now, if a thread id for a new thread is found to already be on the list, 
>>the target comparison is made between the original lwp stored for the old 
>>thread and the lwp for the new thread.  If the comparison is unequal, the 
>>old thread is deleted and then the new thread is added.  Otherwise, the new 
>>thread event is ignored as before.
> 
> 
> If we aren't going to support migration, then why not just store the
> LWP ID in the thread PTID structure?  That's why it has three fields,
> I'd guess.
> 
> I don't think that the target hook makes much sense, since thread-db.c
> is basically Linux-specific at this point and since the contents being
> passed to the hook are thread-db-specific.
> 

Would it make sense to rename thread-db.c to lin-thread-db.c?

-- Jeff J.

[-- Attachment #2: lphello.c --]
[-- Type: text/plain, Size: 2973 bytes --]

#include <signal.h>

#include <pthread.h>
#include <unistd.h>
#include <string.h>
#include <netdb.h>
#include <stdio.h>
#include <stdlib.h>

/**
 * Compile with: 
 * gcc -g -Wall -lpthread -o lphello lphello.c
 *
 * Author Magnus Ihse, ihse@bea.com 
 */
#define THREAD_COUNT 100
#define ITER_COUNT 500000

static volatile int finishedArray[THREAD_COUNT];
static int pKey;

void* setup_altstack(void) {
   stack_t ss;

   ss.ss_sp = malloc(20*1024);
   if (ss.ss_sp == 0) {
      return NULL;
   }
   ss.ss_size = 20*1024;
   ss.ss_flags = 0;
       
   if (sigaltstack(&ss, NULL) == -1) {
      //perror("sigaltstack");
      return NULL;
   }
   return ss.ss_sp;
}

void takedown_altstack(void* stack) {
   struct sigaltstack ss;
   int result;

   if (stack == NULL) {
      return;
   }
   
   ss.ss_flags = SS_DISABLE;
   ss.ss_sp = (void*)47;  // This value should be ignored when ss_flags is SS_DISABLE
   ss.ss_size = 29;       // This value should be ignored when ss_flags is SS_DISABLE
   
   {
      result = sigaltstack(&ss, NULL);
      free(stack);
   }
}

void *threadfunc(void *arg) {
   int mypos = (int)(size_t)arg;
   int i;
   long square = 1;
   void* altstack = setup_altstack();
   
   pthread_setspecific(pKey, arg);
   for (i=0; i < 1000; i++) {
      square = i*i + square*mypos;
   }
   finishedArray[mypos] = 1;
   takedown_altstack(altstack);
   
   return NULL;
}


int main(int argc, char ** argv) {
   pthread_t threads[THREAD_COUNT];
   pthread_attr_t attr;
   int result;
   int i;
   int iteration;
   int finished;

   pthread_attr_init(&attr);
   pthread_attr_setstacksize(&attr, 128*1024);

   pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
  
   pthread_key_create(&pKey, NULL);

   for (iteration = 0; iteration < ITER_COUNT; iteration++) {
      if ((iteration % 100) == 0) {
         printf("\nStarting run series %i: ", iteration);
      }

      if ((iteration % 10) == 0) {
         printf(".");
         fflush(stdout);
      }

      // Clear array
      for (i = 0; i< THREAD_COUNT; i++) {
	 finishedArray[i] = 0;
      }

      // Start threads
      for (i = 0; i< THREAD_COUNT; i++) {
	 result = pthread_create(&threads[i], &attr, threadfunc, (void*)(size_t)i);
	 if (result != 0) {
	    perror("pthread_create");
	    exit(1);
	 }
      }

      // Join threads
      for (i = 0; i< THREAD_COUNT; i++) {
         result = pthread_join(threads[i], NULL);
         if (result != 0) {
            perror("pthread_join");
            exit(EXIT_FAILURE);
         }
      }
      
      // Spin waiting for results
      finished = 1;
      do {
         struct timespec req, rem;

         req.tv_sec = 0;
         req.tv_nsec = 10 * 1000 * 1000;

         nanosleep(&req, &rem);
         
         for (i = 0; i< THREAD_COUNT; i++) {
            if (finishedArray[i] != 1) {
               finished = 0;
               break;
            }
         }
      } while (!finished);

   }

   return EXIT_SUCCESS;
}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC]: fix for recycled thread ids
  2004-03-19 18:44   ` Jeff Johnston
@ 2004-03-19 19:01     ` Daniel Jacobowitz
  2004-03-19 19:35       ` Jeff Johnston
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Jacobowitz @ 2004-03-19 19:01 UTC (permalink / raw)
  To: Jeff Johnston; +Cc: gdb-patches

On Fri, Mar 19, 2004 at 01:44:19PM -0500, Jeff Johnston wrote:
> Daniel Jacobowitz wrote:
> >On Thu, Mar 18, 2004 at 07:36:25PM -0500, Jeff Johnston wrote:
> >
> >>The following patch fixes a problem when a user application creates a 
> >>thread shortly after another thread has completed.  For nptl, thread ids 
> >>are addresses. If a thread completes/dies, the tid is available for reuse 
> >>by a new thread.
> >
> >
> >Does NPTL re-use the TID quickly, or cycle around the way LT did so
> >that we only see this under high thread pressure?
> >
> 
> I can't say for sure as I don't maintain libthread_db.  The test case in 
> question does create high thread pressure, but I think it would be a 
> mistake to generalize and think that this couldn't happen in an existing 
> application.

I know you don't maintain NPTL, but this is the sort of question that
we need to understand before we can fix the problem correctly.  I see
that you've attached a testcase, so I'll take a look at it when I get
back from my trip on Monday.

> >>On RH9 and RHEL3, nptl threads do not have exit events associated with 
> >>them.  I have already discussed this with Daniel J. who feels that the 
> >>kernels are not doing the right thing, but regardless, the current and 
> >>previous RH nptl kernels are behaving this way and gdb needs to handle 
> >>it.  As such, when a new thread is created, if it is reusing the tid of a 
> >>previous thread that gdb hasn't figured out isn't around any more, gdb 
> >>ignores the create event and the new thread is not added.  Ignoring the 
> >>event is done because it is possible for gdb to find out about the thread 
> >>before it's creation event is reported and so the create event can be 
> >>redundant information.
> >
> >
> >What I haven't seen a good explanation of is what problem this causes. 
> >If a thread goes away, and then a new thread using the same ID is
> >created, and then we stop, what do we lose besides the cosmetic fact
> >that there is no [New Thread] message?  Does anything go wrong?
> >
> >Also, I would like the issue of whether or not it is a kernel bug
> >resolved before we discuss working around it in GDB.
> >
> 
> The problem is if a global signal is passed on to the inferior program when 
> there are threads we have not attached to, the process terminates.  A 
> Ctrl-C is such a signal.  In the example program, we only attach to the 
> first 100 threads and when the Ctrl-C is issued, we get:
> 
> ptrace: No such process.
> thread_db_get_info: cannot get thread info: generic error
> 
> The end-user is cooked.

OK.  So what you're saying is, the problem is that we do not see that
the new thread has been created, so we do not attach to it.  Is that
right?

Conceptually, we attach to LWPs, not to threads.  That suggests to me
that the correct fix is to ask the LWP layer if the LWP is attached
rather than looking it up in the thread list in the first place. 
We've already got an appropriate list of LWPs though we might need a
new accessor.

> Regarding resolving this issue as a kernel error, any fix for RHEL3 won't 
> get shipped until Update 3.  I know of no scheduled update for RH9 and this 
> would not qualify as a security update.

That's not what I said - I don't care whether an update is published
for any particular vendor's product.  I want us to understand whether
we are working around a kernel bug or fixing an actual bug in GDB.
That's another part of the problem that we need to understand in order
to fix it correctly.

As the author of the kernel code in question, I think that it's a
kernel bug.  Roland seemed to agree.

> Would it make sense to rename thread-db.c to lin-thread-db.c?

Probably not, but some explanatory comments may be in order.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC]: fix for recycled thread ids
  2004-03-19 19:01     ` Daniel Jacobowitz
@ 2004-03-19 19:35       ` Jeff Johnston
  2004-03-19 19:40         ` Daniel Jacobowitz
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff Johnston @ 2004-03-19 19:35 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb-patches

Daniel Jacobowitz wrote:
> On Fri, Mar 19, 2004 at 01:44:19PM -0500, Jeff Johnston wrote:
> 
>>Daniel Jacobowitz wrote:
>>
>>>On Thu, Mar 18, 2004 at 07:36:25PM -0500, Jeff Johnston wrote:
>>>
>>>
>>>>The following patch fixes a problem when a user application creates a 
>>>>thread shortly after another thread has completed.  For nptl, thread ids 
>>>>are addresses. If a thread completes/dies, the tid is available for reuse 
>>>>by a new thread.
>>>
>>>
>>>Does NPTL re-use the TID quickly, or cycle around the way LT did so
>>>that we only see this under high thread pressure?
>>>
>>
>>I can't say for sure as I don't maintain libthread_db.  The test case in 
>>question does create high thread pressure, but I think it would be a 
>>mistake to generalize and think that this couldn't happen in an existing 
>>application.
> 
> 
> I know you don't maintain NPTL, but this is the sort of question that
> we need to understand before we can fix the problem correctly.  I see
> that you've attached a testcase, so I'll take a look at it when I get
> back from my trip on Monday.
>

Ok, thanks.

> 
>>>>On RH9 and RHEL3, nptl threads do not have exit events associated with 
>>>>them.  I have already discussed this with Daniel J. who feels that the 
>>>>kernels are not doing the right thing, but regardless, the current and 
>>>>previous RH nptl kernels are behaving this way and gdb needs to handle 
>>>>it.  As such, when a new thread is created, if it is reusing the tid of a 
>>>>previous thread that gdb hasn't figured out isn't around any more, gdb 
>>>>ignores the create event and the new thread is not added.  Ignoring the 
>>>>event is done because it is possible for gdb to find out about the thread 
>>>>before it's creation event is reported and so the create event can be 
>>>>redundant information.
>>>
>>>
>>>What I haven't seen a good explanation of is what problem this causes. 
>>>If a thread goes away, and then a new thread using the same ID is
>>>created, and then we stop, what do we lose besides the cosmetic fact
>>>that there is no [New Thread] message?  Does anything go wrong?
>>>
>>>Also, I would like the issue of whether or not it is a kernel bug
>>>resolved before we discuss working around it in GDB.
>>>
>>
>>The problem is if a global signal is passed on to the inferior program when 
>>there are threads we have not attached to, the process terminates.  A 
>>Ctrl-C is such a signal.  In the example program, we only attach to the 
>>first 100 threads and when the Ctrl-C is issued, we get:
>>
>>ptrace: No such process.
>>thread_db_get_info: cannot get thread info: generic error
>>
>>The end-user is cooked.
> 
> 
> OK.  So what you're saying is, the problem is that we do not see that
> the new thread has been created, so we do not attach to it.  Is that
> right?
>

Yes.

> Conceptually, we attach to LWPs, not to threads.  That suggests to me
> that the correct fix is to ask the LWP layer if the LWP is attached
> rather than looking it up in the thread list in the first place. 
> We've already got an appropriate list of LWPs though we might need a
> new accessor.
> 

I like that idea.  We still have to deal with the bogus thread list entry.  The 
routine prune_threads calls thread_db_alive and it won't realize the thread info 
it has is bogus because it will find the tid is valid.

> 
>>Regarding resolving this issue as a kernel error, any fix for RHEL3 won't 
>>get shipped until Update 3.  I know of no scheduled update for RH9 and this 
>>would not qualify as a security update.
> 
> 
> That's not what I said - I don't care whether an update is published
> for any particular vendor's product.  I want us to understand whether
> we are working around a kernel bug or fixing an actual bug in GDB.
> That's another part of the problem that we need to understand in order
> to fix it correctly.
> 
> As the author of the kernel code in question, I think that it's a
> kernel bug.  Roland seemed to agree.
> 
> 
>>Would it make sense to rename thread-db.c to lin-thread-db.c?
> 
> 
> Probably not, but some explanatory comments may be in order.
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC]: fix for recycled thread ids
  2004-03-19 19:35       ` Jeff Johnston
@ 2004-03-19 19:40         ` Daniel Jacobowitz
  2004-03-19 21:32           ` Jeff Johnston
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Jacobowitz @ 2004-03-19 19:40 UTC (permalink / raw)
  To: Jeff Johnston; +Cc: gdb-patches

On Fri, Mar 19, 2004 at 02:35:40PM -0500, Jeff Johnston wrote:
> >Conceptually, we attach to LWPs, not to threads.  That suggests to me
> >that the correct fix is to ask the LWP layer if the LWP is attached
> >rather than looking it up in the thread list in the first place. 
> >We've already got an appropriate list of LWPs though we might need a
> >new accessor.
> >
> 
> I like that idea.  We still have to deal with the bogus thread list entry.  
> The routine prune_threads calls thread_db_alive and it won't realize the 
> thread info it has is bogus because it will find the tid is valid.

Do you think this will be a problem?  My hope is that it will just look
as if the thread has 'migrated' to a new LWP.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC]: fix for recycled thread ids
  2004-03-19 19:40         ` Daniel Jacobowitz
@ 2004-03-19 21:32           ` Jeff Johnston
  2004-03-24 15:51             ` Daniel Jacobowitz
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff Johnston @ 2004-03-19 21:32 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb-patches



Daniel Jacobowitz wrote:
> On Fri, Mar 19, 2004 at 02:35:40PM -0500, Jeff Johnston wrote:
> 
>>>Conceptually, we attach to LWPs, not to threads.  That suggests to me
>>>that the correct fix is to ask the LWP layer if the LWP is attached
>>>rather than looking it up in the thread list in the first place. 
>>>We've already got an appropriate list of LWPs though we might need a
>>>new accessor.
>>>
>>
>>I like that idea.  We still have to deal with the bogus thread list entry.  
>>The routine prune_threads calls thread_db_alive and it won't realize the 
>>thread info it has is bogus because it will find the tid is valid.
> 
> 
> Do you think this will be a problem?  My hope is that it will just look
> as if the thread has 'migrated' to a new LWP.
> 

It will have invalid state associated with it.  For example, the thread info has 
a prev_pc field.  As to all the havoc that the state may or may not cause, I 
think it would be a very good idea to clean it up now.  Who's to say what state 
will be added to thread_info in the future.

-- Jeff J.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC]: fix for recycled thread ids
  2004-03-19 21:32           ` Jeff Johnston
@ 2004-03-24 15:51             ` Daniel Jacobowitz
  2004-03-24 16:56               ` Daniel Jacobowitz
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Jacobowitz @ 2004-03-24 15:51 UTC (permalink / raw)
  To: Jeff Johnston; +Cc: gdb-patches

On Fri, Mar 19, 2004 at 04:32:36PM -0500, Jeff Johnston wrote:
> 
> 
> Daniel Jacobowitz wrote:
> >On Fri, Mar 19, 2004 at 02:35:40PM -0500, Jeff Johnston wrote:
> >
> >>>Conceptually, we attach to LWPs, not to threads.  That suggests to me
> >>>that the correct fix is to ask the LWP layer if the LWP is attached
> >>>rather than looking it up in the thread list in the first place. 
> >>>We've already got an appropriate list of LWPs though we might need a
> >>>new accessor.
> >>>
> >>
> >>I like that idea.  We still have to deal with the bogus thread list 
> >>entry.  The routine prune_threads calls thread_db_alive and it won't 
> >>realize the thread info it has is bogus because it will find the tid is 
> >>valid.
> >
> >
> >Do you think this will be a problem?  My hope is that it will just look
> >as if the thread has 'migrated' to a new LWP.
> >
> 
> It will have invalid state associated with it.  For example, the thread 
> info has a prev_pc field.  As to all the havoc that the state may or may 
> not cause, I think it would be a very good idea to clean it up now.  Who's 
> to say what state will be added to thread_info in the future.

It turns out that this is not only a problem, but the whole problem. 
Could you run this test under GDB, on RHEL, using strace?  Tell me
whether or not you see WIFEXITED results for every thread as it exits. 
I was assuming you did not, but I can reproduce the misbehavior here
even though I do get them.

The problem is that we get the LWP death events, but we treat the
threads and LWPs as completely independent sets.  We never find out
that the threads have died.

We don't enable thread death event reporting, because in glibc 2.1.3
there was a bug in the death reporting which would cause the debugged
program to segfault:

#if 0
  /* FIXME: kettenis/2000-04-23: The event reporting facility is
     broken for TD_DEATH events in glibc 2.1.3, so don't enable it for
     now.  */
  td_event_addset (&events, TD_DEATH);
#endif

Fortunately, in <gnu/libc-version.h> there is a function to return the
runtime version of glibc.  We should be able to use that - and the not
100% valid, but generally valid and already assumed by thread_db,
assumption that a native GDB, when used to debug native programs, is
debugging the same version of the C library - to enable TD_DEATH when
it is safe to do so.  This will let us detach the threads when they
die.

That has its own risks, since the thread continues to run for a short
while after the death event is reported.  For instance, in NPTL the
thread reports the event and then cleans up after itself; in LT I don't
remember whether the manager or the thread does this, but I think it's
the same.  I already wrote limited code to handle this, if you search
for "zombie" in thread-db.c, so it should be OK.  The gist is that we
remove it from the thread list right away, but do not detach the
thread.  We resist attaching to zombies.

At least, all that is how it looks to me.  I'll experiment with
TD_DEATH before I speculate further.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC]: fix for recycled thread ids
  2004-03-24 15:51             ` Daniel Jacobowitz
@ 2004-03-24 16:56               ` Daniel Jacobowitz
  2004-03-24 21:56                 ` Jeff Johnston
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Jacobowitz @ 2004-03-24 16:56 UTC (permalink / raw)
  To: Jeff Johnston, gdb-patches

On Wed, Mar 24, 2004 at 10:51:30AM -0500, Daniel Jacobowitz wrote:
> Fortunately, in <gnu/libc-version.h> there is a function to return the
> runtime version of glibc.  We should be able to use that - and the not
> 100% valid, but generally valid and already assumed by thread_db,
> assumption that a native GDB, when used to debug native programs, is
> debugging the same version of the C library - to enable TD_DEATH when
> it is safe to do so.  This will let us detach the threads when they
> die.
> 
> That has its own risks, since the thread continues to run for a short
> while after the death event is reported.  For instance, in NPTL the
> thread reports the event and then cleans up after itself; in LT I don't
> remember whether the manager or the thread does this, but I think it's
> the same.  I already wrote limited code to handle this, if you search
> for "zombie" in thread-db.c, so it should be OK.  The gist is that we
> remove it from the thread list right away, but do not detach the
> thread.  We resist attaching to zombies.
> 
> At least, all that is how it looks to me.  I'll experiment with
> TD_DEATH before I speculate further.

Here's a proof-of-concept patch.  It is not quite right, in that now I
can hit C-c about four times before I get the error().  It does
illustrate what I was suggesting, though, minus the version checking.

The reason we still get the segfaults is not that surprising.  I spent
a little while coaxing it into producing a core dump, and got this:

(gdb) i threads
  7 process 10303  0x40034531 in __nptl_create_event () from /lib/tls/i686/cmov/libpthread.so.0
  6 process 12467  0xffffe410 in __kernel_vsyscall ()
  5 process 12468  0xffffe410 in __kernel_vsyscall ()
  4 process 12469  0x0804884e in threadfunc (arg=0x25) at lphello.c:64
  3 process 12470  0x40034540 in __nptl_death_event () from /lib/tls/i686/cmov/libpthread.so.0
  2 process 12471  0xffffe410 in __kernel_vsyscall ()
* 1 process 12472  0xffffe410 in __kernel_vsyscall ()

So one thread is running, four are probably sleeping, one is dying...
and one is creating another thread.  I'd bet money that we aren't
attached to the one being created yet.  I can produce all sorts of
other interesting GDB internal errors this way, too.

Does this patch fix the problems you were seeing with less pathological
test cases?  Eclipse, in this case, is a less pathological test case,
since it creates threads that actually do a bit of work.  If it does,
I'll add the necessary glibc version checking.  The only way to fix the
race on the create side is to add PTRACE_EVENT_CLONE support (this _is_
what it was for), but that will require some major surgery.

Index: thread-db.c
===================================================================
RCS file: /cvs/src/src/gdb/thread-db.c,v
retrieving revision 1.37
diff -u -p -r1.37 thread-db.c
--- thread-db.c	29 Feb 2004 02:39:47 -0000	1.37
+++ thread-db.c	24 Mar 2004 16:47:23 -0000
@@ -1,6 +1,6 @@
 /* libthread_db assisted debugging support, generic parts.
 
-   Copyright 1999, 2000, 2001, 2003 Free Software Foundation, Inc.
+   Copyright 1999, 2000, 2001, 2003, 2004 Free Software Foundation, Inc.
 
    This file is part of GDB.
 
@@ -130,6 +130,7 @@ static CORE_ADDR td_death_bp_addr;
 static void thread_db_find_new_threads (void);
 static void attach_thread (ptid_t ptid, const td_thrhandle_t *th_p,
 			   const td_thrinfo_t *ti_p, int verbose);
+static void detach_thread (ptid_t ptid, int verbose);
 \f
 
 /* Building process ids.  */
@@ -154,6 +155,8 @@ struct private_thread_info
   unsigned int th_valid:1;
   unsigned int ti_valid:1;
 
+  unsigned int dying:1;
+
   td_thrhandle_t th;
   td_thrinfo_t ti;
 };
@@ -501,7 +504,7 @@ enable_thread_event_reporting (void)
   /* Set the process wide mask saying which events we're interested in.  */
   td_event_emptyset (&events);
   td_event_addset (&events, TD_CREATE);
-#if 0
+#if 1
   /* FIXME: kettenis/2000-04-23: The event reporting facility is
      broken for TD_DEATH events in glibc 2.1.3, so don't enable it for
      now.  */
@@ -696,6 +699,20 @@ attach_thread (ptid_t ptid, const td_thr
   struct thread_info *tp;
   td_err_e err;
 
+	  /* We may already know about this thread, for instance when the
+	     user has issued the `info threads' command before the SIGTRAP
+	     for hitting the thread creation breakpoint was reported.  */
+  if (in_thread_list (ptid))
+    {
+      tp = find_thread_pid (ptid);
+      gdb_assert (tp != NULL);
+
+      if (!tp->private->dying)
+        return;
+
+      delete_thread (ptid);
+    }
+
   check_thread_signals ();
 
   /* Add the thread to GDB's thread list.  */
@@ -741,8 +758,16 @@ thread_db_attach (char *args, int from_t
 static void
 detach_thread (ptid_t ptid, int verbose)
 {
+  struct thread_info *thread_info;
+
   if (verbose)
     printf_unfiltered ("[%s exited]\n", target_pid_to_str (ptid));
+
+  /* Don't delete the thread now, because it still reports as active until
+     it has executed a few instructions after the event breakpoint.  */
+  thread_info = find_thread_pid (ptid);
+  gdb_assert (thread_info != NULL);
+  thread_info->private->dying = 1;
 }
 
 static void
@@ -848,11 +873,7 @@ check_event (ptid_t ptid)
 	{
 	case TD_CREATE:
 
-	  /* We may already know about this thread, for instance when the
-	     user has issued the `info threads' command before the SIGTRAP
-	     for hitting the thread creation breakpoint was reported.  */
-	  if (!in_thread_list (ptid))
-	    attach_thread (ptid, msg.th_p, &ti, 1);
+	  attach_thread (ptid, msg.th_p, &ti, 1);
 
 	  break;
 
@@ -1121,8 +1142,7 @@ find_new_threads_callback (const td_thrh
 
   ptid = BUILD_THREAD (ti.ti_tid, GET_PID (inferior_ptid));
 
-  if (!in_thread_list (ptid))
-    attach_thread (ptid, th_p, &ti, 1);
+  attach_thread (ptid, th_p, &ti, 1);
 
   return 0;
 }


-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC]: fix for recycled thread ids
  2004-03-24 16:56               ` Daniel Jacobowitz
@ 2004-03-24 21:56                 ` Jeff Johnston
  2004-03-25  0:46                   ` Jeff Johnston
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff Johnston @ 2004-03-24 21:56 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb-patches

Daniel Jacobowitz wrote:
> On Wed, Mar 24, 2004 at 10:51:30AM -0500, Daniel Jacobowitz wrote:
> 
>>Fortunately, in <gnu/libc-version.h> there is a function to return the
>>runtime version of glibc.  We should be able to use that - and the not
>>100% valid, but generally valid and already assumed by thread_db,
>>assumption that a native GDB, when used to debug native programs, is
>>debugging the same version of the C library - to enable TD_DEATH when
>>it is safe to do so.  This will let us detach the threads when they
>>die.
>>
>>That has its own risks, since the thread continues to run for a short
>>while after the death event is reported.  For instance, in NPTL the
>>thread reports the event and then cleans up after itself; in LT I don't
>>remember whether the manager or the thread does this, but I think it's
>>the same.  I already wrote limited code to handle this, if you search
>>for "zombie" in thread-db.c, so it should be OK.  The gist is that we
>>remove it from the thread list right away, but do not detach the
>>thread.  We resist attaching to zombies.
>>
>>At least, all that is how it looks to me.  I'll experiment with
>>TD_DEATH before I speculate further.
> 
> 
> Here's a proof-of-concept patch.  It is not quite right, in that now I
> can hit C-c about four times before I get the error().  It does
> illustrate what I was suggesting, though, minus the version checking.
> 
> The reason we still get the segfaults is not that surprising.  I spent
> a little while coaxing it into producing a core dump, and got this:
> 
> (gdb) i threads
>   7 process 10303  0x40034531 in __nptl_create_event () from /lib/tls/i686/cmov/libpthread.so.0
>   6 process 12467  0xffffe410 in __kernel_vsyscall ()
>   5 process 12468  0xffffe410 in __kernel_vsyscall ()
>   4 process 12469  0x0804884e in threadfunc (arg=0x25) at lphello.c:64
>   3 process 12470  0x40034540 in __nptl_death_event () from /lib/tls/i686/cmov/libpthread.so.0
>   2 process 12471  0xffffe410 in __kernel_vsyscall ()
> * 1 process 12472  0xffffe410 in __kernel_vsyscall ()
> 
> So one thread is running, four are probably sleeping, one is dying...
> and one is creating another thread.  I'd bet money that we aren't
> attached to the one being created yet.  I can produce all sorts of
> other interesting GDB internal errors this way, too.
> 

I also tried the TD_DEATH method but gave up after running into segfaults.  I 
have to go back and recheck whether those segfaults could occur without the 
Crtl-C being issued.  I can't remember off-hand.

> Does this patch fix the problems you were seeing with less pathological
> test cases?  Eclipse, in this case, is a less pathological test case,
> since it creates threads that actually do a bit of work.  If it does,
> I'll add the necessary glibc version checking.  The only way to fix the
> race on the create side is to add PTRACE_EVENT_CLONE support (this _is_
> what it was for), but that will require some major surgery.
> 

As you know, I am also looking at the PTRACE_EVENT_CLONE stuff.  I'll try this 
patch out with the Eclipse test.  I also will include your lin-lwp linux event 
handler fix to see if that gets past the fork problem they were seeing.

> Index: thread-db.c
> ===================================================================
> RCS file: /cvs/src/src/gdb/thread-db.c,v
> retrieving revision 1.37
> diff -u -p -r1.37 thread-db.c
> --- thread-db.c	29 Feb 2004 02:39:47 -0000	1.37
> +++ thread-db.c	24 Mar 2004 16:47:23 -0000
> @@ -1,6 +1,6 @@
>  /* libthread_db assisted debugging support, generic parts.
>  
> -   Copyright 1999, 2000, 2001, 2003 Free Software Foundation, Inc.
> +   Copyright 1999, 2000, 2001, 2003, 2004 Free Software Foundation, Inc.
>  
>     This file is part of GDB.
>  
> @@ -130,6 +130,7 @@ static CORE_ADDR td_death_bp_addr;
>  static void thread_db_find_new_threads (void);
>  static void attach_thread (ptid_t ptid, const td_thrhandle_t *th_p,
>  			   const td_thrinfo_t *ti_p, int verbose);
> +static void detach_thread (ptid_t ptid, int verbose);
>  \f
>  
>  /* Building process ids.  */
> @@ -154,6 +155,8 @@ struct private_thread_info
>    unsigned int th_valid:1;
>    unsigned int ti_valid:1;
>  
> +  unsigned int dying:1;
> +
>    td_thrhandle_t th;
>    td_thrinfo_t ti;
>  };
> @@ -501,7 +504,7 @@ enable_thread_event_reporting (void)
>    /* Set the process wide mask saying which events we're interested in.  */
>    td_event_emptyset (&events);
>    td_event_addset (&events, TD_CREATE);
> -#if 0
> +#if 1
>    /* FIXME: kettenis/2000-04-23: The event reporting facility is
>       broken for TD_DEATH events in glibc 2.1.3, so don't enable it for
>       now.  */
> @@ -696,6 +699,20 @@ attach_thread (ptid_t ptid, const td_thr
>    struct thread_info *tp;
>    td_err_e err;
>  
> +	  /* We may already know about this thread, for instance when the
> +	     user has issued the `info threads' command before the SIGTRAP
> +	     for hitting the thread creation breakpoint was reported.  */
> +  if (in_thread_list (ptid))
> +    {
> +      tp = find_thread_pid (ptid);
> +      gdb_assert (tp != NULL);
> +
> +      if (!tp->private->dying)
> +        return;
> +
> +      delete_thread (ptid);
> +    }
> +
>    check_thread_signals ();
>  
>    /* Add the thread to GDB's thread list.  */
> @@ -741,8 +758,16 @@ thread_db_attach (char *args, int from_t
>  static void
>  detach_thread (ptid_t ptid, int verbose)
>  {
> +  struct thread_info *thread_info;
> +
>    if (verbose)
>      printf_unfiltered ("[%s exited]\n", target_pid_to_str (ptid));
> +
> +  /* Don't delete the thread now, because it still reports as active until
> +     it has executed a few instructions after the event breakpoint.  */
> +  thread_info = find_thread_pid (ptid);
> +  gdb_assert (thread_info != NULL);
> +  thread_info->private->dying = 1;
>  }
>  
>  static void
> @@ -848,11 +873,7 @@ check_event (ptid_t ptid)
>  	{
>  	case TD_CREATE:
>  
> -	  /* We may already know about this thread, for instance when the
> -	     user has issued the `info threads' command before the SIGTRAP
> -	     for hitting the thread creation breakpoint was reported.  */
> -	  if (!in_thread_list (ptid))
> -	    attach_thread (ptid, msg.th_p, &ti, 1);
> +	  attach_thread (ptid, msg.th_p, &ti, 1);
>  
>  	  break;
>  
> @@ -1121,8 +1142,7 @@ find_new_threads_callback (const td_thrh
>  
>    ptid = BUILD_THREAD (ti.ti_tid, GET_PID (inferior_ptid));
>  
> -  if (!in_thread_list (ptid))
> -    attach_thread (ptid, th_p, &ti, 1);
> +  attach_thread (ptid, th_p, &ti, 1);
>  
>    return 0;
>  }
> 
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC]: fix for recycled thread ids
  2004-03-24 21:56                 ` Jeff Johnston
@ 2004-03-25  0:46                   ` Jeff Johnston
  2004-03-25  4:39                     ` Daniel Jacobowitz
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff Johnston @ 2004-03-25  0:46 UTC (permalink / raw)
  To: Jeff Johnston; +Cc: Daniel Jacobowitz, gdb-patches

Jeff Johnston wrote:
> Daniel Jacobowitz wrote:
> 
>> On Wed, Mar 24, 2004 at 10:51:30AM -0500, Daniel Jacobowitz wrote:
>>
>>> Fortunately, in <gnu/libc-version.h> there is a function to return the
>>> runtime version of glibc.  We should be able to use that - and the not
>>> 100% valid, but generally valid and already assumed by thread_db,
>>> assumption that a native GDB, when used to debug native programs, is
>>> debugging the same version of the C library - to enable TD_DEATH when
>>> it is safe to do so.  This will let us detach the threads when they
>>> die.
>>>
>>> That has its own risks, since the thread continues to run for a short
>>> while after the death event is reported.  For instance, in NPTL the
>>> thread reports the event and then cleans up after itself; in LT I don't
>>> remember whether the manager or the thread does this, but I think it's
>>> the same.  I already wrote limited code to handle this, if you search
>>> for "zombie" in thread-db.c, so it should be OK.  The gist is that we
>>> remove it from the thread list right away, but do not detach the
>>> thread.  We resist attaching to zombies.
>>>
>>> At least, all that is how it looks to me.  I'll experiment with
>>> TD_DEATH before I speculate further.
>>
>>
>>
>> Here's a proof-of-concept patch.  It is not quite right, in that now I
>> can hit C-c about four times before I get the error().  It does
>> illustrate what I was suggesting, though, minus the version checking.
>>
>> The reason we still get the segfaults is not that surprising.  I spent
>> a little while coaxing it into producing a core dump, and got this:
>>
>> (gdb) i threads
>>   7 process 10303  0x40034531 in __nptl_create_event () from 
>> /lib/tls/i686/cmov/libpthread.so.0
>>   6 process 12467  0xffffe410 in __kernel_vsyscall ()
>>   5 process 12468  0xffffe410 in __kernel_vsyscall ()
>>   4 process 12469  0x0804884e in threadfunc (arg=0x25) at lphello.c:64
>>   3 process 12470  0x40034540 in __nptl_death_event () from 
>> /lib/tls/i686/cmov/libpthread.so.0
>>   2 process 12471  0xffffe410 in __kernel_vsyscall ()
>> * 1 process 12472  0xffffe410 in __kernel_vsyscall ()
>>
>> So one thread is running, four are probably sleeping, one is dying...
>> and one is creating another thread.  I'd bet money that we aren't
>> attached to the one being created yet.  I can produce all sorts of
>> other interesting GDB internal errors this way, too.
>>
> 
> I also tried the TD_DEATH method but gave up after running into 
> segfaults.  I have to go back and recheck whether those segfaults could 
> occur without the Crtl-C being issued.  I can't remember off-hand.
> 

Just to confirm, I don't see any segfaults just running, but this code is 
extremely brittle on my RHEL3-U1 system.  It is coming back to me why I 
abandoned TD_DEATH.

On the first Ctrl-C I usually get:

Program received signal SIGINT, Interrupt.
[Switching to Thread -1226028112 (LWP 16580)]
0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
(gdb) info threads
[New Thread -1226163280 (LWP 16579)]
Can't attach LWP 16579: Operation not permitted
(gdb) info threads
[New Thread -1226298448 (LWP 16578)]
Can't attach LWP 16578: Operation not permitted
(gdb) info threads
   158 Thread -1226298448 (LWP 16578)  0xb75ce6b1 in __nptl_death_event ()
    from /lib/tls/libpthread.so.0
   157 Thread -1226163280 (LWP 16579)  0xb75ce6b1 in __nptl_death_event ()
    from /lib/tls/libpthread.so.0
* 156 Thread -1226028112 (LWP 16580)  0xb75ebc32 in _dl_sysinfo_int80 ()
    from /lib/ld-linux.so.2
   1 Thread -1218598080 (LWP 16423)  0xb75ce6a1 in __nptl_create_event ()
    from /lib/tls/libpthread.so.0
(gdb) c

After continuing, the program will go shortly and then die.

[Thread -1226163280 (zombie) exited]
[New Thread -1226433616 (LWP 16674)]
Cannot enable thread event reporting for Thread -1226433616 (LWP 16674): generic 
error

My original fix doesn't break near as easily on RHEL-U1.  Hitting enough 
Ctrl-C's does eventually trigger an error (occassionally even a gdb assert), but 
this may be part of catching the race condition before we know about the thread.
Now, it is more than likely just a bug in the TD_DEATH event processing because 
we haven't exercised it.  Would it be worth looking at implementing your 
original alternative for my patch?

You asked me in a previous note to run with strace on and confirm that I was not 
getting WIFEXITTED for the dying threads.  I confirmed this.  I only get one 
WIFEXITTED and that is for the main thread.

I still need to do the Eclipse test.

-- Jeff J.


>> Does this patch fix the problems you were seeing with less pathological
>> test cases?  Eclipse, in this case, is a less pathological test case,
>> since it creates threads that actually do a bit of work.  If it does,
>> I'll add the necessary glibc version checking.  The only way to fix the
>> race on the create side is to add PTRACE_EVENT_CLONE support (this _is_
>> what it was for), but that will require some major surgery.
>>
> 
> As you know, I am also looking at the PTRACE_EVENT_CLONE stuff.  I'll 
> try this patch out with the Eclipse test.  I also will include your 
> lin-lwp linux event handler fix to see if that gets past the fork 
> problem they were seeing.
> 
>> Index: thread-db.c
>> ===================================================================
>> RCS file: /cvs/src/src/gdb/thread-db.c,v
>> retrieving revision 1.37
>> diff -u -p -r1.37 thread-db.c
>> --- thread-db.c    29 Feb 2004 02:39:47 -0000    1.37
>> +++ thread-db.c    24 Mar 2004 16:47:23 -0000
>> @@ -1,6 +1,6 @@
>>  /* libthread_db assisted debugging support, generic parts.
>>  
>> -   Copyright 1999, 2000, 2001, 2003 Free Software Foundation, Inc.
>> +   Copyright 1999, 2000, 2001, 2003, 2004 Free Software Foundation, Inc.
>>  
>>     This file is part of GDB.
>>  
>> @@ -130,6 +130,7 @@ static CORE_ADDR td_death_bp_addr;
>>  static void thread_db_find_new_threads (void);
>>  static void attach_thread (ptid_t ptid, const td_thrhandle_t *th_p,
>>                 const td_thrinfo_t *ti_p, int verbose);
>> +static void detach_thread (ptid_t ptid, int verbose);
>>  \f
>>  
>>  /* Building process ids.  */
>> @@ -154,6 +155,8 @@ struct private_thread_info
>>    unsigned int th_valid:1;
>>    unsigned int ti_valid:1;
>>  
>> +  unsigned int dying:1;
>> +
>>    td_thrhandle_t th;
>>    td_thrinfo_t ti;
>>  };
>> @@ -501,7 +504,7 @@ enable_thread_event_reporting (void)
>>    /* Set the process wide mask saying which events we're interested 
>> in.  */
>>    td_event_emptyset (&events);
>>    td_event_addset (&events, TD_CREATE);
>> -#if 0
>> +#if 1
>>    /* FIXME: kettenis/2000-04-23: The event reporting facility is
>>       broken for TD_DEATH events in glibc 2.1.3, so don't enable it for
>>       now.  */
>> @@ -696,6 +699,20 @@ attach_thread (ptid_t ptid, const td_thr
>>    struct thread_info *tp;
>>    td_err_e err;
>>  
>> +      /* We may already know about this thread, for instance when the
>> +         user has issued the `info threads' command before the SIGTRAP
>> +         for hitting the thread creation breakpoint was reported.  */
>> +  if (in_thread_list (ptid))
>> +    {
>> +      tp = find_thread_pid (ptid);
>> +      gdb_assert (tp != NULL);
>> +
>> +      if (!tp->private->dying)
>> +        return;
>> +
>> +      delete_thread (ptid);
>> +    }
>> +
>>    check_thread_signals ();
>>  
>>    /* Add the thread to GDB's thread list.  */
>> @@ -741,8 +758,16 @@ thread_db_attach (char *args, int from_t
>>  static void
>>  detach_thread (ptid_t ptid, int verbose)
>>  {
>> +  struct thread_info *thread_info;
>> +
>>    if (verbose)
>>      printf_unfiltered ("[%s exited]\n", target_pid_to_str (ptid));
>> +
>> +  /* Don't delete the thread now, because it still reports as active 
>> until
>> +     it has executed a few instructions after the event breakpoint.  */
>> +  thread_info = find_thread_pid (ptid);
>> +  gdb_assert (thread_info != NULL);
>> +  thread_info->private->dying = 1;
>>  }
>>  
>>  static void
>> @@ -848,11 +873,7 @@ check_event (ptid_t ptid)
>>      {
>>      case TD_CREATE:
>>  
>> -      /* We may already know about this thread, for instance when the
>> -         user has issued the `info threads' command before the SIGTRAP
>> -         for hitting the thread creation breakpoint was reported.  */
>> -      if (!in_thread_list (ptid))
>> -        attach_thread (ptid, msg.th_p, &ti, 1);
>> +      attach_thread (ptid, msg.th_p, &ti, 1);
>>  
>>        break;
>>  
>> @@ -1121,8 +1142,7 @@ find_new_threads_callback (const td_thrh
>>  
>>    ptid = BUILD_THREAD (ti.ti_tid, GET_PID (inferior_ptid));
>>  
>> -  if (!in_thread_list (ptid))
>> -    attach_thread (ptid, th_p, &ti, 1);
>> +  attach_thread (ptid, th_p, &ti, 1);
>>  
>>    return 0;
>>  }
>>
>>
> 
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC]: fix for recycled thread ids
  2004-03-25  0:46                   ` Jeff Johnston
@ 2004-03-25  4:39                     ` Daniel Jacobowitz
  2004-03-25 16:34                       ` Daniel Jacobowitz
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Jacobowitz @ 2004-03-25  4:39 UTC (permalink / raw)
  To: Jeff Johnston; +Cc: gdb-patches

On Wed, Mar 24, 2004 at 07:46:39PM -0500, Jeff Johnston wrote:
> Jeff Johnston wrote:
> >I also tried the TD_DEATH method but gave up after running into 
> >segfaults.  I have to go back and recheck whether those segfaults could 
> >occur without the Crtl-C being issued.  I can't remember off-hand.
> >
> 
> Just to confirm, I don't see any segfaults just running, but this code is 
> extremely brittle on my RHEL3-U1 system.  It is coming back to me why I 
> abandoned TD_DEATH.
> 
> On the first Ctrl-C I usually get:
> 
> Program received signal SIGINT, Interrupt.
> [Switching to Thread -1226028112 (LWP 16580)]
> 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
> (gdb) info threads
> [New Thread -1226163280 (LWP 16579)]
> Can't attach LWP 16579: Operation not permitted
> (gdb) info threads
> [New Thread -1226298448 (LWP 16578)]
> Can't attach LWP 16578: Operation not permitted

The usual cause of this message is attempting to attach to a thread
we're already attached to; on earlier versions of the patch I posted I
saw this error plenty often.  Notice that both that TID and LWP ID are
already in the thread list and trying to die:

> (gdb) info threads
>   158 Thread -1226298448 (LWP 16578)  0xb75ce6b1 in __nptl_death_event ()
>    from /lib/tls/libpthread.so.0
>   157 Thread -1226163280 (LWP 16579)  0xb75ce6b1 in __nptl_death_event ()
>    from /lib/tls/libpthread.so.0
> * 156 Thread -1226028112 (LWP 16580)  0xb75ebc32 in _dl_sysinfo_int80 ()
>    from /lib/ld-linux.so.2
>   1 Thread -1218598080 (LWP 16423)  0xb75ce6a1 in __nptl_create_event ()
>    from /lib/tls/libpthread.so.0
> (gdb) c
> 
> After continuing, the program will go shortly and then die.
> 
> [Thread -1226163280 (zombie) exited]
> [New Thread -1226433616 (LWP 16674)]
> Cannot enable thread event reporting for Thread -1226433616 (LWP 16674): 
> generic error
> 
> My original fix doesn't break near as easily on RHEL-U1.  Hitting enough 
> Ctrl-C's does eventually trigger an error (occassionally even a gdb 
> assert), but this may be part of catching the race condition before we know 
> about the thread.
> Now, it is more than likely just a bug in the TD_DEATH event processing 
> because we haven't exercised it.  Would it be worth looking at implementing 
> your original alternative for my patch?

I assume you're using the same test you posted, here.

Hmm, upon thinking about this problem, it's because you used info
threads after stopping and I didn't think to try that.  We end up
removing and re-adding the thread several times.  We'd really like to
delay removing the thread until we see a new create message for it, at
this point.  I see several ways to massage the code for that effect;
it's not the clearest, but for testing purposes we can revert the last
bit of the patch I sent you.  This bit:

@@ -1121,8 +1142,7 @@ find_new_threads_callback (const td_thrh

   ptid = BUILD_THREAD (ti.ti_tid, GET_PID (inferior_ptid));

-  if (!in_thread_list (ptid))
-    attach_thread (ptid, th_p, &ti, 1);
+  attach_thread (ptid, th_p, &ti, 1);

   return 0;
 }

After that we hit some other problems with GDB's book-keeping, though I
suspect they are related to the create race.  It's hard to tell.  The
process tends to zombie unexpectedly but that's the only error I've
seen with the bit above reverted.

Do you have any code for PTRACE_EVENT_CLONE yet, or should I put
something together in the morning to verify whether that's the problem?

> You asked me in a previous note to run with strace on and confirm that I 
> was not getting WIFEXITTED for the dying threads.  I confirmed this.  I 
> only get one WIFEXITTED and that is for the main thread.

OK, so we can't count on the thread exits.  You might be able to enable
PTRACE_EVENT_EXIT to work around this, if RHEL3 has that - but one
needs to be careful with that or you can really confuse GDB.  Anyway,
we don't need that for now.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC]: fix for recycled thread ids
  2004-03-25  4:39                     ` Daniel Jacobowitz
@ 2004-03-25 16:34                       ` Daniel Jacobowitz
  2004-03-25 20:22                         ` Jeff Johnston
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Jacobowitz @ 2004-03-25 16:34 UTC (permalink / raw)
  To: Jeff Johnston, gdb-patches

On Wed, Mar 24, 2004 at 11:39:50PM -0500, Daniel Jacobowitz wrote:
> Do you have any code for PTRACE_EVENT_CLONE yet, or should I put
> something together in the morning to verify whether that's the problem?

Here you go.  Again, this patch is obviously not ready to go into GDB,
but I have not been able to make it misbehave yet.  I don't know if all
the bits it needs work right in RHEL3, or if my testing was conclusive.

The highlights:
  - Includes most of the previous patch
  - Uses PTRACE_EVENT_CLONE to attach to new threads
  - Moves handling of events closer to the waitpid call

There are some potential races but I haven't hit any of them in
practice.  I suspect that with a heavy fork or vfork load (not clone)
you could produce interesting failure modes.

Give it a try, please.  If it works I'll clean it up.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer

Index: lin-lwp.c
===================================================================
RCS file: /cvs/src/src/gdb/lin-lwp.c,v
retrieving revision 1.53
diff -u -p -r1.53 lin-lwp.c
--- lin-lwp.c	22 Mar 2004 20:18:33 -0000	1.53
+++ lin-lwp.c	25 Mar 2004 16:31:30 -0000
@@ -183,6 +183,8 @@ add_lwp (ptid_t ptid)
 
   memset (lp, 0, sizeof (struct lwp_info));
 
+  lp->waitstatus.kind = TARGET_WAITKIND_IGNORE;
+
   lp->ptid = ptid;
 
   lp->next = lwp_list;
@@ -278,7 +280,7 @@ lin_lwp_open (char *args, int from_tty)
 void
 lin_lwp_attach_lwp (ptid_t ptid, int verbose)
 {
-  struct lwp_info *lp;
+  struct lwp_info *lp, *found_lp;
 
   gdb_assert (is_lwp (ptid));
 
@@ -293,13 +295,13 @@ lin_lwp_attach_lwp (ptid_t ptid, int ver
   if (verbose)
     printf_filtered ("[New %s]\n", target_pid_to_str (ptid));
 
-  lp = find_lwp_pid (ptid);
+  found_lp = lp = find_lwp_pid (ptid);
   if (lp == NULL)
     lp = add_lwp (ptid);
 
   /* We assume that we're already attached to any LWP that has an
      id equal to the overall process id.  */
-  if (GET_LWP (ptid) != GET_PID (ptid))
+  if (GET_LWP (ptid) != GET_PID (ptid) && found_lp == NULL)
     {
       pid_t pid;
       int status;
@@ -658,6 +660,30 @@ wait_lwp (struct lwp_info *lp)
 
   gdb_assert (WIFSTOPPED (status));
 
+  /* Handle GNU/Linux's extended waitstatus for trace events.  */
+  if (WIFSTOPPED (status) && WSTOPSIG (status) == SIGTRAP && status >> 16 != 0)
+    {
+      fprintf_unfiltered (gdb_stdlog,
+			  "WL: Handling extended status 0x%06x\n",
+			  status);
+      linux_handle_extended_wait (GET_LWP (lp->ptid), status,
+				  &lp->waitstatus);
+      if (lp->waitstatus.kind == TARGET_WAITKIND_SPURIOUS)
+	{
+	  struct lwp_info *new_lp;
+	  new_lp = add_lwp (BUILD_LWP (lp->waitstatus.value.related_pid,
+				       GET_PID (inferior_ptid)));
+	  new_lp->cloned = 1;
+	  new_lp->stopped = 1;
+	  lp->waitstatus.kind = TARGET_WAITKIND_IGNORE;
+	  fprintf_unfiltered (gdb_stdlog,
+			      "WL: Got clone event from LWP %d, resuming\n",
+			      GET_LWP (lp->ptid));
+	  ptrace (PTRACE_CONT, GET_LWP (lp->ptid), 0, 0);
+	  return wait_lwp (lp);
+	}
+    }
+
   return status;
 }
 
@@ -1161,7 +1187,14 @@ child_wait (ptid_t ptid, struct target_w
 
   /* Handle GNU/Linux's extended waitstatus for trace events.  */
   if (WIFSTOPPED (status) && WSTOPSIG (status) == SIGTRAP && status >> 16 != 0)
-    return linux_handle_extended_wait (pid, status, ourstatus);
+    {
+      ptid_t wait_ptid;
+      wait_ptid = linux_handle_extended_wait (pid, status, ourstatus);
+      if (ourstatus->kind == TARGET_WAITKIND_SPURIOUS)
+	ptrace (PTRACE_DETACH, ourstatus->value.related_pid,
+		0, 0);
+      return wait_ptid;
+    }
 
   store_waitstatus (ourstatus, status);
   return pid_to_ptid (pid);
@@ -1371,6 +1404,31 @@ retry:
 		}
 	    }
 
+	  /* Handle GNU/Linux's extended waitstatus for trace events.  */
+	  if (WIFSTOPPED (status) && WSTOPSIG (status) == SIGTRAP && status >> 16 != 0)
+	    {
+	      fprintf_unfiltered (gdb_stdlog,
+				  "LLW: Handling extended status 0x%06x\n",
+				  status);
+	      linux_handle_extended_wait (GET_LWP (lp->ptid), status,
+					  &lp->waitstatus);
+	      if (lp->waitstatus.kind == TARGET_WAITKIND_SPURIOUS)
+		{
+		  struct lwp_info *new_lp;
+		  new_lp = add_lwp (BUILD_LWP (lp->waitstatus.value.related_pid,
+					       GET_PID (inferior_ptid)));
+		  new_lp->cloned = 1;
+		  new_lp->stopped = 1;
+		  lp->waitstatus.kind = TARGET_WAITKIND_IGNORE;
+		  fprintf_unfiltered (gdb_stdlog,
+				      "LLW: Got clone event from LWP %d, resuming\n",
+				      GET_LWP (lp->ptid));
+		  ptrace (PTRACE_CONT, GET_LWP (lp->ptid), 0, 0);
+		  status = 0;
+		  continue;
+		}
+	    }
+
 	  /* Check if the thread has exited.  */
 	  if ((WIFEXITED (status) || WIFSIGNALED (status)) && num_lwps > 1)
 	    {
@@ -1588,14 +1646,14 @@ retry:
   else
     trap_ptid = null_ptid;
 
-  /* Handle GNU/Linux's extended waitstatus for trace events.  */
-  if (WIFSTOPPED (status) && WSTOPSIG (status) == SIGTRAP && status >> 16 != 0)
+  if (lp->waitstatus.kind != TARGET_WAITKIND_IGNORE)
     {
-      linux_handle_extended_wait (GET_LWP (lp->ptid), status, ourstatus);
-      return trap_ptid;
+      *ourstatus = lp->waitstatus;
+      lp->waitstatus.kind = TARGET_WAITKIND_IGNORE;
     }
+  else
+    store_waitstatus (ourstatus, status);
 
-  store_waitstatus (ourstatus, status);
   return (threaded ? lp->ptid : pid_to_ptid (GET_LWP (lp->ptid)));
 }
 
Index: linux-nat.c
===================================================================
RCS file: /cvs/src/src/gdb/linux-nat.c,v
retrieving revision 1.5
diff -u -p -r1.5 linux-nat.c
--- linux-nat.c	17 Aug 2003 20:17:02 -0000	1.5
+++ linux-nat.c	25 Mar 2004 16:31:30 -0000
@@ -224,7 +224,8 @@ linux_enable_event_reporting (ptid_t pti
   if (! linux_supports_tracefork ())
     return;
 
-  options = PTRACE_O_TRACEFORK | PTRACE_O_TRACEVFORK | PTRACE_O_TRACEEXEC;
+  options = PTRACE_O_TRACEFORK | PTRACE_O_TRACEVFORK | PTRACE_O_TRACEEXEC
+    | PTRACE_O_TRACECLONE;
   if (linux_supports_tracevforkdone ())
     options |= PTRACE_O_TRACEVFORKDONE;
 
@@ -391,11 +392,8 @@ linux_handle_extended_wait (int pid, int
 {
   int event = status >> 16;
 
-  if (event == PTRACE_EVENT_CLONE)
-    internal_error (__FILE__, __LINE__,
-		    "unexpected clone event");
-
-  if (event == PTRACE_EVENT_FORK || event == PTRACE_EVENT_VFORK)
+  if (event == PTRACE_EVENT_FORK || event == PTRACE_EVENT_VFORK
+      || event == PTRACE_EVENT_CLONE)
     {
       unsigned long new_pid;
       int ret;
@@ -406,12 +404,10 @@ linux_handle_extended_wait (int pid, int
       if (! pull_pid_from_list (&stopped_pids, new_pid))
 	{
 	  /* The new child has a pending SIGSTOP.  We can't affect it until it
-	     hits the SIGSTOP, but we're already attached.
-
-	     It won't be a clone (we didn't ask for clones in the event mask)
-	     so we can just call waitpid and wait for the SIGSTOP.  */
+	     hits the SIGSTOP, but we're already attached.  */
 	  do {
-	    ret = waitpid (new_pid, &status, 0);
+	    ret = waitpid (new_pid, &status,
+			   (event == PTRACE_EVENT_CLONE) ? __WCLONE : 0);
 	  } while (ret == -1 && errno == EINTR);
 	  if (ret == -1)
 	    perror_with_name ("waiting for new child");
@@ -423,8 +419,13 @@ linux_handle_extended_wait (int pid, int
 			    "wait returned unexpected status 0x%x", status);
 	}
 
-      ourstatus->kind = (event == PTRACE_EVENT_FORK)
-	? TARGET_WAITKIND_FORKED : TARGET_WAITKIND_VFORKED;
+      if (event == PTRACE_EVENT_FORK)
+	ourstatus->kind = TARGET_WAITKIND_FORKED;
+      else if (event == PTRACE_EVENT_VFORK)
+	ourstatus->kind = TARGET_WAITKIND_VFORKED;
+      else
+	ourstatus->kind = TARGET_WAITKIND_SPURIOUS;
+
       ourstatus->value.related_pid = new_pid;
       return inferior_ptid;
     }
Index: linux-nat.h
===================================================================
RCS file: /cvs/src/src/gdb/linux-nat.h,v
retrieving revision 1.5
diff -u -p -r1.5 linux-nat.h
--- linux-nat.h	14 Sep 2003 02:04:44 -0000	1.5
+++ linux-nat.h	25 Mar 2004 16:31:30 -0000
@@ -1,5 +1,5 @@
 /* Native debugging support for GNU/Linux (LWP layer).
-   Copyright 2000, 2001, 2002, 2003 Free Software Foundation, Inc.
+   Copyright 2000, 2001, 2002, 2003, 2004 Free Software Foundation, Inc.
 
    This file is part of GDB.
 
@@ -18,6 +18,8 @@
    Foundation, Inc., 59 Temple Place - Suite 330,
    Boston, MA 02111-1307, USA.  */
 
+#include "target.h"
+
 /* Structure describing an LWP.  */
 
 struct lwp_info
@@ -52,6 +54,10 @@ struct lwp_info
   /* Non-zero if we were stepping this LWP.  */
   int step;
 
+  /* If WAITSTATUS->KIND != TARGET_WAITKIND_SPURIOUS, the waitstatus
+     corresponding to STATUS above.  */
+  struct target_waitstatus waitstatus;
+
   /* Next LWP in list.  */
   struct lwp_info *next;
 };
@@ -60,7 +66,6 @@ struct lwp_info
    system".  */
 struct mem_attrib;
 struct target_ops;
-struct target_waitstatus;
 
 extern int linux_proc_xfer_memory (CORE_ADDR addr, char *myaddr, int len,
 				   int write, struct mem_attrib *attrib,
Index: thread-db.c
===================================================================
RCS file: /cvs/src/src/gdb/thread-db.c,v
retrieving revision 1.37
diff -u -p -r1.37 thread-db.c
--- thread-db.c	29 Feb 2004 02:39:47 -0000	1.37
+++ thread-db.c	25 Mar 2004 16:31:31 -0000
@@ -1,6 +1,6 @@
 /* libthread_db assisted debugging support, generic parts.
 
-   Copyright 1999, 2000, 2001, 2003 Free Software Foundation, Inc.
+   Copyright 1999, 2000, 2001, 2003, 2004 Free Software Foundation, Inc.
 
    This file is part of GDB.
 
@@ -130,6 +130,7 @@ static CORE_ADDR td_death_bp_addr;
 static void thread_db_find_new_threads (void);
 static void attach_thread (ptid_t ptid, const td_thrhandle_t *th_p,
 			   const td_thrinfo_t *ti_p, int verbose);
+static void detach_thread (ptid_t ptid, int verbose);
 \f
 
 /* Building process ids.  */
@@ -154,6 +155,8 @@ struct private_thread_info
   unsigned int th_valid:1;
   unsigned int ti_valid:1;
 
+  unsigned int dying:1;
+
   td_thrhandle_t th;
   td_thrinfo_t ti;
 };
@@ -501,7 +504,7 @@ enable_thread_event_reporting (void)
   /* Set the process wide mask saying which events we're interested in.  */
   td_event_emptyset (&events);
   td_event_addset (&events, TD_CREATE);
-#if 0
+#if 1
   /* FIXME: kettenis/2000-04-23: The event reporting facility is
      broken for TD_DEATH events in glibc 2.1.3, so don't enable it for
      now.  */
@@ -696,6 +699,20 @@ attach_thread (ptid_t ptid, const td_thr
   struct thread_info *tp;
   td_err_e err;
 
+	  /* We may already know about this thread, for instance when the
+	     user has issued the `info threads' command before the SIGTRAP
+	     for hitting the thread creation breakpoint was reported.  */
+  if (in_thread_list (ptid))
+    {
+      tp = find_thread_pid (ptid);
+      gdb_assert (tp != NULL);
+
+      if (!tp->private->dying)
+        return;
+
+      delete_thread (ptid);
+    }
+
   check_thread_signals ();
 
   /* Add the thread to GDB's thread list.  */
@@ -741,8 +758,16 @@ thread_db_attach (char *args, int from_t
 static void
 detach_thread (ptid_t ptid, int verbose)
 {
+  struct thread_info *thread_info;
+
   if (verbose)
     printf_unfiltered ("[%s exited]\n", target_pid_to_str (ptid));
+
+  /* Don't delete the thread now, because it still reports as active until
+     it has executed a few instructions after the event breakpoint.  */
+  thread_info = find_thread_pid (ptid);
+  gdb_assert (thread_info != NULL);
+  thread_info->private->dying = 1;
 }
 
 static void
@@ -848,11 +873,7 @@ check_event (ptid_t ptid)
 	{
 	case TD_CREATE:
 
-	  /* We may already know about this thread, for instance when the
-	     user has issued the `info threads' command before the SIGTRAP
-	     for hitting the thread creation breakpoint was reported.  */
-	  if (!in_thread_list (ptid))
-	    attach_thread (ptid, msg.th_p, &ti, 1);
+	  attach_thread (ptid, msg.th_p, &ti, 1);
 
 	  break;
 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC]: fix for recycled thread ids
  2004-03-25 16:34                       ` Daniel Jacobowitz
@ 2004-03-25 20:22                         ` Jeff Johnston
  2004-03-26 17:59                           ` [patch] Use TD_DEATH and PTRACE_EVENT_CLONE when available (was: Re: [RFC]: fix for recycled thread ids) Daniel Jacobowitz
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff Johnston @ 2004-03-25 20:22 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb-patches


Daniel Jacobowitz wrote:
> On Wed, Mar 24, 2004 at 11:39:50PM -0500, Daniel Jacobowitz wrote:
> 
>>Do you have any code for PTRACE_EVENT_CLONE yet, or should I put
>>something together in the morning to verify whether that's the problem?
> 
> 
> Here you go.  Again, this patch is obviously not ready to go into GDB,
> but I have not been able to make it misbehave yet.  I don't know if all
> the bits it needs work right in RHEL3, or if my testing was conclusive.
> 
> The highlights:
>   - Includes most of the previous patch
>   - Uses PTRACE_EVENT_CLONE to attach to new threads
>   - Moves handling of events closer to the waitpid call
> 
> There are some potential races but I haven't hit any of them in
> practice.  I suspect that with a heavy fork or vfork load (not clone)
> you could produce interesting failure modes.
> 
> Give it a try, please.  If it works I'll clean it up.
>

Works great.  I was unable to get it to misbehave on RHEL3.  Please go ahead 
cleaning it up.

-- Jeff J.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [patch] Use TD_DEATH and PTRACE_EVENT_CLONE when available (was: Re: [RFC]: fix for recycled thread ids)
  2004-03-25 20:22                         ` Jeff Johnston
@ 2004-03-26 17:59                           ` Daniel Jacobowitz
  2004-03-26 18:14                             ` [patch] Use TD_DEATH and PTRACE_EVENT_CLONE when available Jeff Johnston
  2004-03-29 18:07                             ` [patch] Use TD_DEATH and PTRACE_EVENT_CLONE when available (was: Re: [RFC]: fix for recycled thread ids) Daniel Jacobowitz
  0 siblings, 2 replies; 30+ messages in thread
From: Daniel Jacobowitz @ 2004-03-26 17:59 UTC (permalink / raw)
  To: gdb-patches

On Thu, Mar 25, 2004 at 03:22:13PM -0500, Jeff Johnston wrote:
> 
> Daniel Jacobowitz wrote:
> >On Wed, Mar 24, 2004 at 11:39:50PM -0500, Daniel Jacobowitz wrote:
> >
> >>Do you have any code for PTRACE_EVENT_CLONE yet, or should I put
> >>something together in the morning to verify whether that's the problem?
> >
> >
> >Here you go.  Again, this patch is obviously not ready to go into GDB,
> >but I have not been able to make it misbehave yet.  I don't know if all
> >the bits it needs work right in RHEL3, or if my testing was conclusive.
> >
> >The highlights:
> >  - Includes most of the previous patch
> >  - Uses PTRACE_EVENT_CLONE to attach to new threads
> >  - Moves handling of events closer to the waitpid call
> >
> >There are some potential races but I haven't hit any of them in
> >practice.  I suspect that with a heavy fork or vfork load (not clone)
> >you could produce interesting failure modes.
> >
> >Give it a try, please.  If it works I'll clean it up.
> >
> 
> Works great.  I was unable to get it to misbehave on RHEL3.  Please go 
> ahead cleaning it up.

Here's the final patch. The description of the patch is unchanged from
the above.  The background, for those who did not read the whole
discussion: in NPTL, a signal delivered to the process when any thread
is not attached is likely to terminate the process.  So we need to be
careful to:
 - know when threads have exited, so that we can attach to new threads
   which reuse the same thread ID
 - attach to threads as early as possible

Both of these could cause problems in LinuxThreads, but they were less
drastic.

Tested with both LinuxThreads and NPTL.  I'll commit this to HEAD on
Monday, barring objections.  We've missed the boat for GDB 6.1 at this
point.

The testcase you posted has credits in it, so we can't just grab it for
the testsuite.  But all we should need to test this is a loop that
creates short-lived threads, and then verifying that we can send C-c,
get a prompt, issue info threads, and continue a couple of times
without seeing anyting matching "error:".  Interested in writing that
test? :)

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer

2004-03-26  Daniel Jacobowitz  <drow@mvista.com>

	* Makefile.in (linux_nat_h): Update dependencies.
	* configure.in: Check for <gnu/libc-version.h>.
	* configure: Regenerate.
	* config.in: Regenerate.
	* linux-nat.h: Include "target.h".  Add waitstatus field to
	struct lwp_info.
	* lin-lwp.c (add_lwp): Initialize waitstatus.kind.
	(lin_lwp_attach_lwp): Don't attach to LWPs we have already attached
	to.
	(lin_lwp_handle_extended): New function.  Handle clone events.
	(wait_lwp): Use lin_lwp_handle_extended.  Update comment about
	thread exit events.
	(child_wait): Handle clone events.
	(lin_lwp_wait: Use lin_lwp_handle_extended and handle clone events.
	* linux-nat.c (linux_enable_event_reporting): Turn on
	PTRACE_O_TRACECLONE.
	(linux_handle_extended_wait): Handle clone events.
	* thread-db.c: Include <gnu/libc-version.h>.
	(struct private_thread_info): Add dying flag.
	(enable_thread_event_reporting): Enable TD_DEATH for glibc 2.2 and
	higher.
	(attach_thread): Update comments.  Handle dying threads.
	(detach_thread): Set the dying flag.
	(check_event): Always call attach_thread.

Index: Makefile.in
===================================================================
RCS file: /cvs/src/src/gdb/Makefile.in,v
retrieving revision 1.530
diff -u -p -r1.530 Makefile.in
--- Makefile.in	25 Mar 2004 01:27:26 -0000	1.530
+++ Makefile.in	26 Mar 2004 17:10:34 -0000
@@ -699,7 +699,7 @@ kod_h = kod.h
 language_h = language.h
 libunwind_frame_h = libunwind-frame.h $(libunwind_h)
 linespec_h = linespec.h
-linux_nat_h = linux-nat.h
+linux_nat_h = linux-nat.h $(target_h)
 m2_lang_h = m2-lang.h
 m68k_tdep_h = m68k-tdep.h
 macroexp_h = macroexp.h
Index: config.in
===================================================================
RCS file: /cvs/src/src/gdb/config.in,v
retrieving revision 1.61
diff -u -p -r1.61 config.in
--- config.in	20 Jan 2004 09:29:13 -0000	1.61
+++ config.in	26 Mar 2004 17:10:34 -0000
@@ -266,6 +266,9 @@
 /* Define if you have the <dirent.h> header file.  */
 #undef HAVE_DIRENT_H
 
+/* Define if you have the <gnu/libc-version.h> header file.  */
+#undef HAVE_GNU_LIBC_VERSION_H
+
 /* Define if you have the <libunwind-ia64.h> header file.  */
 #undef HAVE_LIBUNWIND_IA64_H
 
Index: configure
===================================================================
RCS file: /cvs/src/src/gdb/configure,v
retrieving revision 1.148
diff -u -p -r1.148 configure
--- configure	26 Feb 2004 00:41:46 -0000	1.148
+++ configure	26 Mar 2004 17:10:35 -0000
@@ -4756,7 +4756,7 @@ else
 fi
 done
 
-for ac_hdr in proc_service.h thread_db.h
+for ac_hdr in proc_service.h thread_db.h gnu/libc-version.h
 do
 ac_safe=`echo "$ac_hdr" | sed 'y%./+-%__p_%'`
 echo $ac_n "checking for $ac_hdr""... $ac_c" 1>&6
Index: configure.in
===================================================================
RCS file: /cvs/src/src/gdb/configure.in,v
retrieving revision 1.149
diff -u -p -r1.149 configure.in
--- configure.in	26 Feb 2004 00:41:46 -0000	1.149
+++ configure.in	26 Mar 2004 17:10:36 -0000
@@ -342,7 +342,7 @@ AC_CHECK_HEADERS(link.h)
 AC_CHECK_HEADERS(machine/reg.h)
 AC_CHECK_HEADERS(nlist.h)
 AC_CHECK_HEADERS(poll.h sys/poll.h)
-AC_CHECK_HEADERS(proc_service.h thread_db.h)
+AC_CHECK_HEADERS(proc_service.h thread_db.h gnu/libc-version.h)
 AC_CHECK_HEADERS(stddef.h)
 AC_CHECK_HEADERS(stdlib.h)
 AC_CHECK_HEADERS(stdint.h)
Index: lin-lwp.c
===================================================================
RCS file: /cvs/src/src/gdb/lin-lwp.c,v
retrieving revision 1.53
diff -u -p -r1.53 lin-lwp.c
--- lin-lwp.c	22 Mar 2004 20:18:33 -0000	1.53
+++ lin-lwp.c	26 Mar 2004 17:10:36 -0000
@@ -1,5 +1,5 @@
 /* Multi-threaded debugging support for GNU/Linux (LWP layer).
-   Copyright 2000, 2001, 2002, 2003 Free Software Foundation, Inc.
+   Copyright 2000, 2001, 2002, 2003, 2004 Free Software Foundation, Inc.
 
    This file is part of GDB.
 
@@ -183,6 +183,8 @@ add_lwp (ptid_t ptid)
 
   memset (lp, 0, sizeof (struct lwp_info));
 
+  lp->waitstatus.kind = TARGET_WAITKIND_IGNORE;
+
   lp->ptid = ptid;
 
   lp->next = lwp_list;
@@ -278,7 +280,7 @@ lin_lwp_open (char *args, int from_tty)
 void
 lin_lwp_attach_lwp (ptid_t ptid, int verbose)
 {
-  struct lwp_info *lp;
+  struct lwp_info *lp, *found_lp;
 
   gdb_assert (is_lwp (ptid));
 
@@ -293,13 +295,17 @@ lin_lwp_attach_lwp (ptid_t ptid, int ver
   if (verbose)
     printf_filtered ("[New %s]\n", target_pid_to_str (ptid));
 
-  lp = find_lwp_pid (ptid);
+  found_lp = lp = find_lwp_pid (ptid);
   if (lp == NULL)
     lp = add_lwp (ptid);
 
-  /* We assume that we're already attached to any LWP that has an
-     id equal to the overall process id.  */
-  if (GET_LWP (ptid) != GET_PID (ptid))
+  /* We assume that we're already attached to any LWP that has an id
+     equal to the overall process id, and to any LWP that is already
+     in our list of LWPs.  If we're not seeing exit events from threads
+     and we've had PID wraparound since we last tried to stop all threads,
+     this assumption might be wrong; fortunately, this is very unlikely
+     to happen.  */
+  if (GET_LWP (ptid) != GET_PID (ptid) && found_lp == NULL)
     {
       pid_t pid;
       int status;
@@ -590,6 +596,41 @@ kill_lwp (int lwpid, int signo)
   return kill (lwpid, signo);
 }
 
+/* Handle a GNU/Linux extended wait response.  Most of the work we
+   just pass off to linux_handle_extended_wait, but if it reports a
+   clone event we need to add the new LWP to our list (and not report
+   the trap to higher layers).  This function returns non-zero if
+   the event should be ignored and we should wait again.  */
+
+static int
+lin_lwp_handle_extended (struct lwp_info *lp, int status)
+{
+  linux_handle_extended_wait (GET_LWP (lp->ptid), status,
+			      &lp->waitstatus);
+
+  /* TARGET_WAITKIND_SPURIOUS is used to indicate clone events.  */
+  if (lp->waitstatus.kind == TARGET_WAITKIND_SPURIOUS)
+    {
+      struct lwp_info *new_lp;
+      new_lp = add_lwp (BUILD_LWP (lp->waitstatus.value.related_pid,
+				   GET_PID (inferior_ptid)));
+      new_lp->cloned = 1;
+      new_lp->stopped = 1;
+
+      lp->waitstatus.kind = TARGET_WAITKIND_IGNORE;
+
+      if (debug_lin_lwp)
+	fprintf_unfiltered (gdb_stdlog,
+			    "LLHE: Got clone event from LWP %ld, resuming\n",
+			    GET_LWP (lp->ptid));
+      ptrace (PTRACE_CONT, GET_LWP (lp->ptid), 0, 0);
+
+      return 1;
+    }
+
+  return 0;
+}
+
 /* Wait for LP to stop.  Returns the wait status, or 0 if the LWP has
    exited.  */
 
@@ -609,9 +650,11 @@ wait_lwp (struct lwp_info *lp)
       pid = waitpid (GET_LWP (lp->ptid), &status, __WCLONE);
       if (pid == -1 && errno == ECHILD)
 	{
-	  /* The thread has previously exited.  We need to delete it now
-	     because in the case of NPTL threads, there won't be an
-	     exit event unless it is the main thread.  */
+	  /* The thread has previously exited.  We need to delete it
+	     now because, for some vendor 2.4 kernels with NPTL
+	     support backported, there won't be an exit event unless
+	     it is the main thread.  2.6 kernels will report an exit
+	     event for each thread that exits, as expected.  */
 	  thread_dead = 1;
 	  if (debug_lin_lwp)
 	    fprintf_unfiltered (gdb_stdlog, "WL: %s vanished.\n",
@@ -658,6 +701,17 @@ wait_lwp (struct lwp_info *lp)
 
   gdb_assert (WIFSTOPPED (status));
 
+  /* Handle GNU/Linux's extended waitstatus for trace events.  */
+  if (WIFSTOPPED (status) && WSTOPSIG (status) == SIGTRAP && status >> 16 != 0)
+    {
+      if (debug_lin_lwp)
+	fprintf_unfiltered (gdb_stdlog,
+			    "WL: Handling extended status 0x%06x\n",
+			    status);
+      if (lin_lwp_handle_extended (lp, status))
+	return wait_lwp (lp);
+    }
+
   return status;
 }
 
@@ -1097,6 +1151,8 @@ child_wait (ptid_t ptid, struct target_w
   int status;
   pid_t pid;
 
+  ourstatus->kind = TARGET_WAITKIND_IGNORE;
+
   do
     {
       set_sigint_trap ();	/* Causes SIGINT to be passed on to the
@@ -1143,6 +1199,25 @@ child_wait (ptid_t ptid, struct target_w
 	  save_errno = EINTR;
 	}
 
+      /* Handle GNU/Linux's extended waitstatus for trace events.  */
+      if (pid != -1 && WIFSTOPPED (status) && WSTOPSIG (status) == SIGTRAP
+	  && status >> 16 != 0)
+	{
+	  linux_handle_extended_wait (pid, status, ourstatus);
+
+	  /* If we see a clone event, detach the child, and don't
+	     report the event.  It would be nice to offer some way to
+	     switch into a non-thread-db based threaded mode at this
+	     point.  */
+	  if (ourstatus->kind == TARGET_WAITKIND_SPURIOUS)
+	    {
+	      ptrace (PTRACE_DETACH, ourstatus->value.related_pid, 0, 0);
+	      ourstatus->kind = TARGET_WAITKIND_IGNORE;
+	      pid = -1;
+	      save_errno = EINTR;
+	    }
+	}
+
       clear_sigio_trap ();
       clear_sigint_trap ();
     }
@@ -1159,11 +1234,9 @@ child_wait (ptid_t ptid, struct target_w
       return minus_one_ptid;
     }
 
-  /* Handle GNU/Linux's extended waitstatus for trace events.  */
-  if (WIFSTOPPED (status) && WSTOPSIG (status) == SIGTRAP && status >> 16 != 0)
-    return linux_handle_extended_wait (pid, status, ourstatus);
+  if (ourstatus->kind == TARGET_WAITKIND_IGNORE)
+    store_waitstatus (ourstatus, status);
 
-  store_waitstatus (ourstatus, status);
   return pid_to_ptid (pid);
 }
 
@@ -1371,6 +1444,20 @@ retry:
 		}
 	    }
 
+	  /* Handle GNU/Linux's extended waitstatus for trace events.  */
+	  if (WIFSTOPPED (status) && WSTOPSIG (status) == SIGTRAP && status >> 16 != 0)
+	    {
+	      if (debug_lin_lwp)
+		fprintf_unfiltered (gdb_stdlog,
+				    "LLW: Handling extended status 0x%06x\n",
+				    status);
+	      if (lin_lwp_handle_extended (lp, status))
+		{
+		  status = 0;
+		  continue;
+		}
+	    }
+
 	  /* Check if the thread has exited.  */
 	  if ((WIFEXITED (status) || WIFSIGNALED (status)) && num_lwps > 1)
 	    {
@@ -1588,14 +1675,14 @@ retry:
   else
     trap_ptid = null_ptid;
 
-  /* Handle GNU/Linux's extended waitstatus for trace events.  */
-  if (WIFSTOPPED (status) && WSTOPSIG (status) == SIGTRAP && status >> 16 != 0)
+  if (lp->waitstatus.kind != TARGET_WAITKIND_IGNORE)
     {
-      linux_handle_extended_wait (GET_LWP (lp->ptid), status, ourstatus);
-      return trap_ptid;
+      *ourstatus = lp->waitstatus;
+      lp->waitstatus.kind = TARGET_WAITKIND_IGNORE;
     }
+  else
+    store_waitstatus (ourstatus, status);
 
-  store_waitstatus (ourstatus, status);
   return (threaded ? lp->ptid : pid_to_ptid (GET_LWP (lp->ptid)));
 }
 
Index: linux-nat.c
===================================================================
RCS file: /cvs/src/src/gdb/linux-nat.c,v
retrieving revision 1.5
diff -u -p -r1.5 linux-nat.c
--- linux-nat.c	17 Aug 2003 20:17:02 -0000	1.5
+++ linux-nat.c	26 Mar 2004 17:10:36 -0000
@@ -1,5 +1,5 @@
 /* GNU/Linux native-dependent code common to multiple platforms.
-   Copyright (C) 2003 Free Software Foundation, Inc.
+   Copyright (C) 2003, 2004 Free Software Foundation, Inc.
 
    This file is part of GDB.
 
@@ -224,7 +224,8 @@ linux_enable_event_reporting (ptid_t pti
   if (! linux_supports_tracefork ())
     return;
 
-  options = PTRACE_O_TRACEFORK | PTRACE_O_TRACEVFORK | PTRACE_O_TRACEEXEC;
+  options = PTRACE_O_TRACEFORK | PTRACE_O_TRACEVFORK | PTRACE_O_TRACEEXEC
+    | PTRACE_O_TRACECLONE;
   if (linux_supports_tracevforkdone ())
     options |= PTRACE_O_TRACEVFORKDONE;
 
@@ -391,11 +392,8 @@ linux_handle_extended_wait (int pid, int
 {
   int event = status >> 16;
 
-  if (event == PTRACE_EVENT_CLONE)
-    internal_error (__FILE__, __LINE__,
-		    "unexpected clone event");
-
-  if (event == PTRACE_EVENT_FORK || event == PTRACE_EVENT_VFORK)
+  if (event == PTRACE_EVENT_FORK || event == PTRACE_EVENT_VFORK
+      || event == PTRACE_EVENT_CLONE)
     {
       unsigned long new_pid;
       int ret;
@@ -406,12 +404,10 @@ linux_handle_extended_wait (int pid, int
       if (! pull_pid_from_list (&stopped_pids, new_pid))
 	{
 	  /* The new child has a pending SIGSTOP.  We can't affect it until it
-	     hits the SIGSTOP, but we're already attached.
-
-	     It won't be a clone (we didn't ask for clones in the event mask)
-	     so we can just call waitpid and wait for the SIGSTOP.  */
+	     hits the SIGSTOP, but we're already attached.  */
 	  do {
-	    ret = waitpid (new_pid, &status, 0);
+	    ret = waitpid (new_pid, &status,
+			   (event == PTRACE_EVENT_CLONE) ? __WCLONE : 0);
 	  } while (ret == -1 && errno == EINTR);
 	  if (ret == -1)
 	    perror_with_name ("waiting for new child");
@@ -423,8 +419,13 @@ linux_handle_extended_wait (int pid, int
 			    "wait returned unexpected status 0x%x", status);
 	}
 
-      ourstatus->kind = (event == PTRACE_EVENT_FORK)
-	? TARGET_WAITKIND_FORKED : TARGET_WAITKIND_VFORKED;
+      if (event == PTRACE_EVENT_FORK)
+	ourstatus->kind = TARGET_WAITKIND_FORKED;
+      else if (event == PTRACE_EVENT_VFORK)
+	ourstatus->kind = TARGET_WAITKIND_VFORKED;
+      else
+	ourstatus->kind = TARGET_WAITKIND_SPURIOUS;
+
       ourstatus->value.related_pid = new_pid;
       return inferior_ptid;
     }
Index: linux-nat.h
===================================================================
RCS file: /cvs/src/src/gdb/linux-nat.h,v
retrieving revision 1.5
diff -u -p -r1.5 linux-nat.h
--- linux-nat.h	14 Sep 2003 02:04:44 -0000	1.5
+++ linux-nat.h	26 Mar 2004 17:10:36 -0000
@@ -1,5 +1,5 @@
 /* Native debugging support for GNU/Linux (LWP layer).
-   Copyright 2000, 2001, 2002, 2003 Free Software Foundation, Inc.
+   Copyright 2000, 2001, 2002, 2003, 2004 Free Software Foundation, Inc.
 
    This file is part of GDB.
 
@@ -18,6 +18,8 @@
    Foundation, Inc., 59 Temple Place - Suite 330,
    Boston, MA 02111-1307, USA.  */
 
+#include "target.h"
+
 /* Structure describing an LWP.  */
 
 struct lwp_info
@@ -52,6 +54,11 @@ struct lwp_info
   /* Non-zero if we were stepping this LWP.  */
   int step;
 
+  /* If WAITSTATUS->KIND != TARGET_WAITKIND_SPURIOUS, the waitstatus
+     for this LWP's last event.  This may correspond to STATUS above,
+     or to a local variable in lin_lwp_wait.  */
+  struct target_waitstatus waitstatus;
+
   /* Next LWP in list.  */
   struct lwp_info *next;
 };
@@ -60,7 +67,6 @@ struct lwp_info
    system".  */
 struct mem_attrib;
 struct target_ops;
-struct target_waitstatus;
 
 extern int linux_proc_xfer_memory (CORE_ADDR addr, char *myaddr, int len,
 				   int write, struct mem_attrib *attrib,
Index: thread-db.c
===================================================================
RCS file: /cvs/src/src/gdb/thread-db.c,v
retrieving revision 1.37
diff -u -p -r1.37 thread-db.c
--- thread-db.c	29 Feb 2004 02:39:47 -0000	1.37
+++ thread-db.c	26 Mar 2004 17:10:36 -0000
@@ -1,6 +1,6 @@
 /* libthread_db assisted debugging support, generic parts.
 
-   Copyright 1999, 2000, 2001, 2003 Free Software Foundation, Inc.
+   Copyright 1999, 2000, 2001, 2003, 2004 Free Software Foundation, Inc.
 
    This file is part of GDB.
 
@@ -35,6 +35,10 @@
 #include "regcache.h"
 #include "solib-svr4.h"
 
+#ifdef HAVE_GNU_LIBC_VERSION_H
+#include <gnu/libc-version.h>
+#endif
+
 #ifndef LIBTHREAD_DB_SO
 #define LIBTHREAD_DB_SO "libthread_db.so.1"
 #endif
@@ -130,6 +134,7 @@ static CORE_ADDR td_death_bp_addr;
 static void thread_db_find_new_threads (void);
 static void attach_thread (ptid_t ptid, const td_thrhandle_t *th_p,
 			   const td_thrinfo_t *ti_p, int verbose);
+static void detach_thread (ptid_t ptid, int verbose);
 \f
 
 /* Building process ids.  */
@@ -150,6 +155,9 @@ static void attach_thread (ptid_t ptid, 
 
 struct private_thread_info
 {
+  /* Flag set when we see a TD_DEATH event for this thread.  */
+  unsigned int dying:1;
+
   /* Cached thread state.  */
   unsigned int th_valid:1;
   unsigned int ti_valid:1;
@@ -491,6 +499,10 @@ enable_thread_event_reporting (void)
   td_thr_events_t events;
   td_notify_t notify;
   td_err_e err;
+#ifdef HAVE_GNU_LIBC_VERSION_H
+  const char *libc_version;
+  int libc_major, libc_minor;
+#endif
 
   /* We cannot use the thread event reporting facility if these
      functions aren't available.  */
@@ -501,12 +513,16 @@ enable_thread_event_reporting (void)
   /* Set the process wide mask saying which events we're interested in.  */
   td_event_emptyset (&events);
   td_event_addset (&events, TD_CREATE);
-#if 0
+
+#ifdef HAVE_GNU_LIBC_VERSION_H
   /* FIXME: kettenis/2000-04-23: The event reporting facility is
      broken for TD_DEATH events in glibc 2.1.3, so don't enable it for
      now.  */
-  td_event_addset (&events, TD_DEATH);
+  libc_version = gnu_get_libc_version ();
+  if (sscanf (libc_version, "%d.%d", &libc_major, &libc_minor) == 2
+      && (libc_major > 2 || (libc_major == 2 && libc_minor > 1)))
 #endif
+    td_event_addset (&events, TD_DEATH);
 
   err = td_ta_set_event_p (thread_agent, &events);
   if (err != TD_OK)
@@ -689,6 +705,10 @@ quit:
     target_new_objfile_chain (objfile);
 }
 
+/* Attach to a new thread.  This function is called when we receive a
+   TD_CREATE event or when we iterate over all threads and find one
+   that wasn't already in our list.  */
+
 static void
 attach_thread (ptid_t ptid, const td_thrhandle_t *th_p,
 	       const td_thrinfo_t *ti_p, int verbose)
@@ -696,6 +716,27 @@ attach_thread (ptid_t ptid, const td_thr
   struct thread_info *tp;
   td_err_e err;
 
+  /* If we're being called after a TD_CREATE event, we may already
+     know about this thread.  There are two ways this can happen.  We
+     may have iterated over all threads between the thread creation
+     and the TD_CREATE event, for instance when the user has issued
+     the `info threads' command before the SIGTRAP for hitting the
+     thread creation breakpoint was reported.  Alternatively, the
+     thread may have exited and a new one been created with the same
+     thread ID.  In the first case we don't need to do anything; in
+     the second case we should discard information about the dead
+     thread and attach to the new one.  */
+  if (in_thread_list (ptid))
+    {
+      tp = find_thread_pid (ptid);
+      gdb_assert (tp != NULL);
+
+      if (!tp->private->dying)
+        return;
+
+      delete_thread (ptid);
+    }
+
   check_thread_signals ();
 
   /* Add the thread to GDB's thread list.  */
@@ -741,8 +782,21 @@ thread_db_attach (char *args, int from_t
 static void
 detach_thread (ptid_t ptid, int verbose)
 {
+  struct thread_info *thread_info;
+
   if (verbose)
     printf_unfiltered ("[%s exited]\n", target_pid_to_str (ptid));
+
+  /* Don't delete the thread now, because it still reports as active
+     until it has executed a few instructions after the event
+     breakpoint - if we deleted it now, "info threads" would cause us
+     to re-attach to it.  Just mark it as having had a TD_DEATH
+     event.  This means that we won't delete it from our thread list
+     until we notice that it's dead (via prune_threads), or until
+     something re-uses its thread ID.  */
+  thread_info = find_thread_pid (ptid);
+  gdb_assert (thread_info != NULL);
+  thread_info->private->dying = 1;
 }
 
 static void
@@ -847,12 +901,9 @@ check_event (ptid_t ptid)
       switch (msg.event)
 	{
 	case TD_CREATE:
-
-	  /* We may already know about this thread, for instance when the
-	     user has issued the `info threads' command before the SIGTRAP
-	     for hitting the thread creation breakpoint was reported.  */
-	  if (!in_thread_list (ptid))
-	    attach_thread (ptid, msg.th_p, &ti, 1);
+	  /* Call attach_thread whether or not we already know about a
+	     thread with this thread ID.  */
+	  attach_thread (ptid, msg.th_p, &ti, 1);
 
 	  break;
 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] Use TD_DEATH and PTRACE_EVENT_CLONE when available
  2004-03-26 17:59                           ` [patch] Use TD_DEATH and PTRACE_EVENT_CLONE when available (was: Re: [RFC]: fix for recycled thread ids) Daniel Jacobowitz
@ 2004-03-26 18:14                             ` Jeff Johnston
  2004-03-26 21:13                               ` [patch] New thread test to exercise Daniel's Patch Jeff Johnston
  2004-03-29 18:07                             ` [patch] Use TD_DEATH and PTRACE_EVENT_CLONE when available (was: Re: [RFC]: fix for recycled thread ids) Daniel Jacobowitz
  1 sibling, 1 reply; 30+ messages in thread
From: Jeff Johnston @ 2004-03-26 18:14 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb-patches

Daniel Jacobowitz wrote:
> On Thu, Mar 25, 2004 at 03:22:13PM -0500, Jeff Johnston wrote:
> 
>>Daniel Jacobowitz wrote:
>>
>>>On Wed, Mar 24, 2004 at 11:39:50PM -0500, Daniel Jacobowitz wrote:
>>>
>>>
>>>>Do you have any code for PTRACE_EVENT_CLONE yet, or should I put
>>>>something together in the morning to verify whether that's the problem?
>>>
>>>
>>>Here you go.  Again, this patch is obviously not ready to go into GDB,
>>>but I have not been able to make it misbehave yet.  I don't know if all
>>>the bits it needs work right in RHEL3, or if my testing was conclusive.
>>>
>>>The highlights:
>>> - Includes most of the previous patch
>>> - Uses PTRACE_EVENT_CLONE to attach to new threads
>>> - Moves handling of events closer to the waitpid call
>>>
>>>There are some potential races but I haven't hit any of them in
>>>practice.  I suspect that with a heavy fork or vfork load (not clone)
>>>you could produce interesting failure modes.
>>>
>>>Give it a try, please.  If it works I'll clean it up.
>>>
>>
>>Works great.  I was unable to get it to misbehave on RHEL3.  Please go 
>>ahead cleaning it up.
> 
> 
> Here's the final patch. The description of the patch is unchanged from
> the above.  The background, for those who did not read the whole
> discussion: in NPTL, a signal delivered to the process when any thread
> is not attached is likely to terminate the process.  So we need to be
> careful to:
>  - know when threads have exited, so that we can attach to new threads
>    which reuse the same thread ID
>  - attach to threads as early as possible
> 
> Both of these could cause problems in LinuxThreads, but they were less
> drastic.
> 
> Tested with both LinuxThreads and NPTL.  I'll commit this to HEAD on
> Monday, barring objections.  We've missed the boat for GDB 6.1 at this
> point.
> 
> The testcase you posted has credits in it, so we can't just grab it for
> the testsuite.  But all we should need to test this is a loop that
> creates short-lived threads, and then verifying that we can send C-c,
> get a prompt, issue info threads, and continue a couple of times
> without seeing anyting matching "error:".  Interested in writing that
> test? :)
>

Sure.  Just tried debugging eclipse today with this patch applied and it works 
as well.

-- Jeff J.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] New thread test to exercise Daniel's Patch
  2004-03-26 18:14                             ` [patch] Use TD_DEATH and PTRACE_EVENT_CLONE when available Jeff Johnston
@ 2004-03-26 21:13                               ` Jeff Johnston
  2004-03-26 21:19                                 ` Daniel Jacobowitz
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff Johnston @ 2004-03-26 21:13 UTC (permalink / raw)
  To: Jeff Johnston; +Cc: Daniel Jacobowitz, gdb-patches

[-- Attachment #1: Type: text/plain, Size: 2704 bytes --]

Jeff Johnston wrote:
> Daniel Jacobowitz wrote:
> 
>> On Thu, Mar 25, 2004 at 03:22:13PM -0500, Jeff Johnston wrote:
>>
>>> Daniel Jacobowitz wrote:
>>>
>>>> On Wed, Mar 24, 2004 at 11:39:50PM -0500, Daniel Jacobowitz wrote:
>>>>
>>>>
>>>>> Do you have any code for PTRACE_EVENT_CLONE yet, or should I put
>>>>> something together in the morning to verify whether that's the 
>>>>> problem?
>>>>
>>>>
>>>>
>>>> Here you go.  Again, this patch is obviously not ready to go into GDB,
>>>> but I have not been able to make it misbehave yet.  I don't know if all
>>>> the bits it needs work right in RHEL3, or if my testing was conclusive.
>>>>
>>>> The highlights:
>>>> - Includes most of the previous patch
>>>> - Uses PTRACE_EVENT_CLONE to attach to new threads
>>>> - Moves handling of events closer to the waitpid call
>>>>
>>>> There are some potential races but I haven't hit any of them in
>>>> practice.  I suspect that with a heavy fork or vfork load (not clone)
>>>> you could produce interesting failure modes.
>>>>
>>>> Give it a try, please.  If it works I'll clean it up.
>>>>
>>>
>>> Works great.  I was unable to get it to misbehave on RHEL3.  Please 
>>> go ahead cleaning it up.
>>
>>
>>
>> Here's the final patch. The description of the patch is unchanged from
>> the above.  The background, for those who did not read the whole
>> discussion: in NPTL, a signal delivered to the process when any thread
>> is not attached is likely to terminate the process.  So we need to be
>> careful to:
>>  - know when threads have exited, so that we can attach to new threads
>>    which reuse the same thread ID
>>  - attach to threads as early as possible
>>
>> Both of these could cause problems in LinuxThreads, but they were less
>> drastic.
>>
>> Tested with both LinuxThreads and NPTL.  I'll commit this to HEAD on
>> Monday, barring objections.  We've missed the boat for GDB 6.1 at this
>> point.
>>
>> The testcase you posted has credits in it, so we can't just grab it for
>> the testsuite.  But all we should need to test this is a loop that
>> creates short-lived threads, and then verifying that we can send C-c,
>> get a prompt, issue info threads, and continue a couple of times
>> without seeing anyting matching "error:".  Interested in writing that
>> test? :)
>>
> 
> Sure.  Just tried debugging eclipse today with this patch applied and it 
> works as well.
> 
> -- Jeff J.
> 
> 

See the attached test.  There was some stuff in pthreads.exp about Crtl-C and 
alpha-*-osf*.  Do I need to account for this platform?

2004-03-26  Jeff Johnston  <jjohnstn@redhat.com>

         * gdb.threads/manythreads.c: New testcase.
         * gdb.threads/manythreads.exp: Ditto.


-- Jeff J.

[-- Attachment #2: testsuite.patch --]
[-- Type: text/plain, Size: 4383 bytes --]

Index: manythreads.exp
===================================================================
RCS file: manythreads.exp
diff -N manythreads.exp
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ manythreads.exp	26 Mar 2004 21:03:17 -0000
@@ -0,0 +1,116 @@
+# manythreads.exp -- Expect script to test stopping many threads
+# Copyright (C) 2004 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+# 
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+# 
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  
+
+# Please email any bugs, comments, and/or additions to this file to:
+# bug-gdb@prep.ai.mit.edu
+
+# This file was written by Jeff Johnston. (jjohnstn@redhat.com)
+
+if $tracelevel then {
+	strace $tracelevel
+}
+
+set prms_id 0
+set bug_id 0
+
+# This only works with native configurations
+if ![isnative] then {
+    return
+}
+
+set testfile "manythreads"
+set srcfile ${testfile}.c
+set binfile ${objdir}/${subdir}/${testfile}
+if {[gdb_compile "${srcdir}/${subdir}/${srcfile}" "${binfile}" executable {debug libs=-lpthread}] != ""} {
+    return -1
+}
+
+gdb_start
+gdb_reinitialize_dir $srcdir/$subdir
+gdb_load ${binfile}
+send_gdb "set print sevenbit-strings\n" ; gdb_expect -re "$gdb_prompt $"
+runto_main
+
+# We'll need this when we send_gdb a ^C to GDB.  Need to do it before we
+# run the program and gdb starts saving and restoring tty states.
+# On Ultrix, we don't need it and it is really slow (because shell_escape
+# doesn't use vfork).
+if ![istarget "*-*-ultrix*"] then {
+    gdb_test "shell stty intr '^C'" ""
+}
+
+send_gdb "continue\n"
+gdb_expect {
+  -re "Continuing"
+    { pass "first continue" }
+  timeout
+    { fail "first continue (timeout)" }
+}
+
+# Send a Ctrl-C and verify that we can do info threads and continue
+after 100 {send_gdb "\003"}
+gdb_expect {
+  -re "Program received signal SIGINT.*$gdb_prompt $"
+    {
+      pass "stop threads"
+    }
+  timeout
+    {
+      fail "stop threads (timeout)"
+    }
+} 
+
+gdb_test "info threads" ".*1 Thread.*.LWP.*"
+
+send_gdb "continue\n"
+gdb_expect {
+  -re "Continuing"
+    { pass "second continue" }
+  timeout
+    { fail "second continue (timeout)" }
+}
+
+# Send another Ctrl-C and verify that we can do info threads and quit
+after 100 {send_gdb "\003"}
+gdb_expect {
+  -re "Program received signal SIGINT.*$gdb_prompt $"
+    {
+      pass "stop threads"
+    }
+  timeout
+    {
+      fail "stop threads (timeout)"
+    }
+} 
+
+send_gdb "quit\n"
+gdb_expect {
+    -re "The program is running.  Exit anyway\\? \\(y or n\\) $" {
+        send_gdb "y\n"
+        exp_continue
+    }
+    eof {
+        pass "GDB exits after stopping multithreaded program"
+    }
+    default {
+        fail "GDB exits after stopping multithreaded program"
+    }
+    timeout {
+        fail "GDB exits after stopping multithreaded program (timeout)"
+    }
+}
+
Index: manythreads.c
===================================================================
RCS file: manythreads.c
diff -N manythreads.c
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ manythreads.c	26 Mar 2004 21:03:17 -0000
@@ -0,0 +1,42 @@
+#include <pthread.h>
+#include <stdio.h>
+
+void *
+thread_function (void *arg)
+{
+  int x = (int)arg;
+
+  printf ("Thread <%d> executing\n", x);
+
+  return NULL;
+}
+
+int 
+main (int argc, char **argv)
+{
+  pthread_attr_t attr;
+  pthread_t threads[256];
+  int i, j;
+
+  pthread_attr_init (&attr);
+
+  /* Create a ton of quick-executing threads, then wait for them to
+     complete.  */
+  for (i = 0; i < 1000; ++i) 
+    {
+      for (j = 0; j < 256; ++j)
+	{
+	  pthread_create (&threads[j], &attr, thread_function, 
+			  (void *)(i * 1000 + j));
+	}
+
+      for (j = 0; j < 256; ++j)
+	{
+	  pthread_join (threads[j], NULL);
+	}
+    }
+
+  pthread_attr_destroy (&attr);
+
+  return 0;
+}

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] New thread test to exercise Daniel's Patch
  2004-03-26 21:13                               ` [patch] New thread test to exercise Daniel's Patch Jeff Johnston
@ 2004-03-26 21:19                                 ` Daniel Jacobowitz
  2004-03-29 18:06                                   ` Jeff Johnston
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Jacobowitz @ 2004-03-26 21:19 UTC (permalink / raw)
  To: gdb-patches

On Fri, Mar 26, 2004 at 04:13:18PM -0500, Jeff Johnston wrote:
> See the attached test.  There was some stuff in pthreads.exp about Crtl-C 
> and alpha-*-osf*.  Do I need to account for this platform?

Well, it's still supported, but I have no idea whether the bit in that
test is necessary - or what it's supposed to accomplish.  So let's
ignore it.

> +# This only works with native configurations
> +if ![isnative] then {
> +    return
> +}

This shouldn't be necessary.

> +set testfile "manythreads"
> +set srcfile ${testfile}.c
> +set binfile ${objdir}/${subdir}/${testfile}
> +if {[gdb_compile "${srcdir}/${subdir}/${srcfile}" "${binfile}" executable {debug libs=-lpthread}] != ""} {

See gdb_compile_pthreads.

> +send_gdb "set print sevenbit-strings\n" ; gdb_expect -re "$gdb_prompt $"

gdb_test "set print sevenbit-strings" ""

> +send_gdb "continue\n"
> +gdb_expect {

Everywhere you're using gdb_expect, please use gdb_test_multiple
instead.  For the "after" tests, you can use gdb_test_multiple with
"" as the first argument.

> +# Send a Ctrl-C and verify that we can do info threads and continue
> +after 100 {send_gdb "\003"}

Is 100 ms enough to be interesting?

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] New thread test to exercise Daniel's Patch
  2004-03-26 21:19                                 ` Daniel Jacobowitz
@ 2004-03-29 18:06                                   ` Jeff Johnston
  2004-03-29 18:11                                     ` Daniel Jacobowitz
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff Johnston @ 2004-03-29 18:06 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb-patches

Daniel Jacobowitz wrote:
> On Fri, Mar 26, 2004 at 04:13:18PM -0500, Jeff Johnston wrote:
> 
>>See the attached test.  There was some stuff in pthreads.exp about Crtl-C 
>>and alpha-*-osf*.  Do I need to account for this platform?
> 
> 
> Well, it's still supported, but I have no idea whether the bit in that
> test is necessary - or what it's supposed to accomplish.  So let's
> ignore it.
> 
> 
>>+# This only works with native configurations
>>+if ![isnative] then {
>>+    return
>>+}
> 
> 
> This shouldn't be necessary.
> 

Ok.

> 
>>+set testfile "manythreads"
>>+set srcfile ${testfile}.c
>>+set binfile ${objdir}/${subdir}/${testfile}
>>+if {[gdb_compile "${srcdir}/${subdir}/${srcfile}" "${binfile}" executable {debug libs=-lpthread}] != ""} {
> 
> 
> See gdb_compile_pthreads.
>

Will do.

> 
>>+send_gdb "set print sevenbit-strings\n" ; gdb_expect -re "$gdb_prompt $"
> 
> 
> gdb_test "set print sevenbit-strings" ""
> 

Ok.

> 
>>+send_gdb "continue\n"
>>+gdb_expect {
> 
> 
> Everywhere you're using gdb_expect, please use gdb_test_multiple
> instead.  For the "after" tests, you can use gdb_test_multiple with
> "" as the first argument.
>

I tried this initially but I kept getting "Error: internal buffer is full".  I 
tried lowering the "after" time which is why it ended up 100 below but that 
didn't solve the problem.  Any suggestions on how to avoid the "full_buffer" error.

> 
>>+# Send a Ctrl-C and verify that we can do info threads and continue
>>+after 100 {send_gdb "\003"}
> 
> 
> Is 100 ms enough to be interesting?
>

It still kicks off a fair number of threads that have started/exited, but I can 
make this bigger.

-- Jeff J.




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] New thread test to exercise Daniel's Patch
  2004-03-29 18:06                                   ` Jeff Johnston
@ 2004-03-29 18:11                                     ` Daniel Jacobowitz
  2004-03-29 20:18                                       ` Jeff Johnston
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Jacobowitz @ 2004-03-29 18:11 UTC (permalink / raw)
  To: Jeff Johnston; +Cc: gdb-patches

On Mon, Mar 29, 2004 at 01:06:07PM -0500, Jeff Johnston wrote:
> >>+send_gdb "continue\n"
> >>+gdb_expect {
> >
> >
> >Everywhere you're using gdb_expect, please use gdb_test_multiple
> >instead.  For the "after" tests, you can use gdb_test_multiple with
> >"" as the first argument.
> >
> 
> I tried this initially but I kept getting "Error: internal buffer is full". 
> I tried lowering the "after" time which is why it ended up 100 below but 
> that didn't solve the problem.  Any suggestions on how to avoid the 
> "full_buffer" error.

I'm not sure about the regexes, since I'm just pulling this out of the
top of my head, but you want something like this:

  -re "error:.*$gdb_prompt $" {
    fail "msg"
  }
  -re "\\\[New thread \[^\]\]*\\\]\r\n" {
    exp_continue
  }
  -re "\\\[Thread \[^\]\]* exited\\\]\r\n" {
    exp_continue
  }

That will consume any thread create/delete messages, instead of leaving
them in the buffer.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] New thread test to exercise Daniel's Patch
  2004-03-29 18:11                                     ` Daniel Jacobowitz
@ 2004-03-29 20:18                                       ` Jeff Johnston
  2004-03-29 20:42                                         ` Daniel Jacobowitz
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff Johnston @ 2004-03-29 20:18 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb-patches

Daniel Jacobowitz wrote:
> On Mon, Mar 29, 2004 at 01:06:07PM -0500, Jeff Johnston wrote:
> 
>>>>+send_gdb "continue\n"
>>>>+gdb_expect {
>>>
>>>
>>>Everywhere you're using gdb_expect, please use gdb_test_multiple
>>>instead.  For the "after" tests, you can use gdb_test_multiple with
>>>"" as the first argument.
>>>
>>
>>I tried this initially but I kept getting "Error: internal buffer is full". 
>>I tried lowering the "after" time which is why it ended up 100 below but 
>>that didn't solve the problem.  Any suggestions on how to avoid the 
>>"full_buffer" error.
> 
> 
> I'm not sure about the regexes, since I'm just pulling this out of the
> top of my head, but you want something like this:
> 
>   -re "error:.*$gdb_prompt $" {
>     fail "msg"
>   }
>   -re "\\\[New thread \[^\]\]*\\\]\r\n" {
>     exp_continue
>   }
>   -re "\\\[Thread \[^\]\]* exited\\\]\r\n" {
>     exp_continue
>   }
> 
> That will consume any thread create/delete messages, instead of leaving
> them in the buffer.
> 

Forgive my ignorance of the test macros, but how does this terminate?  I tried 
doing it this way and it hangs.  It never gets a gdb prompt and spits out 
endless messages because it is essentially an infinite loop.  I only care to see 
the "Continuing" message and don't care about the rest of the stuff.

I stole the original send_gdb / gdb_expect code from schedlock.exp which has the 
same problem.

-- Jeff J.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] New thread test to exercise Daniel's Patch
  2004-03-29 20:18                                       ` Jeff Johnston
@ 2004-03-29 20:42                                         ` Daniel Jacobowitz
  2004-03-29 21:04                                           ` Jeff Johnston
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Jacobowitz @ 2004-03-29 20:42 UTC (permalink / raw)
  To: Jeff Johnston; +Cc: gdb-patches

On Mon, Mar 29, 2004 at 03:18:35PM -0500, Jeff Johnston wrote:
> Daniel Jacobowitz wrote:
> >On Mon, Mar 29, 2004 at 01:06:07PM -0500, Jeff Johnston wrote:
> >
> >>>>+send_gdb "continue\n"
> >>>>+gdb_expect {
> >>>
> >>>
> >>>Everywhere you're using gdb_expect, please use gdb_test_multiple
> >>>instead.  For the "after" tests, you can use gdb_test_multiple with
> >>>"" as the first argument.
> >>>
> >>
> >>I tried this initially but I kept getting "Error: internal buffer is 
> >>full". I tried lowering the "after" time which is why it ended up 100 
> >>below but that didn't solve the problem.  Any suggestions on how to avoid 
> >>the "full_buffer" error.
> >
> >
> >I'm not sure about the regexes, since I'm just pulling this out of the
> >top of my head, but you want something like this:
> >
> >  -re "error:.*$gdb_prompt $" {
> >    fail "msg"
> >  }
> >  -re "\\\[New thread \[^\]\]*\\\]\r\n" {
> >    exp_continue
> >  }
> >  -re "\\\[Thread \[^\]\]* exited\\\]\r\n" {
> >    exp_continue
> >  }
> >
> >That will consume any thread create/delete messages, instead of leaving
> >them in the buffer.
> >
> 
> Forgive my ignorance of the test macros, but how does this terminate?  I 
> tried doing it this way and it hangs.  It never gets a gdb prompt and spits 
> out endless messages because it is essentially an infinite loop.  I only 
> care to see the "Continuing" message and don't care about the rest of the 
> stuff.
> 
> I stole the original send_gdb / gdb_expect code from schedlock.exp which 
> has the same problem.

Hmm, I must have misunderstood.  Which part of the test gives you a
full buffer error?  I'm thinking of something like this:

gdb_test_multiple "continue" {
  -re "error:.*$gdb_prompt $" {
    fail "msg"
  }
  -re "Continuing" {
    pass "msg"
  }
}

after 1000 { send_gdb "\003" }
gdb_test_multiple "" {
  -re "error:.*$gdb_prompt $" {
    fail "msg"
  }
  -re "$gdb_prompt $" {
    pass "msg"
  }
  -re "\\\[New thread \[^\]\]*\\\]\r\n" {
    exp_continue
  }
  -re "\\\[Thread \[^\]\]* exited\\\]\r\n" {
    exp_continue
  }
}

Those will catch any error messages and treat them as failures.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] New thread test to exercise Daniel's Patch
  2004-03-29 20:42                                         ` Daniel Jacobowitz
@ 2004-03-29 21:04                                           ` Jeff Johnston
  2004-03-29 21:34                                             ` Daniel Jacobowitz
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff Johnston @ 2004-03-29 21:04 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb-patches

Daniel Jacobowitz wrote:
> On Mon, Mar 29, 2004 at 03:18:35PM -0500, Jeff Johnston wrote:
> 
>>Daniel Jacobowitz wrote:
>>
>>>On Mon, Mar 29, 2004 at 01:06:07PM -0500, Jeff Johnston wrote:
>>>
>>>
>>>>>>+send_gdb "continue\n"
>>>>>>+gdb_expect {
>>>>>
>>>>>
>>>>>Everywhere you're using gdb_expect, please use gdb_test_multiple
>>>>>instead.  For the "after" tests, you can use gdb_test_multiple with
>>>>>"" as the first argument.
>>>>>
>>>>
>>>>I tried this initially but I kept getting "Error: internal buffer is 
>>>>full". I tried lowering the "after" time which is why it ended up 100 
>>>>below but that didn't solve the problem.  Any suggestions on how to avoid 
>>>>the "full_buffer" error.
>>>
>>>
>>>I'm not sure about the regexes, since I'm just pulling this out of the
>>>top of my head, but you want something like this:
>>>
>>> -re "error:.*$gdb_prompt $" {
>>>   fail "msg"
>>> }
>>> -re "\\\[New thread \[^\]\]*\\\]\r\n" {
>>>   exp_continue
>>> }
>>> -re "\\\[Thread \[^\]\]* exited\\\]\r\n" {
>>>   exp_continue
>>> }
>>>
>>>That will consume any thread create/delete messages, instead of leaving
>>>them in the buffer.
>>>
>>
>>Forgive my ignorance of the test macros, but how does this terminate?  I 
>>tried doing it this way and it hangs.  It never gets a gdb prompt and spits 
>>out endless messages because it is essentially an infinite loop.  I only 
>>care to see the "Continuing" message and don't care about the rest of the 
>>stuff.
>>
>>I stole the original send_gdb / gdb_expect code from schedlock.exp which 
>>has the same problem.
> 
> 
> Hmm, I must have misunderstood.  Which part of the test gives you a
> full buffer error?  I'm thinking of something like this:
> 
> gdb_test_multiple "continue" {
>   -re "error:.*$gdb_prompt $" {
>     fail "msg"
>   }
>   -re "Continuing" {
>     pass "msg"
>   }
> }
> 
> after 1000 { send_gdb "\003" }
> gdb_test_multiple "" {
>   -re "error:.*$gdb_prompt $" {
>     fail "msg"
>   }
>   -re "$gdb_prompt $" {
>     pass "msg"
>   }
>   -re "\\\[New thread \[^\]\]*\\\]\r\n" {
>     exp_continue
>   }
>   -re "\\\[Thread \[^\]\]* exited\\\]\r\n" {
>     exp_continue
>   }
> }
> 
> Those will catch any error messages and treat them as failures.
> 

Ok, I had the thread checks in the Continue test.  If I change it to the way 
above, now I get 3 "Process no longer exists "messages but the test completes. :(

I'm really starting to hate these test macros.  What is the reasoning behind 
avoiding send_gdb / gdb_expect?

-- Jeff J.




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] New thread test to exercise Daniel's Patch
  2004-03-29 21:04                                           ` Jeff Johnston
@ 2004-03-29 21:34                                             ` Daniel Jacobowitz
  2004-03-29 21:59                                               ` Jeff Johnston
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Jacobowitz @ 2004-03-29 21:34 UTC (permalink / raw)
  To: Jeff Johnston; +Cc: gdb-patches

On Mon, Mar 29, 2004 at 04:04:00PM -0500, Jeff Johnston wrote:
> Ok, I had the thread checks in the Continue test.  If I change it to the 
> way above, now I get 3 "Process no longer exists "messages but the test 
> completes. :(
> 
> I'm really starting to hate these test macros.  What is the reasoning 
> behind avoiding send_gdb / gdb_expect?

They don't automatically handle things like internal errors,
disconnects, et cetera.  Want to post your current version and I'll
give it a try?

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] New thread test to exercise Daniel's Patch
  2004-03-29 21:34                                             ` Daniel Jacobowitz
@ 2004-03-29 21:59                                               ` Jeff Johnston
  2004-03-29 22:32                                                 ` Daniel Jacobowitz
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff Johnston @ 2004-03-29 21:59 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb-patches

[-- Attachment #1: Type: text/plain, Size: 536 bytes --]

Daniel Jacobowitz wrote:
> On Mon, Mar 29, 2004 at 04:04:00PM -0500, Jeff Johnston wrote:
> 
>>Ok, I had the thread checks in the Continue test.  If I change it to the 
>>way above, now I get 3 "Process no longer exists "messages but the test 
>>completes. :(
>>
>>I'm really starting to hate these test macros.  What is the reasoning 
>>behind avoiding send_gdb / gdb_expect?
> 
> 
> They don't automatically handle things like internal errors,
> disconnects, et cetera.  Want to post your current version and I'll
> give it a try?
> 

[-- Attachment #2: manythreads.c --]
[-- Type: text/plain, Size: 664 bytes --]

#include <pthread.h>
#include <stdio.h>

void *
thread_function (void *arg)
{
  int x = (int)arg;

  printf ("Thread <%d> executing\n", x);

  return NULL;
}

int 
main (int argc, char **argv)
{
  pthread_attr_t attr;
  pthread_t threads[256];
  int i, j;

  pthread_attr_init (&attr);

  /* Create a ton of quick-executing threads, then wait for them to
     complete.  */
  for (i = 0; i < 1000; ++i) 
    {
      for (j = 0; j < 256; ++j)
	{
	  pthread_create (&threads[j], &attr, thread_function, 
			  (void *)(i * 1000 + j));
	}

      for (j = 0; j < 256; ++j)
	{
	  pthread_join (threads[j], NULL);
	}
    }

  pthread_attr_destroy (&attr);

  return 0;
}

[-- Attachment #3: manythreads.exp --]
[-- Type: text/plain, Size: 3495 bytes --]

# manythreads.exp -- Expect script to test stopping many threads
# Copyright (C) 2004 Free Software Foundation, Inc.

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
# 
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
# 
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  

# Please email any bugs, comments, and/or additions to this file to:
# bug-gdb@prep.ai.mit.edu

# This file was written by Jeff Johnston. (jjohnstn@redhat.com)

if $tracelevel then {
	strace $tracelevel
}

set prms_id 0
set bug_id 0

set testfile "manythreads"
set srcfile ${testfile}.c
set binfile ${objdir}/${subdir}/${testfile}

if {[gdb_compile_pthreads "${srcdir}/${subdir}/${srcfile}" "${binfile}" executable [list debug "incdir=${objdir}"]] != "" } {
    return -1
}

gdb_start
gdb_reinitialize_dir $srcdir/$subdir
gdb_load ${binfile}
gdb_test "set print sevenbit-strings" ""
runto_main

# We'll need this when we send_gdb a ^C to GDB.  Need to do it before we
# run the program and gdb starts saving and restoring tty states.
# On Ultrix, we don't need it and it is really slow (because shell_escape
# doesn't use vfork).
if ![istarget "*-*-ultrix*"] then {
    gdb_test "shell stty intr '^C'" ""
}

gdb_test_multiple "continue" "first continue" {
  -re "ERROR:.*$gdb_prompt $" {
    fail "$message"
  }
  -re "Continuing" {
    pass "$message"
  }
}

# Send a Ctrl-C and verify that we can do info threads and continue
after 1000 {send_gdb "\003"}
gdb_test_multiple "" "stop threads 1" {
  -re "\\\[New thread \[^\]\]*\\\]\r\n" {
    exp_continue
  }
  -re "\\\[Thread \[^\]\]* exited\\\]\r\n" {
    exp_continue
  }
  -re "Thread.*executing\r\n" {
    exp_continue
  }
  -re "Program received signal SIGINT.*$gdb_prompt $"
    {
      pass "$message"
    }
  timeout
    {
      fail "$message (timeout)"
    }
}

gdb_test "info threads" ".*1 Thread.*.LWP.*"

gdb_test_multiple "continue" "second continue" {
  -re "ERROR:.*$gdb_prompt $" {
    fail "$message"
  }
  -re "Continuing" {
    pass "$message"
  }
}

# Send another Ctrl-C and verify that we can do info threads and quit
after 1000 {send_gdb "\003"}
gdb_test_multiple "" "stop threads 2" {
  -re "\\\[New thread \[^\]\]*\\\]\r\n" {
    exp_continue
  }
  -re "\\\[Thread \[^\]\]* exited\\\]\r\n" {
    exp_continue
  }
  -re "Thread.*executing\r\n" {
    exp_continue
  }
  -re "Program received signal SIGINT.*$gdb_prompt $"
    {
      pass "stop threads 2"
    }
  timeout
    {
      fail "stop threads 2 (timeout)"
    }
} 

gdb_test_multiple "quit" "GDB exits after stopping multithreaded program" {
    -re "The program is running.  Exit anyway\\? \\(y or n\\) $" {
        gdb_test_multiple "y" "$message" {
          eof {
            pass "GDB exits after stopping multithreaded program"
          }
        }
    }
    eof {
        pass "GDB exits after stopping multithreaded program"
    }
    timeout {
        fail "GDB exits after stopping multithreaded program (timeout)"
    }
}


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] New thread test to exercise Daniel's Patch
  2004-03-29 21:59                                               ` Jeff Johnston
@ 2004-03-29 22:32                                                 ` Daniel Jacobowitz
  2004-03-29 22:58                                                   ` Jeff Johnston
  0 siblings, 1 reply; 30+ messages in thread
From: Daniel Jacobowitz @ 2004-03-29 22:32 UTC (permalink / raw)
  To: Jeff Johnston; +Cc: gdb-patches

[-- Attachment #1: Type: text/plain, Size: 1513 bytes --]

On Mon, Mar 29, 2004 at 04:59:30PM -0500, Jeff Johnston wrote:
> Daniel Jacobowitz wrote:
> >On Mon, Mar 29, 2004 at 04:04:00PM -0500, Jeff Johnston wrote:
> >
> >>Ok, I had the thread checks in the Continue test.  If I change it to the 
> >>way above, now I get 3 "Process no longer exists "messages but the test 
> >>completes. :(
> >>
> >>I'm really starting to hate these test macros.  What is the reasoning 
> >>behind avoiding send_gdb / gdb_expect?
> >
> >
> >They don't automatically handle things like internal errors,
> >disconnects, et cetera.  Want to post your current version and I'll
> >give it a try?
> >

>   -re "Program received signal SIGINT.*$gdb_prompt $"
>     {
>       pass "$message"
>     }

Nothing automatically sets $message.  The ERROR: you saw is DejaGNU's
generic failure message for things like syntax errors in expect blocks;
I sent a patch to the dejagnu list a month or so ago to print more
helpful information.  Try the attached script instead.

Oddly, running a fixed manythreads.exp with an unpatched GDB, I get a
SIGSEGV in pthread_join.  It shows up as a FAIL (yay).  The patched GDB
shows up as nine PASSes (yay).  Re-running it a number of times, the
SIGSEGV came and went intermittently.

Running the test with LinuxThreads an internal error (lp->status == 0
assertion failed) came and went also.

I guess that makes it a good test.... now someone will have to _fix_
those.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer

[-- Attachment #2: manythreads.exp --]
[-- Type: text/plain, Size: 3423 bytes --]

# manythreads.exp -- Expect script to test stopping many threads
# Copyright (C) 2004 Free Software Foundation, Inc.

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
# 
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
# 
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.  

# Please email any bugs, comments, and/or additions to this file to:
# bug-gdb@prep.ai.mit.edu

# This file was written by Jeff Johnston. (jjohnstn@redhat.com)

if $tracelevel then {
	strace $tracelevel
}

set prms_id 0
set bug_id 0

set testfile "manythreads"
set srcfile ${testfile}.c
set binfile ${objdir}/${subdir}/${testfile}

if {[gdb_compile_pthreads "${srcdir}/${subdir}/${srcfile}" "${binfile}" executable [list debug "incdir=${objdir}"]] != "" } {
    return -1
}

gdb_start
gdb_reinitialize_dir $srcdir/$subdir
gdb_load ${binfile}
gdb_test "set print sevenbit-strings" ""
runto_main

# We'll need this when we send_gdb a ^C to GDB.  Need to do it before we
# run the program and gdb starts saving and restoring tty states.
# On Ultrix, we don't need it and it is really slow (because shell_escape
# doesn't use vfork).
if ![istarget "*-*-ultrix*"] then {
    gdb_test "shell stty intr '^C'" ""
}

set message "first continue"
gdb_test_multiple "continue" "first continue" {
  -re "error:.*$gdb_prompt $" {
    fail "$message"
  }
  -re "Continuing" {
    pass "$message"
  }
}

# Send a Ctrl-C and verify that we can do info threads and continue
after 1000 {send_gdb "\003"}
set message "stop threads 1"
gdb_test_multiple "" "stop threads 1" {
  -re "\\\[New \[^\]\]*\\\]\r\n" {
    exp_continue
  }
  -re "\\\[\[^\]\]* exited\\\]\r\n" {
    exp_continue
  }
  -re "Thread \[^\n\]* executing\r\n" {
    exp_continue
  }
  -re "Program received signal SIGINT.*$gdb_prompt $"
    {
      pass "$message"
    }
  timeout
    {
      fail "$message (timeout)"
    }
}

gdb_test "info threads" ".*1 Thread.*.LWP.*"

set message "second continue"
gdb_test_multiple "continue" "second continue" {
  -re "error:.*$gdb_prompt $" {
    fail "$message"
  }
  -re "Continuing" {
    pass "$message"
  }
}

# Send another Ctrl-C and verify that we can do info threads and quit
after 1000 {send_gdb "\003"}
set message "stop threads 2"
gdb_test_multiple "" "stop threads 2" {
  -re "\\\[New \[^\]\]*\\\]\r\n" {
    exp_continue
  }
  -re "\\\[\[^\]\]* exited\\\]\r\n" {
    exp_continue
  }
  -re "Thread \[^\n\]* executing\r\n" {
    exp_continue
  }
  -re "Program received signal SIGINT.*$gdb_prompt $"
    {
      pass "stop threads 2"
    }
} 

gdb_test_multiple "quit" "GDB exits after stopping multithreaded program" {
    -re "The program is running.  Exit anyway\\? \\(y or n\\) $" {
	send_gdb "y\n"
	exp_continue
    }
    eof {
        pass "GDB exits after stopping multithreaded program"
    }
    timeout {
        fail "GDB exits after stopping multithreaded program (timeout)"
    }
}


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] New thread test to exercise Daniel's Patch
  2004-03-29 22:32                                                 ` Daniel Jacobowitz
@ 2004-03-29 22:58                                                   ` Jeff Johnston
  2004-03-29 23:57                                                     ` Daniel Jacobowitz
  0 siblings, 1 reply; 30+ messages in thread
From: Jeff Johnston @ 2004-03-29 22:58 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb-patches

Daniel Jacobowitz wrote:
> On Mon, Mar 29, 2004 at 04:59:30PM -0500, Jeff Johnston wrote:
> 
>>Daniel Jacobowitz wrote:
>>
>>>On Mon, Mar 29, 2004 at 04:04:00PM -0500, Jeff Johnston wrote:
>>>
>>>
>>>>Ok, I had the thread checks in the Continue test.  If I change it to the 
>>>>way above, now I get 3 "Process no longer exists "messages but the test 
>>>>completes. :(
>>>>
>>>>I'm really starting to hate these test macros.  What is the reasoning 
>>>>behind avoiding send_gdb / gdb_expect?
>>>
>>>
>>>They don't automatically handle things like internal errors,
>>>disconnects, et cetera.  Want to post your current version and I'll
>>>give it a try?
>>>
> 
> 
>>  -re "Program received signal SIGINT.*$gdb_prompt $"
>>    {
>>      pass "$message"
>>    }
> 
> 
> Nothing automatically sets $message.  The ERROR: you saw is DejaGNU's
> generic failure message for things like syntax errors in expect blocks;
> I sent a patch to the dejagnu list a month or so ago to print more
> helpful information.  Try the attached script instead.
> 

Thanks for restoring my sanity.  I got mixed up by the fact that the 
gdb_test_multiple macro uses $message inside itself based on the passed 
argument.  I thought the conditions I specify would be able to use the same 
variable.

It runs successfully for me on my RHEL3 system.

> Oddly, running a fixed manythreads.exp with an unpatched GDB, I get a
> SIGSEGV in pthread_join.  It shows up as a FAIL (yay).  The patched GDB
> shows up as nine PASSes (yay).  Re-running it a number of times, the
> SIGSEGV came and went intermittently.
> 
> Running the test with LinuxThreads an internal error (lp->status == 0
> assertion failed) came and went also.
> 
> I guess that makes it a good test.... now someone will have to _fix_
> those.
> 

Do I resubmit the patched test or has it been approved?

-- Jeff J.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] New thread test to exercise Daniel's Patch
  2004-03-29 22:58                                                   ` Jeff Johnston
@ 2004-03-29 23:57                                                     ` Daniel Jacobowitz
  0 siblings, 0 replies; 30+ messages in thread
From: Daniel Jacobowitz @ 2004-03-29 23:57 UTC (permalink / raw)
  To: Jeff Johnston; +Cc: gdb-patches

On Mon, Mar 29, 2004 at 05:58:44PM -0500, Jeff Johnston wrote:
> Do I resubmit the patched test or has it been approved?

It'll have to be approved by the gdb.threads maintainers, so I
recommend resubmitting a full patch/changelog and copying Michael S.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [patch] Use TD_DEATH and PTRACE_EVENT_CLONE when available (was: Re: [RFC]: fix for recycled thread ids)
  2004-03-26 17:59                           ` [patch] Use TD_DEATH and PTRACE_EVENT_CLONE when available (was: Re: [RFC]: fix for recycled thread ids) Daniel Jacobowitz
  2004-03-26 18:14                             ` [patch] Use TD_DEATH and PTRACE_EVENT_CLONE when available Jeff Johnston
@ 2004-03-29 18:07                             ` Daniel Jacobowitz
  1 sibling, 0 replies; 30+ messages in thread
From: Daniel Jacobowitz @ 2004-03-29 18:07 UTC (permalink / raw)
  To: gdb-patches

On Fri, Mar 26, 2004 at 12:59:06PM -0500, Daniel Jacobowitz wrote:
> On Thu, Mar 25, 2004 at 03:22:13PM -0500, Jeff Johnston wrote:
> > 
> > Daniel Jacobowitz wrote:
> > >On Wed, Mar 24, 2004 at 11:39:50PM -0500, Daniel Jacobowitz wrote:
> > >
> > >>Do you have any code for PTRACE_EVENT_CLONE yet, or should I put
> > >>something together in the morning to verify whether that's the problem?
> > >
> > >
> > >Here you go.  Again, this patch is obviously not ready to go into GDB,
> > >but I have not been able to make it misbehave yet.  I don't know if all
> > >the bits it needs work right in RHEL3, or if my testing was conclusive.
> > >
> > >The highlights:
> > >  - Includes most of the previous patch
> > >  - Uses PTRACE_EVENT_CLONE to attach to new threads
> > >  - Moves handling of events closer to the waitpid call
> > >
> > >There are some potential races but I haven't hit any of them in
> > >practice.  I suspect that with a heavy fork or vfork load (not clone)
> > >you could produce interesting failure modes.
> > >
> > >Give it a try, please.  If it works I'll clean it up.
> > >
> > 
> > Works great.  I was unable to get it to misbehave on RHEL3.  Please go 
> > ahead cleaning it up.
> 
> Here's the final patch. The description of the patch is unchanged from
> the above.  The background, for those who did not read the whole
> discussion: in NPTL, a signal delivered to the process when any thread
> is not attached is likely to terminate the process.  So we need to be
> careful to:
>  - know when threads have exited, so that we can attach to new threads
>    which reuse the same thread ID
>  - attach to threads as early as possible
> 
> Both of these could cause problems in LinuxThreads, but they were less
> drastic.
> 
> Tested with both LinuxThreads and NPTL.  I'll commit this to HEAD on
> Monday, barring objections.  We've missed the boat for GDB 6.1 at this
> point.

I've checked this in for HEAD.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC]: fix for recycled thread ids
  2004-03-19  1:53 ` Daniel Jacobowitz
  2004-03-19 18:44   ` Jeff Johnston
@ 2004-03-19 20:33   ` Andrew Cagney
  1 sibling, 0 replies; 30+ messages in thread
From: Andrew Cagney @ 2004-03-19 20:33 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: Jeff Johnston, gdb-patches

> On Thu, Mar 18, 2004 at 07:36:25PM -0500, Jeff Johnston wrote:
> 
>>> The following patch fixes a problem when a user application creates a 
>>> thread shortly after another thread has completed.  For nptl, thread ids 
>>> are addresses. If a thread completes/dies, the tid is available for reuse 
>>>  by a new thread.
> 
> 
> Does NPTL re-use the TID quickly, or cycle around the way LT did so
> that we only see this under high thread pressure?

Immediatly.  Since they are pointers NPTL re-cycles them as soon as they 
are freed (on the next thread create and, obviously, in preference to 
allocating a new buffer) -- very different to the linuxthreads model.

As a consequence any heavily threaded app quickly shows the problem - 
java (esp eclipse) for instance.

Andrew



^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2004-03-29 23:57 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-03-19  0:36 [RFC]: fix for recycled thread ids Jeff Johnston
2004-03-19  1:53 ` Daniel Jacobowitz
2004-03-19 18:44   ` Jeff Johnston
2004-03-19 19:01     ` Daniel Jacobowitz
2004-03-19 19:35       ` Jeff Johnston
2004-03-19 19:40         ` Daniel Jacobowitz
2004-03-19 21:32           ` Jeff Johnston
2004-03-24 15:51             ` Daniel Jacobowitz
2004-03-24 16:56               ` Daniel Jacobowitz
2004-03-24 21:56                 ` Jeff Johnston
2004-03-25  0:46                   ` Jeff Johnston
2004-03-25  4:39                     ` Daniel Jacobowitz
2004-03-25 16:34                       ` Daniel Jacobowitz
2004-03-25 20:22                         ` Jeff Johnston
2004-03-26 17:59                           ` [patch] Use TD_DEATH and PTRACE_EVENT_CLONE when available (was: Re: [RFC]: fix for recycled thread ids) Daniel Jacobowitz
2004-03-26 18:14                             ` [patch] Use TD_DEATH and PTRACE_EVENT_CLONE when available Jeff Johnston
2004-03-26 21:13                               ` [patch] New thread test to exercise Daniel's Patch Jeff Johnston
2004-03-26 21:19                                 ` Daniel Jacobowitz
2004-03-29 18:06                                   ` Jeff Johnston
2004-03-29 18:11                                     ` Daniel Jacobowitz
2004-03-29 20:18                                       ` Jeff Johnston
2004-03-29 20:42                                         ` Daniel Jacobowitz
2004-03-29 21:04                                           ` Jeff Johnston
2004-03-29 21:34                                             ` Daniel Jacobowitz
2004-03-29 21:59                                               ` Jeff Johnston
2004-03-29 22:32                                                 ` Daniel Jacobowitz
2004-03-29 22:58                                                   ` Jeff Johnston
2004-03-29 23:57                                                     ` Daniel Jacobowitz
2004-03-29 18:07                             ` [patch] Use TD_DEATH and PTRACE_EVENT_CLONE when available (was: Re: [RFC]: fix for recycled thread ids) Daniel Jacobowitz
2004-03-19 20:33   ` [RFC]: fix for recycled thread ids Andrew Cagney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox