* [RFC]: fix for recycled thread ids
@ 2004-03-19 0:36 Jeff Johnston
2004-03-19 1:53 ` Daniel Jacobowitz
0 siblings, 1 reply; 30+ messages in thread
From: Jeff Johnston @ 2004-03-19 0:36 UTC (permalink / raw)
To: gdb-patches
[-- Attachment #1: Type: text/plain, Size: 2737 bytes --]
The following patch fixes a problem when a user application creates a thread
shortly after another thread has completed. For nptl, thread ids are addresses.
If a thread completes/dies, the tid is available for reuse by a new thread.
On RH9 and RHEL3, nptl threads do not have exit events associated with them. I
have already discussed this with Daniel J. who feels that the kernels are not
doing the right thing, but regardless, the current and previous RH nptl kernels
are behaving this way and gdb needs to handle it. As such, when a new thread is
created, if it is reusing the tid of a previous thread that gdb hasn't figured
out isn't around any more, gdb ignores the create event and the new thread is
not added. Ignoring the event is done because it is possible for gdb to find
out about the thread before it's creation event is reported and so the create
event can be redundant information.
The following fix removes the problem by also checking if the original lwp given
to the old thread in the list matches the lwp for the new thread. IIRC, there
is at least one platform that allows threads to change their lwps dynamically.
I do not believe that platform uses the thread-db layer, but to be safe, I did
not use a direct compare between the original lwp and the new lwp. Instead, I
added a target vector routine that by default never returns an unequal
comparison. For linux, the comparison routine points to a new routine in
lin-lwp.c which does the expected comparison. The comparison is based on C
compare routines whereby 0 means equal and non-zero means unequal.
Now, if a thread id for a new thread is found to already be on the list, the
target comparison is made between the original lwp stored for the old thread and
the lwp for the new thread. If the comparison is unequal, the old thread is
deleted and then the new thread is added. Otherwise, the new thread event is
ignored as before.
Ok to commit?
-- Jeff J.
2004-03-18 Jeff Johnston <jjohnstn@redhat.com>
* thread-db.c (struct private_thread_info): Add orig_lwp field.
(attach_thread): Save original lwp for thread after attaching.
(check_event): When a create event occurs and the thread is already
in the thread list, check if the old thread has gone away and the
thread id is being recycled.
* target.h (struct target_ops): Add to_thread_compare function.
* target.c (update_current_target): Add to_thread_compare support.
(debug_to_thread_compare): New function.
(setup_target_debug): Add debug_to_thread_compare.
* lin-lwp.c (lin_lwp_thread_compare): New function.
(init_lin_lwp_ops): Add lin_lwp_thread_compare.
[-- Attachment #2: thread.patch --]
[-- Type: text/plain, Size: 5910 bytes --]
Index: thread-db.c
===================================================================
RCS file: /cvs/src/src/gdb/thread-db.c,v
retrieving revision 1.37
diff -u -p -r1.37 thread-db.c
--- thread-db.c 29 Feb 2004 02:39:47 -0000 1.37
+++ thread-db.c 18 Mar 2004 23:14:01 -0000
@@ -156,6 +156,8 @@ struct private_thread_info
td_thrhandle_t th;
td_thrinfo_t ti;
+
+ lwpid_t orig_lwp;
};
\f
@@ -714,6 +716,9 @@ attach_thread (ptid_t ptid, const td_thr
ATTACH_LWP (BUILD_LWP (ti_p->ti_lid, GET_PID (ptid)), 0);
#endif
+ /* Save original lwp for future reference. */
+ tp->private->orig_lwp = ti_p->ti_lid;
+
/* Enable thread event reporting for this thread. */
err = td_thr_event_enable_p (th_p, 1);
if (err != TD_OK)
@@ -850,8 +855,35 @@ check_event (ptid_t ptid)
/* We may already know about this thread, for instance when the
user has issued the `info threads' command before the SIGTRAP
- for hitting the thread creation breakpoint was reported. */
- if (!in_thread_list (ptid))
+ for hitting the thread creation breakpoint was reported.
+ However, we have a problem when threads die and new threads
+ are created shortly after.
+
+ If a thread dies and gdb is unaware the thread has died,
+ a new thread could reuse the tid that is no longer being
+ used. Thus, if we find the new tid is already in our list,
+ we must punt to the lower level to tell us if the new thread
+ and old thread are one and the same. If they are different,
+ the old thread must have gone away and the tid is being
+ recycled.
+
+ If we know the original thread has gone away, we can
+ delete the previous entry with the tid in the thread list and then
+ attach the new thread. Otherwise, we ignore the create event. */
+
+ if (in_thread_list (ptid))
+ {
+ struct thread_info *t = find_thread_pid (ptid);
+ if (target_beneath->to_thread_compare (BUILD_LWP (t->private->orig_lwp,
+ GET_PID (ptid)),
+ BUILD_LWP (ti.ti_lid,
+ GET_PID (ptid))))
+ {
+ delete_thread (ptid);
+ attach_thread (ptid, msg.th_p, &ti, 1);
+ }
+ }
+ else
attach_thread (ptid, msg.th_p, &ti, 1);
break;
Index: target.h
===================================================================
RCS file: /cvs/src/src/gdb/target.h,v
retrieving revision 1.57
diff -u -p -r1.57 target.h
--- target.h 4 Feb 2004 21:49:55 -0000 1.57
+++ target.h 18 Mar 2004 23:14:01 -0000
@@ -369,6 +369,7 @@ struct target_ops
int (*to_can_run) (void);
void (*to_notice_signals) (ptid_t ptid);
int (*to_thread_alive) (ptid_t ptid);
+ int (*to_thread_compare) (ptid_t ptid1, ptid_t ptid2);
void (*to_find_new_threads) (void);
char *(*to_pid_to_str) (ptid_t);
char *(*to_extra_thread_info) (struct thread_info *);
Index: target.c
===================================================================
RCS file: /cvs/src/src/gdb/target.c,v
retrieving revision 1.71
diff -u -p -r1.71 target.c
--- target.c 9 Mar 2004 16:16:52 -0000 1.71
+++ target.c 18 Mar 2004 23:14:01 -0000
@@ -161,6 +161,8 @@ static void debug_to_notice_signals (pti
static int debug_to_thread_alive (ptid_t);
+static int debug_to_thread_compare (ptid_t, ptid_t);
+
static void debug_to_stop (void);
/* Pointer to array of target architecture structures; the size of the
@@ -424,6 +426,7 @@ update_current_target (void)
INHERIT (to_can_run, t);
INHERIT (to_notice_signals, t);
INHERIT (to_thread_alive, t);
+ INHERIT (to_thread_compare, t);
INHERIT (to_find_new_threads, t);
INHERIT (to_pid_to_str, t);
INHERIT (to_extra_thread_info, t);
@@ -601,6 +604,9 @@ update_current_target (void)
de_fault (to_thread_alive,
(int (*) (ptid_t))
return_zero);
+ de_fault (to_thread_compare,
+ (int (*) (ptid_t, ptid_t))
+ return_zero);
de_fault (to_find_new_threads,
(void (*) (void))
target_ignore);
@@ -2263,6 +2269,19 @@ debug_to_thread_alive (ptid_t ptid)
return retval;
}
+static int
+debug_to_thread_compare (ptid_t ptid1, ptid_t ptid2)
+{
+ int retval;
+
+ retval = debug_target.to_thread_compare (ptid1, ptid2);
+
+ fprintf_unfiltered (gdb_stdlog, "target_thread_compare (%ld, %ld) = %d\n",
+ TIDGET (ptid1), TIDGET (ptid2), retval);
+
+ return retval;
+}
+
static void
debug_to_find_new_threads (void)
{
@@ -2393,6 +2412,7 @@ setup_target_debug (void)
current_target.to_can_run = debug_to_can_run;
current_target.to_notice_signals = debug_to_notice_signals;
current_target.to_thread_alive = debug_to_thread_alive;
+ current_target.to_thread_compare = debug_to_thread_compare;
current_target.to_find_new_threads = debug_to_find_new_threads;
current_target.to_stop = debug_to_stop;
current_target.to_xfer_partial = debug_to_xfer_partial;
Index: lin-lwp.c
===================================================================
RCS file: /cvs/src/src/gdb/lin-lwp.c,v
retrieving revision 1.52
diff -u -p -r1.52 lin-lwp.c
--- lin-lwp.c 8 Oct 2003 20:05:56 -0000 1.52
+++ lin-lwp.c 18 Mar 2004 23:14:01 -0000
@@ -1728,6 +1728,13 @@ lin_lwp_thread_alive (ptid_t ptid)
return 1;
}
+/* Compare thread lwps and return 0 if equal, non-zero if not equal. */
+static int
+lin_lwp_thread_compare (ptid_t old, ptid_t new)
+{
+ return (GET_LWP (old) != GET_LWP (new));
+}
+
static char *
lin_lwp_pid_to_str (ptid_t ptid)
{
@@ -1764,6 +1771,7 @@ init_lin_lwp_ops (void)
lin_lwp_ops.to_create_inferior = lin_lwp_create_inferior;
lin_lwp_ops.to_mourn_inferior = lin_lwp_mourn_inferior;
lin_lwp_ops.to_thread_alive = lin_lwp_thread_alive;
+ lin_lwp_ops.to_thread_compare = lin_lwp_thread_compare;
lin_lwp_ops.to_pid_to_str = lin_lwp_pid_to_str;
lin_lwp_ops.to_post_startup_inferior = child_post_startup_inferior;
lin_lwp_ops.to_post_attach = child_post_attach;
^ permalink raw reply [flat|nested] 30+ messages in thread* Re: [RFC]: fix for recycled thread ids 2004-03-19 0:36 [RFC]: fix for recycled thread ids Jeff Johnston @ 2004-03-19 1:53 ` Daniel Jacobowitz 2004-03-19 18:44 ` Jeff Johnston 2004-03-19 20:33 ` [RFC]: fix for recycled thread ids Andrew Cagney 0 siblings, 2 replies; 30+ messages in thread From: Daniel Jacobowitz @ 2004-03-19 1:53 UTC (permalink / raw) To: Jeff Johnston; +Cc: gdb-patches On Thu, Mar 18, 2004 at 07:36:25PM -0500, Jeff Johnston wrote: > The following patch fixes a problem when a user application creates a > thread shortly after another thread has completed. For nptl, thread ids > are addresses. If a thread completes/dies, the tid is available for reuse > by a new thread. Does NPTL re-use the TID quickly, or cycle around the way LT did so that we only see this under high thread pressure? > On RH9 and RHEL3, nptl threads do not have exit events associated with > them. I have already discussed this with Daniel J. who feels that the > kernels are not doing the right thing, but regardless, the current and > previous RH nptl kernels are behaving this way and gdb needs to handle it. > As such, when a new thread is created, if it is reusing the tid of a > previous thread that gdb hasn't figured out isn't around any more, gdb > ignores the create event and the new thread is not added. Ignoring the > event is done because it is possible for gdb to find out about the thread > before it's creation event is reported and so the create event can be > redundant information. What I haven't seen a good explanation of is what problem this causes. If a thread goes away, and then a new thread using the same ID is created, and then we stop, what do we lose besides the cosmetic fact that there is no [New Thread] message? Does anything go wrong? Also, I would like the issue of whether or not it is a kernel bug resolved before we discuss working around it in GDB. > The following fix removes the problem by also checking if the original lwp > given to the old thread in the list matches the lwp for the new thread. > IIRC, there is at least one platform that allows threads to change their > lwps dynamically. I do not believe that platform uses the thread-db layer, > but to be safe, I did not use a direct compare between the original lwp and > the new lwp. Instead, I added a target vector routine that by default > never returns an unequal comparison. For linux, the comparison routine > points to a new routine in lin-lwp.c which does the expected comparison. > The comparison is based on C compare routines whereby 0 means equal and > non-zero means unequal. At least one such "platform" was NGPT, which runs on Linux (and worked OK with thread-db.c, too). It's obsolete nowadays. > Now, if a thread id for a new thread is found to already be on the list, > the target comparison is made between the original lwp stored for the old > thread and the lwp for the new thread. If the comparison is unequal, the > old thread is deleted and then the new thread is added. Otherwise, the new > thread event is ignored as before. If we aren't going to support migration, then why not just store the LWP ID in the thread PTID structure? That's why it has three fields, I'd guess. I don't think that the target hook makes much sense, since thread-db.c is basically Linux-specific at this point and since the contents being passed to the hook are thread-db-specific. -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC]: fix for recycled thread ids 2004-03-19 1:53 ` Daniel Jacobowitz @ 2004-03-19 18:44 ` Jeff Johnston 2004-03-19 19:01 ` Daniel Jacobowitz 2004-03-19 20:33 ` [RFC]: fix for recycled thread ids Andrew Cagney 1 sibling, 1 reply; 30+ messages in thread From: Jeff Johnston @ 2004-03-19 18:44 UTC (permalink / raw) To: Daniel Jacobowitz; +Cc: gdb-patches [-- Attachment #1: Type: text/plain, Size: 4389 bytes --] Daniel Jacobowitz wrote: > On Thu, Mar 18, 2004 at 07:36:25PM -0500, Jeff Johnston wrote: > >>The following patch fixes a problem when a user application creates a >>thread shortly after another thread has completed. For nptl, thread ids >>are addresses. If a thread completes/dies, the tid is available for reuse >> by a new thread. > > > Does NPTL re-use the TID quickly, or cycle around the way LT did so > that we only see this under high thread pressure? > I can't say for sure as I don't maintain libthread_db. The test case in question does create high thread pressure, but I think it would be a mistake to generalize and think that this couldn't happen in an existing application. > >>On RH9 and RHEL3, nptl threads do not have exit events associated with >>them. I have already discussed this with Daniel J. who feels that the >>kernels are not doing the right thing, but regardless, the current and >>previous RH nptl kernels are behaving this way and gdb needs to handle it. >>As such, when a new thread is created, if it is reusing the tid of a >>previous thread that gdb hasn't figured out isn't around any more, gdb >>ignores the create event and the new thread is not added. Ignoring the >>event is done because it is possible for gdb to find out about the thread >>before it's creation event is reported and so the create event can be >>redundant information. > > > What I haven't seen a good explanation of is what problem this causes. > If a thread goes away, and then a new thread using the same ID is > created, and then we stop, what do we lose besides the cosmetic fact > that there is no [New Thread] message? Does anything go wrong? > > Also, I would like the issue of whether or not it is a kernel bug > resolved before we discuss working around it in GDB. > The problem is if a global signal is passed on to the inferior program when there are threads we have not attached to, the process terminates. A Ctrl-C is such a signal. In the example program, we only attach to the first 100 threads and when the Ctrl-C is issued, we get: ptrace: No such process. thread_db_get_info: cannot get thread info: generic error The end-user is cooked. Regarding resolving this issue as a kernel error, any fix for RHEL3 won't get shipped until Update 3. I know of no scheduled update for RH9 and this would not qualify as a security update. If I read your subsequent comments right, you wouldn't have a problem with me just making the lwp compare directly in the thread-db.c code. This would mean the change would be all of 6 lines of code. Does it make a lot of sense to withhold this change in light of the time-frame a kernel fix could be issued and the simplicity of the change itself? > >>The following fix removes the problem by also checking if the original lwp >>given to the old thread in the list matches the lwp for the new thread. >>IIRC, there is at least one platform that allows threads to change their >>lwps dynamically. I do not believe that platform uses the thread-db layer, >>but to be safe, I did not use a direct compare between the original lwp and >>the new lwp. Instead, I added a target vector routine that by default >>never returns an unequal comparison. For linux, the comparison routine >>points to a new routine in lin-lwp.c which does the expected comparison. >>The comparison is based on C compare routines whereby 0 means equal and >>non-zero means unequal. > > > At least one such "platform" was NGPT, which runs on Linux (and worked > OK with thread-db.c, too). It's obsolete nowadays. > > >>Now, if a thread id for a new thread is found to already be on the list, >>the target comparison is made between the original lwp stored for the old >>thread and the lwp for the new thread. If the comparison is unequal, the >>old thread is deleted and then the new thread is added. Otherwise, the new >>thread event is ignored as before. > > > If we aren't going to support migration, then why not just store the > LWP ID in the thread PTID structure? That's why it has three fields, > I'd guess. > > I don't think that the target hook makes much sense, since thread-db.c > is basically Linux-specific at this point and since the contents being > passed to the hook are thread-db-specific. > Would it make sense to rename thread-db.c to lin-thread-db.c? -- Jeff J. [-- Attachment #2: lphello.c --] [-- Type: text/plain, Size: 2973 bytes --] #include <signal.h> #include <pthread.h> #include <unistd.h> #include <string.h> #include <netdb.h> #include <stdio.h> #include <stdlib.h> /** * Compile with: * gcc -g -Wall -lpthread -o lphello lphello.c * * Author Magnus Ihse, ihse@bea.com */ #define THREAD_COUNT 100 #define ITER_COUNT 500000 static volatile int finishedArray[THREAD_COUNT]; static int pKey; void* setup_altstack(void) { stack_t ss; ss.ss_sp = malloc(20*1024); if (ss.ss_sp == 0) { return NULL; } ss.ss_size = 20*1024; ss.ss_flags = 0; if (sigaltstack(&ss, NULL) == -1) { //perror("sigaltstack"); return NULL; } return ss.ss_sp; } void takedown_altstack(void* stack) { struct sigaltstack ss; int result; if (stack == NULL) { return; } ss.ss_flags = SS_DISABLE; ss.ss_sp = (void*)47; // This value should be ignored when ss_flags is SS_DISABLE ss.ss_size = 29; // This value should be ignored when ss_flags is SS_DISABLE { result = sigaltstack(&ss, NULL); free(stack); } } void *threadfunc(void *arg) { int mypos = (int)(size_t)arg; int i; long square = 1; void* altstack = setup_altstack(); pthread_setspecific(pKey, arg); for (i=0; i < 1000; i++) { square = i*i + square*mypos; } finishedArray[mypos] = 1; takedown_altstack(altstack); return NULL; } int main(int argc, char ** argv) { pthread_t threads[THREAD_COUNT]; pthread_attr_t attr; int result; int i; int iteration; int finished; pthread_attr_init(&attr); pthread_attr_setstacksize(&attr, 128*1024); pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE); pthread_key_create(&pKey, NULL); for (iteration = 0; iteration < ITER_COUNT; iteration++) { if ((iteration % 100) == 0) { printf("\nStarting run series %i: ", iteration); } if ((iteration % 10) == 0) { printf("."); fflush(stdout); } // Clear array for (i = 0; i< THREAD_COUNT; i++) { finishedArray[i] = 0; } // Start threads for (i = 0; i< THREAD_COUNT; i++) { result = pthread_create(&threads[i], &attr, threadfunc, (void*)(size_t)i); if (result != 0) { perror("pthread_create"); exit(1); } } // Join threads for (i = 0; i< THREAD_COUNT; i++) { result = pthread_join(threads[i], NULL); if (result != 0) { perror("pthread_join"); exit(EXIT_FAILURE); } } // Spin waiting for results finished = 1; do { struct timespec req, rem; req.tv_sec = 0; req.tv_nsec = 10 * 1000 * 1000; nanosleep(&req, &rem); for (i = 0; i< THREAD_COUNT; i++) { if (finishedArray[i] != 1) { finished = 0; break; } } } while (!finished); } return EXIT_SUCCESS; } ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC]: fix for recycled thread ids 2004-03-19 18:44 ` Jeff Johnston @ 2004-03-19 19:01 ` Daniel Jacobowitz 2004-03-19 19:35 ` Jeff Johnston 0 siblings, 1 reply; 30+ messages in thread From: Daniel Jacobowitz @ 2004-03-19 19:01 UTC (permalink / raw) To: Jeff Johnston; +Cc: gdb-patches On Fri, Mar 19, 2004 at 01:44:19PM -0500, Jeff Johnston wrote: > Daniel Jacobowitz wrote: > >On Thu, Mar 18, 2004 at 07:36:25PM -0500, Jeff Johnston wrote: > > > >>The following patch fixes a problem when a user application creates a > >>thread shortly after another thread has completed. For nptl, thread ids > >>are addresses. If a thread completes/dies, the tid is available for reuse > >>by a new thread. > > > > > >Does NPTL re-use the TID quickly, or cycle around the way LT did so > >that we only see this under high thread pressure? > > > > I can't say for sure as I don't maintain libthread_db. The test case in > question does create high thread pressure, but I think it would be a > mistake to generalize and think that this couldn't happen in an existing > application. I know you don't maintain NPTL, but this is the sort of question that we need to understand before we can fix the problem correctly. I see that you've attached a testcase, so I'll take a look at it when I get back from my trip on Monday. > >>On RH9 and RHEL3, nptl threads do not have exit events associated with > >>them. I have already discussed this with Daniel J. who feels that the > >>kernels are not doing the right thing, but regardless, the current and > >>previous RH nptl kernels are behaving this way and gdb needs to handle > >>it. As such, when a new thread is created, if it is reusing the tid of a > >>previous thread that gdb hasn't figured out isn't around any more, gdb > >>ignores the create event and the new thread is not added. Ignoring the > >>event is done because it is possible for gdb to find out about the thread > >>before it's creation event is reported and so the create event can be > >>redundant information. > > > > > >What I haven't seen a good explanation of is what problem this causes. > >If a thread goes away, and then a new thread using the same ID is > >created, and then we stop, what do we lose besides the cosmetic fact > >that there is no [New Thread] message? Does anything go wrong? > > > >Also, I would like the issue of whether or not it is a kernel bug > >resolved before we discuss working around it in GDB. > > > > The problem is if a global signal is passed on to the inferior program when > there are threads we have not attached to, the process terminates. A > Ctrl-C is such a signal. In the example program, we only attach to the > first 100 threads and when the Ctrl-C is issued, we get: > > ptrace: No such process. > thread_db_get_info: cannot get thread info: generic error > > The end-user is cooked. OK. So what you're saying is, the problem is that we do not see that the new thread has been created, so we do not attach to it. Is that right? Conceptually, we attach to LWPs, not to threads. That suggests to me that the correct fix is to ask the LWP layer if the LWP is attached rather than looking it up in the thread list in the first place. We've already got an appropriate list of LWPs though we might need a new accessor. > Regarding resolving this issue as a kernel error, any fix for RHEL3 won't > get shipped until Update 3. I know of no scheduled update for RH9 and this > would not qualify as a security update. That's not what I said - I don't care whether an update is published for any particular vendor's product. I want us to understand whether we are working around a kernel bug or fixing an actual bug in GDB. That's another part of the problem that we need to understand in order to fix it correctly. As the author of the kernel code in question, I think that it's a kernel bug. Roland seemed to agree. > Would it make sense to rename thread-db.c to lin-thread-db.c? Probably not, but some explanatory comments may be in order. -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC]: fix for recycled thread ids 2004-03-19 19:01 ` Daniel Jacobowitz @ 2004-03-19 19:35 ` Jeff Johnston 2004-03-19 19:40 ` Daniel Jacobowitz 0 siblings, 1 reply; 30+ messages in thread From: Jeff Johnston @ 2004-03-19 19:35 UTC (permalink / raw) To: Daniel Jacobowitz; +Cc: gdb-patches Daniel Jacobowitz wrote: > On Fri, Mar 19, 2004 at 01:44:19PM -0500, Jeff Johnston wrote: > >>Daniel Jacobowitz wrote: >> >>>On Thu, Mar 18, 2004 at 07:36:25PM -0500, Jeff Johnston wrote: >>> >>> >>>>The following patch fixes a problem when a user application creates a >>>>thread shortly after another thread has completed. For nptl, thread ids >>>>are addresses. If a thread completes/dies, the tid is available for reuse >>>>by a new thread. >>> >>> >>>Does NPTL re-use the TID quickly, or cycle around the way LT did so >>>that we only see this under high thread pressure? >>> >> >>I can't say for sure as I don't maintain libthread_db. The test case in >>question does create high thread pressure, but I think it would be a >>mistake to generalize and think that this couldn't happen in an existing >>application. > > > I know you don't maintain NPTL, but this is the sort of question that > we need to understand before we can fix the problem correctly. I see > that you've attached a testcase, so I'll take a look at it when I get > back from my trip on Monday. > Ok, thanks. > >>>>On RH9 and RHEL3, nptl threads do not have exit events associated with >>>>them. I have already discussed this with Daniel J. who feels that the >>>>kernels are not doing the right thing, but regardless, the current and >>>>previous RH nptl kernels are behaving this way and gdb needs to handle >>>>it. As such, when a new thread is created, if it is reusing the tid of a >>>>previous thread that gdb hasn't figured out isn't around any more, gdb >>>>ignores the create event and the new thread is not added. Ignoring the >>>>event is done because it is possible for gdb to find out about the thread >>>>before it's creation event is reported and so the create event can be >>>>redundant information. >>> >>> >>>What I haven't seen a good explanation of is what problem this causes. >>>If a thread goes away, and then a new thread using the same ID is >>>created, and then we stop, what do we lose besides the cosmetic fact >>>that there is no [New Thread] message? Does anything go wrong? >>> >>>Also, I would like the issue of whether or not it is a kernel bug >>>resolved before we discuss working around it in GDB. >>> >> >>The problem is if a global signal is passed on to the inferior program when >>there are threads we have not attached to, the process terminates. A >>Ctrl-C is such a signal. In the example program, we only attach to the >>first 100 threads and when the Ctrl-C is issued, we get: >> >>ptrace: No such process. >>thread_db_get_info: cannot get thread info: generic error >> >>The end-user is cooked. > > > OK. So what you're saying is, the problem is that we do not see that > the new thread has been created, so we do not attach to it. Is that > right? > Yes. > Conceptually, we attach to LWPs, not to threads. That suggests to me > that the correct fix is to ask the LWP layer if the LWP is attached > rather than looking it up in the thread list in the first place. > We've already got an appropriate list of LWPs though we might need a > new accessor. > I like that idea. We still have to deal with the bogus thread list entry. The routine prune_threads calls thread_db_alive and it won't realize the thread info it has is bogus because it will find the tid is valid. > >>Regarding resolving this issue as a kernel error, any fix for RHEL3 won't >>get shipped until Update 3. I know of no scheduled update for RH9 and this >>would not qualify as a security update. > > > That's not what I said - I don't care whether an update is published > for any particular vendor's product. I want us to understand whether > we are working around a kernel bug or fixing an actual bug in GDB. > That's another part of the problem that we need to understand in order > to fix it correctly. > > As the author of the kernel code in question, I think that it's a > kernel bug. Roland seemed to agree. > > >>Would it make sense to rename thread-db.c to lin-thread-db.c? > > > Probably not, but some explanatory comments may be in order. > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC]: fix for recycled thread ids 2004-03-19 19:35 ` Jeff Johnston @ 2004-03-19 19:40 ` Daniel Jacobowitz 2004-03-19 21:32 ` Jeff Johnston 0 siblings, 1 reply; 30+ messages in thread From: Daniel Jacobowitz @ 2004-03-19 19:40 UTC (permalink / raw) To: Jeff Johnston; +Cc: gdb-patches On Fri, Mar 19, 2004 at 02:35:40PM -0500, Jeff Johnston wrote: > >Conceptually, we attach to LWPs, not to threads. That suggests to me > >that the correct fix is to ask the LWP layer if the LWP is attached > >rather than looking it up in the thread list in the first place. > >We've already got an appropriate list of LWPs though we might need a > >new accessor. > > > > I like that idea. We still have to deal with the bogus thread list entry. > The routine prune_threads calls thread_db_alive and it won't realize the > thread info it has is bogus because it will find the tid is valid. Do you think this will be a problem? My hope is that it will just look as if the thread has 'migrated' to a new LWP. -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC]: fix for recycled thread ids 2004-03-19 19:40 ` Daniel Jacobowitz @ 2004-03-19 21:32 ` Jeff Johnston 2004-03-24 15:51 ` Daniel Jacobowitz 0 siblings, 1 reply; 30+ messages in thread From: Jeff Johnston @ 2004-03-19 21:32 UTC (permalink / raw) To: Daniel Jacobowitz; +Cc: gdb-patches Daniel Jacobowitz wrote: > On Fri, Mar 19, 2004 at 02:35:40PM -0500, Jeff Johnston wrote: > >>>Conceptually, we attach to LWPs, not to threads. That suggests to me >>>that the correct fix is to ask the LWP layer if the LWP is attached >>>rather than looking it up in the thread list in the first place. >>>We've already got an appropriate list of LWPs though we might need a >>>new accessor. >>> >> >>I like that idea. We still have to deal with the bogus thread list entry. >>The routine prune_threads calls thread_db_alive and it won't realize the >>thread info it has is bogus because it will find the tid is valid. > > > Do you think this will be a problem? My hope is that it will just look > as if the thread has 'migrated' to a new LWP. > It will have invalid state associated with it. For example, the thread info has a prev_pc field. As to all the havoc that the state may or may not cause, I think it would be a very good idea to clean it up now. Who's to say what state will be added to thread_info in the future. -- Jeff J. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC]: fix for recycled thread ids 2004-03-19 21:32 ` Jeff Johnston @ 2004-03-24 15:51 ` Daniel Jacobowitz 2004-03-24 16:56 ` Daniel Jacobowitz 0 siblings, 1 reply; 30+ messages in thread From: Daniel Jacobowitz @ 2004-03-24 15:51 UTC (permalink / raw) To: Jeff Johnston; +Cc: gdb-patches On Fri, Mar 19, 2004 at 04:32:36PM -0500, Jeff Johnston wrote: > > > Daniel Jacobowitz wrote: > >On Fri, Mar 19, 2004 at 02:35:40PM -0500, Jeff Johnston wrote: > > > >>>Conceptually, we attach to LWPs, not to threads. That suggests to me > >>>that the correct fix is to ask the LWP layer if the LWP is attached > >>>rather than looking it up in the thread list in the first place. > >>>We've already got an appropriate list of LWPs though we might need a > >>>new accessor. > >>> > >> > >>I like that idea. We still have to deal with the bogus thread list > >>entry. The routine prune_threads calls thread_db_alive and it won't > >>realize the thread info it has is bogus because it will find the tid is > >>valid. > > > > > >Do you think this will be a problem? My hope is that it will just look > >as if the thread has 'migrated' to a new LWP. > > > > It will have invalid state associated with it. For example, the thread > info has a prev_pc field. As to all the havoc that the state may or may > not cause, I think it would be a very good idea to clean it up now. Who's > to say what state will be added to thread_info in the future. It turns out that this is not only a problem, but the whole problem. Could you run this test under GDB, on RHEL, using strace? Tell me whether or not you see WIFEXITED results for every thread as it exits. I was assuming you did not, but I can reproduce the misbehavior here even though I do get them. The problem is that we get the LWP death events, but we treat the threads and LWPs as completely independent sets. We never find out that the threads have died. We don't enable thread death event reporting, because in glibc 2.1.3 there was a bug in the death reporting which would cause the debugged program to segfault: #if 0 /* FIXME: kettenis/2000-04-23: The event reporting facility is broken for TD_DEATH events in glibc 2.1.3, so don't enable it for now. */ td_event_addset (&events, TD_DEATH); #endif Fortunately, in <gnu/libc-version.h> there is a function to return the runtime version of glibc. We should be able to use that - and the not 100% valid, but generally valid and already assumed by thread_db, assumption that a native GDB, when used to debug native programs, is debugging the same version of the C library - to enable TD_DEATH when it is safe to do so. This will let us detach the threads when they die. That has its own risks, since the thread continues to run for a short while after the death event is reported. For instance, in NPTL the thread reports the event and then cleans up after itself; in LT I don't remember whether the manager or the thread does this, but I think it's the same. I already wrote limited code to handle this, if you search for "zombie" in thread-db.c, so it should be OK. The gist is that we remove it from the thread list right away, but do not detach the thread. We resist attaching to zombies. At least, all that is how it looks to me. I'll experiment with TD_DEATH before I speculate further. -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC]: fix for recycled thread ids 2004-03-24 15:51 ` Daniel Jacobowitz @ 2004-03-24 16:56 ` Daniel Jacobowitz 2004-03-24 21:56 ` Jeff Johnston 0 siblings, 1 reply; 30+ messages in thread From: Daniel Jacobowitz @ 2004-03-24 16:56 UTC (permalink / raw) To: Jeff Johnston, gdb-patches On Wed, Mar 24, 2004 at 10:51:30AM -0500, Daniel Jacobowitz wrote: > Fortunately, in <gnu/libc-version.h> there is a function to return the > runtime version of glibc. We should be able to use that - and the not > 100% valid, but generally valid and already assumed by thread_db, > assumption that a native GDB, when used to debug native programs, is > debugging the same version of the C library - to enable TD_DEATH when > it is safe to do so. This will let us detach the threads when they > die. > > That has its own risks, since the thread continues to run for a short > while after the death event is reported. For instance, in NPTL the > thread reports the event and then cleans up after itself; in LT I don't > remember whether the manager or the thread does this, but I think it's > the same. I already wrote limited code to handle this, if you search > for "zombie" in thread-db.c, so it should be OK. The gist is that we > remove it from the thread list right away, but do not detach the > thread. We resist attaching to zombies. > > At least, all that is how it looks to me. I'll experiment with > TD_DEATH before I speculate further. Here's a proof-of-concept patch. It is not quite right, in that now I can hit C-c about four times before I get the error(). It does illustrate what I was suggesting, though, minus the version checking. The reason we still get the segfaults is not that surprising. I spent a little while coaxing it into producing a core dump, and got this: (gdb) i threads 7 process 10303 0x40034531 in __nptl_create_event () from /lib/tls/i686/cmov/libpthread.so.0 6 process 12467 0xffffe410 in __kernel_vsyscall () 5 process 12468 0xffffe410 in __kernel_vsyscall () 4 process 12469 0x0804884e in threadfunc (arg=0x25) at lphello.c:64 3 process 12470 0x40034540 in __nptl_death_event () from /lib/tls/i686/cmov/libpthread.so.0 2 process 12471 0xffffe410 in __kernel_vsyscall () * 1 process 12472 0xffffe410 in __kernel_vsyscall () So one thread is running, four are probably sleeping, one is dying... and one is creating another thread. I'd bet money that we aren't attached to the one being created yet. I can produce all sorts of other interesting GDB internal errors this way, too. Does this patch fix the problems you were seeing with less pathological test cases? Eclipse, in this case, is a less pathological test case, since it creates threads that actually do a bit of work. If it does, I'll add the necessary glibc version checking. The only way to fix the race on the create side is to add PTRACE_EVENT_CLONE support (this _is_ what it was for), but that will require some major surgery. Index: thread-db.c =================================================================== RCS file: /cvs/src/src/gdb/thread-db.c,v retrieving revision 1.37 diff -u -p -r1.37 thread-db.c --- thread-db.c 29 Feb 2004 02:39:47 -0000 1.37 +++ thread-db.c 24 Mar 2004 16:47:23 -0000 @@ -1,6 +1,6 @@ /* libthread_db assisted debugging support, generic parts. - Copyright 1999, 2000, 2001, 2003 Free Software Foundation, Inc. + Copyright 1999, 2000, 2001, 2003, 2004 Free Software Foundation, Inc. This file is part of GDB. @@ -130,6 +130,7 @@ static CORE_ADDR td_death_bp_addr; static void thread_db_find_new_threads (void); static void attach_thread (ptid_t ptid, const td_thrhandle_t *th_p, const td_thrinfo_t *ti_p, int verbose); +static void detach_thread (ptid_t ptid, int verbose); \f /* Building process ids. */ @@ -154,6 +155,8 @@ struct private_thread_info unsigned int th_valid:1; unsigned int ti_valid:1; + unsigned int dying:1; + td_thrhandle_t th; td_thrinfo_t ti; }; @@ -501,7 +504,7 @@ enable_thread_event_reporting (void) /* Set the process wide mask saying which events we're interested in. */ td_event_emptyset (&events); td_event_addset (&events, TD_CREATE); -#if 0 +#if 1 /* FIXME: kettenis/2000-04-23: The event reporting facility is broken for TD_DEATH events in glibc 2.1.3, so don't enable it for now. */ @@ -696,6 +699,20 @@ attach_thread (ptid_t ptid, const td_thr struct thread_info *tp; td_err_e err; + /* We may already know about this thread, for instance when the + user has issued the `info threads' command before the SIGTRAP + for hitting the thread creation breakpoint was reported. */ + if (in_thread_list (ptid)) + { + tp = find_thread_pid (ptid); + gdb_assert (tp != NULL); + + if (!tp->private->dying) + return; + + delete_thread (ptid); + } + check_thread_signals (); /* Add the thread to GDB's thread list. */ @@ -741,8 +758,16 @@ thread_db_attach (char *args, int from_t static void detach_thread (ptid_t ptid, int verbose) { + struct thread_info *thread_info; + if (verbose) printf_unfiltered ("[%s exited]\n", target_pid_to_str (ptid)); + + /* Don't delete the thread now, because it still reports as active until + it has executed a few instructions after the event breakpoint. */ + thread_info = find_thread_pid (ptid); + gdb_assert (thread_info != NULL); + thread_info->private->dying = 1; } static void @@ -848,11 +873,7 @@ check_event (ptid_t ptid) { case TD_CREATE: - /* We may already know about this thread, for instance when the - user has issued the `info threads' command before the SIGTRAP - for hitting the thread creation breakpoint was reported. */ - if (!in_thread_list (ptid)) - attach_thread (ptid, msg.th_p, &ti, 1); + attach_thread (ptid, msg.th_p, &ti, 1); break; @@ -1121,8 +1142,7 @@ find_new_threads_callback (const td_thrh ptid = BUILD_THREAD (ti.ti_tid, GET_PID (inferior_ptid)); - if (!in_thread_list (ptid)) - attach_thread (ptid, th_p, &ti, 1); + attach_thread (ptid, th_p, &ti, 1); return 0; } -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC]: fix for recycled thread ids 2004-03-24 16:56 ` Daniel Jacobowitz @ 2004-03-24 21:56 ` Jeff Johnston 2004-03-25 0:46 ` Jeff Johnston 0 siblings, 1 reply; 30+ messages in thread From: Jeff Johnston @ 2004-03-24 21:56 UTC (permalink / raw) To: Daniel Jacobowitz; +Cc: gdb-patches Daniel Jacobowitz wrote: > On Wed, Mar 24, 2004 at 10:51:30AM -0500, Daniel Jacobowitz wrote: > >>Fortunately, in <gnu/libc-version.h> there is a function to return the >>runtime version of glibc. We should be able to use that - and the not >>100% valid, but generally valid and already assumed by thread_db, >>assumption that a native GDB, when used to debug native programs, is >>debugging the same version of the C library - to enable TD_DEATH when >>it is safe to do so. This will let us detach the threads when they >>die. >> >>That has its own risks, since the thread continues to run for a short >>while after the death event is reported. For instance, in NPTL the >>thread reports the event and then cleans up after itself; in LT I don't >>remember whether the manager or the thread does this, but I think it's >>the same. I already wrote limited code to handle this, if you search >>for "zombie" in thread-db.c, so it should be OK. The gist is that we >>remove it from the thread list right away, but do not detach the >>thread. We resist attaching to zombies. >> >>At least, all that is how it looks to me. I'll experiment with >>TD_DEATH before I speculate further. > > > Here's a proof-of-concept patch. It is not quite right, in that now I > can hit C-c about four times before I get the error(). It does > illustrate what I was suggesting, though, minus the version checking. > > The reason we still get the segfaults is not that surprising. I spent > a little while coaxing it into producing a core dump, and got this: > > (gdb) i threads > 7 process 10303 0x40034531 in __nptl_create_event () from /lib/tls/i686/cmov/libpthread.so.0 > 6 process 12467 0xffffe410 in __kernel_vsyscall () > 5 process 12468 0xffffe410 in __kernel_vsyscall () > 4 process 12469 0x0804884e in threadfunc (arg=0x25) at lphello.c:64 > 3 process 12470 0x40034540 in __nptl_death_event () from /lib/tls/i686/cmov/libpthread.so.0 > 2 process 12471 0xffffe410 in __kernel_vsyscall () > * 1 process 12472 0xffffe410 in __kernel_vsyscall () > > So one thread is running, four are probably sleeping, one is dying... > and one is creating another thread. I'd bet money that we aren't > attached to the one being created yet. I can produce all sorts of > other interesting GDB internal errors this way, too. > I also tried the TD_DEATH method but gave up after running into segfaults. I have to go back and recheck whether those segfaults could occur without the Crtl-C being issued. I can't remember off-hand. > Does this patch fix the problems you were seeing with less pathological > test cases? Eclipse, in this case, is a less pathological test case, > since it creates threads that actually do a bit of work. If it does, > I'll add the necessary glibc version checking. The only way to fix the > race on the create side is to add PTRACE_EVENT_CLONE support (this _is_ > what it was for), but that will require some major surgery. > As you know, I am also looking at the PTRACE_EVENT_CLONE stuff. I'll try this patch out with the Eclipse test. I also will include your lin-lwp linux event handler fix to see if that gets past the fork problem they were seeing. > Index: thread-db.c > =================================================================== > RCS file: /cvs/src/src/gdb/thread-db.c,v > retrieving revision 1.37 > diff -u -p -r1.37 thread-db.c > --- thread-db.c 29 Feb 2004 02:39:47 -0000 1.37 > +++ thread-db.c 24 Mar 2004 16:47:23 -0000 > @@ -1,6 +1,6 @@ > /* libthread_db assisted debugging support, generic parts. > > - Copyright 1999, 2000, 2001, 2003 Free Software Foundation, Inc. > + Copyright 1999, 2000, 2001, 2003, 2004 Free Software Foundation, Inc. > > This file is part of GDB. > > @@ -130,6 +130,7 @@ static CORE_ADDR td_death_bp_addr; > static void thread_db_find_new_threads (void); > static void attach_thread (ptid_t ptid, const td_thrhandle_t *th_p, > const td_thrinfo_t *ti_p, int verbose); > +static void detach_thread (ptid_t ptid, int verbose); > \f > > /* Building process ids. */ > @@ -154,6 +155,8 @@ struct private_thread_info > unsigned int th_valid:1; > unsigned int ti_valid:1; > > + unsigned int dying:1; > + > td_thrhandle_t th; > td_thrinfo_t ti; > }; > @@ -501,7 +504,7 @@ enable_thread_event_reporting (void) > /* Set the process wide mask saying which events we're interested in. */ > td_event_emptyset (&events); > td_event_addset (&events, TD_CREATE); > -#if 0 > +#if 1 > /* FIXME: kettenis/2000-04-23: The event reporting facility is > broken for TD_DEATH events in glibc 2.1.3, so don't enable it for > now. */ > @@ -696,6 +699,20 @@ attach_thread (ptid_t ptid, const td_thr > struct thread_info *tp; > td_err_e err; > > + /* We may already know about this thread, for instance when the > + user has issued the `info threads' command before the SIGTRAP > + for hitting the thread creation breakpoint was reported. */ > + if (in_thread_list (ptid)) > + { > + tp = find_thread_pid (ptid); > + gdb_assert (tp != NULL); > + > + if (!tp->private->dying) > + return; > + > + delete_thread (ptid); > + } > + > check_thread_signals (); > > /* Add the thread to GDB's thread list. */ > @@ -741,8 +758,16 @@ thread_db_attach (char *args, int from_t > static void > detach_thread (ptid_t ptid, int verbose) > { > + struct thread_info *thread_info; > + > if (verbose) > printf_unfiltered ("[%s exited]\n", target_pid_to_str (ptid)); > + > + /* Don't delete the thread now, because it still reports as active until > + it has executed a few instructions after the event breakpoint. */ > + thread_info = find_thread_pid (ptid); > + gdb_assert (thread_info != NULL); > + thread_info->private->dying = 1; > } > > static void > @@ -848,11 +873,7 @@ check_event (ptid_t ptid) > { > case TD_CREATE: > > - /* We may already know about this thread, for instance when the > - user has issued the `info threads' command before the SIGTRAP > - for hitting the thread creation breakpoint was reported. */ > - if (!in_thread_list (ptid)) > - attach_thread (ptid, msg.th_p, &ti, 1); > + attach_thread (ptid, msg.th_p, &ti, 1); > > break; > > @@ -1121,8 +1142,7 @@ find_new_threads_callback (const td_thrh > > ptid = BUILD_THREAD (ti.ti_tid, GET_PID (inferior_ptid)); > > - if (!in_thread_list (ptid)) > - attach_thread (ptid, th_p, &ti, 1); > + attach_thread (ptid, th_p, &ti, 1); > > return 0; > } > > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC]: fix for recycled thread ids 2004-03-24 21:56 ` Jeff Johnston @ 2004-03-25 0:46 ` Jeff Johnston 2004-03-25 4:39 ` Daniel Jacobowitz 0 siblings, 1 reply; 30+ messages in thread From: Jeff Johnston @ 2004-03-25 0:46 UTC (permalink / raw) To: Jeff Johnston; +Cc: Daniel Jacobowitz, gdb-patches Jeff Johnston wrote: > Daniel Jacobowitz wrote: > >> On Wed, Mar 24, 2004 at 10:51:30AM -0500, Daniel Jacobowitz wrote: >> >>> Fortunately, in <gnu/libc-version.h> there is a function to return the >>> runtime version of glibc. We should be able to use that - and the not >>> 100% valid, but generally valid and already assumed by thread_db, >>> assumption that a native GDB, when used to debug native programs, is >>> debugging the same version of the C library - to enable TD_DEATH when >>> it is safe to do so. This will let us detach the threads when they >>> die. >>> >>> That has its own risks, since the thread continues to run for a short >>> while after the death event is reported. For instance, in NPTL the >>> thread reports the event and then cleans up after itself; in LT I don't >>> remember whether the manager or the thread does this, but I think it's >>> the same. I already wrote limited code to handle this, if you search >>> for "zombie" in thread-db.c, so it should be OK. The gist is that we >>> remove it from the thread list right away, but do not detach the >>> thread. We resist attaching to zombies. >>> >>> At least, all that is how it looks to me. I'll experiment with >>> TD_DEATH before I speculate further. >> >> >> >> Here's a proof-of-concept patch. It is not quite right, in that now I >> can hit C-c about four times before I get the error(). It does >> illustrate what I was suggesting, though, minus the version checking. >> >> The reason we still get the segfaults is not that surprising. I spent >> a little while coaxing it into producing a core dump, and got this: >> >> (gdb) i threads >> 7 process 10303 0x40034531 in __nptl_create_event () from >> /lib/tls/i686/cmov/libpthread.so.0 >> 6 process 12467 0xffffe410 in __kernel_vsyscall () >> 5 process 12468 0xffffe410 in __kernel_vsyscall () >> 4 process 12469 0x0804884e in threadfunc (arg=0x25) at lphello.c:64 >> 3 process 12470 0x40034540 in __nptl_death_event () from >> /lib/tls/i686/cmov/libpthread.so.0 >> 2 process 12471 0xffffe410 in __kernel_vsyscall () >> * 1 process 12472 0xffffe410 in __kernel_vsyscall () >> >> So one thread is running, four are probably sleeping, one is dying... >> and one is creating another thread. I'd bet money that we aren't >> attached to the one being created yet. I can produce all sorts of >> other interesting GDB internal errors this way, too. >> > > I also tried the TD_DEATH method but gave up after running into > segfaults. I have to go back and recheck whether those segfaults could > occur without the Crtl-C being issued. I can't remember off-hand. > Just to confirm, I don't see any segfaults just running, but this code is extremely brittle on my RHEL3-U1 system. It is coming back to me why I abandoned TD_DEATH. On the first Ctrl-C I usually get: Program received signal SIGINT, Interrupt. [Switching to Thread -1226028112 (LWP 16580)] 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 (gdb) info threads [New Thread -1226163280 (LWP 16579)] Can't attach LWP 16579: Operation not permitted (gdb) info threads [New Thread -1226298448 (LWP 16578)] Can't attach LWP 16578: Operation not permitted (gdb) info threads 158 Thread -1226298448 (LWP 16578) 0xb75ce6b1 in __nptl_death_event () from /lib/tls/libpthread.so.0 157 Thread -1226163280 (LWP 16579) 0xb75ce6b1 in __nptl_death_event () from /lib/tls/libpthread.so.0 * 156 Thread -1226028112 (LWP 16580) 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 1 Thread -1218598080 (LWP 16423) 0xb75ce6a1 in __nptl_create_event () from /lib/tls/libpthread.so.0 (gdb) c After continuing, the program will go shortly and then die. [Thread -1226163280 (zombie) exited] [New Thread -1226433616 (LWP 16674)] Cannot enable thread event reporting for Thread -1226433616 (LWP 16674): generic error My original fix doesn't break near as easily on RHEL-U1. Hitting enough Ctrl-C's does eventually trigger an error (occassionally even a gdb assert), but this may be part of catching the race condition before we know about the thread. Now, it is more than likely just a bug in the TD_DEATH event processing because we haven't exercised it. Would it be worth looking at implementing your original alternative for my patch? You asked me in a previous note to run with strace on and confirm that I was not getting WIFEXITTED for the dying threads. I confirmed this. I only get one WIFEXITTED and that is for the main thread. I still need to do the Eclipse test. -- Jeff J. >> Does this patch fix the problems you were seeing with less pathological >> test cases? Eclipse, in this case, is a less pathological test case, >> since it creates threads that actually do a bit of work. If it does, >> I'll add the necessary glibc version checking. The only way to fix the >> race on the create side is to add PTRACE_EVENT_CLONE support (this _is_ >> what it was for), but that will require some major surgery. >> > > As you know, I am also looking at the PTRACE_EVENT_CLONE stuff. I'll > try this patch out with the Eclipse test. I also will include your > lin-lwp linux event handler fix to see if that gets past the fork > problem they were seeing. > >> Index: thread-db.c >> =================================================================== >> RCS file: /cvs/src/src/gdb/thread-db.c,v >> retrieving revision 1.37 >> diff -u -p -r1.37 thread-db.c >> --- thread-db.c 29 Feb 2004 02:39:47 -0000 1.37 >> +++ thread-db.c 24 Mar 2004 16:47:23 -0000 >> @@ -1,6 +1,6 @@ >> /* libthread_db assisted debugging support, generic parts. >> >> - Copyright 1999, 2000, 2001, 2003 Free Software Foundation, Inc. >> + Copyright 1999, 2000, 2001, 2003, 2004 Free Software Foundation, Inc. >> >> This file is part of GDB. >> >> @@ -130,6 +130,7 @@ static CORE_ADDR td_death_bp_addr; >> static void thread_db_find_new_threads (void); >> static void attach_thread (ptid_t ptid, const td_thrhandle_t *th_p, >> const td_thrinfo_t *ti_p, int verbose); >> +static void detach_thread (ptid_t ptid, int verbose); >> \f >> >> /* Building process ids. */ >> @@ -154,6 +155,8 @@ struct private_thread_info >> unsigned int th_valid:1; >> unsigned int ti_valid:1; >> >> + unsigned int dying:1; >> + >> td_thrhandle_t th; >> td_thrinfo_t ti; >> }; >> @@ -501,7 +504,7 @@ enable_thread_event_reporting (void) >> /* Set the process wide mask saying which events we're interested >> in. */ >> td_event_emptyset (&events); >> td_event_addset (&events, TD_CREATE); >> -#if 0 >> +#if 1 >> /* FIXME: kettenis/2000-04-23: The event reporting facility is >> broken for TD_DEATH events in glibc 2.1.3, so don't enable it for >> now. */ >> @@ -696,6 +699,20 @@ attach_thread (ptid_t ptid, const td_thr >> struct thread_info *tp; >> td_err_e err; >> >> + /* We may already know about this thread, for instance when the >> + user has issued the `info threads' command before the SIGTRAP >> + for hitting the thread creation breakpoint was reported. */ >> + if (in_thread_list (ptid)) >> + { >> + tp = find_thread_pid (ptid); >> + gdb_assert (tp != NULL); >> + >> + if (!tp->private->dying) >> + return; >> + >> + delete_thread (ptid); >> + } >> + >> check_thread_signals (); >> >> /* Add the thread to GDB's thread list. */ >> @@ -741,8 +758,16 @@ thread_db_attach (char *args, int from_t >> static void >> detach_thread (ptid_t ptid, int verbose) >> { >> + struct thread_info *thread_info; >> + >> if (verbose) >> printf_unfiltered ("[%s exited]\n", target_pid_to_str (ptid)); >> + >> + /* Don't delete the thread now, because it still reports as active >> until >> + it has executed a few instructions after the event breakpoint. */ >> + thread_info = find_thread_pid (ptid); >> + gdb_assert (thread_info != NULL); >> + thread_info->private->dying = 1; >> } >> >> static void >> @@ -848,11 +873,7 @@ check_event (ptid_t ptid) >> { >> case TD_CREATE: >> >> - /* We may already know about this thread, for instance when the >> - user has issued the `info threads' command before the SIGTRAP >> - for hitting the thread creation breakpoint was reported. */ >> - if (!in_thread_list (ptid)) >> - attach_thread (ptid, msg.th_p, &ti, 1); >> + attach_thread (ptid, msg.th_p, &ti, 1); >> >> break; >> >> @@ -1121,8 +1142,7 @@ find_new_threads_callback (const td_thrh >> >> ptid = BUILD_THREAD (ti.ti_tid, GET_PID (inferior_ptid)); >> >> - if (!in_thread_list (ptid)) >> - attach_thread (ptid, th_p, &ti, 1); >> + attach_thread (ptid, th_p, &ti, 1); >> >> return 0; >> } >> >> > > ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC]: fix for recycled thread ids 2004-03-25 0:46 ` Jeff Johnston @ 2004-03-25 4:39 ` Daniel Jacobowitz 2004-03-25 16:34 ` Daniel Jacobowitz 0 siblings, 1 reply; 30+ messages in thread From: Daniel Jacobowitz @ 2004-03-25 4:39 UTC (permalink / raw) To: Jeff Johnston; +Cc: gdb-patches On Wed, Mar 24, 2004 at 07:46:39PM -0500, Jeff Johnston wrote: > Jeff Johnston wrote: > >I also tried the TD_DEATH method but gave up after running into > >segfaults. I have to go back and recheck whether those segfaults could > >occur without the Crtl-C being issued. I can't remember off-hand. > > > > Just to confirm, I don't see any segfaults just running, but this code is > extremely brittle on my RHEL3-U1 system. It is coming back to me why I > abandoned TD_DEATH. > > On the first Ctrl-C I usually get: > > Program received signal SIGINT, Interrupt. > [Switching to Thread -1226028112 (LWP 16580)] > 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 > (gdb) info threads > [New Thread -1226163280 (LWP 16579)] > Can't attach LWP 16579: Operation not permitted > (gdb) info threads > [New Thread -1226298448 (LWP 16578)] > Can't attach LWP 16578: Operation not permitted The usual cause of this message is attempting to attach to a thread we're already attached to; on earlier versions of the patch I posted I saw this error plenty often. Notice that both that TID and LWP ID are already in the thread list and trying to die: > (gdb) info threads > 158 Thread -1226298448 (LWP 16578) 0xb75ce6b1 in __nptl_death_event () > from /lib/tls/libpthread.so.0 > 157 Thread -1226163280 (LWP 16579) 0xb75ce6b1 in __nptl_death_event () > from /lib/tls/libpthread.so.0 > * 156 Thread -1226028112 (LWP 16580) 0xb75ebc32 in _dl_sysinfo_int80 () > from /lib/ld-linux.so.2 > 1 Thread -1218598080 (LWP 16423) 0xb75ce6a1 in __nptl_create_event () > from /lib/tls/libpthread.so.0 > (gdb) c > > After continuing, the program will go shortly and then die. > > [Thread -1226163280 (zombie) exited] > [New Thread -1226433616 (LWP 16674)] > Cannot enable thread event reporting for Thread -1226433616 (LWP 16674): > generic error > > My original fix doesn't break near as easily on RHEL-U1. Hitting enough > Ctrl-C's does eventually trigger an error (occassionally even a gdb > assert), but this may be part of catching the race condition before we know > about the thread. > Now, it is more than likely just a bug in the TD_DEATH event processing > because we haven't exercised it. Would it be worth looking at implementing > your original alternative for my patch? I assume you're using the same test you posted, here. Hmm, upon thinking about this problem, it's because you used info threads after stopping and I didn't think to try that. We end up removing and re-adding the thread several times. We'd really like to delay removing the thread until we see a new create message for it, at this point. I see several ways to massage the code for that effect; it's not the clearest, but for testing purposes we can revert the last bit of the patch I sent you. This bit: @@ -1121,8 +1142,7 @@ find_new_threads_callback (const td_thrh ptid = BUILD_THREAD (ti.ti_tid, GET_PID (inferior_ptid)); - if (!in_thread_list (ptid)) - attach_thread (ptid, th_p, &ti, 1); + attach_thread (ptid, th_p, &ti, 1); return 0; } After that we hit some other problems with GDB's book-keeping, though I suspect they are related to the create race. It's hard to tell. The process tends to zombie unexpectedly but that's the only error I've seen with the bit above reverted. Do you have any code for PTRACE_EVENT_CLONE yet, or should I put something together in the morning to verify whether that's the problem? > You asked me in a previous note to run with strace on and confirm that I > was not getting WIFEXITTED for the dying threads. I confirmed this. I > only get one WIFEXITTED and that is for the main thread. OK, so we can't count on the thread exits. You might be able to enable PTRACE_EVENT_EXIT to work around this, if RHEL3 has that - but one needs to be careful with that or you can really confuse GDB. Anyway, we don't need that for now. -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC]: fix for recycled thread ids 2004-03-25 4:39 ` Daniel Jacobowitz @ 2004-03-25 16:34 ` Daniel Jacobowitz 2004-03-25 20:22 ` Jeff Johnston 0 siblings, 1 reply; 30+ messages in thread From: Daniel Jacobowitz @ 2004-03-25 16:34 UTC (permalink / raw) To: Jeff Johnston, gdb-patches On Wed, Mar 24, 2004 at 11:39:50PM -0500, Daniel Jacobowitz wrote: > Do you have any code for PTRACE_EVENT_CLONE yet, or should I put > something together in the morning to verify whether that's the problem? Here you go. Again, this patch is obviously not ready to go into GDB, but I have not been able to make it misbehave yet. I don't know if all the bits it needs work right in RHEL3, or if my testing was conclusive. The highlights: - Includes most of the previous patch - Uses PTRACE_EVENT_CLONE to attach to new threads - Moves handling of events closer to the waitpid call There are some potential races but I haven't hit any of them in practice. I suspect that with a heavy fork or vfork load (not clone) you could produce interesting failure modes. Give it a try, please. If it works I'll clean it up. -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer Index: lin-lwp.c =================================================================== RCS file: /cvs/src/src/gdb/lin-lwp.c,v retrieving revision 1.53 diff -u -p -r1.53 lin-lwp.c --- lin-lwp.c 22 Mar 2004 20:18:33 -0000 1.53 +++ lin-lwp.c 25 Mar 2004 16:31:30 -0000 @@ -183,6 +183,8 @@ add_lwp (ptid_t ptid) memset (lp, 0, sizeof (struct lwp_info)); + lp->waitstatus.kind = TARGET_WAITKIND_IGNORE; + lp->ptid = ptid; lp->next = lwp_list; @@ -278,7 +280,7 @@ lin_lwp_open (char *args, int from_tty) void lin_lwp_attach_lwp (ptid_t ptid, int verbose) { - struct lwp_info *lp; + struct lwp_info *lp, *found_lp; gdb_assert (is_lwp (ptid)); @@ -293,13 +295,13 @@ lin_lwp_attach_lwp (ptid_t ptid, int ver if (verbose) printf_filtered ("[New %s]\n", target_pid_to_str (ptid)); - lp = find_lwp_pid (ptid); + found_lp = lp = find_lwp_pid (ptid); if (lp == NULL) lp = add_lwp (ptid); /* We assume that we're already attached to any LWP that has an id equal to the overall process id. */ - if (GET_LWP (ptid) != GET_PID (ptid)) + if (GET_LWP (ptid) != GET_PID (ptid) && found_lp == NULL) { pid_t pid; int status; @@ -658,6 +660,30 @@ wait_lwp (struct lwp_info *lp) gdb_assert (WIFSTOPPED (status)); + /* Handle GNU/Linux's extended waitstatus for trace events. */ + if (WIFSTOPPED (status) && WSTOPSIG (status) == SIGTRAP && status >> 16 != 0) + { + fprintf_unfiltered (gdb_stdlog, + "WL: Handling extended status 0x%06x\n", + status); + linux_handle_extended_wait (GET_LWP (lp->ptid), status, + &lp->waitstatus); + if (lp->waitstatus.kind == TARGET_WAITKIND_SPURIOUS) + { + struct lwp_info *new_lp; + new_lp = add_lwp (BUILD_LWP (lp->waitstatus.value.related_pid, + GET_PID (inferior_ptid))); + new_lp->cloned = 1; + new_lp->stopped = 1; + lp->waitstatus.kind = TARGET_WAITKIND_IGNORE; + fprintf_unfiltered (gdb_stdlog, + "WL: Got clone event from LWP %d, resuming\n", + GET_LWP (lp->ptid)); + ptrace (PTRACE_CONT, GET_LWP (lp->ptid), 0, 0); + return wait_lwp (lp); + } + } + return status; } @@ -1161,7 +1187,14 @@ child_wait (ptid_t ptid, struct target_w /* Handle GNU/Linux's extended waitstatus for trace events. */ if (WIFSTOPPED (status) && WSTOPSIG (status) == SIGTRAP && status >> 16 != 0) - return linux_handle_extended_wait (pid, status, ourstatus); + { + ptid_t wait_ptid; + wait_ptid = linux_handle_extended_wait (pid, status, ourstatus); + if (ourstatus->kind == TARGET_WAITKIND_SPURIOUS) + ptrace (PTRACE_DETACH, ourstatus->value.related_pid, + 0, 0); + return wait_ptid; + } store_waitstatus (ourstatus, status); return pid_to_ptid (pid); @@ -1371,6 +1404,31 @@ retry: } } + /* Handle GNU/Linux's extended waitstatus for trace events. */ + if (WIFSTOPPED (status) && WSTOPSIG (status) == SIGTRAP && status >> 16 != 0) + { + fprintf_unfiltered (gdb_stdlog, + "LLW: Handling extended status 0x%06x\n", + status); + linux_handle_extended_wait (GET_LWP (lp->ptid), status, + &lp->waitstatus); + if (lp->waitstatus.kind == TARGET_WAITKIND_SPURIOUS) + { + struct lwp_info *new_lp; + new_lp = add_lwp (BUILD_LWP (lp->waitstatus.value.related_pid, + GET_PID (inferior_ptid))); + new_lp->cloned = 1; + new_lp->stopped = 1; + lp->waitstatus.kind = TARGET_WAITKIND_IGNORE; + fprintf_unfiltered (gdb_stdlog, + "LLW: Got clone event from LWP %d, resuming\n", + GET_LWP (lp->ptid)); + ptrace (PTRACE_CONT, GET_LWP (lp->ptid), 0, 0); + status = 0; + continue; + } + } + /* Check if the thread has exited. */ if ((WIFEXITED (status) || WIFSIGNALED (status)) && num_lwps > 1) { @@ -1588,14 +1646,14 @@ retry: else trap_ptid = null_ptid; - /* Handle GNU/Linux's extended waitstatus for trace events. */ - if (WIFSTOPPED (status) && WSTOPSIG (status) == SIGTRAP && status >> 16 != 0) + if (lp->waitstatus.kind != TARGET_WAITKIND_IGNORE) { - linux_handle_extended_wait (GET_LWP (lp->ptid), status, ourstatus); - return trap_ptid; + *ourstatus = lp->waitstatus; + lp->waitstatus.kind = TARGET_WAITKIND_IGNORE; } + else + store_waitstatus (ourstatus, status); - store_waitstatus (ourstatus, status); return (threaded ? lp->ptid : pid_to_ptid (GET_LWP (lp->ptid))); } Index: linux-nat.c =================================================================== RCS file: /cvs/src/src/gdb/linux-nat.c,v retrieving revision 1.5 diff -u -p -r1.5 linux-nat.c --- linux-nat.c 17 Aug 2003 20:17:02 -0000 1.5 +++ linux-nat.c 25 Mar 2004 16:31:30 -0000 @@ -224,7 +224,8 @@ linux_enable_event_reporting (ptid_t pti if (! linux_supports_tracefork ()) return; - options = PTRACE_O_TRACEFORK | PTRACE_O_TRACEVFORK | PTRACE_O_TRACEEXEC; + options = PTRACE_O_TRACEFORK | PTRACE_O_TRACEVFORK | PTRACE_O_TRACEEXEC + | PTRACE_O_TRACECLONE; if (linux_supports_tracevforkdone ()) options |= PTRACE_O_TRACEVFORKDONE; @@ -391,11 +392,8 @@ linux_handle_extended_wait (int pid, int { int event = status >> 16; - if (event == PTRACE_EVENT_CLONE) - internal_error (__FILE__, __LINE__, - "unexpected clone event"); - - if (event == PTRACE_EVENT_FORK || event == PTRACE_EVENT_VFORK) + if (event == PTRACE_EVENT_FORK || event == PTRACE_EVENT_VFORK + || event == PTRACE_EVENT_CLONE) { unsigned long new_pid; int ret; @@ -406,12 +404,10 @@ linux_handle_extended_wait (int pid, int if (! pull_pid_from_list (&stopped_pids, new_pid)) { /* The new child has a pending SIGSTOP. We can't affect it until it - hits the SIGSTOP, but we're already attached. - - It won't be a clone (we didn't ask for clones in the event mask) - so we can just call waitpid and wait for the SIGSTOP. */ + hits the SIGSTOP, but we're already attached. */ do { - ret = waitpid (new_pid, &status, 0); + ret = waitpid (new_pid, &status, + (event == PTRACE_EVENT_CLONE) ? __WCLONE : 0); } while (ret == -1 && errno == EINTR); if (ret == -1) perror_with_name ("waiting for new child"); @@ -423,8 +419,13 @@ linux_handle_extended_wait (int pid, int "wait returned unexpected status 0x%x", status); } - ourstatus->kind = (event == PTRACE_EVENT_FORK) - ? TARGET_WAITKIND_FORKED : TARGET_WAITKIND_VFORKED; + if (event == PTRACE_EVENT_FORK) + ourstatus->kind = TARGET_WAITKIND_FORKED; + else if (event == PTRACE_EVENT_VFORK) + ourstatus->kind = TARGET_WAITKIND_VFORKED; + else + ourstatus->kind = TARGET_WAITKIND_SPURIOUS; + ourstatus->value.related_pid = new_pid; return inferior_ptid; } Index: linux-nat.h =================================================================== RCS file: /cvs/src/src/gdb/linux-nat.h,v retrieving revision 1.5 diff -u -p -r1.5 linux-nat.h --- linux-nat.h 14 Sep 2003 02:04:44 -0000 1.5 +++ linux-nat.h 25 Mar 2004 16:31:30 -0000 @@ -1,5 +1,5 @@ /* Native debugging support for GNU/Linux (LWP layer). - Copyright 2000, 2001, 2002, 2003 Free Software Foundation, Inc. + Copyright 2000, 2001, 2002, 2003, 2004 Free Software Foundation, Inc. This file is part of GDB. @@ -18,6 +18,8 @@ Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ +#include "target.h" + /* Structure describing an LWP. */ struct lwp_info @@ -52,6 +54,10 @@ struct lwp_info /* Non-zero if we were stepping this LWP. */ int step; + /* If WAITSTATUS->KIND != TARGET_WAITKIND_SPURIOUS, the waitstatus + corresponding to STATUS above. */ + struct target_waitstatus waitstatus; + /* Next LWP in list. */ struct lwp_info *next; }; @@ -60,7 +66,6 @@ struct lwp_info system". */ struct mem_attrib; struct target_ops; -struct target_waitstatus; extern int linux_proc_xfer_memory (CORE_ADDR addr, char *myaddr, int len, int write, struct mem_attrib *attrib, Index: thread-db.c =================================================================== RCS file: /cvs/src/src/gdb/thread-db.c,v retrieving revision 1.37 diff -u -p -r1.37 thread-db.c --- thread-db.c 29 Feb 2004 02:39:47 -0000 1.37 +++ thread-db.c 25 Mar 2004 16:31:31 -0000 @@ -1,6 +1,6 @@ /* libthread_db assisted debugging support, generic parts. - Copyright 1999, 2000, 2001, 2003 Free Software Foundation, Inc. + Copyright 1999, 2000, 2001, 2003, 2004 Free Software Foundation, Inc. This file is part of GDB. @@ -130,6 +130,7 @@ static CORE_ADDR td_death_bp_addr; static void thread_db_find_new_threads (void); static void attach_thread (ptid_t ptid, const td_thrhandle_t *th_p, const td_thrinfo_t *ti_p, int verbose); +static void detach_thread (ptid_t ptid, int verbose); \f /* Building process ids. */ @@ -154,6 +155,8 @@ struct private_thread_info unsigned int th_valid:1; unsigned int ti_valid:1; + unsigned int dying:1; + td_thrhandle_t th; td_thrinfo_t ti; }; @@ -501,7 +504,7 @@ enable_thread_event_reporting (void) /* Set the process wide mask saying which events we're interested in. */ td_event_emptyset (&events); td_event_addset (&events, TD_CREATE); -#if 0 +#if 1 /* FIXME: kettenis/2000-04-23: The event reporting facility is broken for TD_DEATH events in glibc 2.1.3, so don't enable it for now. */ @@ -696,6 +699,20 @@ attach_thread (ptid_t ptid, const td_thr struct thread_info *tp; td_err_e err; + /* We may already know about this thread, for instance when the + user has issued the `info threads' command before the SIGTRAP + for hitting the thread creation breakpoint was reported. */ + if (in_thread_list (ptid)) + { + tp = find_thread_pid (ptid); + gdb_assert (tp != NULL); + + if (!tp->private->dying) + return; + + delete_thread (ptid); + } + check_thread_signals (); /* Add the thread to GDB's thread list. */ @@ -741,8 +758,16 @@ thread_db_attach (char *args, int from_t static void detach_thread (ptid_t ptid, int verbose) { + struct thread_info *thread_info; + if (verbose) printf_unfiltered ("[%s exited]\n", target_pid_to_str (ptid)); + + /* Don't delete the thread now, because it still reports as active until + it has executed a few instructions after the event breakpoint. */ + thread_info = find_thread_pid (ptid); + gdb_assert (thread_info != NULL); + thread_info->private->dying = 1; } static void @@ -848,11 +873,7 @@ check_event (ptid_t ptid) { case TD_CREATE: - /* We may already know about this thread, for instance when the - user has issued the `info threads' command before the SIGTRAP - for hitting the thread creation breakpoint was reported. */ - if (!in_thread_list (ptid)) - attach_thread (ptid, msg.th_p, &ti, 1); + attach_thread (ptid, msg.th_p, &ti, 1); break; ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC]: fix for recycled thread ids 2004-03-25 16:34 ` Daniel Jacobowitz @ 2004-03-25 20:22 ` Jeff Johnston 2004-03-26 17:59 ` [patch] Use TD_DEATH and PTRACE_EVENT_CLONE when available (was: Re: [RFC]: fix for recycled thread ids) Daniel Jacobowitz 0 siblings, 1 reply; 30+ messages in thread From: Jeff Johnston @ 2004-03-25 20:22 UTC (permalink / raw) To: Daniel Jacobowitz; +Cc: gdb-patches Daniel Jacobowitz wrote: > On Wed, Mar 24, 2004 at 11:39:50PM -0500, Daniel Jacobowitz wrote: > >>Do you have any code for PTRACE_EVENT_CLONE yet, or should I put >>something together in the morning to verify whether that's the problem? > > > Here you go. Again, this patch is obviously not ready to go into GDB, > but I have not been able to make it misbehave yet. I don't know if all > the bits it needs work right in RHEL3, or if my testing was conclusive. > > The highlights: > - Includes most of the previous patch > - Uses PTRACE_EVENT_CLONE to attach to new threads > - Moves handling of events closer to the waitpid call > > There are some potential races but I haven't hit any of them in > practice. I suspect that with a heavy fork or vfork load (not clone) > you could produce interesting failure modes. > > Give it a try, please. If it works I'll clean it up. > Works great. I was unable to get it to misbehave on RHEL3. Please go ahead cleaning it up. -- Jeff J. ^ permalink raw reply [flat|nested] 30+ messages in thread
* [patch] Use TD_DEATH and PTRACE_EVENT_CLONE when available (was: Re: [RFC]: fix for recycled thread ids) 2004-03-25 20:22 ` Jeff Johnston @ 2004-03-26 17:59 ` Daniel Jacobowitz 2004-03-26 18:14 ` [patch] Use TD_DEATH and PTRACE_EVENT_CLONE when available Jeff Johnston 2004-03-29 18:07 ` [patch] Use TD_DEATH and PTRACE_EVENT_CLONE when available (was: Re: [RFC]: fix for recycled thread ids) Daniel Jacobowitz 0 siblings, 2 replies; 30+ messages in thread From: Daniel Jacobowitz @ 2004-03-26 17:59 UTC (permalink / raw) To: gdb-patches On Thu, Mar 25, 2004 at 03:22:13PM -0500, Jeff Johnston wrote: > > Daniel Jacobowitz wrote: > >On Wed, Mar 24, 2004 at 11:39:50PM -0500, Daniel Jacobowitz wrote: > > > >>Do you have any code for PTRACE_EVENT_CLONE yet, or should I put > >>something together in the morning to verify whether that's the problem? > > > > > >Here you go. Again, this patch is obviously not ready to go into GDB, > >but I have not been able to make it misbehave yet. I don't know if all > >the bits it needs work right in RHEL3, or if my testing was conclusive. > > > >The highlights: > > - Includes most of the previous patch > > - Uses PTRACE_EVENT_CLONE to attach to new threads > > - Moves handling of events closer to the waitpid call > > > >There are some potential races but I haven't hit any of them in > >practice. I suspect that with a heavy fork or vfork load (not clone) > >you could produce interesting failure modes. > > > >Give it a try, please. If it works I'll clean it up. > > > > Works great. I was unable to get it to misbehave on RHEL3. Please go > ahead cleaning it up. Here's the final patch. The description of the patch is unchanged from the above. The background, for those who did not read the whole discussion: in NPTL, a signal delivered to the process when any thread is not attached is likely to terminate the process. So we need to be careful to: - know when threads have exited, so that we can attach to new threads which reuse the same thread ID - attach to threads as early as possible Both of these could cause problems in LinuxThreads, but they were less drastic. Tested with both LinuxThreads and NPTL. I'll commit this to HEAD on Monday, barring objections. We've missed the boat for GDB 6.1 at this point. The testcase you posted has credits in it, so we can't just grab it for the testsuite. But all we should need to test this is a loop that creates short-lived threads, and then verifying that we can send C-c, get a prompt, issue info threads, and continue a couple of times without seeing anyting matching "error:". Interested in writing that test? :) -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer 2004-03-26 Daniel Jacobowitz <drow@mvista.com> * Makefile.in (linux_nat_h): Update dependencies. * configure.in: Check for <gnu/libc-version.h>. * configure: Regenerate. * config.in: Regenerate. * linux-nat.h: Include "target.h". Add waitstatus field to struct lwp_info. * lin-lwp.c (add_lwp): Initialize waitstatus.kind. (lin_lwp_attach_lwp): Don't attach to LWPs we have already attached to. (lin_lwp_handle_extended): New function. Handle clone events. (wait_lwp): Use lin_lwp_handle_extended. Update comment about thread exit events. (child_wait): Handle clone events. (lin_lwp_wait: Use lin_lwp_handle_extended and handle clone events. * linux-nat.c (linux_enable_event_reporting): Turn on PTRACE_O_TRACECLONE. (linux_handle_extended_wait): Handle clone events. * thread-db.c: Include <gnu/libc-version.h>. (struct private_thread_info): Add dying flag. (enable_thread_event_reporting): Enable TD_DEATH for glibc 2.2 and higher. (attach_thread): Update comments. Handle dying threads. (detach_thread): Set the dying flag. (check_event): Always call attach_thread. Index: Makefile.in =================================================================== RCS file: /cvs/src/src/gdb/Makefile.in,v retrieving revision 1.530 diff -u -p -r1.530 Makefile.in --- Makefile.in 25 Mar 2004 01:27:26 -0000 1.530 +++ Makefile.in 26 Mar 2004 17:10:34 -0000 @@ -699,7 +699,7 @@ kod_h = kod.h language_h = language.h libunwind_frame_h = libunwind-frame.h $(libunwind_h) linespec_h = linespec.h -linux_nat_h = linux-nat.h +linux_nat_h = linux-nat.h $(target_h) m2_lang_h = m2-lang.h m68k_tdep_h = m68k-tdep.h macroexp_h = macroexp.h Index: config.in =================================================================== RCS file: /cvs/src/src/gdb/config.in,v retrieving revision 1.61 diff -u -p -r1.61 config.in --- config.in 20 Jan 2004 09:29:13 -0000 1.61 +++ config.in 26 Mar 2004 17:10:34 -0000 @@ -266,6 +266,9 @@ /* Define if you have the <dirent.h> header file. */ #undef HAVE_DIRENT_H +/* Define if you have the <gnu/libc-version.h> header file. */ +#undef HAVE_GNU_LIBC_VERSION_H + /* Define if you have the <libunwind-ia64.h> header file. */ #undef HAVE_LIBUNWIND_IA64_H Index: configure =================================================================== RCS file: /cvs/src/src/gdb/configure,v retrieving revision 1.148 diff -u -p -r1.148 configure --- configure 26 Feb 2004 00:41:46 -0000 1.148 +++ configure 26 Mar 2004 17:10:35 -0000 @@ -4756,7 +4756,7 @@ else fi done -for ac_hdr in proc_service.h thread_db.h +for ac_hdr in proc_service.h thread_db.h gnu/libc-version.h do ac_safe=`echo "$ac_hdr" | sed 'y%./+-%__p_%'` echo $ac_n "checking for $ac_hdr""... $ac_c" 1>&6 Index: configure.in =================================================================== RCS file: /cvs/src/src/gdb/configure.in,v retrieving revision 1.149 diff -u -p -r1.149 configure.in --- configure.in 26 Feb 2004 00:41:46 -0000 1.149 +++ configure.in 26 Mar 2004 17:10:36 -0000 @@ -342,7 +342,7 @@ AC_CHECK_HEADERS(link.h) AC_CHECK_HEADERS(machine/reg.h) AC_CHECK_HEADERS(nlist.h) AC_CHECK_HEADERS(poll.h sys/poll.h) -AC_CHECK_HEADERS(proc_service.h thread_db.h) +AC_CHECK_HEADERS(proc_service.h thread_db.h gnu/libc-version.h) AC_CHECK_HEADERS(stddef.h) AC_CHECK_HEADERS(stdlib.h) AC_CHECK_HEADERS(stdint.h) Index: lin-lwp.c =================================================================== RCS file: /cvs/src/src/gdb/lin-lwp.c,v retrieving revision 1.53 diff -u -p -r1.53 lin-lwp.c --- lin-lwp.c 22 Mar 2004 20:18:33 -0000 1.53 +++ lin-lwp.c 26 Mar 2004 17:10:36 -0000 @@ -1,5 +1,5 @@ /* Multi-threaded debugging support for GNU/Linux (LWP layer). - Copyright 2000, 2001, 2002, 2003 Free Software Foundation, Inc. + Copyright 2000, 2001, 2002, 2003, 2004 Free Software Foundation, Inc. This file is part of GDB. @@ -183,6 +183,8 @@ add_lwp (ptid_t ptid) memset (lp, 0, sizeof (struct lwp_info)); + lp->waitstatus.kind = TARGET_WAITKIND_IGNORE; + lp->ptid = ptid; lp->next = lwp_list; @@ -278,7 +280,7 @@ lin_lwp_open (char *args, int from_tty) void lin_lwp_attach_lwp (ptid_t ptid, int verbose) { - struct lwp_info *lp; + struct lwp_info *lp, *found_lp; gdb_assert (is_lwp (ptid)); @@ -293,13 +295,17 @@ lin_lwp_attach_lwp (ptid_t ptid, int ver if (verbose) printf_filtered ("[New %s]\n", target_pid_to_str (ptid)); - lp = find_lwp_pid (ptid); + found_lp = lp = find_lwp_pid (ptid); if (lp == NULL) lp = add_lwp (ptid); - /* We assume that we're already attached to any LWP that has an - id equal to the overall process id. */ - if (GET_LWP (ptid) != GET_PID (ptid)) + /* We assume that we're already attached to any LWP that has an id + equal to the overall process id, and to any LWP that is already + in our list of LWPs. If we're not seeing exit events from threads + and we've had PID wraparound since we last tried to stop all threads, + this assumption might be wrong; fortunately, this is very unlikely + to happen. */ + if (GET_LWP (ptid) != GET_PID (ptid) && found_lp == NULL) { pid_t pid; int status; @@ -590,6 +596,41 @@ kill_lwp (int lwpid, int signo) return kill (lwpid, signo); } +/* Handle a GNU/Linux extended wait response. Most of the work we + just pass off to linux_handle_extended_wait, but if it reports a + clone event we need to add the new LWP to our list (and not report + the trap to higher layers). This function returns non-zero if + the event should be ignored and we should wait again. */ + +static int +lin_lwp_handle_extended (struct lwp_info *lp, int status) +{ + linux_handle_extended_wait (GET_LWP (lp->ptid), status, + &lp->waitstatus); + + /* TARGET_WAITKIND_SPURIOUS is used to indicate clone events. */ + if (lp->waitstatus.kind == TARGET_WAITKIND_SPURIOUS) + { + struct lwp_info *new_lp; + new_lp = add_lwp (BUILD_LWP (lp->waitstatus.value.related_pid, + GET_PID (inferior_ptid))); + new_lp->cloned = 1; + new_lp->stopped = 1; + + lp->waitstatus.kind = TARGET_WAITKIND_IGNORE; + + if (debug_lin_lwp) + fprintf_unfiltered (gdb_stdlog, + "LLHE: Got clone event from LWP %ld, resuming\n", + GET_LWP (lp->ptid)); + ptrace (PTRACE_CONT, GET_LWP (lp->ptid), 0, 0); + + return 1; + } + + return 0; +} + /* Wait for LP to stop. Returns the wait status, or 0 if the LWP has exited. */ @@ -609,9 +650,11 @@ wait_lwp (struct lwp_info *lp) pid = waitpid (GET_LWP (lp->ptid), &status, __WCLONE); if (pid == -1 && errno == ECHILD) { - /* The thread has previously exited. We need to delete it now - because in the case of NPTL threads, there won't be an - exit event unless it is the main thread. */ + /* The thread has previously exited. We need to delete it + now because, for some vendor 2.4 kernels with NPTL + support backported, there won't be an exit event unless + it is the main thread. 2.6 kernels will report an exit + event for each thread that exits, as expected. */ thread_dead = 1; if (debug_lin_lwp) fprintf_unfiltered (gdb_stdlog, "WL: %s vanished.\n", @@ -658,6 +701,17 @@ wait_lwp (struct lwp_info *lp) gdb_assert (WIFSTOPPED (status)); + /* Handle GNU/Linux's extended waitstatus for trace events. */ + if (WIFSTOPPED (status) && WSTOPSIG (status) == SIGTRAP && status >> 16 != 0) + { + if (debug_lin_lwp) + fprintf_unfiltered (gdb_stdlog, + "WL: Handling extended status 0x%06x\n", + status); + if (lin_lwp_handle_extended (lp, status)) + return wait_lwp (lp); + } + return status; } @@ -1097,6 +1151,8 @@ child_wait (ptid_t ptid, struct target_w int status; pid_t pid; + ourstatus->kind = TARGET_WAITKIND_IGNORE; + do { set_sigint_trap (); /* Causes SIGINT to be passed on to the @@ -1143,6 +1199,25 @@ child_wait (ptid_t ptid, struct target_w save_errno = EINTR; } + /* Handle GNU/Linux's extended waitstatus for trace events. */ + if (pid != -1 && WIFSTOPPED (status) && WSTOPSIG (status) == SIGTRAP + && status >> 16 != 0) + { + linux_handle_extended_wait (pid, status, ourstatus); + + /* If we see a clone event, detach the child, and don't + report the event. It would be nice to offer some way to + switch into a non-thread-db based threaded mode at this + point. */ + if (ourstatus->kind == TARGET_WAITKIND_SPURIOUS) + { + ptrace (PTRACE_DETACH, ourstatus->value.related_pid, 0, 0); + ourstatus->kind = TARGET_WAITKIND_IGNORE; + pid = -1; + save_errno = EINTR; + } + } + clear_sigio_trap (); clear_sigint_trap (); } @@ -1159,11 +1234,9 @@ child_wait (ptid_t ptid, struct target_w return minus_one_ptid; } - /* Handle GNU/Linux's extended waitstatus for trace events. */ - if (WIFSTOPPED (status) && WSTOPSIG (status) == SIGTRAP && status >> 16 != 0) - return linux_handle_extended_wait (pid, status, ourstatus); + if (ourstatus->kind == TARGET_WAITKIND_IGNORE) + store_waitstatus (ourstatus, status); - store_waitstatus (ourstatus, status); return pid_to_ptid (pid); } @@ -1371,6 +1444,20 @@ retry: } } + /* Handle GNU/Linux's extended waitstatus for trace events. */ + if (WIFSTOPPED (status) && WSTOPSIG (status) == SIGTRAP && status >> 16 != 0) + { + if (debug_lin_lwp) + fprintf_unfiltered (gdb_stdlog, + "LLW: Handling extended status 0x%06x\n", + status); + if (lin_lwp_handle_extended (lp, status)) + { + status = 0; + continue; + } + } + /* Check if the thread has exited. */ if ((WIFEXITED (status) || WIFSIGNALED (status)) && num_lwps > 1) { @@ -1588,14 +1675,14 @@ retry: else trap_ptid = null_ptid; - /* Handle GNU/Linux's extended waitstatus for trace events. */ - if (WIFSTOPPED (status) && WSTOPSIG (status) == SIGTRAP && status >> 16 != 0) + if (lp->waitstatus.kind != TARGET_WAITKIND_IGNORE) { - linux_handle_extended_wait (GET_LWP (lp->ptid), status, ourstatus); - return trap_ptid; + *ourstatus = lp->waitstatus; + lp->waitstatus.kind = TARGET_WAITKIND_IGNORE; } + else + store_waitstatus (ourstatus, status); - store_waitstatus (ourstatus, status); return (threaded ? lp->ptid : pid_to_ptid (GET_LWP (lp->ptid))); } Index: linux-nat.c =================================================================== RCS file: /cvs/src/src/gdb/linux-nat.c,v retrieving revision 1.5 diff -u -p -r1.5 linux-nat.c --- linux-nat.c 17 Aug 2003 20:17:02 -0000 1.5 +++ linux-nat.c 26 Mar 2004 17:10:36 -0000 @@ -1,5 +1,5 @@ /* GNU/Linux native-dependent code common to multiple platforms. - Copyright (C) 2003 Free Software Foundation, Inc. + Copyright (C) 2003, 2004 Free Software Foundation, Inc. This file is part of GDB. @@ -224,7 +224,8 @@ linux_enable_event_reporting (ptid_t pti if (! linux_supports_tracefork ()) return; - options = PTRACE_O_TRACEFORK | PTRACE_O_TRACEVFORK | PTRACE_O_TRACEEXEC; + options = PTRACE_O_TRACEFORK | PTRACE_O_TRACEVFORK | PTRACE_O_TRACEEXEC + | PTRACE_O_TRACECLONE; if (linux_supports_tracevforkdone ()) options |= PTRACE_O_TRACEVFORKDONE; @@ -391,11 +392,8 @@ linux_handle_extended_wait (int pid, int { int event = status >> 16; - if (event == PTRACE_EVENT_CLONE) - internal_error (__FILE__, __LINE__, - "unexpected clone event"); - - if (event == PTRACE_EVENT_FORK || event == PTRACE_EVENT_VFORK) + if (event == PTRACE_EVENT_FORK || event == PTRACE_EVENT_VFORK + || event == PTRACE_EVENT_CLONE) { unsigned long new_pid; int ret; @@ -406,12 +404,10 @@ linux_handle_extended_wait (int pid, int if (! pull_pid_from_list (&stopped_pids, new_pid)) { /* The new child has a pending SIGSTOP. We can't affect it until it - hits the SIGSTOP, but we're already attached. - - It won't be a clone (we didn't ask for clones in the event mask) - so we can just call waitpid and wait for the SIGSTOP. */ + hits the SIGSTOP, but we're already attached. */ do { - ret = waitpid (new_pid, &status, 0); + ret = waitpid (new_pid, &status, + (event == PTRACE_EVENT_CLONE) ? __WCLONE : 0); } while (ret == -1 && errno == EINTR); if (ret == -1) perror_with_name ("waiting for new child"); @@ -423,8 +419,13 @@ linux_handle_extended_wait (int pid, int "wait returned unexpected status 0x%x", status); } - ourstatus->kind = (event == PTRACE_EVENT_FORK) - ? TARGET_WAITKIND_FORKED : TARGET_WAITKIND_VFORKED; + if (event == PTRACE_EVENT_FORK) + ourstatus->kind = TARGET_WAITKIND_FORKED; + else if (event == PTRACE_EVENT_VFORK) + ourstatus->kind = TARGET_WAITKIND_VFORKED; + else + ourstatus->kind = TARGET_WAITKIND_SPURIOUS; + ourstatus->value.related_pid = new_pid; return inferior_ptid; } Index: linux-nat.h =================================================================== RCS file: /cvs/src/src/gdb/linux-nat.h,v retrieving revision 1.5 diff -u -p -r1.5 linux-nat.h --- linux-nat.h 14 Sep 2003 02:04:44 -0000 1.5 +++ linux-nat.h 26 Mar 2004 17:10:36 -0000 @@ -1,5 +1,5 @@ /* Native debugging support for GNU/Linux (LWP layer). - Copyright 2000, 2001, 2002, 2003 Free Software Foundation, Inc. + Copyright 2000, 2001, 2002, 2003, 2004 Free Software Foundation, Inc. This file is part of GDB. @@ -18,6 +18,8 @@ Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ +#include "target.h" + /* Structure describing an LWP. */ struct lwp_info @@ -52,6 +54,11 @@ struct lwp_info /* Non-zero if we were stepping this LWP. */ int step; + /* If WAITSTATUS->KIND != TARGET_WAITKIND_SPURIOUS, the waitstatus + for this LWP's last event. This may correspond to STATUS above, + or to a local variable in lin_lwp_wait. */ + struct target_waitstatus waitstatus; + /* Next LWP in list. */ struct lwp_info *next; }; @@ -60,7 +67,6 @@ struct lwp_info system". */ struct mem_attrib; struct target_ops; -struct target_waitstatus; extern int linux_proc_xfer_memory (CORE_ADDR addr, char *myaddr, int len, int write, struct mem_attrib *attrib, Index: thread-db.c =================================================================== RCS file: /cvs/src/src/gdb/thread-db.c,v retrieving revision 1.37 diff -u -p -r1.37 thread-db.c --- thread-db.c 29 Feb 2004 02:39:47 -0000 1.37 +++ thread-db.c 26 Mar 2004 17:10:36 -0000 @@ -1,6 +1,6 @@ /* libthread_db assisted debugging support, generic parts. - Copyright 1999, 2000, 2001, 2003 Free Software Foundation, Inc. + Copyright 1999, 2000, 2001, 2003, 2004 Free Software Foundation, Inc. This file is part of GDB. @@ -35,6 +35,10 @@ #include "regcache.h" #include "solib-svr4.h" +#ifdef HAVE_GNU_LIBC_VERSION_H +#include <gnu/libc-version.h> +#endif + #ifndef LIBTHREAD_DB_SO #define LIBTHREAD_DB_SO "libthread_db.so.1" #endif @@ -130,6 +134,7 @@ static CORE_ADDR td_death_bp_addr; static void thread_db_find_new_threads (void); static void attach_thread (ptid_t ptid, const td_thrhandle_t *th_p, const td_thrinfo_t *ti_p, int verbose); +static void detach_thread (ptid_t ptid, int verbose); \f /* Building process ids. */ @@ -150,6 +155,9 @@ static void attach_thread (ptid_t ptid, struct private_thread_info { + /* Flag set when we see a TD_DEATH event for this thread. */ + unsigned int dying:1; + /* Cached thread state. */ unsigned int th_valid:1; unsigned int ti_valid:1; @@ -491,6 +499,10 @@ enable_thread_event_reporting (void) td_thr_events_t events; td_notify_t notify; td_err_e err; +#ifdef HAVE_GNU_LIBC_VERSION_H + const char *libc_version; + int libc_major, libc_minor; +#endif /* We cannot use the thread event reporting facility if these functions aren't available. */ @@ -501,12 +513,16 @@ enable_thread_event_reporting (void) /* Set the process wide mask saying which events we're interested in. */ td_event_emptyset (&events); td_event_addset (&events, TD_CREATE); -#if 0 + +#ifdef HAVE_GNU_LIBC_VERSION_H /* FIXME: kettenis/2000-04-23: The event reporting facility is broken for TD_DEATH events in glibc 2.1.3, so don't enable it for now. */ - td_event_addset (&events, TD_DEATH); + libc_version = gnu_get_libc_version (); + if (sscanf (libc_version, "%d.%d", &libc_major, &libc_minor) == 2 + && (libc_major > 2 || (libc_major == 2 && libc_minor > 1))) #endif + td_event_addset (&events, TD_DEATH); err = td_ta_set_event_p (thread_agent, &events); if (err != TD_OK) @@ -689,6 +705,10 @@ quit: target_new_objfile_chain (objfile); } +/* Attach to a new thread. This function is called when we receive a + TD_CREATE event or when we iterate over all threads and find one + that wasn't already in our list. */ + static void attach_thread (ptid_t ptid, const td_thrhandle_t *th_p, const td_thrinfo_t *ti_p, int verbose) @@ -696,6 +716,27 @@ attach_thread (ptid_t ptid, const td_thr struct thread_info *tp; td_err_e err; + /* If we're being called after a TD_CREATE event, we may already + know about this thread. There are two ways this can happen. We + may have iterated over all threads between the thread creation + and the TD_CREATE event, for instance when the user has issued + the `info threads' command before the SIGTRAP for hitting the + thread creation breakpoint was reported. Alternatively, the + thread may have exited and a new one been created with the same + thread ID. In the first case we don't need to do anything; in + the second case we should discard information about the dead + thread and attach to the new one. */ + if (in_thread_list (ptid)) + { + tp = find_thread_pid (ptid); + gdb_assert (tp != NULL); + + if (!tp->private->dying) + return; + + delete_thread (ptid); + } + check_thread_signals (); /* Add the thread to GDB's thread list. */ @@ -741,8 +782,21 @@ thread_db_attach (char *args, int from_t static void detach_thread (ptid_t ptid, int verbose) { + struct thread_info *thread_info; + if (verbose) printf_unfiltered ("[%s exited]\n", target_pid_to_str (ptid)); + + /* Don't delete the thread now, because it still reports as active + until it has executed a few instructions after the event + breakpoint - if we deleted it now, "info threads" would cause us + to re-attach to it. Just mark it as having had a TD_DEATH + event. This means that we won't delete it from our thread list + until we notice that it's dead (via prune_threads), or until + something re-uses its thread ID. */ + thread_info = find_thread_pid (ptid); + gdb_assert (thread_info != NULL); + thread_info->private->dying = 1; } static void @@ -847,12 +901,9 @@ check_event (ptid_t ptid) switch (msg.event) { case TD_CREATE: - - /* We may already know about this thread, for instance when the - user has issued the `info threads' command before the SIGTRAP - for hitting the thread creation breakpoint was reported. */ - if (!in_thread_list (ptid)) - attach_thread (ptid, msg.th_p, &ti, 1); + /* Call attach_thread whether or not we already know about a + thread with this thread ID. */ + attach_thread (ptid, msg.th_p, &ti, 1); break; ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] Use TD_DEATH and PTRACE_EVENT_CLONE when available 2004-03-26 17:59 ` [patch] Use TD_DEATH and PTRACE_EVENT_CLONE when available (was: Re: [RFC]: fix for recycled thread ids) Daniel Jacobowitz @ 2004-03-26 18:14 ` Jeff Johnston 2004-03-26 21:13 ` [patch] New thread test to exercise Daniel's Patch Jeff Johnston 2004-03-29 18:07 ` [patch] Use TD_DEATH and PTRACE_EVENT_CLONE when available (was: Re: [RFC]: fix for recycled thread ids) Daniel Jacobowitz 1 sibling, 1 reply; 30+ messages in thread From: Jeff Johnston @ 2004-03-26 18:14 UTC (permalink / raw) To: Daniel Jacobowitz; +Cc: gdb-patches Daniel Jacobowitz wrote: > On Thu, Mar 25, 2004 at 03:22:13PM -0500, Jeff Johnston wrote: > >>Daniel Jacobowitz wrote: >> >>>On Wed, Mar 24, 2004 at 11:39:50PM -0500, Daniel Jacobowitz wrote: >>> >>> >>>>Do you have any code for PTRACE_EVENT_CLONE yet, or should I put >>>>something together in the morning to verify whether that's the problem? >>> >>> >>>Here you go. Again, this patch is obviously not ready to go into GDB, >>>but I have not been able to make it misbehave yet. I don't know if all >>>the bits it needs work right in RHEL3, or if my testing was conclusive. >>> >>>The highlights: >>> - Includes most of the previous patch >>> - Uses PTRACE_EVENT_CLONE to attach to new threads >>> - Moves handling of events closer to the waitpid call >>> >>>There are some potential races but I haven't hit any of them in >>>practice. I suspect that with a heavy fork or vfork load (not clone) >>>you could produce interesting failure modes. >>> >>>Give it a try, please. If it works I'll clean it up. >>> >> >>Works great. I was unable to get it to misbehave on RHEL3. Please go >>ahead cleaning it up. > > > Here's the final patch. The description of the patch is unchanged from > the above. The background, for those who did not read the whole > discussion: in NPTL, a signal delivered to the process when any thread > is not attached is likely to terminate the process. So we need to be > careful to: > - know when threads have exited, so that we can attach to new threads > which reuse the same thread ID > - attach to threads as early as possible > > Both of these could cause problems in LinuxThreads, but they were less > drastic. > > Tested with both LinuxThreads and NPTL. I'll commit this to HEAD on > Monday, barring objections. We've missed the boat for GDB 6.1 at this > point. > > The testcase you posted has credits in it, so we can't just grab it for > the testsuite. But all we should need to test this is a loop that > creates short-lived threads, and then verifying that we can send C-c, > get a prompt, issue info threads, and continue a couple of times > without seeing anyting matching "error:". Interested in writing that > test? :) > Sure. Just tried debugging eclipse today with this patch applied and it works as well. -- Jeff J. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] New thread test to exercise Daniel's Patch 2004-03-26 18:14 ` [patch] Use TD_DEATH and PTRACE_EVENT_CLONE when available Jeff Johnston @ 2004-03-26 21:13 ` Jeff Johnston 2004-03-26 21:19 ` Daniel Jacobowitz 0 siblings, 1 reply; 30+ messages in thread From: Jeff Johnston @ 2004-03-26 21:13 UTC (permalink / raw) To: Jeff Johnston; +Cc: Daniel Jacobowitz, gdb-patches [-- Attachment #1: Type: text/plain, Size: 2704 bytes --] Jeff Johnston wrote: > Daniel Jacobowitz wrote: > >> On Thu, Mar 25, 2004 at 03:22:13PM -0500, Jeff Johnston wrote: >> >>> Daniel Jacobowitz wrote: >>> >>>> On Wed, Mar 24, 2004 at 11:39:50PM -0500, Daniel Jacobowitz wrote: >>>> >>>> >>>>> Do you have any code for PTRACE_EVENT_CLONE yet, or should I put >>>>> something together in the morning to verify whether that's the >>>>> problem? >>>> >>>> >>>> >>>> Here you go. Again, this patch is obviously not ready to go into GDB, >>>> but I have not been able to make it misbehave yet. I don't know if all >>>> the bits it needs work right in RHEL3, or if my testing was conclusive. >>>> >>>> The highlights: >>>> - Includes most of the previous patch >>>> - Uses PTRACE_EVENT_CLONE to attach to new threads >>>> - Moves handling of events closer to the waitpid call >>>> >>>> There are some potential races but I haven't hit any of them in >>>> practice. I suspect that with a heavy fork or vfork load (not clone) >>>> you could produce interesting failure modes. >>>> >>>> Give it a try, please. If it works I'll clean it up. >>>> >>> >>> Works great. I was unable to get it to misbehave on RHEL3. Please >>> go ahead cleaning it up. >> >> >> >> Here's the final patch. The description of the patch is unchanged from >> the above. The background, for those who did not read the whole >> discussion: in NPTL, a signal delivered to the process when any thread >> is not attached is likely to terminate the process. So we need to be >> careful to: >> - know when threads have exited, so that we can attach to new threads >> which reuse the same thread ID >> - attach to threads as early as possible >> >> Both of these could cause problems in LinuxThreads, but they were less >> drastic. >> >> Tested with both LinuxThreads and NPTL. I'll commit this to HEAD on >> Monday, barring objections. We've missed the boat for GDB 6.1 at this >> point. >> >> The testcase you posted has credits in it, so we can't just grab it for >> the testsuite. But all we should need to test this is a loop that >> creates short-lived threads, and then verifying that we can send C-c, >> get a prompt, issue info threads, and continue a couple of times >> without seeing anyting matching "error:". Interested in writing that >> test? :) >> > > Sure. Just tried debugging eclipse today with this patch applied and it > works as well. > > -- Jeff J. > > See the attached test. There was some stuff in pthreads.exp about Crtl-C and alpha-*-osf*. Do I need to account for this platform? 2004-03-26 Jeff Johnston <jjohnstn@redhat.com> * gdb.threads/manythreads.c: New testcase. * gdb.threads/manythreads.exp: Ditto. -- Jeff J. [-- Attachment #2: testsuite.patch --] [-- Type: text/plain, Size: 4383 bytes --] Index: manythreads.exp =================================================================== RCS file: manythreads.exp diff -N manythreads.exp --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ manythreads.exp 26 Mar 2004 21:03:17 -0000 @@ -0,0 +1,116 @@ +# manythreads.exp -- Expect script to test stopping many threads +# Copyright (C) 2004 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + +# Please email any bugs, comments, and/or additions to this file to: +# bug-gdb@prep.ai.mit.edu + +# This file was written by Jeff Johnston. (jjohnstn@redhat.com) + +if $tracelevel then { + strace $tracelevel +} + +set prms_id 0 +set bug_id 0 + +# This only works with native configurations +if ![isnative] then { + return +} + +set testfile "manythreads" +set srcfile ${testfile}.c +set binfile ${objdir}/${subdir}/${testfile} +if {[gdb_compile "${srcdir}/${subdir}/${srcfile}" "${binfile}" executable {debug libs=-lpthread}] != ""} { + return -1 +} + +gdb_start +gdb_reinitialize_dir $srcdir/$subdir +gdb_load ${binfile} +send_gdb "set print sevenbit-strings\n" ; gdb_expect -re "$gdb_prompt $" +runto_main + +# We'll need this when we send_gdb a ^C to GDB. Need to do it before we +# run the program and gdb starts saving and restoring tty states. +# On Ultrix, we don't need it and it is really slow (because shell_escape +# doesn't use vfork). +if ![istarget "*-*-ultrix*"] then { + gdb_test "shell stty intr '^C'" "" +} + +send_gdb "continue\n" +gdb_expect { + -re "Continuing" + { pass "first continue" } + timeout + { fail "first continue (timeout)" } +} + +# Send a Ctrl-C and verify that we can do info threads and continue +after 100 {send_gdb "\003"} +gdb_expect { + -re "Program received signal SIGINT.*$gdb_prompt $" + { + pass "stop threads" + } + timeout + { + fail "stop threads (timeout)" + } +} + +gdb_test "info threads" ".*1 Thread.*.LWP.*" + +send_gdb "continue\n" +gdb_expect { + -re "Continuing" + { pass "second continue" } + timeout + { fail "second continue (timeout)" } +} + +# Send another Ctrl-C and verify that we can do info threads and quit +after 100 {send_gdb "\003"} +gdb_expect { + -re "Program received signal SIGINT.*$gdb_prompt $" + { + pass "stop threads" + } + timeout + { + fail "stop threads (timeout)" + } +} + +send_gdb "quit\n" +gdb_expect { + -re "The program is running. Exit anyway\\? \\(y or n\\) $" { + send_gdb "y\n" + exp_continue + } + eof { + pass "GDB exits after stopping multithreaded program" + } + default { + fail "GDB exits after stopping multithreaded program" + } + timeout { + fail "GDB exits after stopping multithreaded program (timeout)" + } +} + Index: manythreads.c =================================================================== RCS file: manythreads.c diff -N manythreads.c --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ manythreads.c 26 Mar 2004 21:03:17 -0000 @@ -0,0 +1,42 @@ +#include <pthread.h> +#include <stdio.h> + +void * +thread_function (void *arg) +{ + int x = (int)arg; + + printf ("Thread <%d> executing\n", x); + + return NULL; +} + +int +main (int argc, char **argv) +{ + pthread_attr_t attr; + pthread_t threads[256]; + int i, j; + + pthread_attr_init (&attr); + + /* Create a ton of quick-executing threads, then wait for them to + complete. */ + for (i = 0; i < 1000; ++i) + { + for (j = 0; j < 256; ++j) + { + pthread_create (&threads[j], &attr, thread_function, + (void *)(i * 1000 + j)); + } + + for (j = 0; j < 256; ++j) + { + pthread_join (threads[j], NULL); + } + } + + pthread_attr_destroy (&attr); + + return 0; +} ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] New thread test to exercise Daniel's Patch 2004-03-26 21:13 ` [patch] New thread test to exercise Daniel's Patch Jeff Johnston @ 2004-03-26 21:19 ` Daniel Jacobowitz 2004-03-29 18:06 ` Jeff Johnston 0 siblings, 1 reply; 30+ messages in thread From: Daniel Jacobowitz @ 2004-03-26 21:19 UTC (permalink / raw) To: gdb-patches On Fri, Mar 26, 2004 at 04:13:18PM -0500, Jeff Johnston wrote: > See the attached test. There was some stuff in pthreads.exp about Crtl-C > and alpha-*-osf*. Do I need to account for this platform? Well, it's still supported, but I have no idea whether the bit in that test is necessary - or what it's supposed to accomplish. So let's ignore it. > +# This only works with native configurations > +if ![isnative] then { > + return > +} This shouldn't be necessary. > +set testfile "manythreads" > +set srcfile ${testfile}.c > +set binfile ${objdir}/${subdir}/${testfile} > +if {[gdb_compile "${srcdir}/${subdir}/${srcfile}" "${binfile}" executable {debug libs=-lpthread}] != ""} { See gdb_compile_pthreads. > +send_gdb "set print sevenbit-strings\n" ; gdb_expect -re "$gdb_prompt $" gdb_test "set print sevenbit-strings" "" > +send_gdb "continue\n" > +gdb_expect { Everywhere you're using gdb_expect, please use gdb_test_multiple instead. For the "after" tests, you can use gdb_test_multiple with "" as the first argument. > +# Send a Ctrl-C and verify that we can do info threads and continue > +after 100 {send_gdb "\003"} Is 100 ms enough to be interesting? -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] New thread test to exercise Daniel's Patch 2004-03-26 21:19 ` Daniel Jacobowitz @ 2004-03-29 18:06 ` Jeff Johnston 2004-03-29 18:11 ` Daniel Jacobowitz 0 siblings, 1 reply; 30+ messages in thread From: Jeff Johnston @ 2004-03-29 18:06 UTC (permalink / raw) To: Daniel Jacobowitz; +Cc: gdb-patches Daniel Jacobowitz wrote: > On Fri, Mar 26, 2004 at 04:13:18PM -0500, Jeff Johnston wrote: > >>See the attached test. There was some stuff in pthreads.exp about Crtl-C >>and alpha-*-osf*. Do I need to account for this platform? > > > Well, it's still supported, but I have no idea whether the bit in that > test is necessary - or what it's supposed to accomplish. So let's > ignore it. > > >>+# This only works with native configurations >>+if ![isnative] then { >>+ return >>+} > > > This shouldn't be necessary. > Ok. > >>+set testfile "manythreads" >>+set srcfile ${testfile}.c >>+set binfile ${objdir}/${subdir}/${testfile} >>+if {[gdb_compile "${srcdir}/${subdir}/${srcfile}" "${binfile}" executable {debug libs=-lpthread}] != ""} { > > > See gdb_compile_pthreads. > Will do. > >>+send_gdb "set print sevenbit-strings\n" ; gdb_expect -re "$gdb_prompt $" > > > gdb_test "set print sevenbit-strings" "" > Ok. > >>+send_gdb "continue\n" >>+gdb_expect { > > > Everywhere you're using gdb_expect, please use gdb_test_multiple > instead. For the "after" tests, you can use gdb_test_multiple with > "" as the first argument. > I tried this initially but I kept getting "Error: internal buffer is full". I tried lowering the "after" time which is why it ended up 100 below but that didn't solve the problem. Any suggestions on how to avoid the "full_buffer" error. > >>+# Send a Ctrl-C and verify that we can do info threads and continue >>+after 100 {send_gdb "\003"} > > > Is 100 ms enough to be interesting? > It still kicks off a fair number of threads that have started/exited, but I can make this bigger. -- Jeff J. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] New thread test to exercise Daniel's Patch 2004-03-29 18:06 ` Jeff Johnston @ 2004-03-29 18:11 ` Daniel Jacobowitz 2004-03-29 20:18 ` Jeff Johnston 0 siblings, 1 reply; 30+ messages in thread From: Daniel Jacobowitz @ 2004-03-29 18:11 UTC (permalink / raw) To: Jeff Johnston; +Cc: gdb-patches On Mon, Mar 29, 2004 at 01:06:07PM -0500, Jeff Johnston wrote: > >>+send_gdb "continue\n" > >>+gdb_expect { > > > > > >Everywhere you're using gdb_expect, please use gdb_test_multiple > >instead. For the "after" tests, you can use gdb_test_multiple with > >"" as the first argument. > > > > I tried this initially but I kept getting "Error: internal buffer is full". > I tried lowering the "after" time which is why it ended up 100 below but > that didn't solve the problem. Any suggestions on how to avoid the > "full_buffer" error. I'm not sure about the regexes, since I'm just pulling this out of the top of my head, but you want something like this: -re "error:.*$gdb_prompt $" { fail "msg" } -re "\\\[New thread \[^\]\]*\\\]\r\n" { exp_continue } -re "\\\[Thread \[^\]\]* exited\\\]\r\n" { exp_continue } That will consume any thread create/delete messages, instead of leaving them in the buffer. -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] New thread test to exercise Daniel's Patch 2004-03-29 18:11 ` Daniel Jacobowitz @ 2004-03-29 20:18 ` Jeff Johnston 2004-03-29 20:42 ` Daniel Jacobowitz 0 siblings, 1 reply; 30+ messages in thread From: Jeff Johnston @ 2004-03-29 20:18 UTC (permalink / raw) To: Daniel Jacobowitz; +Cc: gdb-patches Daniel Jacobowitz wrote: > On Mon, Mar 29, 2004 at 01:06:07PM -0500, Jeff Johnston wrote: > >>>>+send_gdb "continue\n" >>>>+gdb_expect { >>> >>> >>>Everywhere you're using gdb_expect, please use gdb_test_multiple >>>instead. For the "after" tests, you can use gdb_test_multiple with >>>"" as the first argument. >>> >> >>I tried this initially but I kept getting "Error: internal buffer is full". >>I tried lowering the "after" time which is why it ended up 100 below but >>that didn't solve the problem. Any suggestions on how to avoid the >>"full_buffer" error. > > > I'm not sure about the regexes, since I'm just pulling this out of the > top of my head, but you want something like this: > > -re "error:.*$gdb_prompt $" { > fail "msg" > } > -re "\\\[New thread \[^\]\]*\\\]\r\n" { > exp_continue > } > -re "\\\[Thread \[^\]\]* exited\\\]\r\n" { > exp_continue > } > > That will consume any thread create/delete messages, instead of leaving > them in the buffer. > Forgive my ignorance of the test macros, but how does this terminate? I tried doing it this way and it hangs. It never gets a gdb prompt and spits out endless messages because it is essentially an infinite loop. I only care to see the "Continuing" message and don't care about the rest of the stuff. I stole the original send_gdb / gdb_expect code from schedlock.exp which has the same problem. -- Jeff J. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] New thread test to exercise Daniel's Patch 2004-03-29 20:18 ` Jeff Johnston @ 2004-03-29 20:42 ` Daniel Jacobowitz 2004-03-29 21:04 ` Jeff Johnston 0 siblings, 1 reply; 30+ messages in thread From: Daniel Jacobowitz @ 2004-03-29 20:42 UTC (permalink / raw) To: Jeff Johnston; +Cc: gdb-patches On Mon, Mar 29, 2004 at 03:18:35PM -0500, Jeff Johnston wrote: > Daniel Jacobowitz wrote: > >On Mon, Mar 29, 2004 at 01:06:07PM -0500, Jeff Johnston wrote: > > > >>>>+send_gdb "continue\n" > >>>>+gdb_expect { > >>> > >>> > >>>Everywhere you're using gdb_expect, please use gdb_test_multiple > >>>instead. For the "after" tests, you can use gdb_test_multiple with > >>>"" as the first argument. > >>> > >> > >>I tried this initially but I kept getting "Error: internal buffer is > >>full". I tried lowering the "after" time which is why it ended up 100 > >>below but that didn't solve the problem. Any suggestions on how to avoid > >>the "full_buffer" error. > > > > > >I'm not sure about the regexes, since I'm just pulling this out of the > >top of my head, but you want something like this: > > > > -re "error:.*$gdb_prompt $" { > > fail "msg" > > } > > -re "\\\[New thread \[^\]\]*\\\]\r\n" { > > exp_continue > > } > > -re "\\\[Thread \[^\]\]* exited\\\]\r\n" { > > exp_continue > > } > > > >That will consume any thread create/delete messages, instead of leaving > >them in the buffer. > > > > Forgive my ignorance of the test macros, but how does this terminate? I > tried doing it this way and it hangs. It never gets a gdb prompt and spits > out endless messages because it is essentially an infinite loop. I only > care to see the "Continuing" message and don't care about the rest of the > stuff. > > I stole the original send_gdb / gdb_expect code from schedlock.exp which > has the same problem. Hmm, I must have misunderstood. Which part of the test gives you a full buffer error? I'm thinking of something like this: gdb_test_multiple "continue" { -re "error:.*$gdb_prompt $" { fail "msg" } -re "Continuing" { pass "msg" } } after 1000 { send_gdb "\003" } gdb_test_multiple "" { -re "error:.*$gdb_prompt $" { fail "msg" } -re "$gdb_prompt $" { pass "msg" } -re "\\\[New thread \[^\]\]*\\\]\r\n" { exp_continue } -re "\\\[Thread \[^\]\]* exited\\\]\r\n" { exp_continue } } Those will catch any error messages and treat them as failures. -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] New thread test to exercise Daniel's Patch 2004-03-29 20:42 ` Daniel Jacobowitz @ 2004-03-29 21:04 ` Jeff Johnston 2004-03-29 21:34 ` Daniel Jacobowitz 0 siblings, 1 reply; 30+ messages in thread From: Jeff Johnston @ 2004-03-29 21:04 UTC (permalink / raw) To: Daniel Jacobowitz; +Cc: gdb-patches Daniel Jacobowitz wrote: > On Mon, Mar 29, 2004 at 03:18:35PM -0500, Jeff Johnston wrote: > >>Daniel Jacobowitz wrote: >> >>>On Mon, Mar 29, 2004 at 01:06:07PM -0500, Jeff Johnston wrote: >>> >>> >>>>>>+send_gdb "continue\n" >>>>>>+gdb_expect { >>>>> >>>>> >>>>>Everywhere you're using gdb_expect, please use gdb_test_multiple >>>>>instead. For the "after" tests, you can use gdb_test_multiple with >>>>>"" as the first argument. >>>>> >>>> >>>>I tried this initially but I kept getting "Error: internal buffer is >>>>full". I tried lowering the "after" time which is why it ended up 100 >>>>below but that didn't solve the problem. Any suggestions on how to avoid >>>>the "full_buffer" error. >>> >>> >>>I'm not sure about the regexes, since I'm just pulling this out of the >>>top of my head, but you want something like this: >>> >>> -re "error:.*$gdb_prompt $" { >>> fail "msg" >>> } >>> -re "\\\[New thread \[^\]\]*\\\]\r\n" { >>> exp_continue >>> } >>> -re "\\\[Thread \[^\]\]* exited\\\]\r\n" { >>> exp_continue >>> } >>> >>>That will consume any thread create/delete messages, instead of leaving >>>them in the buffer. >>> >> >>Forgive my ignorance of the test macros, but how does this terminate? I >>tried doing it this way and it hangs. It never gets a gdb prompt and spits >>out endless messages because it is essentially an infinite loop. I only >>care to see the "Continuing" message and don't care about the rest of the >>stuff. >> >>I stole the original send_gdb / gdb_expect code from schedlock.exp which >>has the same problem. > > > Hmm, I must have misunderstood. Which part of the test gives you a > full buffer error? I'm thinking of something like this: > > gdb_test_multiple "continue" { > -re "error:.*$gdb_prompt $" { > fail "msg" > } > -re "Continuing" { > pass "msg" > } > } > > after 1000 { send_gdb "\003" } > gdb_test_multiple "" { > -re "error:.*$gdb_prompt $" { > fail "msg" > } > -re "$gdb_prompt $" { > pass "msg" > } > -re "\\\[New thread \[^\]\]*\\\]\r\n" { > exp_continue > } > -re "\\\[Thread \[^\]\]* exited\\\]\r\n" { > exp_continue > } > } > > Those will catch any error messages and treat them as failures. > Ok, I had the thread checks in the Continue test. If I change it to the way above, now I get 3 "Process no longer exists "messages but the test completes. :( I'm really starting to hate these test macros. What is the reasoning behind avoiding send_gdb / gdb_expect? -- Jeff J. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] New thread test to exercise Daniel's Patch 2004-03-29 21:04 ` Jeff Johnston @ 2004-03-29 21:34 ` Daniel Jacobowitz 2004-03-29 21:59 ` Jeff Johnston 0 siblings, 1 reply; 30+ messages in thread From: Daniel Jacobowitz @ 2004-03-29 21:34 UTC (permalink / raw) To: Jeff Johnston; +Cc: gdb-patches On Mon, Mar 29, 2004 at 04:04:00PM -0500, Jeff Johnston wrote: > Ok, I had the thread checks in the Continue test. If I change it to the > way above, now I get 3 "Process no longer exists "messages but the test > completes. :( > > I'm really starting to hate these test macros. What is the reasoning > behind avoiding send_gdb / gdb_expect? They don't automatically handle things like internal errors, disconnects, et cetera. Want to post your current version and I'll give it a try? -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] New thread test to exercise Daniel's Patch 2004-03-29 21:34 ` Daniel Jacobowitz @ 2004-03-29 21:59 ` Jeff Johnston 2004-03-29 22:32 ` Daniel Jacobowitz 0 siblings, 1 reply; 30+ messages in thread From: Jeff Johnston @ 2004-03-29 21:59 UTC (permalink / raw) To: Daniel Jacobowitz; +Cc: gdb-patches [-- Attachment #1: Type: text/plain, Size: 536 bytes --] Daniel Jacobowitz wrote: > On Mon, Mar 29, 2004 at 04:04:00PM -0500, Jeff Johnston wrote: > >>Ok, I had the thread checks in the Continue test. If I change it to the >>way above, now I get 3 "Process no longer exists "messages but the test >>completes. :( >> >>I'm really starting to hate these test macros. What is the reasoning >>behind avoiding send_gdb / gdb_expect? > > > They don't automatically handle things like internal errors, > disconnects, et cetera. Want to post your current version and I'll > give it a try? > [-- Attachment #2: manythreads.c --] [-- Type: text/plain, Size: 664 bytes --] #include <pthread.h> #include <stdio.h> void * thread_function (void *arg) { int x = (int)arg; printf ("Thread <%d> executing\n", x); return NULL; } int main (int argc, char **argv) { pthread_attr_t attr; pthread_t threads[256]; int i, j; pthread_attr_init (&attr); /* Create a ton of quick-executing threads, then wait for them to complete. */ for (i = 0; i < 1000; ++i) { for (j = 0; j < 256; ++j) { pthread_create (&threads[j], &attr, thread_function, (void *)(i * 1000 + j)); } for (j = 0; j < 256; ++j) { pthread_join (threads[j], NULL); } } pthread_attr_destroy (&attr); return 0; } [-- Attachment #3: manythreads.exp --] [-- Type: text/plain, Size: 3495 bytes --] # manythreads.exp -- Expect script to test stopping many threads # Copyright (C) 2004 Free Software Foundation, Inc. # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. # Please email any bugs, comments, and/or additions to this file to: # bug-gdb@prep.ai.mit.edu # This file was written by Jeff Johnston. (jjohnstn@redhat.com) if $tracelevel then { strace $tracelevel } set prms_id 0 set bug_id 0 set testfile "manythreads" set srcfile ${testfile}.c set binfile ${objdir}/${subdir}/${testfile} if {[gdb_compile_pthreads "${srcdir}/${subdir}/${srcfile}" "${binfile}" executable [list debug "incdir=${objdir}"]] != "" } { return -1 } gdb_start gdb_reinitialize_dir $srcdir/$subdir gdb_load ${binfile} gdb_test "set print sevenbit-strings" "" runto_main # We'll need this when we send_gdb a ^C to GDB. Need to do it before we # run the program and gdb starts saving and restoring tty states. # On Ultrix, we don't need it and it is really slow (because shell_escape # doesn't use vfork). if ![istarget "*-*-ultrix*"] then { gdb_test "shell stty intr '^C'" "" } gdb_test_multiple "continue" "first continue" { -re "ERROR:.*$gdb_prompt $" { fail "$message" } -re "Continuing" { pass "$message" } } # Send a Ctrl-C and verify that we can do info threads and continue after 1000 {send_gdb "\003"} gdb_test_multiple "" "stop threads 1" { -re "\\\[New thread \[^\]\]*\\\]\r\n" { exp_continue } -re "\\\[Thread \[^\]\]* exited\\\]\r\n" { exp_continue } -re "Thread.*executing\r\n" { exp_continue } -re "Program received signal SIGINT.*$gdb_prompt $" { pass "$message" } timeout { fail "$message (timeout)" } } gdb_test "info threads" ".*1 Thread.*.LWP.*" gdb_test_multiple "continue" "second continue" { -re "ERROR:.*$gdb_prompt $" { fail "$message" } -re "Continuing" { pass "$message" } } # Send another Ctrl-C and verify that we can do info threads and quit after 1000 {send_gdb "\003"} gdb_test_multiple "" "stop threads 2" { -re "\\\[New thread \[^\]\]*\\\]\r\n" { exp_continue } -re "\\\[Thread \[^\]\]* exited\\\]\r\n" { exp_continue } -re "Thread.*executing\r\n" { exp_continue } -re "Program received signal SIGINT.*$gdb_prompt $" { pass "stop threads 2" } timeout { fail "stop threads 2 (timeout)" } } gdb_test_multiple "quit" "GDB exits after stopping multithreaded program" { -re "The program is running. Exit anyway\\? \\(y or n\\) $" { gdb_test_multiple "y" "$message" { eof { pass "GDB exits after stopping multithreaded program" } } } eof { pass "GDB exits after stopping multithreaded program" } timeout { fail "GDB exits after stopping multithreaded program (timeout)" } } ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] New thread test to exercise Daniel's Patch 2004-03-29 21:59 ` Jeff Johnston @ 2004-03-29 22:32 ` Daniel Jacobowitz 2004-03-29 22:58 ` Jeff Johnston 0 siblings, 1 reply; 30+ messages in thread From: Daniel Jacobowitz @ 2004-03-29 22:32 UTC (permalink / raw) To: Jeff Johnston; +Cc: gdb-patches [-- Attachment #1: Type: text/plain, Size: 1513 bytes --] On Mon, Mar 29, 2004 at 04:59:30PM -0500, Jeff Johnston wrote: > Daniel Jacobowitz wrote: > >On Mon, Mar 29, 2004 at 04:04:00PM -0500, Jeff Johnston wrote: > > > >>Ok, I had the thread checks in the Continue test. If I change it to the > >>way above, now I get 3 "Process no longer exists "messages but the test > >>completes. :( > >> > >>I'm really starting to hate these test macros. What is the reasoning > >>behind avoiding send_gdb / gdb_expect? > > > > > >They don't automatically handle things like internal errors, > >disconnects, et cetera. Want to post your current version and I'll > >give it a try? > > > -re "Program received signal SIGINT.*$gdb_prompt $" > { > pass "$message" > } Nothing automatically sets $message. The ERROR: you saw is DejaGNU's generic failure message for things like syntax errors in expect blocks; I sent a patch to the dejagnu list a month or so ago to print more helpful information. Try the attached script instead. Oddly, running a fixed manythreads.exp with an unpatched GDB, I get a SIGSEGV in pthread_join. It shows up as a FAIL (yay). The patched GDB shows up as nine PASSes (yay). Re-running it a number of times, the SIGSEGV came and went intermittently. Running the test with LinuxThreads an internal error (lp->status == 0 assertion failed) came and went also. I guess that makes it a good test.... now someone will have to _fix_ those. -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer [-- Attachment #2: manythreads.exp --] [-- Type: text/plain, Size: 3423 bytes --] # manythreads.exp -- Expect script to test stopping many threads # Copyright (C) 2004 Free Software Foundation, Inc. # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. # Please email any bugs, comments, and/or additions to this file to: # bug-gdb@prep.ai.mit.edu # This file was written by Jeff Johnston. (jjohnstn@redhat.com) if $tracelevel then { strace $tracelevel } set prms_id 0 set bug_id 0 set testfile "manythreads" set srcfile ${testfile}.c set binfile ${objdir}/${subdir}/${testfile} if {[gdb_compile_pthreads "${srcdir}/${subdir}/${srcfile}" "${binfile}" executable [list debug "incdir=${objdir}"]] != "" } { return -1 } gdb_start gdb_reinitialize_dir $srcdir/$subdir gdb_load ${binfile} gdb_test "set print sevenbit-strings" "" runto_main # We'll need this when we send_gdb a ^C to GDB. Need to do it before we # run the program and gdb starts saving and restoring tty states. # On Ultrix, we don't need it and it is really slow (because shell_escape # doesn't use vfork). if ![istarget "*-*-ultrix*"] then { gdb_test "shell stty intr '^C'" "" } set message "first continue" gdb_test_multiple "continue" "first continue" { -re "error:.*$gdb_prompt $" { fail "$message" } -re "Continuing" { pass "$message" } } # Send a Ctrl-C and verify that we can do info threads and continue after 1000 {send_gdb "\003"} set message "stop threads 1" gdb_test_multiple "" "stop threads 1" { -re "\\\[New \[^\]\]*\\\]\r\n" { exp_continue } -re "\\\[\[^\]\]* exited\\\]\r\n" { exp_continue } -re "Thread \[^\n\]* executing\r\n" { exp_continue } -re "Program received signal SIGINT.*$gdb_prompt $" { pass "$message" } timeout { fail "$message (timeout)" } } gdb_test "info threads" ".*1 Thread.*.LWP.*" set message "second continue" gdb_test_multiple "continue" "second continue" { -re "error:.*$gdb_prompt $" { fail "$message" } -re "Continuing" { pass "$message" } } # Send another Ctrl-C and verify that we can do info threads and quit after 1000 {send_gdb "\003"} set message "stop threads 2" gdb_test_multiple "" "stop threads 2" { -re "\\\[New \[^\]\]*\\\]\r\n" { exp_continue } -re "\\\[\[^\]\]* exited\\\]\r\n" { exp_continue } -re "Thread \[^\n\]* executing\r\n" { exp_continue } -re "Program received signal SIGINT.*$gdb_prompt $" { pass "stop threads 2" } } gdb_test_multiple "quit" "GDB exits after stopping multithreaded program" { -re "The program is running. Exit anyway\\? \\(y or n\\) $" { send_gdb "y\n" exp_continue } eof { pass "GDB exits after stopping multithreaded program" } timeout { fail "GDB exits after stopping multithreaded program (timeout)" } } ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] New thread test to exercise Daniel's Patch 2004-03-29 22:32 ` Daniel Jacobowitz @ 2004-03-29 22:58 ` Jeff Johnston 2004-03-29 23:57 ` Daniel Jacobowitz 0 siblings, 1 reply; 30+ messages in thread From: Jeff Johnston @ 2004-03-29 22:58 UTC (permalink / raw) To: Daniel Jacobowitz; +Cc: gdb-patches Daniel Jacobowitz wrote: > On Mon, Mar 29, 2004 at 04:59:30PM -0500, Jeff Johnston wrote: > >>Daniel Jacobowitz wrote: >> >>>On Mon, Mar 29, 2004 at 04:04:00PM -0500, Jeff Johnston wrote: >>> >>> >>>>Ok, I had the thread checks in the Continue test. If I change it to the >>>>way above, now I get 3 "Process no longer exists "messages but the test >>>>completes. :( >>>> >>>>I'm really starting to hate these test macros. What is the reasoning >>>>behind avoiding send_gdb / gdb_expect? >>> >>> >>>They don't automatically handle things like internal errors, >>>disconnects, et cetera. Want to post your current version and I'll >>>give it a try? >>> > > >> -re "Program received signal SIGINT.*$gdb_prompt $" >> { >> pass "$message" >> } > > > Nothing automatically sets $message. The ERROR: you saw is DejaGNU's > generic failure message for things like syntax errors in expect blocks; > I sent a patch to the dejagnu list a month or so ago to print more > helpful information. Try the attached script instead. > Thanks for restoring my sanity. I got mixed up by the fact that the gdb_test_multiple macro uses $message inside itself based on the passed argument. I thought the conditions I specify would be able to use the same variable. It runs successfully for me on my RHEL3 system. > Oddly, running a fixed manythreads.exp with an unpatched GDB, I get a > SIGSEGV in pthread_join. It shows up as a FAIL (yay). The patched GDB > shows up as nine PASSes (yay). Re-running it a number of times, the > SIGSEGV came and went intermittently. > > Running the test with LinuxThreads an internal error (lp->status == 0 > assertion failed) came and went also. > > I guess that makes it a good test.... now someone will have to _fix_ > those. > Do I resubmit the patched test or has it been approved? -- Jeff J. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] New thread test to exercise Daniel's Patch 2004-03-29 22:58 ` Jeff Johnston @ 2004-03-29 23:57 ` Daniel Jacobowitz 0 siblings, 0 replies; 30+ messages in thread From: Daniel Jacobowitz @ 2004-03-29 23:57 UTC (permalink / raw) To: Jeff Johnston; +Cc: gdb-patches On Mon, Mar 29, 2004 at 05:58:44PM -0500, Jeff Johnston wrote: > Do I resubmit the patched test or has it been approved? It'll have to be approved by the gdb.threads maintainers, so I recommend resubmitting a full patch/changelog and copying Michael S. -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [patch] Use TD_DEATH and PTRACE_EVENT_CLONE when available (was: Re: [RFC]: fix for recycled thread ids) 2004-03-26 17:59 ` [patch] Use TD_DEATH and PTRACE_EVENT_CLONE when available (was: Re: [RFC]: fix for recycled thread ids) Daniel Jacobowitz 2004-03-26 18:14 ` [patch] Use TD_DEATH and PTRACE_EVENT_CLONE when available Jeff Johnston @ 2004-03-29 18:07 ` Daniel Jacobowitz 1 sibling, 0 replies; 30+ messages in thread From: Daniel Jacobowitz @ 2004-03-29 18:07 UTC (permalink / raw) To: gdb-patches On Fri, Mar 26, 2004 at 12:59:06PM -0500, Daniel Jacobowitz wrote: > On Thu, Mar 25, 2004 at 03:22:13PM -0500, Jeff Johnston wrote: > > > > Daniel Jacobowitz wrote: > > >On Wed, Mar 24, 2004 at 11:39:50PM -0500, Daniel Jacobowitz wrote: > > > > > >>Do you have any code for PTRACE_EVENT_CLONE yet, or should I put > > >>something together in the morning to verify whether that's the problem? > > > > > > > > >Here you go. Again, this patch is obviously not ready to go into GDB, > > >but I have not been able to make it misbehave yet. I don't know if all > > >the bits it needs work right in RHEL3, or if my testing was conclusive. > > > > > >The highlights: > > > - Includes most of the previous patch > > > - Uses PTRACE_EVENT_CLONE to attach to new threads > > > - Moves handling of events closer to the waitpid call > > > > > >There are some potential races but I haven't hit any of them in > > >practice. I suspect that with a heavy fork or vfork load (not clone) > > >you could produce interesting failure modes. > > > > > >Give it a try, please. If it works I'll clean it up. > > > > > > > Works great. I was unable to get it to misbehave on RHEL3. Please go > > ahead cleaning it up. > > Here's the final patch. The description of the patch is unchanged from > the above. The background, for those who did not read the whole > discussion: in NPTL, a signal delivered to the process when any thread > is not attached is likely to terminate the process. So we need to be > careful to: > - know when threads have exited, so that we can attach to new threads > which reuse the same thread ID > - attach to threads as early as possible > > Both of these could cause problems in LinuxThreads, but they were less > drastic. > > Tested with both LinuxThreads and NPTL. I'll commit this to HEAD on > Monday, barring objections. We've missed the boat for GDB 6.1 at this > point. I've checked this in for HEAD. -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC]: fix for recycled thread ids 2004-03-19 1:53 ` Daniel Jacobowitz 2004-03-19 18:44 ` Jeff Johnston @ 2004-03-19 20:33 ` Andrew Cagney 1 sibling, 0 replies; 30+ messages in thread From: Andrew Cagney @ 2004-03-19 20:33 UTC (permalink / raw) To: Daniel Jacobowitz; +Cc: Jeff Johnston, gdb-patches > On Thu, Mar 18, 2004 at 07:36:25PM -0500, Jeff Johnston wrote: > >>> The following patch fixes a problem when a user application creates a >>> thread shortly after another thread has completed. For nptl, thread ids >>> are addresses. If a thread completes/dies, the tid is available for reuse >>> by a new thread. > > > Does NPTL re-use the TID quickly, or cycle around the way LT did so > that we only see this under high thread pressure? Immediatly. Since they are pointers NPTL re-cycles them as soon as they are freed (on the next thread create and, obviously, in preference to allocating a new buffer) -- very different to the linuxthreads model. As a consequence any heavily threaded app quickly shows the problem - java (esp eclipse) for instance. Andrew ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2004-03-29 23:57 UTC | newest] Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2004-03-19 0:36 [RFC]: fix for recycled thread ids Jeff Johnston 2004-03-19 1:53 ` Daniel Jacobowitz 2004-03-19 18:44 ` Jeff Johnston 2004-03-19 19:01 ` Daniel Jacobowitz 2004-03-19 19:35 ` Jeff Johnston 2004-03-19 19:40 ` Daniel Jacobowitz 2004-03-19 21:32 ` Jeff Johnston 2004-03-24 15:51 ` Daniel Jacobowitz 2004-03-24 16:56 ` Daniel Jacobowitz 2004-03-24 21:56 ` Jeff Johnston 2004-03-25 0:46 ` Jeff Johnston 2004-03-25 4:39 ` Daniel Jacobowitz 2004-03-25 16:34 ` Daniel Jacobowitz 2004-03-25 20:22 ` Jeff Johnston 2004-03-26 17:59 ` [patch] Use TD_DEATH and PTRACE_EVENT_CLONE when available (was: Re: [RFC]: fix for recycled thread ids) Daniel Jacobowitz 2004-03-26 18:14 ` [patch] Use TD_DEATH and PTRACE_EVENT_CLONE when available Jeff Johnston 2004-03-26 21:13 ` [patch] New thread test to exercise Daniel's Patch Jeff Johnston 2004-03-26 21:19 ` Daniel Jacobowitz 2004-03-29 18:06 ` Jeff Johnston 2004-03-29 18:11 ` Daniel Jacobowitz 2004-03-29 20:18 ` Jeff Johnston 2004-03-29 20:42 ` Daniel Jacobowitz 2004-03-29 21:04 ` Jeff Johnston 2004-03-29 21:34 ` Daniel Jacobowitz 2004-03-29 21:59 ` Jeff Johnston 2004-03-29 22:32 ` Daniel Jacobowitz 2004-03-29 22:58 ` Jeff Johnston 2004-03-29 23:57 ` Daniel Jacobowitz 2004-03-29 18:07 ` [patch] Use TD_DEATH and PTRACE_EVENT_CLONE when available (was: Re: [RFC]: fix for recycled thread ids) Daniel Jacobowitz 2004-03-19 20:33 ` [RFC]: fix for recycled thread ids Andrew Cagney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox