* [PATCH RFA] lin-lwp.c: Block SIGCHLD events when attaching
@ 2001-11-07 15:07 Kevin Buettner
2001-11-07 15:15 ` H . J . Lu
2001-11-08 13:03 ` Mark Kettenis
0 siblings, 2 replies; 12+ messages in thread
From: Kevin Buettner @ 2001-11-07 15:07 UTC (permalink / raw)
To: gdb-patches
When doing a lin_lwp_attach_lwp(), it is sometimes possible to receive
a SIGCHLD signal thus causing the waitpid() call to fail with EINTR.
This in turn causes the second assert() in lin_lwp_attach_lwp() to
fail.
Reproducing this problem can be somewhat difficult. I've only been
able to reproduce it on a dual processor Linux/x86 machine. I did
manage to reproduce it using the linux-dp program as follows:
1) Start linux-dp (again, running on an SMP machine).
2) Determine the process id. Let's call it PID. (Make appropriate
substitution below.)
3) Invoke gdb:
gdb linux-dp
4) Do:
set height 0
break print_philosopher
5) Do:
while 1
attach PID
continue
detach
end
6) Go get a cup of coffee, take a nap, etc. When you come back, you
may be fortunate enough to see the following:
...
Breakpoint 1, print_philosopher (n=2, left=95 '_', right=95 '_')
at ../../../src/gdb/testsuite/gdb.threads/linux-dp.c:105
105 shared_printf ("%*s%c %d %c\n", (n * 4) + 2, "", left, n, right);
../../src/gdb/lin-lwp.c:416: gdb-internal-error: lin_lwp_attach: Assertion `pid == GET_PID (inferior_ptid) && WIFSTOPPED (status) && WSTOPSIG (status) == SIGSTOP' failed.
An internal GDB error was detected. This may make further
debugging unreliable. Continue this debugging session? (y or n)
Jim Blandy deserves credit for arriving at the above procedure for
reproducing this bug. (The actual test case that we used for tracking
down this problem is different and is able to demonstrate it in a very
short period of time. Having *lots* of running threads greatly
increases the chances of being able to reproduce it quickly.)
The fix is below. Okay to commit?
* lin-lwp.c (lin_lwp_attach_lwp): Make sure SIGCHLD is in set of
blocked signals.
Index: lin-lwp.c
===================================================================
RCS file: /cvs/src/src/gdb/lin-lwp.c,v
retrieving revision 1.30
diff -u -p -r1.30 lin-lwp.c
--- lin-lwp.c 2001/10/14 11:30:37 1.30
+++ lin-lwp.c 2001/11/19 19:08:10
@@ -352,6 +352,14 @@ lin_lwp_attach_lwp (ptid_t ptid, int ver
gdb_assert (is_lwp (ptid));
+ /* Make sure SIGCHLD is blocked. We don't want SIGCHLD events
+ to interrupt either the ptrace() or waitpid() calls below. */
+ if (! sigismember (&blocked_mask, SIGCHLD))
+ {
+ sigaddset (&blocked_mask, SIGCHLD);
+ sigprocmask (SIG_BLOCK, &blocked_mask, NULL);
+ }
+
if (verbose)
printf_filtered ("[New %s]\n", target_pid_to_str (ptid));
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH RFA] lin-lwp.c: Block SIGCHLD events when attaching 2001-11-07 15:07 [PATCH RFA] lin-lwp.c: Block SIGCHLD events when attaching Kevin Buettner @ 2001-11-07 15:15 ` H . J . Lu 2001-11-07 19:05 ` Kevin Buettner 2001-11-08 13:03 ` Mark Kettenis 1 sibling, 1 reply; 12+ messages in thread From: H . J . Lu @ 2001-11-07 15:15 UTC (permalink / raw) To: Kevin Buettner; +Cc: gdb-patches On Mon, Nov 19, 2001 at 12:30:45PM -0700, Kevin Buettner wrote: > When doing a lin_lwp_attach_lwp(), it is sometimes possible to receive > a SIGCHLD signal thus causing the waitpid() call to fail with EINTR. > This in turn causes the second assert() in lin_lwp_attach_lwp() to > fail. > > Reproducing this problem can be somewhat difficult. I've only been > able to reproduce it on a dual processor Linux/x86 machine. I did > manage to reproduce it using the linux-dp program as follows: > Have you looked at http://sources.redhat.com/ml/gdb/2001-09/msg00139.html Specifically, # gcc -g ex11.c -lpthread -lrt -D_GNU_SOURCE -static # a.out # gdb a.out ... (gdb) att 14226 Attaching to program: /home/hjl/bugs/gdb/thread/a.out, process 14226 ... lin-lwp.c:620: gdb-internal-error: stop_wait_callback: Assertion `pid == GET_LWP (lp->ptid)' failed. An internal GDB error was detected. This may make further ex11.c is from glibc and 14226 is the first thread. Your patch may fix it also. H.J. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFA] lin-lwp.c: Block SIGCHLD events when attaching 2001-11-07 15:15 ` H . J . Lu @ 2001-11-07 19:05 ` Kevin Buettner 2001-11-07 19:37 ` Kevin Buettner 0 siblings, 1 reply; 12+ messages in thread From: Kevin Buettner @ 2001-11-07 19:05 UTC (permalink / raw) To: H . J . Lu; +Cc: gdb-patches On Nov 19, 11:38am, H . J . Lu wrote: > On Mon, Nov 19, 2001 at 12:30:45PM -0700, Kevin Buettner wrote: > > When doing a lin_lwp_attach_lwp(), it is sometimes possible to receive > > a SIGCHLD signal thus causing the waitpid() call to fail with EINTR. > > This in turn causes the second assert() in lin_lwp_attach_lwp() to > > fail. > > > > Reproducing this problem can be somewhat difficult. I've only been > > able to reproduce it on a dual processor Linux/x86 machine. I did > > manage to reproduce it using the linux-dp program as follows: > > > > Have you looked at > > http://sources.redhat.com/ml/gdb/2001-09/msg00139.html > > Specifically, > > # gcc -g ex11.c -lpthread -lrt -D_GNU_SOURCE -static > # a.out > # gdb a.out > ... > (gdb) att 14226 > Attaching to program: /home/hjl/bugs/gdb/thread/a.out, process 14226 > ... > lin-lwp.c:620: gdb-internal-error: stop_wait_callback: Assertion `pid == GET_LWP (lp->ptid)' failed. > An internal GDB error was detected. This may make further > > ex11.c is from glibc and 14226 is the first thread. Your patch may fix > it also. I think this might be a different problem. I haven't been able to reproduce the exact problem that you mentioned above, either with or without my patch. I do occassionally see "Cannot find new threads: generic error". I'm going to try to figure this one out... Kevin ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFA] lin-lwp.c: Block SIGCHLD events when attaching 2001-11-07 19:05 ` Kevin Buettner @ 2001-11-07 19:37 ` Kevin Buettner 2001-11-07 20:26 ` H . J . Lu 0 siblings, 1 reply; 12+ messages in thread From: Kevin Buettner @ 2001-11-07 19:37 UTC (permalink / raw) To: H . J . Lu; +Cc: gdb-patches On Nov 19, 5:42pm, Kevin Buettner wrote: > > # gcc -g ex11.c -lpthread -lrt -D_GNU_SOURCE -static > > # a.out > > # gdb a.out > > ... > > (gdb) att 14226 > > Attaching to program: /home/hjl/bugs/gdb/thread/a.out, process 14226 > > ... > > lin-lwp.c:620: gdb-internal-error: stop_wait_callback: Assertion `pid == GET_LWP (lp->ptid)' failed. > > An internal GDB error was detected. This may make further > > > > ex11.c is from glibc and 14226 is the first thread. Your patch may fix > > it also. > > I think this might be a different problem. I haven't been able to > reproduce the exact problem that you mentioned above, either with > or without my patch. > > I do occassionally see "Cannot find new threads: generic error". I'm > going to try to figure this one out... The "Cannot find new threads: generic error" that I'm seeing in this program is happening because td_ta_thr_iter() in libthread_db.so is requesting that gdb read the memory associated with a struct _pthread_descr_struct (which has size 1056). It turns out that gdb is able to read 1024 bytes of this struct, but no more. (ptrace() returns EIO when attempting to read more.) I'm not sure why this is happening. Could it be that the manager thread is in the midst of updating the descriptor data structures? Kevin ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFA] lin-lwp.c: Block SIGCHLD events when attaching 2001-11-07 19:37 ` Kevin Buettner @ 2001-11-07 20:26 ` H . J . Lu 2001-11-07 22:45 ` Kevin Buettner 0 siblings, 1 reply; 12+ messages in thread From: H . J . Lu @ 2001-11-07 20:26 UTC (permalink / raw) To: Kevin Buettner; +Cc: gdb-patches On Mon, Nov 19, 2001 at 07:24:46PM -0700, Kevin Buettner wrote: > On Nov 19, 5:42pm, Kevin Buettner wrote: > > > > # gcc -g ex11.c -lpthread -lrt -D_GNU_SOURCE -static > > > # a.out > > > # gdb a.out > > > ... > > > (gdb) att 14226 > > > Attaching to program: /home/hjl/bugs/gdb/thread/a.out, process 14226 > > > ... > > > lin-lwp.c:620: gdb-internal-error: stop_wait_callback: Assertion `pid == GET_LWP (lp->ptid)' failed. > > > An internal GDB error was detected. This may make further > > > > > > ex11.c is from glibc and 14226 is the first thread. Your patch may fix > > > it also. > > > > I think this might be a different problem. I haven't been able to > > reproduce the exact problem that you mentioned above, either with > > or without my patch. > > > > I do occassionally see "Cannot find new threads: generic error". I'm > > going to try to figure this one out... > > The "Cannot find new threads: generic error" that I'm seeing in this > program is happening because td_ta_thr_iter() in libthread_db.so is > requesting that gdb read the memory associated with a struct > _pthread_descr_struct (which has size 1056). It turns out that gdb > is able to read 1024 bytes of this struct, but no more. (ptrace() > returns EIO when attempting to read more.) > > I'm not sure why this is happening. Could it be that the manager thread > is in the midst of updating the descriptor data structures? Can you try gdb 5.1 to see what happens with ex11.c? H.J. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFA] lin-lwp.c: Block SIGCHLD events when attaching 2001-11-07 20:26 ` H . J . Lu @ 2001-11-07 22:45 ` Kevin Buettner 2001-11-07 22:48 ` Kevin Buettner 0 siblings, 1 reply; 12+ messages in thread From: Kevin Buettner @ 2001-11-07 22:45 UTC (permalink / raw) To: H . J . Lu; +Cc: gdb-patches On Nov 19, 6:51pm, H . J . Lu wrote: > > > > # gcc -g ex11.c -lpthread -lrt -D_GNU_SOURCE -static > > > > # a.out > > > > # gdb a.out > > > > ... > > > > (gdb) att 14226 > > > > Attaching to program: /home/hjl/bugs/gdb/thread/a.out, process 14226 > > > > ... > > > > lin-lwp.c:620: gdb-internal-error: stop_wait_callback: Assertion `pid == GET_LWP (lp->ptid)' failed. > > > > An internal GDB error was detected. This may make further > > > > > > > > ex11.c is from glibc and 14226 is the first thread. Your patch may fix > > > > it also. > > > > > > I think this might be a different problem. I haven't been able to > > > reproduce the exact problem that you mentioned above, either with > > > or without my patch. > > > > > > I do occassionally see "Cannot find new threads: generic error". I'm > > > going to try to figure this one out... > > > > The "Cannot find new threads: generic error" that I'm seeing in this > > program is happening because td_ta_thr_iter() in libthread_db.so is > > requesting that gdb read the memory associated with a struct > > _pthread_descr_struct (which has size 1056). It turns out that gdb > > is able to read 1024 bytes of this struct, but no more. (ptrace() > > returns EIO when attempting to read more.) > > > > I'm not sure why this is happening. Could it be that the manager thread > > is in the midst of updating the descriptor data structures? > > Can you try gdb 5.1 to see what happens with ex11.c? I just tried it. It seems to behave about the same. It usually attaches okay, but I am able to reproduce the "Cannot find new threads: generic error" by using Jim's trick of repeatedly attaching and detaching. Kevin ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFA] lin-lwp.c: Block SIGCHLD events when attaching 2001-11-07 22:45 ` Kevin Buettner @ 2001-11-07 22:48 ` Kevin Buettner 2001-11-07 23:21 ` H . J . Lu 0 siblings, 1 reply; 12+ messages in thread From: Kevin Buettner @ 2001-11-07 22:48 UTC (permalink / raw) To: H . J . Lu; +Cc: gdb-patches On Nov 19, 11:53pm, Kevin Buettner wrote: > I just tried it. It seems to behave about the same. It usually attaches > okay, but I am able to reproduce the "Cannot find new threads: generic error" > by using Jim's trick of repeatedly attaching and detaching. I now know a little bit more about this problem. First, I should explain that I'm using Jim Blandy's trick of set height 0 break <somewhere that will be hit by one or more threads> while 1 attach <pid of main process> continue detach end The "Cannot find new threads: generic error" seems to occur just (or shortly) after one of the early threads has exited. Tomorrow, I'll write a test to see if it's possible at all to attach to a multithreaded process in which one of the threads has exited. Kevin ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFA] lin-lwp.c: Block SIGCHLD events when attaching 2001-11-07 22:48 ` Kevin Buettner @ 2001-11-07 23:21 ` H . J . Lu 2001-11-08 10:55 ` Kevin Buettner 0 siblings, 1 reply; 12+ messages in thread From: H . J . Lu @ 2001-11-07 23:21 UTC (permalink / raw) To: Kevin Buettner; +Cc: gdb-patches [-- Attachment #1: Type: text/plain, Size: 975 bytes --] On Tue, Nov 20, 2001 at 12:17:10AM -0700, Kevin Buettner wrote: > On Nov 19, 11:53pm, Kevin Buettner wrote: > > > I just tried it. It seems to behave about the same. It usually attaches > > okay, but I am able to reproduce the "Cannot find new threads: generic error" > > by using Jim's trick of repeatedly attaching and detaching. > > I now know a little bit more about this problem. > > First, I should explain that I'm using Jim Blandy's trick of > > set height 0 > break <somewhere that will be hit by one or more threads> > while 1 > attach <pid of main process> > continue > detach > end > > The "Cannot find new threads: generic error" seems to occur just (or > shortly) after one of the early threads has exited. Tomorrow, I'll > write a test to see if it's possible at all to attach to a > multithreaded process in which one of the threads has exited. Here is a modified ex11.c. It is easier to reproduce the gdb bug. H.J. [-- Attachment #2: ex11.c --] [-- Type: text/plain, Size: 3606 bytes --] /* Test program for timedout read/write lock functions. Copyright (C) 2000 Free Software Foundation, Inc. Contributed by Ulrich Drepper <drepper@redhat.com>, 2000. The GNU C Library is free software; you can redistribute it and/or modify it under the terms of the GNU Library General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. The GNU C Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Library General Public License for more details. You should have received a copy of the GNU Library General Public License along with the GNU C Library; see the file COPYING.LIB. If not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include <errno.h> #include <error.h> #include <pthread.h> #include <stdio.h> #include <stdlib.h> #include <time.h> #include <unistd.h> #define NWRITERS 15 #define WRITETRIES 10 #define NREADERS 15 #define READTRIES 15 #define TIMEOUT 1000000 #define DELAY 1000000 static pthread_rwlock_t lock = PTHREAD_RWLOCK_WRITER_NONRECURSIVE_INITIALIZER_NP; static void * writer_thread (void *nr) { struct timespec ts; struct timespec delay; int n; ts.tv_sec = 1000000; ts.tv_nsec = TIMEOUT; delay.tv_sec = 1000; delay.tv_nsec = DELAY; for (n = 0; n < WRITETRIES; ++n) { do { clock_gettime (CLOCK_REALTIME, &ts); ts.tv_nsec += 2 * TIMEOUT; // printf ("writer thread %ld tries again\n", (long int) nr); } //while (pthread_rwlock_wrlock (&lock), 0); while (pthread_rwlock_timedwrlock (&lock, &ts) == ETIMEDOUT); printf ("writer thread %ld succeeded\n", (long int) nr); nanosleep (&delay, NULL); pthread_rwlock_unlock (&lock); printf ("writer thread %ld released\n", (long int) nr); } return NULL; } static void * reader_thread (void *nr) { struct timespec ts; struct timespec delay; int n; delay.tv_sec = 1000000; delay.tv_nsec = DELAY; for (n = 0; n < READTRIES; ++n) { do { clock_gettime (CLOCK_REALTIME, &ts); ts.tv_nsec += TIMEOUT; // printf ("reader thread %ld tries again\n", (long int) nr); } //while (pthread_rwlock_rdlock (&lock), 0); while (pthread_rwlock_timedrdlock (&lock, &ts) == ETIMEDOUT); printf ("reader thread %ld succeeded\n", (long int) nr); nanosleep (&delay, NULL); pthread_rwlock_unlock (&lock); printf ("reader thread %ld released\n", (long int) nr); } return NULL; } int main (void) { pthread_t thwr[NWRITERS]; pthread_t thrd[NREADERS]; int n; void *res; /* Make standard error the same as standard output. */ dup2 (1, 2); /* Make sure we see all message, even those on stdout. */ setvbuf (stdout, NULL, _IONBF, 0); for (n = 0; n < NWRITERS; ++n) { int err = pthread_create (&thwr[n], NULL, writer_thread, (void *) (long int) n); if (err != 0) error (EXIT_FAILURE, err, "cannot create writer thread"); } for (n = 0; n < NREADERS; ++n) { int err = pthread_create (&thrd[n], NULL, reader_thread, (void *) (long int) n); if (err != 0) error (EXIT_FAILURE, err, "cannot create reader thread"); } /* Wait for all the threads. */ for (n = 0; n < NWRITERS; ++n) pthread_join (thwr[n], &res); for (n = 0; n < NREADERS; ++n) pthread_join (thrd[n], &res); return 0; } ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFA] lin-lwp.c: Block SIGCHLD events when attaching 2001-11-07 23:21 ` H . J . Lu @ 2001-11-08 10:55 ` Kevin Buettner 2001-11-08 10:25 ` H . J . Lu 0 siblings, 1 reply; 12+ messages in thread From: Kevin Buettner @ 2001-11-08 10:55 UTC (permalink / raw) To: H . J . Lu; +Cc: gdb-patches On Nov 19, 11:55pm, H . J . Lu wrote: > Here is a modified ex11.c. It is easier to reproduce the gdb bug. It is easier to attach to - the original finishes too quickly. But, I still haven't been able to reproduce the problem that you're seeing. (I tried both the 5.1 branch and the current development sources on a stock Red Hat 7.2 machine.) Is there some particular glibc or kernel version that I need to use? Kevin ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFA] lin-lwp.c: Block SIGCHLD events when attaching 2001-11-08 10:55 ` Kevin Buettner @ 2001-11-08 10:25 ` H . J . Lu 0 siblings, 0 replies; 12+ messages in thread From: H . J . Lu @ 2001-11-08 10:25 UTC (permalink / raw) To: Kevin Buettner; +Cc: gdb-patches On Tue, Nov 20, 2001 at 03:20:44PM -0700, Kevin Buettner wrote: > On Nov 19, 11:55pm, H . J . Lu wrote: > > > Here is a modified ex11.c. It is easier to reproduce the gdb bug. > > It is easier to attach to - the original finishes too quickly. But, > I still haven't been able to reproduce the problem that you're seeing. > (I tried both the 5.1 branch and the current development sources on > a stock Red Hat 7.2 machine.) It may be timing related. > > Is there some particular glibc or kernel version that I need to use? I am using kernel 2.4.9-12 and glibc 2.2.4-19 from Red Hat. H.J. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFA] lin-lwp.c: Block SIGCHLD events when attaching 2001-11-07 15:07 [PATCH RFA] lin-lwp.c: Block SIGCHLD events when attaching Kevin Buettner 2001-11-07 15:15 ` H . J . Lu @ 2001-11-08 13:03 ` Mark Kettenis 2001-11-08 19:19 ` Kevin Buettner 1 sibling, 1 reply; 12+ messages in thread From: Mark Kettenis @ 2001-11-08 13:03 UTC (permalink / raw) To: Kevin Buettner; +Cc: gdb-patches Kevin Buettner <kevinb@cygnus.com> writes: > When doing a lin_lwp_attach_lwp(), it is sometimes possible to receive > a SIGCHLD signal thus causing the waitpid() call to fail with EINTR. > This in turn causes the second assert() in lin_lwp_attach_lwp() to > fail. Thanks for tracking this down. Note that getting this EINTR is a side-effect of the tricks we play with sigsuspend in lin_lwp_wait. > The fix is below. Okay to commit? Looks fine to me. lin_lwp_attach_lwp() should not be called unless the inferior is already running so blocking SIGCHLD here is OK. Mark ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH RFA] lin-lwp.c: Block SIGCHLD events when attaching 2001-11-08 13:03 ` Mark Kettenis @ 2001-11-08 19:19 ` Kevin Buettner 0 siblings, 0 replies; 12+ messages in thread From: Kevin Buettner @ 2001-11-08 19:19 UTC (permalink / raw) To: Mark Kettenis; +Cc: gdb-patches On Nov 21, 12:52pm, Mark Kettenis wrote: > Kevin Buettner <kevinb@cygnus.com> writes: > > > When doing a lin_lwp_attach_lwp(), it is sometimes possible to receive > > a SIGCHLD signal thus causing the waitpid() call to fail with EINTR. > > This in turn causes the second assert() in lin_lwp_attach_lwp() to > > fail. > > Thanks for tracking this down. Note that getting this EINTR is a > side-effect of the tricks we play with sigsuspend in lin_lwp_wait. > > > The fix is below. Okay to commit? > > Looks fine to me. lin_lwp_attach_lwp() should not be called unless > the inferior is already running so blocking SIGCHLD here is OK. Committed. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2001-11-21 22:27 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2001-11-07 15:07 [PATCH RFA] lin-lwp.c: Block SIGCHLD events when attaching Kevin Buettner 2001-11-07 15:15 ` H . J . Lu 2001-11-07 19:05 ` Kevin Buettner 2001-11-07 19:37 ` Kevin Buettner 2001-11-07 20:26 ` H . J . Lu 2001-11-07 22:45 ` Kevin Buettner 2001-11-07 22:48 ` Kevin Buettner 2001-11-07 23:21 ` H . J . Lu 2001-11-08 10:55 ` Kevin Buettner 2001-11-08 10:25 ` H . J . Lu 2001-11-08 13:03 ` Mark Kettenis 2001-11-08 19:19 ` Kevin Buettner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox