From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1907 invoked by alias); 25 Mar 2004 00:46:43 -0000 Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sources.redhat.com Received: (qmail 1896 invoked from network); 25 Mar 2004 00:46:41 -0000 Received: from unknown (HELO mx1.redhat.com) (66.187.233.31) by sources.redhat.com with SMTP; 25 Mar 2004 00:46:41 -0000 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.12.10/8.12.10) with ESMTP id i2P0kfWA012335 for ; Wed, 24 Mar 2004 19:46:41 -0500 Received: from pobox.toronto.redhat.com (pobox.toronto.redhat.com [172.16.14.4]) by int-mx1.corp.redhat.com (8.11.6/8.11.6) with ESMTP id i2P0kej06679; Wed, 24 Mar 2004 19:46:40 -0500 Received: from touchme.toronto.redhat.com (IDENT:postfix@touchme.toronto.redhat.com [172.16.14.9]) by pobox.toronto.redhat.com (8.12.8/8.12.8) with ESMTP id i2P0kexo007972; Wed, 24 Mar 2004 19:46:40 -0500 Received: from redhat.com (toocool.toronto.redhat.com [172.16.14.72]) by touchme.toronto.redhat.com (Postfix) with ESMTP id C505680008E; Wed, 24 Mar 2004 19:46:39 -0500 (EST) Message-ID: <40622BEF.8030403@redhat.com> Date: Thu, 25 Mar 2004 00:46:00 -0000 From: Jeff Johnston User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 MIME-Version: 1.0 To: Jeff Johnston Cc: Daniel Jacobowitz , gdb-patches@sources.redhat.com Subject: Re: [RFC]: fix for recycled thread ids References: <405A4089.1080605@redhat.com> <20040319015351.GA28443@nevyn.them.org> <405B3F83.4030503@redhat.com> <20040319190126.GA16950@nevyn.them.org> <405B4B8C.2060801@redhat.com> <20040319194011.GA18776@nevyn.them.org> <405B66F4.4090101@redhat.com> <20040324155130.GA27748@nevyn.them.org> <20040324165625.GA10256@nevyn.them.org> <4062041F.1010302@redhat.com> In-Reply-To: <4062041F.1010302@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-SW-Source: 2004-03/txt/msg00584.txt.bz2 Jeff Johnston wrote: > Daniel Jacobowitz wrote: > >> On Wed, Mar 24, 2004 at 10:51:30AM -0500, Daniel Jacobowitz wrote: >> >>> Fortunately, in there is a function to return the >>> runtime version of glibc. We should be able to use that - and the not >>> 100% valid, but generally valid and already assumed by thread_db, >>> assumption that a native GDB, when used to debug native programs, is >>> debugging the same version of the C library - to enable TD_DEATH when >>> it is safe to do so. This will let us detach the threads when they >>> die. >>> >>> That has its own risks, since the thread continues to run for a short >>> while after the death event is reported. For instance, in NPTL the >>> thread reports the event and then cleans up after itself; in LT I don't >>> remember whether the manager or the thread does this, but I think it's >>> the same. I already wrote limited code to handle this, if you search >>> for "zombie" in thread-db.c, so it should be OK. The gist is that we >>> remove it from the thread list right away, but do not detach the >>> thread. We resist attaching to zombies. >>> >>> At least, all that is how it looks to me. I'll experiment with >>> TD_DEATH before I speculate further. >> >> >> >> Here's a proof-of-concept patch. It is not quite right, in that now I >> can hit C-c about four times before I get the error(). It does >> illustrate what I was suggesting, though, minus the version checking. >> >> The reason we still get the segfaults is not that surprising. I spent >> a little while coaxing it into producing a core dump, and got this: >> >> (gdb) i threads >> 7 process 10303 0x40034531 in __nptl_create_event () from >> /lib/tls/i686/cmov/libpthread.so.0 >> 6 process 12467 0xffffe410 in __kernel_vsyscall () >> 5 process 12468 0xffffe410 in __kernel_vsyscall () >> 4 process 12469 0x0804884e in threadfunc (arg=0x25) at lphello.c:64 >> 3 process 12470 0x40034540 in __nptl_death_event () from >> /lib/tls/i686/cmov/libpthread.so.0 >> 2 process 12471 0xffffe410 in __kernel_vsyscall () >> * 1 process 12472 0xffffe410 in __kernel_vsyscall () >> >> So one thread is running, four are probably sleeping, one is dying... >> and one is creating another thread. I'd bet money that we aren't >> attached to the one being created yet. I can produce all sorts of >> other interesting GDB internal errors this way, too. >> > > I also tried the TD_DEATH method but gave up after running into > segfaults. I have to go back and recheck whether those segfaults could > occur without the Crtl-C being issued. I can't remember off-hand. > Just to confirm, I don't see any segfaults just running, but this code is extremely brittle on my RHEL3-U1 system. It is coming back to me why I abandoned TD_DEATH. On the first Ctrl-C I usually get: Program received signal SIGINT, Interrupt. [Switching to Thread -1226028112 (LWP 16580)] 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 (gdb) info threads [New Thread -1226163280 (LWP 16579)] Can't attach LWP 16579: Operation not permitted (gdb) info threads [New Thread -1226298448 (LWP 16578)] Can't attach LWP 16578: Operation not permitted (gdb) info threads 158 Thread -1226298448 (LWP 16578) 0xb75ce6b1 in __nptl_death_event () from /lib/tls/libpthread.so.0 157 Thread -1226163280 (LWP 16579) 0xb75ce6b1 in __nptl_death_event () from /lib/tls/libpthread.so.0 * 156 Thread -1226028112 (LWP 16580) 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 1 Thread -1218598080 (LWP 16423) 0xb75ce6a1 in __nptl_create_event () from /lib/tls/libpthread.so.0 (gdb) c After continuing, the program will go shortly and then die. [Thread -1226163280 (zombie) exited] [New Thread -1226433616 (LWP 16674)] Cannot enable thread event reporting for Thread -1226433616 (LWP 16674): generic error My original fix doesn't break near as easily on RHEL-U1. Hitting enough Ctrl-C's does eventually trigger an error (occassionally even a gdb assert), but this may be part of catching the race condition before we know about the thread. Now, it is more than likely just a bug in the TD_DEATH event processing because we haven't exercised it. Would it be worth looking at implementing your original alternative for my patch? You asked me in a previous note to run with strace on and confirm that I was not getting WIFEXITTED for the dying threads. I confirmed this. I only get one WIFEXITTED and that is for the main thread. I still need to do the Eclipse test. -- Jeff J. >> Does this patch fix the problems you were seeing with less pathological >> test cases? Eclipse, in this case, is a less pathological test case, >> since it creates threads that actually do a bit of work. If it does, >> I'll add the necessary glibc version checking. The only way to fix the >> race on the create side is to add PTRACE_EVENT_CLONE support (this _is_ >> what it was for), but that will require some major surgery. >> > > As you know, I am also looking at the PTRACE_EVENT_CLONE stuff. I'll > try this patch out with the Eclipse test. I also will include your > lin-lwp linux event handler fix to see if that gets past the fork > problem they were seeing. > >> Index: thread-db.c >> =================================================================== >> RCS file: /cvs/src/src/gdb/thread-db.c,v >> retrieving revision 1.37 >> diff -u -p -r1.37 thread-db.c >> --- thread-db.c 29 Feb 2004 02:39:47 -0000 1.37 >> +++ thread-db.c 24 Mar 2004 16:47:23 -0000 >> @@ -1,6 +1,6 @@ >> /* libthread_db assisted debugging support, generic parts. >> >> - Copyright 1999, 2000, 2001, 2003 Free Software Foundation, Inc. >> + Copyright 1999, 2000, 2001, 2003, 2004 Free Software Foundation, Inc. >> >> This file is part of GDB. >> >> @@ -130,6 +130,7 @@ static CORE_ADDR td_death_bp_addr; >> static void thread_db_find_new_threads (void); >> static void attach_thread (ptid_t ptid, const td_thrhandle_t *th_p, >> const td_thrinfo_t *ti_p, int verbose); >> +static void detach_thread (ptid_t ptid, int verbose); >> >> >> /* Building process ids. */ >> @@ -154,6 +155,8 @@ struct private_thread_info >> unsigned int th_valid:1; >> unsigned int ti_valid:1; >> >> + unsigned int dying:1; >> + >> td_thrhandle_t th; >> td_thrinfo_t ti; >> }; >> @@ -501,7 +504,7 @@ enable_thread_event_reporting (void) >> /* Set the process wide mask saying which events we're interested >> in. */ >> td_event_emptyset (&events); >> td_event_addset (&events, TD_CREATE); >> -#if 0 >> +#if 1 >> /* FIXME: kettenis/2000-04-23: The event reporting facility is >> broken for TD_DEATH events in glibc 2.1.3, so don't enable it for >> now. */ >> @@ -696,6 +699,20 @@ attach_thread (ptid_t ptid, const td_thr >> struct thread_info *tp; >> td_err_e err; >> >> + /* We may already know about this thread, for instance when the >> + user has issued the `info threads' command before the SIGTRAP >> + for hitting the thread creation breakpoint was reported. */ >> + if (in_thread_list (ptid)) >> + { >> + tp = find_thread_pid (ptid); >> + gdb_assert (tp != NULL); >> + >> + if (!tp->private->dying) >> + return; >> + >> + delete_thread (ptid); >> + } >> + >> check_thread_signals (); >> >> /* Add the thread to GDB's thread list. */ >> @@ -741,8 +758,16 @@ thread_db_attach (char *args, int from_t >> static void >> detach_thread (ptid_t ptid, int verbose) >> { >> + struct thread_info *thread_info; >> + >> if (verbose) >> printf_unfiltered ("[%s exited]\n", target_pid_to_str (ptid)); >> + >> + /* Don't delete the thread now, because it still reports as active >> until >> + it has executed a few instructions after the event breakpoint. */ >> + thread_info = find_thread_pid (ptid); >> + gdb_assert (thread_info != NULL); >> + thread_info->private->dying = 1; >> } >> >> static void >> @@ -848,11 +873,7 @@ check_event (ptid_t ptid) >> { >> case TD_CREATE: >> >> - /* We may already know about this thread, for instance when the >> - user has issued the `info threads' command before the SIGTRAP >> - for hitting the thread creation breakpoint was reported. */ >> - if (!in_thread_list (ptid)) >> - attach_thread (ptid, msg.th_p, &ti, 1); >> + attach_thread (ptid, msg.th_p, &ti, 1); >> >> break; >> >> @@ -1121,8 +1142,7 @@ find_new_threads_callback (const td_thrh >> >> ptid = BUILD_THREAD (ti.ti_tid, GET_PID (inferior_ptid)); >> >> - if (!in_thread_list (ptid)) >> - attach_thread (ptid, th_p, &ti, 1); >> + attach_thread (ptid, th_p, &ti, 1); >> >> return 0; >> } >> >> > >