From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 15997 invoked by alias); 19 Mar 2004 19:01:29 -0000 Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sources.redhat.com Received: (qmail 15968 invoked from network); 19 Mar 2004 19:01:28 -0000 Received: from unknown (HELO nevyn.them.org) (66.93.172.17) by sources.redhat.com with SMTP; 19 Mar 2004 19:01:28 -0000 Received: from drow by nevyn.them.org with local (Exim 4.30 #1 (Debian)) id 1B4PFG-0004Sz-QT; Fri, 19 Mar 2004 14:01:26 -0500 Date: Fri, 19 Mar 2004 19:01:00 -0000 From: Daniel Jacobowitz To: Jeff Johnston Cc: gdb-patches@sources.redhat.com Subject: Re: [RFC]: fix for recycled thread ids Message-ID: <20040319190126.GA16950@nevyn.them.org> Mail-Followup-To: Jeff Johnston , gdb-patches@sources.redhat.com References: <405A4089.1080605@redhat.com> <20040319015351.GA28443@nevyn.them.org> <405B3F83.4030503@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <405B3F83.4030503@redhat.com> User-Agent: Mutt/1.5.1i X-SW-Source: 2004-03/txt/msg00465.txt.bz2 On Fri, Mar 19, 2004 at 01:44:19PM -0500, Jeff Johnston wrote: > Daniel Jacobowitz wrote: > >On Thu, Mar 18, 2004 at 07:36:25PM -0500, Jeff Johnston wrote: > > > >>The following patch fixes a problem when a user application creates a > >>thread shortly after another thread has completed. For nptl, thread ids > >>are addresses. If a thread completes/dies, the tid is available for reuse > >>by a new thread. > > > > > >Does NPTL re-use the TID quickly, or cycle around the way LT did so > >that we only see this under high thread pressure? > > > > I can't say for sure as I don't maintain libthread_db. The test case in > question does create high thread pressure, but I think it would be a > mistake to generalize and think that this couldn't happen in an existing > application. I know you don't maintain NPTL, but this is the sort of question that we need to understand before we can fix the problem correctly. I see that you've attached a testcase, so I'll take a look at it when I get back from my trip on Monday. > >>On RH9 and RHEL3, nptl threads do not have exit events associated with > >>them. I have already discussed this with Daniel J. who feels that the > >>kernels are not doing the right thing, but regardless, the current and > >>previous RH nptl kernels are behaving this way and gdb needs to handle > >>it. As such, when a new thread is created, if it is reusing the tid of a > >>previous thread that gdb hasn't figured out isn't around any more, gdb > >>ignores the create event and the new thread is not added. Ignoring the > >>event is done because it is possible for gdb to find out about the thread > >>before it's creation event is reported and so the create event can be > >>redundant information. > > > > > >What I haven't seen a good explanation of is what problem this causes. > >If a thread goes away, and then a new thread using the same ID is > >created, and then we stop, what do we lose besides the cosmetic fact > >that there is no [New Thread] message? Does anything go wrong? > > > >Also, I would like the issue of whether or not it is a kernel bug > >resolved before we discuss working around it in GDB. > > > > The problem is if a global signal is passed on to the inferior program when > there are threads we have not attached to, the process terminates. A > Ctrl-C is such a signal. In the example program, we only attach to the > first 100 threads and when the Ctrl-C is issued, we get: > > ptrace: No such process. > thread_db_get_info: cannot get thread info: generic error > > The end-user is cooked. OK. So what you're saying is, the problem is that we do not see that the new thread has been created, so we do not attach to it. Is that right? Conceptually, we attach to LWPs, not to threads. That suggests to me that the correct fix is to ask the LWP layer if the LWP is attached rather than looking it up in the thread list in the first place. We've already got an appropriate list of LWPs though we might need a new accessor. > Regarding resolving this issue as a kernel error, any fix for RHEL3 won't > get shipped until Update 3. I know of no scheduled update for RH9 and this > would not qualify as a security update. That's not what I said - I don't care whether an update is published for any particular vendor's product. I want us to understand whether we are working around a kernel bug or fixing an actual bug in GDB. That's another part of the problem that we need to understand in order to fix it correctly. As the author of the kernel code in question, I think that it's a kernel bug. Roland seemed to agree. > Would it make sense to rename thread-db.c to lin-thread-db.c? Probably not, but some explanatory comments may be in order. -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer