From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 564 invoked by alias); 19 Mar 2004 19:35:42 -0000 Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sources.redhat.com Received: (qmail 551 invoked from network); 19 Mar 2004 19:35:41 -0000 Received: from unknown (HELO touchme.toronto.redhat.com) (216.129.200.20) by sources.redhat.com with SMTP; 19 Mar 2004 19:35:41 -0000 Received: from redhat.com (toocool.toronto.redhat.com [172.16.14.72]) by touchme.toronto.redhat.com (Postfix) with ESMTP id 947278000DA; Fri, 19 Mar 2004 14:35:39 -0500 (EST) Message-ID: <405B4B8C.2060801@redhat.com> Date: Fri, 19 Mar 2004 19:35:00 -0000 From: Jeff Johnston User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 MIME-Version: 1.0 To: Daniel Jacobowitz Cc: gdb-patches@sources.redhat.com Subject: Re: [RFC]: fix for recycled thread ids References: <405A4089.1080605@redhat.com> <20040319015351.GA28443@nevyn.them.org> <405B3F83.4030503@redhat.com> <20040319190126.GA16950@nevyn.them.org> In-Reply-To: <20040319190126.GA16950@nevyn.them.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-SW-Source: 2004-03/txt/msg00467.txt.bz2 Daniel Jacobowitz wrote: > On Fri, Mar 19, 2004 at 01:44:19PM -0500, Jeff Johnston wrote: > >>Daniel Jacobowitz wrote: >> >>>On Thu, Mar 18, 2004 at 07:36:25PM -0500, Jeff Johnston wrote: >>> >>> >>>>The following patch fixes a problem when a user application creates a >>>>thread shortly after another thread has completed. For nptl, thread ids >>>>are addresses. If a thread completes/dies, the tid is available for reuse >>>>by a new thread. >>> >>> >>>Does NPTL re-use the TID quickly, or cycle around the way LT did so >>>that we only see this under high thread pressure? >>> >> >>I can't say for sure as I don't maintain libthread_db. The test case in >>question does create high thread pressure, but I think it would be a >>mistake to generalize and think that this couldn't happen in an existing >>application. > > > I know you don't maintain NPTL, but this is the sort of question that > we need to understand before we can fix the problem correctly. I see > that you've attached a testcase, so I'll take a look at it when I get > back from my trip on Monday. > Ok, thanks. > >>>>On RH9 and RHEL3, nptl threads do not have exit events associated with >>>>them. I have already discussed this with Daniel J. who feels that the >>>>kernels are not doing the right thing, but regardless, the current and >>>>previous RH nptl kernels are behaving this way and gdb needs to handle >>>>it. As such, when a new thread is created, if it is reusing the tid of a >>>>previous thread that gdb hasn't figured out isn't around any more, gdb >>>>ignores the create event and the new thread is not added. Ignoring the >>>>event is done because it is possible for gdb to find out about the thread >>>>before it's creation event is reported and so the create event can be >>>>redundant information. >>> >>> >>>What I haven't seen a good explanation of is what problem this causes. >>>If a thread goes away, and then a new thread using the same ID is >>>created, and then we stop, what do we lose besides the cosmetic fact >>>that there is no [New Thread] message? Does anything go wrong? >>> >>>Also, I would like the issue of whether or not it is a kernel bug >>>resolved before we discuss working around it in GDB. >>> >> >>The problem is if a global signal is passed on to the inferior program when >>there are threads we have not attached to, the process terminates. A >>Ctrl-C is such a signal. In the example program, we only attach to the >>first 100 threads and when the Ctrl-C is issued, we get: >> >>ptrace: No such process. >>thread_db_get_info: cannot get thread info: generic error >> >>The end-user is cooked. > > > OK. So what you're saying is, the problem is that we do not see that > the new thread has been created, so we do not attach to it. Is that > right? > Yes. > Conceptually, we attach to LWPs, not to threads. That suggests to me > that the correct fix is to ask the LWP layer if the LWP is attached > rather than looking it up in the thread list in the first place. > We've already got an appropriate list of LWPs though we might need a > new accessor. > I like that idea. We still have to deal with the bogus thread list entry. The routine prune_threads calls thread_db_alive and it won't realize the thread info it has is bogus because it will find the tid is valid. > >>Regarding resolving this issue as a kernel error, any fix for RHEL3 won't >>get shipped until Update 3. I know of no scheduled update for RH9 and this >>would not qualify as a security update. > > > That's not what I said - I don't care whether an update is published > for any particular vendor's product. I want us to understand whether > we are working around a kernel bug or fixing an actual bug in GDB. > That's another part of the problem that we need to understand in order > to fix it correctly. > > As the author of the kernel code in question, I think that it's a > kernel bug. Roland seemed to agree. > > >>Would it make sense to rename thread-db.c to lin-thread-db.c? > > > Probably not, but some explanatory comments may be in order. >