Daniel Jacobowitz wrote: > On Thu, Mar 18, 2004 at 07:36:25PM -0500, Jeff Johnston wrote: > >>The following patch fixes a problem when a user application creates a >>thread shortly after another thread has completed. For nptl, thread ids >>are addresses. If a thread completes/dies, the tid is available for reuse >> by a new thread. > > > Does NPTL re-use the TID quickly, or cycle around the way LT did so > that we only see this under high thread pressure? > I can't say for sure as I don't maintain libthread_db. The test case in question does create high thread pressure, but I think it would be a mistake to generalize and think that this couldn't happen in an existing application. > >>On RH9 and RHEL3, nptl threads do not have exit events associated with >>them. I have already discussed this with Daniel J. who feels that the >>kernels are not doing the right thing, but regardless, the current and >>previous RH nptl kernels are behaving this way and gdb needs to handle it. >>As such, when a new thread is created, if it is reusing the tid of a >>previous thread that gdb hasn't figured out isn't around any more, gdb >>ignores the create event and the new thread is not added. Ignoring the >>event is done because it is possible for gdb to find out about the thread >>before it's creation event is reported and so the create event can be >>redundant information. > > > What I haven't seen a good explanation of is what problem this causes. > If a thread goes away, and then a new thread using the same ID is > created, and then we stop, what do we lose besides the cosmetic fact > that there is no [New Thread] message? Does anything go wrong? > > Also, I would like the issue of whether or not it is a kernel bug > resolved before we discuss working around it in GDB. > The problem is if a global signal is passed on to the inferior program when there are threads we have not attached to, the process terminates. A Ctrl-C is such a signal. In the example program, we only attach to the first 100 threads and when the Ctrl-C is issued, we get: ptrace: No such process. thread_db_get_info: cannot get thread info: generic error The end-user is cooked. Regarding resolving this issue as a kernel error, any fix for RHEL3 won't get shipped until Update 3. I know of no scheduled update for RH9 and this would not qualify as a security update. If I read your subsequent comments right, you wouldn't have a problem with me just making the lwp compare directly in the thread-db.c code. This would mean the change would be all of 6 lines of code. Does it make a lot of sense to withhold this change in light of the time-frame a kernel fix could be issued and the simplicity of the change itself? > >>The following fix removes the problem by also checking if the original lwp >>given to the old thread in the list matches the lwp for the new thread. >>IIRC, there is at least one platform that allows threads to change their >>lwps dynamically. I do not believe that platform uses the thread-db layer, >>but to be safe, I did not use a direct compare between the original lwp and >>the new lwp. Instead, I added a target vector routine that by default >>never returns an unequal comparison. For linux, the comparison routine >>points to a new routine in lin-lwp.c which does the expected comparison. >>The comparison is based on C compare routines whereby 0 means equal and >>non-zero means unequal. > > > At least one such "platform" was NGPT, which runs on Linux (and worked > OK with thread-db.c, too). It's obsolete nowadays. > > >>Now, if a thread id for a new thread is found to already be on the list, >>the target comparison is made between the original lwp stored for the old >>thread and the lwp for the new thread. If the comparison is unequal, the >>old thread is deleted and then the new thread is added. Otherwise, the new >>thread event is ignored as before. > > > If we aren't going to support migration, then why not just store the > LWP ID in the thread PTID structure? That's why it has three fields, > I'd guess. > > I don't think that the target hook makes much sense, since thread-db.c > is basically Linux-specific at this point and since the contents being > passed to the hook are thread-db-specific. > Would it make sense to rename thread-db.c to lin-thread-db.c? -- Jeff J.