From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 32437 invoked by alias); 17 Sep 2011 15:29:38 -0000 Received: (qmail 32420 invoked by uid 22791); 17 Sep 2011 15:29:37 -0000 X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 tests=AWL,BAYES_00,TW_FC X-Spam-Check-By: sourceware.org Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sat, 17 Sep 2011 15:29:14 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=EU1-MAIL.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1R4wpN-0007Bi-M2 from pedro_alves@mentor.com ; Sat, 17 Sep 2011 08:29:14 -0700 Received: from scottsdale.localnet ([172.16.63.104]) by EU1-MAIL.mgc.mentorg.com with Microsoft SMTPSVC(6.0.3790.1830); Sat, 17 Sep 2011 16:29:10 +0100 From: Pedro Alves To: gdb-patches@sourceware.org Subject: Re: wrong assumptions about pthread_t being numeric Date: Sat, 17 Sep 2011 15:47:00 -0000 User-Agent: KMail/1.13.6 (Linux/2.6.38-11-generic; KDE/4.7.0; x86_64; ; ) Cc: John Spencer References: <4E73D06F.603@barfooze.de> <4E73F1A4.2020606@barfooze.de> <201109171558.33536.pedro@codesourcery.com> In-Reply-To: <201109171558.33536.pedro@codesourcery.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201109171629.09383.pedro@codesourcery.com> X-IsSubscribed: yes Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2011-09/txt/msg00344.txt.bz2 On Saturday 17 September 2011 15:58:33, Pedro Alves wrote: > On Saturday 17 September 2011 02:02:28, John Spencer wrote: > > On 09/17/2011 02:30 AM, Pedro Alves wrote: > > > On Saturday 17 September 2011 00:13:10, John Spencer wrote: > > >> On 09/17/2011 01:00 AM, Pedro Alves wrote: > > >>> These are only built natively on solaris and aix respectively, so > > >>> let's just leave them alone. > > >>> > > >> I expected it to be desirable for a product in industrial use to be > > >> standard-compliant and not invoking undefined behavior. > > > Those files are tied to those platforms' thread_db/libc implementations. > > > There's absolutely no need to handle some other hipotetical libc that > > > defines pthread_t diferently there. If it appears, we'll handle it. > > > > > Chances are, some other changes would be necessary to make it really > > > work, not just build. > > > > > > > exactly. > > No, you're completely missing the point. The code in question is > not _using_ posix threads, as a normal application would. > It is interfacing with the system's libc to get at the thread support > library's internals, and letting the debugger control and instead the state > of threads. There's no portability requirement here. The major requirement > is that the code works with the libc it is intended to be interfacing with. > > pthread_t on a given system is not _required_ to be an arithmetic type, but > it certainly can. Sorry, I strongly object to changing the solaris or AIX > thread support just because you've learned that on some other systems > pthread_t may not be numeric. > > > for example in musl's case it is wrong to compare the > > underlying type (which is a struct pointer) with 0. > > musl does not support solaris nor aix... > > > in the implementation specific part it would practically be valid to > > compare it with > > NULL, but that's still invalid regarding the definition in POSIX. > > The gdb code in question _is_ peeking at implementation specific > parts! That's the point that's apparently gone missing. > > E.g., with such a program being debugged: > > pthread_t thread; > res = pthread_create (&thread, NULL, entry_func, NULL); > ... > > You can do: > > (gdb) p /x thread > $2 = 0x7ffff7829700 > > And correlate that with the thread in gdb's thread list: > > (gdb) info threads > 3 Thread 0x7ffff7028700 (LWP 29643) 0x00007ffff78d75ad in nanosleep () at ../sysdeps/unix/syscall-template.S:82 > 2 Thread 0x7ffff7829700 (LWP 29640) 0x00007ffff78d75ad in nanosleep () at ../sysdeps/unix/syscall-template.S:82 > * 1 Thread 0x7ffff7fcd720 (LWP 29618) main () at threads.c:50 > > See, it's thread 2. If on some other platform pthread_t is > > struct pthread_t { int foo; int bar; }; > > Then we'll need to have specific code in gdb's libthread_db.so handling > to print that. But on some other platform, it could be, > > struct pthread_t { long tid; int some_other_id_who_knows; }; > > and again, we'd need some code to be able to print this. > > So, you see, as we find platforms that need adapting to, we'll > do it, but it doesn't make much sense to do it before stumbling > on them. > > > so the few places that use a thread id had to be changed everywhere, to > > not have dozens of special case > > sections for any libc/OS combination... > > Again, let's not invent work until we find such a combination... > There's no way you can know what such a libc/thread_db's pthread_t > would look like until it exists. > > > also there are a few spots where the thread_t is numerically compared to > > some arbitrary int value > > (i have not yet evaluated where those originate from). > > Let us know when you find them... > > Maybe you're thinking of this in gdbserver/thread-db.c, where we > have: > > /* If the new thread ID is zero, a final thread ID will be available > later. Do not enable thread debugging yet. */ > if (ti.ti_tid == 0) > return 0; > > but again, this is an implementation detail of glibc that gdb needs > to know about! There's no portable null pthread_t, and even if > there was, there's no telling what other libc's do in this scenario. > Again, there's _no_ way to make this portable, because we _are_ part > of the implementation. > > Code doing the same in gdb, where the comments are a bit more clear: > > if (ti.ti_tid == 0) > { > /* A thread ID of zero means that this is the main thread, but > glibc has not yet initialized thread-local storage and the > pthread library. We do not know what the thread's TID will > be yet. Just enable event reporting and otherwise ignore > it. */ > > > this also can't > > work when thread_t is no number. > > this happens for example in the if statement leading to the posted > > printf format string errors. > > > > i expect that the number of actual fixes would be small, but it might > > require another way to get to a thread id. > > > > if the id is only used for textual representation, it would be > > sufficient if gdb would assign a new unique number > > on each successful pthread_create. > > if it is also used to find a thread internally, it had to have a list of > > tid numbers -> pthread_t. > > Again, when we have a libc/thread_db that requires that, we'll do it. > Until then, we'll keep the code simple, thank you. > > > > > > > >>>> thread-db.c: In function 'find_one_thread': > > >>>> thread-db.c:295:7: error: format '%ld' expects type 'long int', but > > >>>> argument 3 has type 'thread_t' > > >>>> thread-db.c: In function 'attach_thread': > > >>>> thread-db.c:335:7: error: format '%ld' expects type 'long int', but > > >>>> argument 3 has type 'thread_t' > > >>>> thread-db.c:341:9: error: format '%ld' expects type 'long int', but > > >>>> argument 2 has type 'thread_t' > > >>> So just cast it to long, and you're done. > > >>> > > >> pthread_t could legally be a struct, which you can't just cast to a long. > > > No need to complicate things for an hipotetical scenario. The set of > > > libc's in existence is finite. If we were to handle a struct pthread_t, > > > we'd need to be able to print it, and so we'd need some libc specific > > > way to do it, something autoconf'ed. There's no need to invent work. > > > > > i disagree. adding a proper solution once is superior to creating dozens > > of special case hacks. > > What's the "proper" solution then? The only possible way I can think > of to handle some libc what uses a struct for pthread_t, is with some way at > runtime to know which libc/thread_db gdb is talking to, failing that, > #ifdefs and/or autoconfig'ury. Since a simple cast will > work for glibc, android, musl, and whatever other libc's happen be > be building gdbserver today, what's the point? I'll even go a step further, and suggest you do expose pthread_t as a typedef to long in public headers, and cast it to pointer internally in musl, just like glibc and most other libc's do. Hiding the fact that you implement things with a pointer could even be a good thing. It's just easier for everybody else that way. There's really no point in being different for the sake of it. > > > also it saves a lot of time in the long term. > > No, it wastes time right now coming up with a fix for an made up > problem that does not exist in practice today. > > -- Pedro Alves