From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 22522 invoked by alias); 17 Sep 2011 14:58:55 -0000 Received: (qmail 22514 invoked by uid 22791); 17 Sep 2011 14:58:54 -0000 X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 tests=AWL,BAYES_00,TW_FC X-Spam-Check-By: sourceware.org Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sat, 17 Sep 2011 14:58:38 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=EU1-MAIL.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1R4wLl-0005ra-PF from pedro_alves@mentor.com ; Sat, 17 Sep 2011 07:58:37 -0700 Received: from scottsdale.localnet ([172.16.63.104]) by EU1-MAIL.mgc.mentorg.com with Microsoft SMTPSVC(6.0.3790.1830); Sat, 17 Sep 2011 15:58:35 +0100 From: Pedro Alves To: John Spencer Subject: Re: wrong assumptions about pthread_t being numeric Date: Sat, 17 Sep 2011 15:29:00 -0000 User-Agent: KMail/1.13.6 (Linux/2.6.38-11-generic; KDE/4.7.0; x86_64; ; ) Cc: gdb-patches@sourceware.org References: <4E73D06F.603@barfooze.de> <201109170130.42276.pedro@codesourcery.com> <4E73F1A4.2020606@barfooze.de> In-Reply-To: <4E73F1A4.2020606@barfooze.de> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201109171558.33536.pedro@codesourcery.com> X-IsSubscribed: yes Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2011-09/txt/msg00343.txt.bz2 On Saturday 17 September 2011 02:02:28, John Spencer wrote: > On 09/17/2011 02:30 AM, Pedro Alves wrote: > > On Saturday 17 September 2011 00:13:10, John Spencer wrote: > >> On 09/17/2011 01:00 AM, Pedro Alves wrote: > >>> These are only built natively on solaris and aix respectively, so > >>> let's just leave them alone. > >>> > >> I expected it to be desirable for a product in industrial use to be > >> standard-compliant and not invoking undefined behavior. > > Those files are tied to those platforms' thread_db/libc implementations. > > There's absolutely no need to handle some other hipotetical libc that > > defines pthread_t diferently there. If it appears, we'll handle it. > > > Chances are, some other changes would be necessary to make it really > > work, not just build. > > > > exactly. No, you're completely missing the point. The code in question is not _using_ posix threads, as a normal application would. It is interfacing with the system's libc to get at the thread support library's internals, and letting the debugger control and instead the state of threads. There's no portability requirement here. The major requirement is that the code works with the libc it is intended to be interfacing with. pthread_t on a given system is not _required_ to be an arithmetic type, but it certainly can. Sorry, I strongly object to changing the solaris or AIX thread support just because you've learned that on some other systems pthread_t may not be numeric. > for example in musl's case it is wrong to compare the > underlying type (which is a struct pointer) with 0. musl does not support solaris nor aix... > in the implementation specific part it would practically be valid to > compare it with > NULL, but that's still invalid regarding the definition in POSIX. The gdb code in question _is_ peeking at implementation specific parts! That's the point that's apparently gone missing. E.g., with such a program being debugged: pthread_t thread; res = pthread_create (&thread, NULL, entry_func, NULL); ... You can do: (gdb) p /x thread $2 = 0x7ffff7829700 And correlate that with the thread in gdb's thread list: (gdb) info threads 3 Thread 0x7ffff7028700 (LWP 29643) 0x00007ffff78d75ad in nanosleep () at ../sysdeps/unix/syscall-template.S:82 2 Thread 0x7ffff7829700 (LWP 29640) 0x00007ffff78d75ad in nanosleep () at ../sysdeps/unix/syscall-template.S:82 * 1 Thread 0x7ffff7fcd720 (LWP 29618) main () at threads.c:50 See, it's thread 2. If on some other platform pthread_t is struct pthread_t { int foo; int bar; }; Then we'll need to have specific code in gdb's libthread_db.so handling to print that. But on some other platform, it could be, struct pthread_t { long tid; int some_other_id_who_knows; }; and again, we'd need some code to be able to print this. So, you see, as we find platforms that need adapting to, we'll do it, but it doesn't make much sense to do it before stumbling on them. > so the few places that use a thread id had to be changed everywhere, to > not have dozens of special case > sections for any libc/OS combination... Again, let's not invent work until we find such a combination... There's no way you can know what such a libc/thread_db's pthread_t would look like until it exists. > also there are a few spots where the thread_t is numerically compared to > some arbitrary int value > (i have not yet evaluated where those originate from). Let us know when you find them... Maybe you're thinking of this in gdbserver/thread-db.c, where we have: /* If the new thread ID is zero, a final thread ID will be available later. Do not enable thread debugging yet. */ if (ti.ti_tid == 0) return 0; but again, this is an implementation detail of glibc that gdb needs to know about! There's no portable null pthread_t, and even if there was, there's no telling what other libc's do in this scenario. Again, there's _no_ way to make this portable, because we _are_ part of the implementation. Code doing the same in gdb, where the comments are a bit more clear: if (ti.ti_tid == 0) { /* A thread ID of zero means that this is the main thread, but glibc has not yet initialized thread-local storage and the pthread library. We do not know what the thread's TID will be yet. Just enable event reporting and otherwise ignore it. */ > this also can't > work when thread_t is no number. > this happens for example in the if statement leading to the posted > printf format string errors. > i expect that the number of actual fixes would be small, but it might > require another way to get to a thread id. > > if the id is only used for textual representation, it would be > sufficient if gdb would assign a new unique number > on each successful pthread_create. > if it is also used to find a thread internally, it had to have a list of > tid numbers -> pthread_t. Again, when we have a libc/thread_db that requires that, we'll do it. Until then, we'll keep the code simple, thank you. > > > >>>> thread-db.c: In function 'find_one_thread': > >>>> thread-db.c:295:7: error: format '%ld' expects type 'long int', but > >>>> argument 3 has type 'thread_t' > >>>> thread-db.c: In function 'attach_thread': > >>>> thread-db.c:335:7: error: format '%ld' expects type 'long int', but > >>>> argument 3 has type 'thread_t' > >>>> thread-db.c:341:9: error: format '%ld' expects type 'long int', but > >>>> argument 2 has type 'thread_t' > >>> So just cast it to long, and you're done. > >>> > >> pthread_t could legally be a struct, which you can't just cast to a long. > > No need to complicate things for an hipotetical scenario. The set of > > libc's in existence is finite. If we were to handle a struct pthread_t, > > we'd need to be able to print it, and so we'd need some libc specific > > way to do it, something autoconf'ed. There's no need to invent work. > > > i disagree. adding a proper solution once is superior to creating dozens > of special case hacks. What's the "proper" solution then? The only possible way I can think of to handle some libc what uses a struct for pthread_t, is with some way at runtime to know which libc/thread_db gdb is talking to, failing that, #ifdefs and/or autoconfig'ury. Since a simple cast will work for glibc, android, musl, and whatever other libc's happen be be building gdbserver today, what's the point? > also it saves a lot of time in the long term. No, it wastes time right now coming up with a fix for an made up problem that does not exist in practice today. -- Pedro Alves