From: John Spencer <maillist-gdbpatches@barfooze.de>
To: gdb-patches@sourceware.org
Cc: Pedro Alves <pedro@codesourcery.com>
Subject: Re: wrong assumptions about pthread_t being numeric
Date: Sat, 01 Oct 2011 01:02:00 -0000 [thread overview]
Message-ID: <4E8665ED.4070905@barfooze.de> (raw)
In-Reply-To: <201109171629.09383.pedro@codesourcery.com>
On 09/17/2011 05:29 PM, Pedro Alves wrote:
> On Saturday 17 September 2011 15:58:33, Pedro Alves wrote:
>> On Saturday 17 September 2011 02:02:28, John Spencer wrote:
>>> On 09/17/2011 02:30 AM, Pedro Alves wrote:
>>>> On Saturday 17 September 2011 00:13:10, John Spencer wrote:
>>>>> On 09/17/2011 01:00 AM, Pedro Alves wrote:
>>>>>> These are only built natively on solaris and aix respectively, so
>>>>>> let's just leave them alone.
>>>>>>
>>>>> I expected it to be desirable for a product in industrial use to be
>>>>> standard-compliant and not invoking undefined behavior.
>>>> Those files are tied to those platforms' thread_db/libc implementations.
>>>> There's absolutely no need to handle some other hipotetical libc that
>>>> defines pthread_t diferently there. If it appears, we'll handle it.
>>>> Chances are, some other changes would be necessary to make it really
>>>> work, not just build.
>>>>
>>> exactly.
>> No, you're completely missing the point. The code in question is
>> not _using_ posix threads, as a normal application would.
>> It is interfacing with the system's libc to get at the thread support
>> library's internals, and letting the debugger control and instead the state
>> of threads. There's no portability requirement here. The major requirement
>> is that the code works with the libc it is intended to be interfacing with.
>>
>> pthread_t on a given system is not _required_ to be an arithmetic type, but
>> it certainly can. Sorry, I strongly object to changing the solaris or AIX
>> thread support just because you've learned that on some other systems
>> pthread_t may not be numeric.
>>
>>> for example in musl's case it is wrong to compare the
>>> underlying type (which is a struct pointer) with 0.
>> musl does not support solaris nor aix...
>>
>>> in the implementation specific part it would practically be valid to
>>> compare it with
>>> NULL, but that's still invalid regarding the definition in POSIX.
>> The gdb code in question _is_ peeking at implementation specific
>> parts! That's the point that's apparently gone missing.
>>
>> E.g., with such a program being debugged:
>>
>> pthread_t thread;
>> res = pthread_create (&thread, NULL, entry_func, NULL);
>> ...
>>
>> You can do:
>>
>> (gdb) p /x thread
>> $2 = 0x7ffff7829700
>>
>> And correlate that with the thread in gdb's thread list:
>>
>> (gdb) info threads
>> 3 Thread 0x7ffff7028700 (LWP 29643) 0x00007ffff78d75ad in nanosleep () at ../sysdeps/unix/syscall-template.S:82
>> 2 Thread 0x7ffff7829700 (LWP 29640) 0x00007ffff78d75ad in nanosleep () at ../sysdeps/unix/syscall-template.S:82
>> * 1 Thread 0x7ffff7fcd720 (LWP 29618) main () at threads.c:50
>>
>> See, it's thread 2. If on some other platform pthread_t is
>>
>> struct pthread_t { int foo; int bar; };
>>
>> Then we'll need to have specific code in gdb's libthread_db.so handling
>> to print that. But on some other platform, it could be,
>>
>> struct pthread_t { long tid; int some_other_id_who_knows; };
>>
>> and again, we'd need some code to be able to print this.
>>
>> So, you see, as we find platforms that need adapting to, we'll
>> do it, but it doesn't make much sense to do it before stumbling
>> on them.
>>
>>> so the few places that use a thread id had to be changed everywhere, to
>>> not have dozens of special case
>>> sections for any libc/OS combination...
>> Again, let's not invent work until we find such a combination...
>> There's no way you can know what such a libc/thread_db's pthread_t
>> would look like until it exists.
>>
>>> also there are a few spots where the thread_t is numerically compared to
>>> some arbitrary int value
>>> (i have not yet evaluated where those originate from).
>> Let us know when you find them...
>>
>> Maybe you're thinking of this in gdbserver/thread-db.c, where we
>> have:
>>
>> /* If the new thread ID is zero, a final thread ID will be available
>> later. Do not enable thread debugging yet. */
>> if (ti.ti_tid == 0)
>> return 0;
>>
>> but again, this is an implementation detail of glibc that gdb needs
>> to know about! There's no portable null pthread_t, and even if
>> there was, there's no telling what other libc's do in this scenario.
>> Again, there's _no_ way to make this portable, because we _are_ part
>> of the implementation.
>>
>> Code doing the same in gdb, where the comments are a bit more clear:
>>
>> if (ti.ti_tid == 0)
>> {
>> /* A thread ID of zero means that this is the main thread, but
>> glibc has not yet initialized thread-local storage and the
>> pthread library. We do not know what the thread's TID will
>> be yet. Just enable event reporting and otherwise ignore
>> it. */
>>
>>> this also can't
>>> work when thread_t is no number.
>>> this happens for example in the if statement leading to the posted
>>> printf format string errors.
>>
>>> i expect that the number of actual fixes would be small, but it might
>>> require another way to get to a thread id.
>>>
>>> if the id is only used for textual representation, it would be
>>> sufficient if gdb would assign a new unique number
>>> on each successful pthread_create.
>>> if it is also used to find a thread internally, it had to have a list of
>>> tid numbers -> pthread_t.
>> Again, when we have a libc/thread_db that requires that, we'll do it.
>> Until then, we'll keep the code simple, thank you.
>>
>>>
>>>>>>> thread-db.c: In function 'find_one_thread':
>>>>>>> thread-db.c:295:7: error: format '%ld' expects type 'long int', but
>>>>>>> argument 3 has type 'thread_t'
>>>>>>> thread-db.c: In function 'attach_thread':
>>>>>>> thread-db.c:335:7: error: format '%ld' expects type 'long int', but
>>>>>>> argument 3 has type 'thread_t'
>>>>>>> thread-db.c:341:9: error: format '%ld' expects type 'long int', but
>>>>>>> argument 2 has type 'thread_t'
>>>>>> So just cast it to long, and you're done.
>>>>>>
>>>>> pthread_t could legally be a struct, which you can't just cast to a long.
>>>> No need to complicate things for an hipotetical scenario. The set of
>>>> libc's in existence is finite. If we were to handle a struct pthread_t,
>>>> we'd need to be able to print it, and so we'd need some libc specific
>>>> way to do it, something autoconf'ed. There's no need to invent work.
>>>>
>>> i disagree. adding a proper solution once is superior to creating dozens
>>> of special case hacks.
>> What's the "proper" solution then? The only possible way I can think
>> of to handle some libc what uses a struct for pthread_t, is with some way at
>> runtime to know which libc/thread_db gdb is talking to, failing that,
>> #ifdefs and/or autoconfig'ury. Since a simple cast will
>> work for glibc, android, musl, and whatever other libc's happen be
>> be building gdbserver today, what's the point?
> I'll even go a step further, and suggest you do expose pthread_t
> as a typedef to long in public headers, and cast it to pointer internally
> in musl, just like glibc and most other libc's do. Hiding the fact
> that you implement things with a pointer could even be a good thing.
>
> It's just easier for everybody else that way. There's really no point in
> being different for the sake of it.
>
that's definitely going to far, you can't dictate how libcs should
handle an opaque type internally just to keep broken code working.
since it seems to much effort to change it to a proper solution, there
should be at least a explicit function/macro which takes a thread_t and
converts it to long, since it is assumed in a couple of spots that it is
of this type.
that is exactly what my patch does.
and as you wished, it fixes the current issue with minimal effort.
>>> also it saves a lot of time in the long term.
>> No, it wastes time right now coming up with a fix for an made up
>> problem that does not exist in practice today.
>>
>>
next prev parent reply other threads:[~2011-10-01 1:02 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-09-16 23:01 John Spencer
2011-09-16 23:16 ` Pedro Alves
2011-09-17 0:31 ` John Spencer
2011-09-17 1:05 ` Pedro Alves
2011-09-17 1:23 ` John Spencer
2011-09-17 2:29 ` Matt Rice
2011-09-17 6:47 ` Joel Brobecker
2011-09-18 0:11 ` John Spencer
2011-09-29 2:31 ` John Spencer
2011-09-29 8:39 ` Kai Tietz
2011-10-01 1:08 ` John Spencer
2011-09-29 11:10 ` Pedro Alves
2011-09-17 15:29 ` Pedro Alves
2011-09-17 15:47 ` Pedro Alves
2011-10-01 1:02 ` John Spencer [this message]
2011-10-01 2:00 ` Pedro Alves
2011-10-01 9:00 ` Mark Kettenis
2011-10-01 9:14 ` Kevin Pouget
2011-10-05 8:02 ` John Spencer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4E8665ED.4070905@barfooze.de \
--to=maillist-gdbpatches@barfooze.de \
--cc=gdb-patches@sourceware.org \
--cc=pedro@codesourcery.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox