Re: [PATCH][gdb/tdep] Use pid to choose x86_64 process 64/32-bitness

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

From: Simon Marchi via Gdb-patches <gdb-patches@sourceware.org>
To: Tom de Vries <tdevries@suse.de>, gdb-patches@sourceware.org
Subject: Re: [PATCH][gdb/tdep] Use pid to choose x86_64 process 64/32-bitness
Date: Fri, 7 May 2021 15:27:12 -0400	[thread overview]
Message-ID: <b619347f-62a9-bccf-193a-4855acc22daf@polymtl.ca> (raw)
In-Reply-To: <20210507084402.GA14817@delia>

On 2021-05-07 4:44 a.m., Tom de Vries wrote:
> Hi,
> 
> In a linux kernel mailing list discussion, it was mentioned that "gdb has
> this odd thing where it takes the 64-bit vs 32-bit data for the whole process
> from one thread, and picks the worst possible thread to do it (ie explicitly
> not even the main thread, ...)" [1].
> 
> The picking of the thread is done here in
> x86_linux_nat_target::read_description:
> ...
>   /* GNU/Linux LWP ID's are process ID's.  */
>   tid = inferior_ptid.lwp ();
>   if (tid == 0)
>     tid = inferior_ptid.pid (); /* Not a threaded program.  */
> ...
> 
> To understand what this code does, let's investigate a scenario in which
> inferior_ptid.lwp () != inferior_ptid.pid ().
> 
> Say we start exec jit-attach-pie, identified with pid x.  The main thread
> starts another thread that sleeps, and then the main thread waits for the
> sleeping thread.  So we have two threads, identified with LWP IDs x and x+1:
> ...
> PID  LWP  CMD
> x    x    ./jit-attach-pie
> x    x+1  ./jit-attach-pie
> ...
> [ The thread with LWP x is known as the thread group leader. ]
> 
> When attaching to this exec using the pid, gdb does a stop_all_threads which
> iterates over all the threads, first LWP x, and then LWP x+1.
> 
> So the state we arrive with at x86_linux_nat_target::read_description is:
> ...
> (gdb) p inferior_ptid
> $1 = {m_pid = x, m_lwp = x+1, m_tid = 0}
> ...
> and consequently we probe 64/32-bitness from thread LWP x+1.
> 
> [ Note that this is different from when gdb doesn't attach but instead
> launches the exec itself, in which case there's no stop_all_threads needed,
> and the probed thread is LWP x. ]

Well, it's mostly because when running, there's only one thread to begin
with, and the io_uring threads are likely not there yet.

> According to aforementioned remark, a better choice would have been the main
> thread, that is, LWP x.

That doesn't tell the whole story though.  Up to now, probing the
non-main thread was maybe a slightly strange thing to do, but it worked
because all threads were set up the same way for that matter.  It worked
under the assumption that all threads had the same CS/DS contents (the
registers we use to detect architecture).  The failure happens when
there are io_uring threads present.  These threads are of a strange kind
because they are like any userspace threads, except that they never
return to userspace.  Because of that, they don't have a meaningful
userspace register state, reading their registers returns garbage.  So
if we choose one of these threads for probing the architecture of the
process (which, as you mentioned below, is bogus, but that's how it
works now), then we get bad results.

From what I read (from the lkml thread you linked), the kernel's plan is
to mitigate it by filling the register state of these threads to
something GDB is happy with, even though that's unnecessary otherwise.
I don't know what's the state of that though.  But I agree it would be
good to fix this on our side as well.

> 
> This patch implement that choice, by simply doing:
> ...
>   tid = inferior_ptid.pid ();

I think that's ok.

> ...
> 
> The fact that gdb makes a per-process choice for 64/32-bitness is a problem in
> itself: each thread can be in either 64 or 32 bit mode.  That is a problem
> that this patch doesn't fix.

Not only that, but from what I understood, it's not right to decide once
and for all whether a thread is 32 or 64.  It could switch between the
two modes.  So all we can really know is what mode a thread is
currently, while it is stopped.  But who knows what mode it's going to
be in 5 instructions from now.

This actually is a problem to people who debug early boot code with
QEMU, because there the "thread" goes from 32 to 64 bit mode at some
point.  To handle this, it sounds like GDB should re-evaluate the
architecture of each thread every time it stops.  That sounds expensive,
but maybe this mode could exist behind a knob that is disabled by
default, for these special cases.

There might also be features that can only work if we assume the
architecture of a thread never changes, I'm not sure.

Also, we already have kind of partial machinery for having threads with
different architectures, that's why we have
target_ops::thread_architecture.  But read_description is used to set
inferior::gdbarch, which is kind of the "default" arch of the process,
used for many things.  And the default implementation of
thread_architecture, process_stratum_target::thread_architecture, just
returns inf->gdbarch.  So in practice, today, for Linux it's true that
we have one single architecture for the whole inferior.  The exception
is aarch64-linux, which can return different architectures for different
SVE configurations.

> Tested on x86_64-linux.
> 
> [1] https://lore.kernel.org/io-uring/\
>   CAHk-=wh0KoEZXPYMGkfkeVEerSCEF1AiCZSvz9TRrx=Kj74D+Q@mail.gmail.com/

Hmm, the link is not clickable / does not lead to the right place,
because it is split in two.

> 
> Any comments?

1. I think it would be good to have a test for this.  I wanted to actually
try making (or finding) a program that uses io_uring and see the problem
with my own eyes, but I didn't have time yet.  But there could be such a
test in our testsuite (we would check that the kernel actually supports
io_uring, of course).  We could validate that when we select one of the
io threads and do things like backtrace, the failure is handled
gracefully.

2. There are other architectures than x86-64 where it looks like we would
also probe a non-main thread, like ARM:

    https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/arm-linux-nat.c;h=662dade0a122f8adf25da91577b1de375df7785b;hb=HEAD

It might not fail like it does on x86-64, it depends on the kernel
implementation.  But we should probably change them the same way.

3. Have you tested attaching to a program where the main thread has
exited?  It might still work (and there might be a test for it in our
testsuite) because the thread group leader stays around as a zombie as
long as other threads exist.  But it would be good to test it.

Simon

next prev parent reply	other threads:[~2021-05-07 19:27 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-07  8:44 Tom de Vries
2021-05-07 19:27 ` Simon Marchi via Gdb-patches [this message]
2021-05-07 20:06   ` Luis Machado via Gdb-patches
2021-05-19 16:32   ` Tom de Vries
2021-05-20 10:42     ` Alan Hayward via Gdb-patches
2021-05-20 16:07       ` Simon Marchi via Gdb-patches
2021-05-21 10:50         ` Alan Hayward via Gdb-patches
2021-05-22  8:32           ` Tom de Vries
2021-05-22  9:56             ` Tom de Vries
2021-05-23  1:27               ` Simon Marchi via Gdb-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b619347f-62a9-bccf-193a-4855acc22daf@polymtl.ca \
    --to=gdb-patches@sourceware.org \
    --cc=simon.marchi@polymtl.ca \
    --cc=tdevries@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox