Re: [PATCH][gdb/tdep] Use pid to choose x86_64 process 64/32-bitness

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

From: Tom de Vries <tdevries@suse.de>
To: Simon Marchi <simon.marchi@polymtl.ca>, gdb-patches@sourceware.org
Cc: Andreas Arnez <arnez@linux.ibm.com>
Subject: Re: [PATCH][gdb/tdep] Use pid to choose x86_64 process 64/32-bitness
Date: Wed, 19 May 2021 18:32:40 +0200	[thread overview]
Message-ID: <e1f3b82e-db5a-56bc-01f8-58724105a954@suse.de> (raw)
In-Reply-To: <b619347f-62a9-bccf-193a-4855acc22daf@polymtl.ca>

[-- Attachment #1: Type: text/plain, Size: 7567 bytes --]

On 5/7/21 9:27 PM, Simon Marchi wrote:
> On 2021-05-07 4:44 a.m., Tom de Vries wrote:
>> Hi,
>>
>> In a linux kernel mailing list discussion, it was mentioned that "gdb has
>> this odd thing where it takes the 64-bit vs 32-bit data for the whole process
>> from one thread, and picks the worst possible thread to do it (ie explicitly
>> not even the main thread, ...)" [1].
>>
>> The picking of the thread is done here in
>> x86_linux_nat_target::read_description:
>> ...
>>   /* GNU/Linux LWP ID's are process ID's.  */
>>   tid = inferior_ptid.lwp ();
>>   if (tid == 0)
>>     tid = inferior_ptid.pid (); /* Not a threaded program.  */
>> ...
>>
>> To understand what this code does, let's investigate a scenario in which
>> inferior_ptid.lwp () != inferior_ptid.pid ().
>>
>> Say we start exec jit-attach-pie, identified with pid x.  The main thread
>> starts another thread that sleeps, and then the main thread waits for the
>> sleeping thread.  So we have two threads, identified with LWP IDs x and x+1:
>> ...
>> PID  LWP  CMD
>> x    x    ./jit-attach-pie
>> x    x+1  ./jit-attach-pie
>> ...
>> [ The thread with LWP x is known as the thread group leader. ]
>>
>> When attaching to this exec using the pid, gdb does a stop_all_threads 
which
>> iterates over all the threads, first LWP x, and then LWP x+1.
>>
>> So the state we arrive with at x86_linux_nat_target::read_description is:
>> ...
>> (gdb) p inferior_ptid
>> $1 = {m_pid = x, m_lwp = x+1, m_tid = 0}
>> ...
>> and consequently we probe 64/32-bitness from thread LWP x+1.
>>
>> [ Note that this is different from when gdb doesn't attach but instead
>> launches the exec itself, in which case there's no stop_all_threads needed,
>> and the probed thread is LWP x. ]
> 
> Well, it's mostly because when running, there's only one thread to begin
> with,

Correct, I've updated the comment.

> and the io_uring threads are likely not there yet.
> 

I'm trying to discuss things here first relative to the current
situation, so without the kernel patch that makes io_uring threads
user-visible.

>> According to aforementioned remark, a better choice would have been the main
>> thread, that is, LWP x.
> 
> That doesn't tell the whole story though.

Ack, updated log message.

> Up to now, probing the
> non-main thread was maybe a slightly strange thing to do, but it worked
> because all threads were set up the same way for that matter.  It worked
> under the assumption that all threads had the same CS/DS contents (the
> registers we use to detect architecture).  The failure happens when
> there are io_uring threads present.  These threads are of a strange kind
> because they are like any userspace threads, except that they never
> return to userspace.  Because of that, they don't have a meaningful
> userspace register state, reading their registers returns garbage.  So
> if we choose one of these threads for probing the architecture of the
> process (which, as you mentioned below, is bogus, but that's how it
> works now), then we get bad results.
> 
> From what I read (from the lkml thread you linked), the kernel's plan is
> to mitigate it by filling the register state of these threads to
> something GDB is happy with, even though that's unnecessary otherwise.
> I don't know what's the state of that though.  But I agree it would be
> good to fix this on our side as well.
> 
>>
>> This patch implement that choice, by simply doing:
>> ...
>>   tid = inferior_ptid.pid ();
> 
> I think that's ok.
> 
>> ...
>>
>> The fact that gdb makes a per-process choice for 64/32-bitness is a problem in
>> itself: each thread can be in either 64 or 32 bit mode.  That is a problem
>> that this patch doesn't fix.
> 
> Not only that, but from what I understood, it's not right to decide once
> and for all whether a thread is 32 or 64.  It could switch between the
> two modes.  So all we can really know is what mode a thread is
> currently, while it is stopped.  But who knows what mode it's going to
> be in 5 instructions from now.
> 

Ack, I've updated the message to make that more clear.

> This actually is a problem to people who debug early boot code with
> QEMU, because there the "thread" goes from 32 to 64 bit mode at some
> point.  To handle this, it sounds like GDB should re-evaluate the
> architecture of each thread every time it stops.  That sounds expensive,
> but maybe this mode could exist behind a knob that is disabled by
> default, for these special cases.
> 
> There might also be features that can only work if we assume the
> architecture of a thread never changes, I'm not sure.
> 

It would already be good if users could tell gdb that things have
switched, currently not even that is possible, see PR27838.

> Also, we already have kind of partial machinery for having threads with
> different architectures, that's why we have
> target_ops::thread_architecture.  But read_description is used to set
> inferior::gdbarch, which is kind of the "default" arch of the process,
> used for many things.  And the default implementation of
> thread_architecture, process_stratum_target::thread_architecture, just
> returns inf->gdbarch.  So in practice, today, for Linux it's true that
> we have one single architecture for the whole inferior.  The exception
> is aarch64-linux, which can return different architectures for different
> SVE configurations.
> 
>> Tested on x86_64-linux.
>>
>> [1] https://lore.kernel.org/io-uring/\
>>   CAHk-=wh0KoEZXPYMGkfkeVEerSCEF1AiCZSvz9TRrx=Kj74D+Q@mail.gmail.com/
> 
> Hmm, the link is not clickable / does not lead to the right place,
> because it is split in two.
> 

Yeah, I don't like lines overrunning, but OK, fixed.

>>
>> Any comments?
> 
> 1. I think it would be good to have a test for this.  I wanted to actually
> try making (or finding) a program that uses io_uring and see the problem
> with my own eyes, but I didn't have time yet.  But there could be such a
> test in our testsuite (we would check that the kernel actually supports
> io_uring, of course).  We could validate that when we select one of the
> io threads and do things like backtrace, the failure is handled
> gracefully.
> 

Yeah, that sounds like something I could try once the kernel exposing
io_uring threads to user-space lands in tumbleweed.

> 2. There are other architectures than x86-64 where it looks like we would
> also probe a non-main thread, like ARM:
> 
>     https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/arm-linux-nat.c;h=662dade0a122f8adf25da91577b1de375df7785b;hb=HEAD
> 
> It might not fail like it does on x86-64, it depends on the kernel
> implementation.  But we should probably change them the same way.
> 

Ack, I've scanned the sources of other ports, and added a few more
(cc-ing port maintainers).

> 3. Have you tested attaching to a program where the main thread has
> exited?  It might still work (and there might be a test for it in our
> testsuite) because the thread group leader stays around as a zombie as
> long as other threads exist.  But it would be good to test it.

Good question.  I've tried and got both with system gdb and patched
upstream gdb the same result:
...
$ gdb -q -p $(pidof a.out)
Attaching to process 28988
warning: process 28988 is a zombie - the process has already terminated
ptrace: Operation not permitted.
(gdb)
$ ~/gdb_versions/devel/gdb -q -p $(pidof a.out)
Attaching to process 28988
warning: process 28988 is a zombie - the process has already terminated
ptrace: Operation not permitted.
(gdb) q
...

Updated patch attached.  Any further comments?

Thanks,
- Tom

[-- Attachment #2: 0005-gdb-tdep-Use-pid-to-choose-process-64-32-bitness.patch --]
[-- Type: text/x-patch, Size: 6025 bytes --]

[gdb/tdep] Use pid to choose process 64/32-bitness

In a linux kernel mailing list discussion, it was mentioned that "gdb has
this odd thing where it takes the 64-bit vs 32-bit data for the whole process
from one thread, and picks the worst possible thread to do it (ie explicitly
not even the main thread, ...)" [1].

The picking of the thread is done here in
x86_linux_nat_target::read_description:
...
  /* GNU/Linux LWP ID's are process ID's.  */
  tid = inferior_ptid.lwp ();
  if (tid == 0)
    tid = inferior_ptid.pid (); /* Not a threaded program.  */
...

To understand what this code does, let's investigate a scenario in which
inferior_ptid.lwp () != inferior_ptid.pid ().

Say we start exec jit-attach-pie, identified with pid x.  The main thread
starts another thread that sleeps, and then the main thread waits for the
sleeping thread.  So we have two threads, identified with LWP IDs x and x+1:
...
PID  LWP  CMD
x    x    ./jit-attach-pie
x    x+1  ./jit-attach-pie
...
[ The thread with LWP x is known as the thread group leader. ]

When attaching to this exec using the pid, gdb does a stop_all_threads which
iterates over all the threads, first LWP x, and then LWP x+1.

So the state we arrive with at x86_linux_nat_target::read_description is:
...
(gdb) p inferior_ptid
$1 = {m_pid = x, m_lwp = x+1, m_tid = 0}
...
and consequently we probe 64/32-bitness from thread LWP x+1.

[ Note that this is different from when gdb doesn't attach but instead
launches the exec itself, in which case there's just one thread to begin with,
and consequently the probed thread is LWP x. ]

According to aforementioned remark, a better choice would have been the main
thread, that is, LWP x.

This patch implement that choice, by simply doing:
...
  tid = inferior_ptid.pid ();
...

The fact that gdb makes a per-process permanent choice for 64/32-bitness is a
problem in itself: each thread can be in either 64 or 32 bit mode, and change
forth and back.  That is a problem that this patch doesn't fix.

Now finally: why does this matter in the context of the linux kernel
discussion?  The discussion was related to a patch that exposed io_uring
threads to user-space.  This made it possible that one of those threads would
be picked out to select 64/32-bitness.  Given that such threads are atypical
user-space threads in the sense that they don't return to user-space and don't
have a userspace register state, reading their registers returns garbage, 
and
so it could f.i. occur that in a 64-bit process with all normal user-space
threads in 64-bit mode, the probing would return 32-bit.

It may be that this is worked-around on the kernel side by providing userspace
register state in those threads such that current gdb is happy.  Nevertheless,
it seems prudent to fix this on the gdb size as well.

Tested on x86_64-linux.

[1] https://lore.kernel.org/io-uring/CAHk-=wh0KoEZXPYMGkfkeVEerSCEF1AiCZSvz9TRrx=Kj74D+Q@mail.gmail.com/

gdb/ChangeLog:

2021-05-07  Tom de Vries  <tdevries@suse.de>

	PR tdep/27822
	* x86-linux-nat.c (x86_linux_nat_target::read_description): Use
	pid to determine if process is 64-bit or 32-bit.
	* aarch64-linux-nat.c (aarch64_linux_nat_target::read_description):
	Same.
	* arm-linux-nat.c (arm_linux_nat_target::read_description): Same.
	* ppc-linux-nat.c (ppc_linux_nat_target::read_description): Same.
	* s390-linux-nat.c (s390_linux_nat_target::read_description): Same.

---
 gdb/aarch64-linux-nat.c | 2 +-
 gdb/arm-linux-nat.c     | 2 +-
 gdb/ppc-linux-nat.c     | 4 +---
 gdb/s390-linux-nat.c    | 2 +-
 gdb/x86-linux-nat.c     | 5 +----
 5 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/gdb/aarch64-linux-nat.c b/gdb/aarch64-linux-nat.c
index ae8db2988c2..61224022f6a 100644
--- a/gdb/aarch64-linux-nat.c
+++ b/gdb/aarch64-linux-nat.c
@@ -723,7 +723,7 @@ aarch64_linux_nat_target::read_description ()
   gdb_byte regbuf[ARM_VFP3_REGS_SIZE];
   struct iovec iovec;
 
-  tid = inferior_ptid.lwp ();
+  tid = inferior_ptid.pid ();
 
   iovec.iov_base = regbuf;
   iovec.iov_len = ARM_VFP3_REGS_SIZE;
diff --git a/gdb/arm-linux-nat.c b/gdb/arm-linux-nat.c
index 662dade0a12..b5b470b6876 100644
--- a/gdb/arm-linux-nat.c
+++ b/gdb/arm-linux-nat.c
@@ -537,7 +537,7 @@ arm_linux_nat_target::read_description ()
     {
       elf_gregset_t gpregs;
       struct iovec iov;
-      int tid = inferior_ptid.lwp ();
+      int tid = inferior_ptid.pid ();
 
       iov.iov_base = &gpregs;
       iov.iov_len = sizeof (gpregs);
diff --git a/gdb/ppc-linux-nat.c b/gdb/ppc-linux-nat.c
index 171f5b386fa..06a30efeaef 100644
--- a/gdb/ppc-linux-nat.c
+++ b/gdb/ppc-linux-nat.c
@@ -1946,9 +1946,7 @@ ppc_linux_nat_target::auxv_parse (gdb_byte **readptr,
 const struct target_desc *
 ppc_linux_nat_target::read_description ()
 {
-  int tid = inferior_ptid.lwp ();
-  if (tid == 0)
-    tid = inferior_ptid.pid ();
+  int tid = inferior_ptid.pid ();
 
   if (have_ptrace_getsetevrregs)
     {
diff --git a/gdb/s390-linux-nat.c b/gdb/s390-linux-nat.c
index 41b50ce4800..8f6eb61505b 100644
--- a/gdb/s390-linux-nat.c
+++ b/gdb/s390-linux-nat.c
@@ -988,7 +988,7 @@ s390_linux_nat_target::auxv_parse (gdb_byte **readptr,
 const struct target_desc *
 s390_linux_nat_target::read_description ()
 {
-  int tid = s390_inferior_tid ();
+  int tid = inferior_ptid.pid ();
 
   have_regset_last_break
     = check_regset (tid, NT_S390_LAST_BREAK, 8);
diff --git a/gdb/x86-linux-nat.c b/gdb/x86-linux-nat.c
index 85c7f0ddc94..adea1ad0092 100644
--- a/gdb/x86-linux-nat.c
+++ b/gdb/x86-linux-nat.c
@@ -113,10 +113,7 @@ x86_linux_nat_target::read_description ()
   static uint64_t xcr0;
   uint64_t xcr0_features_bits;
 
-  /* GNU/Linux LWP ID's are process ID's.  */
-  tid = inferior_ptid.lwp ();
-  if (tid == 0)
-    tid = inferior_ptid.pid (); /* Not a threaded program.  */
+  tid = inferior_ptid.pid ();
 
 #ifdef __x86_64__
   {

next prev parent reply	other threads:[~2021-05-19 16:32 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-07  8:44 Tom de Vries
2021-05-07 19:27 ` Simon Marchi via Gdb-patches
2021-05-07 20:06   ` Luis Machado via Gdb-patches
2021-05-19 16:32   ` Tom de Vries [this message]
2021-05-20 10:42     ` Alan Hayward via Gdb-patches
2021-05-20 16:07       ` Simon Marchi via Gdb-patches
2021-05-21 10:50         ` Alan Hayward via Gdb-patches
2021-05-22  8:32           ` Tom de Vries
2021-05-22  9:56             ` Tom de Vries
2021-05-23  1:27               ` Simon Marchi via Gdb-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e1f3b82e-db5a-56bc-01f8-58724105a954@suse.de \
    --to=tdevries@suse.de \
    --cc=arnez@linux.ibm.com \
    --cc=gdb-patches@sourceware.org \
    --cc=simon.marchi@polymtl.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox