[PATCH] [gdb/tdep] Fix unrelocated pc in i386_displaced_step

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

* [PATCH] [gdb/tdep] Fix unrelocated pc in i386_displaced_step_fixup
@ 2026-03-17  7:19 Tom de Vries
  2026-03-19 12:34 ` Andrew Burgess
  0 siblings, 1 reply; 5+ messages in thread
From: Tom de Vries @ 2026-03-17  7:19 UTC (permalink / raw)
  To: gdb-patches

With test-case gdb.threads/next-fork-other-thread.exp and target board
unix/-m32 I run into:
...
(gdb) next^M
[Switching to Thread 0xf643ab40 (LWP 3267939)]^M
^M
Thread 5 "next-fork-other" hit Breakpoint 1, main () at \
  next-fork-other-thread.c:73^M
73        alarm (60);^M
(gdb) FAIL: $exp: fork_func=fork: target-non-stop=off: non-stop=off: \
  displaced-stepping=on: i=$n: next to break here
...

Before we go into how this happens, let's first look at the inferior.

In main, 4 threads are started with the same thread function, leaving all 5
threads in a loop:
- the main thread is stuck in a loop calling sleep, and gdb steps through this
  loop using next
- the other, non-main threads are stuck in a loop where each thread:
  - forks off a child process that exits immediately
  - waits for the child process to exit
  - calls sleep

The FAIL happens as follows (following snippets from this gdb.log [1]).

One of the non-main threads enters __syscall_cancel_arch to do a sycall (
either to sleep, or to wait).

Then the non-main thread stops:
...
  [infrun] handle_signal_stop: [2937316.2937324.0] hit another thread's \
    single-step breakpoint^M
  [infrun] handle_signal_stop: delayed software breakpoint trap, ignoring^M
  [infrun] switch_back_to_stepped_thread: need to step [2937316.2937324.0] \
    over single-step breakpoint^M
...
because we "hit another thread's single-step breakpoint".

AFAIU, this is because of the main thread stepping through
__syscall_cancel_arch.

To handle this, we're going to try displaced stepping.

The syscall instruction is copied:
...
  [displaced] displaced_step_prepare_throw: original insn 0xf7cdce69: \
    cd 80      int    $0x80^M
  [displaced] prepare: selected buffer at 0x80490d2^M
...
to a buffer at _start+2:
...
080490d0 <_start>:
 80490d0:       31 ed                   xor    %ebp,%ebp
 80490d2:       5e                      pop    %esi
...
and we're going to resume execution there.

However, right after resuming we get a GDB_SIGNAL_CHLD for the same thread.

Part of handling that is finalizing the displaced stepping:
...
  [displaced] finish: restored 2937316.2937324.0 0x80490d2^M
  [displaced] i386_displaced_step_fixup: fixup (0xf7cdce69, 0x80490d2), \
    insn = 0xcd 0x80 ...^M
  [displaced] i386_displaced_step_fixup: syscall changed %eip; not relocating^M
  [infrun] handle_signal_stop: stop_pc=0x80490d2^M
...

The stop pc is 0x80490d2, the address of the copied instruction.  In other
words, we've stopped without making progress.

The problem is that the address is in the displaced stepping buffer, and needs
relocating, but instead we have "syscall changed %eip; not relocating".

The code in i386_displaced_step_fixup doesn't recognize this situation:
...
      if (i386_syscall_p (insn, &insn_len)
	  && pc != to + (insn - insn_start) + insn_len
	  /* GDB can get control back after the insn after the syscall.
	     Presumably this is a kernel bug.
	     i386_displaced_step_copy_insn ensures it's a nop,
	     we add one to the length for it.  */
	  && pc != to + (insn - insn_start) + insn_len + 1)
 	displaced_debug_printf ("syscall changed %%eip; not relocating");
...

It only handles the cases where the stop pc is:
- the address after the syscall insn, or
- the address after the nop after the syscall insn

So, instead of relocating the stop pc back to the original 0xf7cdce69, it
stays 0x80490d2.  After resuming at that address, the thread:
- executes the syscall,
- executes the rest of _start,
- enters main, and
- runs into the breakpoint at the start of main.

Since commit cf141dd8ccd ("gdb: fix reg corruption from displaced stepping on
amd64"), we do handle the "pc == to" case in amd64_displaced_step_fixup:
...
       if (amd64_syscall_p (insn_details, &insn_len)
 	  /* GDB can get control back after the insn after the syscall.
	     Presumably this is a kernel bug.  Fixup ensures its a nop, we
	     add one to the length for it.  */
	  && (pc < to || pc > (to + insn_len + 1)))
 	displaced_debug_printf ("syscall changed %%rip; not relocating");
...

Fix this in the same way.

Tested on x86_64-linux, with target board unix/-m32.

On openSUSE Tumbleweed (kernel version 6.19.7), this patch fixes the test-case.

On openSUSE Leap 16.0 (kernel version 6.12.0), we still run into PR29040.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=33997

[1] https://sourceware.org/bugzilla/attachment.cgi?id=16660
---
 gdb/i386-tdep.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gdb/i386-tdep.c b/gdb/i386-tdep.c
index b9013e183c2..93357b41b10 100644
--- a/gdb/i386-tdep.c
+++ b/gdb/i386-tdep.c
@@ -813,12 +813,11 @@ i386_displaced_step_fixup (struct gdbarch *gdbarch,
 	 it unrelocated.  Goodness help us if there are PC-relative
 	 system calls.  */
       if (i386_syscall_p (insn, &insn_len)
-	  && pc != to + (insn - insn_start) + insn_len
 	  /* GDB can get control back after the insn after the syscall.
 	     Presumably this is a kernel bug.
 	     i386_displaced_step_copy_insn ensures it's a nop,
 	     we add one to the length for it.  */
-	  && pc != to + (insn - insn_start) + insn_len + 1)
+	  && (pc < to || pc > to + (insn - insn_start) + insn_len + 1))
 	displaced_debug_printf ("syscall changed %%eip; not relocating");
       else
 	{

base-commit: 9cc83ec0ce9b4b75e8cd2b0c46f23d4cbf4b2f2b
-- 
2.51.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] [gdb/tdep] Fix unrelocated pc in i386_displaced_step_fixup
  2026-03-17  7:19 [PATCH] [gdb/tdep] Fix unrelocated pc in i386_displaced_step_fixup Tom de Vries
@ 2026-03-19 12:34 ` Andrew Burgess
  2026-03-19 17:40   ` Schimpe, Christina
  2026-03-20 16:33   ` Tom de Vries
  0 siblings, 2 replies; 5+ messages in thread
From: Andrew Burgess @ 2026-03-19 12:34 UTC (permalink / raw)
  To: Tom de Vries, gdb-patches


Hi Tom,

Thanks for fixing this.  Just a couple of minor notes:

Tom de Vries <tdevries@suse.de> writes:

> With test-case gdb.threads/next-fork-other-thread.exp and target board
> unix/-m32 I run into:
> ...
> (gdb) next^M
> [Switching to Thread 0xf643ab40 (LWP 3267939)]^M
> ^M
> Thread 5 "next-fork-other" hit Breakpoint 1, main () at \
>   next-fork-other-thread.c:73^M
> 73        alarm (60);^M
> (gdb) FAIL: $exp: fork_func=fork: target-non-stop=off: non-stop=off: \
>   displaced-stepping=on: i=$n: next to break here
> ...

Please can you clean up the log output to remove ^M line endings when
quoting it in commit messages.  Commit messages are written once, but
read possibly multiple times, so it is right that we invest more effort
when writing them in order to make them easier to read.  Leaving the ^M
endings in place just adds unnecessary noise.

>
> Before we go into how this happens, let's first look at the inferior.
>
> In main, 4 threads are started with the same thread function, leaving all 5
> threads in a loop:
> - the main thread is stuck in a loop calling sleep, and gdb steps through this
>   loop using next
> - the other, non-main threads are stuck in a loop where each thread:
>   - forks off a child process that exits immediately
>   - waits for the child process to exit
>   - calls sleep
>
> The FAIL happens as follows (following snippets from this gdb.log [1]).
>
> One of the non-main threads enters __syscall_cancel_arch to do a sycall (
> either to sleep, or to wait).
>
> Then the non-main thread stops:
> ...
>   [infrun] handle_signal_stop: [2937316.2937324.0] hit another thread's \
>     single-step breakpoint^M
>   [infrun] handle_signal_stop: delayed software breakpoint trap, ignoring^M
>   [infrun] switch_back_to_stepped_thread: need to step [2937316.2937324.0] \
>     over single-step breakpoint^M
> ...
> because we "hit another thread's single-step breakpoint".
>
> AFAIU, this is because of the main thread stepping through
> __syscall_cancel_arch.
>
> To handle this, we're going to try displaced stepping.
>
> The syscall instruction is copied:
> ...
>   [displaced] displaced_step_prepare_throw: original insn 0xf7cdce69: \
>     cd 80      int    $0x80^M
>   [displaced] prepare: selected buffer at 0x80490d2^M
> ...
> to a buffer at _start+2:
> ...
> 080490d0 <_start>:
>  80490d0:       31 ed                   xor    %ebp,%ebp
>  80490d2:       5e                      pop    %esi
> ...
> and we're going to resume execution there.
>
> However, right after resuming we get a GDB_SIGNAL_CHLD for the same thread.
>
> Part of handling that is finalizing the displaced stepping:
> ...
>   [displaced] finish: restored 2937316.2937324.0 0x80490d2^M
>   [displaced] i386_displaced_step_fixup: fixup (0xf7cdce69, 0x80490d2), \
>     insn = 0xcd 0x80 ...^M
>   [displaced] i386_displaced_step_fixup: syscall changed %eip; not relocating^M
>   [infrun] handle_signal_stop: stop_pc=0x80490d2^M
> ...
>
> The stop pc is 0x80490d2, the address of the copied instruction.  In other
> words, we've stopped without making progress.
>
> The problem is that the address is in the displaced stepping buffer, and needs
> relocating, but instead we have "syscall changed %eip; not relocating".
>
> The code in i386_displaced_step_fixup doesn't recognize this situation:
> ...
>       if (i386_syscall_p (insn, &insn_len)
> 	  && pc != to + (insn - insn_start) + insn_len
> 	  /* GDB can get control back after the insn after the syscall.
> 	     Presumably this is a kernel bug.
> 	     i386_displaced_step_copy_insn ensures it's a nop,
> 	     we add one to the length for it.  */
> 	  && pc != to + (insn - insn_start) + insn_len + 1)
>  	displaced_debug_printf ("syscall changed %%eip; not relocating");
> ...
>
> It only handles the cases where the stop pc is:
> - the address after the syscall insn, or
> - the address after the nop after the syscall insn
>
> So, instead of relocating the stop pc back to the original 0xf7cdce69, it
> stays 0x80490d2.  After resuming at that address, the thread:
> - executes the syscall,
> - executes the rest of _start,
> - enters main, and
> - runs into the breakpoint at the start of main.
>
> Since commit cf141dd8ccd ("gdb: fix reg corruption from displaced stepping on
> amd64"), we do handle the "pc == to" case in amd64_displaced_step_fixup:
> ...
>        if (amd64_syscall_p (insn_details, &insn_len)
>  	  /* GDB can get control back after the insn after the syscall.
> 	     Presumably this is a kernel bug.  Fixup ensures its a nop, we
> 	     add one to the length for it.  */
> 	  && (pc < to || pc > (to + insn_len + 1)))
>  	displaced_debug_printf ("syscall changed %%rip; not relocating");
> ...
>
> Fix this in the same way.
>
> Tested on x86_64-linux, with target board unix/-m32.
>
> On openSUSE Tumbleweed (kernel version 6.19.7), this patch fixes the test-case.
>
> On openSUSE Leap 16.0 (kernel version 6.12.0), we still run into PR29040.

I still see these failures:

  FAIL: gdb.threads/next-fork-other-thread.exp: fork_func=fork: target-non-stop=off: non-stop=off: displaced-stepping=auto: i=4: next to other line
  FAIL: gdb.threads/next-fork-other-thread.exp: fork_func=fork: target-non-stop=off: non-stop=off: displaced-stepping=on: i=2: next to other line
  FAIL: gdb.threads/next-fork-other-thread.exp: fork_func=fork: target-non-stop=off: non-stop=off: displaced-stepping=off: i=4: next to for loop

The first two are described by PR gdb/29040, but the third is a
different error message, but I don't know if the third failure is a
consequence of the first two failures or not.  Still I do think the fix
you are proposing here is correct, so with the commit message cleaned
up:

Approved-By: Andrew Burgess <aburgess@redhat.com>

Thanks,
Andrew

>
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=33997
>
> [1] https://sourceware.org/bugzilla/attachment.cgi?id=16660
> ---
>  gdb/i386-tdep.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/gdb/i386-tdep.c b/gdb/i386-tdep.c
> index b9013e183c2..93357b41b10 100644
> --- a/gdb/i386-tdep.c
> +++ b/gdb/i386-tdep.c
> @@ -813,12 +813,11 @@ i386_displaced_step_fixup (struct gdbarch *gdbarch,
>  	 it unrelocated.  Goodness help us if there are PC-relative
>  	 system calls.  */
>        if (i386_syscall_p (insn, &insn_len)
> -	  && pc != to + (insn - insn_start) + insn_len
>  	  /* GDB can get control back after the insn after the syscall.
>  	     Presumably this is a kernel bug.
>  	     i386_displaced_step_copy_insn ensures it's a nop,
>  	     we add one to the length for it.  */
> -	  && pc != to + (insn - insn_start) + insn_len + 1)
> +	  && (pc < to || pc > to + (insn - insn_start) + insn_len + 1))
>  	displaced_debug_printf ("syscall changed %%eip; not relocating");
>        else
>  	{
>
> base-commit: 9cc83ec0ce9b4b75e8cd2b0c46f23d4cbf4b2f2b
> -- 
> 2.51.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH] [gdb/tdep] Fix unrelocated pc in i386_displaced_step_fixup
  2026-03-19 12:34 ` Andrew Burgess
@ 2026-03-19 17:40   ` Schimpe, Christina
  2026-03-19 22:34     ` Tom de Vries
  2026-03-20 16:33   ` Tom de Vries
  1 sibling, 1 reply; 5+ messages in thread
From: Schimpe, Christina @ 2026-03-19 17:40 UTC (permalink / raw)
  To: Andrew Burgess, Tom de Vries, gdb-patches

> -----Original Message-----
> From: Andrew Burgess <aburgess@redhat.com>
> Sent: Donnerstag, 19. März 2026 13:34
> To: Tom de Vries <tdevries@suse.de>; gdb-patches@sourceware.org
> Subject: Re: [PATCH] [gdb/tdep] Fix unrelocated pc in
> i386_displaced_step_fixup
> 
> 
> Hi Tom,
> 
> Thanks for fixing this.  Just a couple of minor notes:
> 
> Tom de Vries <tdevries@suse.de> writes:
> 
> > With test-case gdb.threads/next-fork-other-thread.exp and target board
> > unix/-m32 I run into:
> > ...
> > (gdb) next^M
> > [Switching to Thread 0xf643ab40 (LWP 3267939)]^M ^M Thread 5
> > "next-fork-other" hit Breakpoint 1, main () at \
> >   next-fork-other-thread.c:73^M
> > 73        alarm (60);^M
> > (gdb) FAIL: $exp: fork_func=fork: target-non-stop=off: non-stop=off: \
> >   displaced-stepping=on: i=$n: next to break here ...
> 
> Please can you clean up the log output to remove ^M line endings when
> quoting it in commit messages.  Commit messages are written once, but read
> possibly multiple times, so it is right that we invest more effort when writing
> them in order to make them easier to read.  Leaving the ^M endings in place
> just adds unnecessary noise.
> 
> >
> > Before we go into how this happens, let's first look at the inferior.
> >
> > In main, 4 threads are started with the same thread function, leaving
> > all 5 threads in a loop:
> > - the main thread is stuck in a loop calling sleep, and gdb steps through this
> >   loop using next
> > - the other, non-main threads are stuck in a loop where each thread:
> >   - forks off a child process that exits immediately
> >   - waits for the child process to exit
> >   - calls sleep
> >
> > The FAIL happens as follows (following snippets from this gdb.log [1]).
> >
> > One of the non-main threads enters __syscall_cancel_arch to do a
> > sycall ( either to sleep, or to wait).
> >
> > Then the non-main thread stops:
> > ...
> >   [infrun] handle_signal_stop: [2937316.2937324.0] hit another thread's \
> >     single-step breakpoint^M
> >   [infrun] handle_signal_stop: delayed software breakpoint trap, ignoring^M
> >   [infrun] switch_back_to_stepped_thread: need to step
> [2937316.2937324.0] \
> >     over single-step breakpoint^M
> > ...
> > because we "hit another thread's single-step breakpoint".
> >
> > AFAIU, this is because of the main thread stepping through
> > __syscall_cancel_arch.
> >
> > To handle this, we're going to try displaced stepping.
> >
> > The syscall instruction is copied:
> > ...
> >   [displaced] displaced_step_prepare_throw: original insn 0xf7cdce69: \
> >     cd 80      int    $0x80^M
> >   [displaced] prepare: selected buffer at 0x80490d2^M ...
> > to a buffer at _start+2:
> > ...
> > 080490d0 <_start>:
> >  80490d0:       31 ed                   xor    %ebp,%ebp
> >  80490d2:       5e                      pop    %esi
> > ...
> > and we're going to resume execution there.
> >
> > However, right after resuming we get a GDB_SIGNAL_CHLD for the same
> thread.
> >
> > Part of handling that is finalizing the displaced stepping:
> > ...
> >   [displaced] finish: restored 2937316.2937324.0 0x80490d2^M
> >   [displaced] i386_displaced_step_fixup: fixup (0xf7cdce69, 0x80490d2), \
> >     insn = 0xcd 0x80 ...^M
> >   [displaced] i386_displaced_step_fixup: syscall changed %eip; not
> relocating^M
> >   [infrun] handle_signal_stop: stop_pc=0x80490d2^M ...
> >
> > The stop pc is 0x80490d2, the address of the copied instruction.  In
> > other words, we've stopped without making progress.
> >
> > The problem is that the address is in the displaced stepping buffer,
> > and needs relocating, but instead we have "syscall changed %eip; not
> relocating".
> >
> > The code in i386_displaced_step_fixup doesn't recognize this situation:
> > ...
> >       if (i386_syscall_p (insn, &insn_len)
> > 	  && pc != to + (insn - insn_start) + insn_len
> > 	  /* GDB can get control back after the insn after the syscall.
> > 	     Presumably this is a kernel bug.
> > 	     i386_displaced_step_copy_insn ensures it's a nop,
> > 	     we add one to the length for it.  */
> > 	  && pc != to + (insn - insn_start) + insn_len + 1)
> >  	displaced_debug_printf ("syscall changed %%eip; not relocating");
> > ...
> >
> > It only handles the cases where the stop pc is:
> > - the address after the syscall insn, or
> > - the address after the nop after the syscall insn
> >
> > So, instead of relocating the stop pc back to the original 0xf7cdce69,
> > it stays 0x80490d2.  After resuming at that address, the thread:
> > - executes the syscall,
> > - executes the rest of _start,
> > - enters main, and
> > - runs into the breakpoint at the start of main.
> >
> > Since commit cf141dd8ccd ("gdb: fix reg corruption from displaced
> > stepping on amd64"), we do handle the "pc == to" case in
> amd64_displaced_step_fixup:
> > ...
> >        if (amd64_syscall_p (insn_details, &insn_len)
> >  	  /* GDB can get control back after the insn after the syscall.
> > 	     Presumably this is a kernel bug.  Fixup ensures its a nop, we
> > 	     add one to the length for it.  */
> > 	  && (pc < to || pc > (to + insn_len + 1)))
> >  	displaced_debug_printf ("syscall changed %%rip; not relocating");
> > ...
> >
> > Fix this in the same way.
> >
> > Tested on x86_64-linux, with target board unix/-m32.
> >
> > On openSUSE Tumbleweed (kernel version 6.19.7), this patch fixes the test-
> case.
> >
> > On openSUSE Leap 16.0 (kernel version 6.12.0), we still run into PR29040.
> 
> I still see these failures:
> 
>   FAIL: gdb.threads/next-fork-other-thread.exp: fork_func=fork: target-non-
> stop=off: non-stop=off: displaced-stepping=auto: i=4: next to other line
>   FAIL: gdb.threads/next-fork-other-thread.exp: fork_func=fork: target-non-
> stop=off: non-stop=off: displaced-stepping=on: i=2: next to other line
>   FAIL: gdb.threads/next-fork-other-thread.exp: fork_func=fork: target-non-
> stop=off: non-stop=off: displaced-stepping=off: i=4: next to for loop
> 
> The first two are described by PR gdb/29040, but the third is a different error
> message, but I don't know if the third failure is a consequence of the first two
> failures or not.  Still I do think the fix you are proposing here is correct, so
> with the commit message cleaned
> up:
> 
> Approved-By: Andrew Burgess <aburgess@redhat.com>
> 
> Thanks,
> Andrew
> 
> >
> > Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=33997
> >
> > [1] https://sourceware.org/bugzilla/attachment.cgi?id=16660
> > ---
> >  gdb/i386-tdep.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/gdb/i386-tdep.c b/gdb/i386-tdep.c index
> > b9013e183c2..93357b41b10 100644
> > --- a/gdb/i386-tdep.c
> > +++ b/gdb/i386-tdep.c
> > @@ -813,12 +813,11 @@ i386_displaced_step_fixup (struct gdbarch
> *gdbarch,
> >  	 it unrelocated.  Goodness help us if there are PC-relative
> >  	 system calls.  */
> >        if (i386_syscall_p (insn, &insn_len)
> > -	  && pc != to + (insn - insn_start) + insn_len
> >  	  /* GDB can get control back after the insn after the syscall.
> >  	     Presumably this is a kernel bug.
> >  	     i386_displaced_step_copy_insn ensures it's a nop,
> >  	     we add one to the length for it.  */
> > -	  && pc != to + (insn - insn_start) + insn_len + 1)
> > +	  && (pc < to || pc > to + (insn - insn_start) + insn_len + 1))
> >  	displaced_debug_printf ("syscall changed %%eip; not relocating");
> >        else
> >  	{
> >
> > base-commit: 9cc83ec0ce9b4b75e8cd2b0c46f23d4cbf4b2f2b
> > --
> > 2.51.0
> 

Thank you for working on this.

I cannot reproduce the issue(s), as I do not have that specific system available.
In any case, the fix makes sense to me. However, I am only adding my Reviewed-by
tag here as this is an area I do not yet feel completely confident about.
Andrew approved this anyways already (thanks!). So with the commit message fixed:

Reviewed-By: Christina Schimpe <christina.schimpe@intel.com>

Christina
Intel Deutschland GmbH
Registered Address: Dornacher Straße 1, 85622 Feldkirchen, Germany
Tel: +49 89 991 430, www.intel.de
Managing Directors: Harry Demas, Jeffrey Schneiderman, Yin Chong Sorrell
Chairperson of the Supervisory Board: Nicole Lau
Registered Seat: Munich
Commercial Register: Amtsgericht München HRB 186928

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] [gdb/tdep] Fix unrelocated pc in i386_displaced_step_fixup
  2026-03-19 17:40   ` Schimpe, Christina
@ 2026-03-19 22:34     ` Tom de Vries
  0 siblings, 0 replies; 5+ messages in thread
From: Tom de Vries @ 2026-03-19 22:34 UTC (permalink / raw)
  To: Schimpe, Christina, Andrew Burgess, gdb-patches

On 3/19/26 6:40 PM, Schimpe, Christina wrote:
>> -----Original Message-----
>> From: Andrew Burgess <aburgess@redhat.com>
>> Sent: Donnerstag, 19. März 2026 13:34
>> To: Tom de Vries <tdevries@suse.de>; gdb-patches@sourceware.org
>> Subject: Re: [PATCH] [gdb/tdep] Fix unrelocated pc in
>> i386_displaced_step_fixup
>>
>>
>> Hi Tom,
>>
>> Thanks for fixing this.  Just a couple of minor notes:
>>
>> Tom de Vries <tdevries@suse.de> writes:
>>
>>> With test-case gdb.threads/next-fork-other-thread.exp and target board
>>> unix/-m32 I run into:
>>> ...
>>> (gdb) next^M
>>> [Switching to Thread 0xf643ab40 (LWP 3267939)]^M ^M Thread 5
>>> "next-fork-other" hit Breakpoint 1, main () at \
>>>    next-fork-other-thread.c:73^M
>>> 73        alarm (60);^M
>>> (gdb) FAIL: $exp: fork_func=fork: target-non-stop=off: non-stop=off: \
>>>    displaced-stepping=on: i=$n: next to break here ...
>>
>> Please can you clean up the log output to remove ^M line endings when
>> quoting it in commit messages.  Commit messages are written once, but read
>> possibly multiple times, so it is right that we invest more effort when writing
>> them in order to make them easier to read.  Leaving the ^M endings in place
>> just adds unnecessary noise.
>>
>>>
>>> Before we go into how this happens, let's first look at the inferior.
>>>
>>> In main, 4 threads are started with the same thread function, leaving
>>> all 5 threads in a loop:
>>> - the main thread is stuck in a loop calling sleep, and gdb steps through this
>>>    loop using next
>>> - the other, non-main threads are stuck in a loop where each thread:
>>>    - forks off a child process that exits immediately
>>>    - waits for the child process to exit
>>>    - calls sleep
>>>
>>> The FAIL happens as follows (following snippets from this gdb.log [1]).
>>>
>>> One of the non-main threads enters __syscall_cancel_arch to do a
>>> sycall ( either to sleep, or to wait).
>>>
>>> Then the non-main thread stops:
>>> ...
>>>    [infrun] handle_signal_stop: [2937316.2937324.0] hit another thread's \
>>>      single-step breakpoint^M
>>>    [infrun] handle_signal_stop: delayed software breakpoint trap, ignoring^M
>>>    [infrun] switch_back_to_stepped_thread: need to step
>> [2937316.2937324.0] \
>>>      over single-step breakpoint^M
>>> ...
>>> because we "hit another thread's single-step breakpoint".
>>>
>>> AFAIU, this is because of the main thread stepping through
>>> __syscall_cancel_arch.
>>>
>>> To handle this, we're going to try displaced stepping.
>>>
>>> The syscall instruction is copied:
>>> ...
>>>    [displaced] displaced_step_prepare_throw: original insn 0xf7cdce69: \
>>>      cd 80      int    $0x80^M
>>>    [displaced] prepare: selected buffer at 0x80490d2^M ...
>>> to a buffer at _start+2:
>>> ...
>>> 080490d0 <_start>:
>>>   80490d0:       31 ed                   xor    %ebp,%ebp
>>>   80490d2:       5e                      pop    %esi
>>> ...
>>> and we're going to resume execution there.
>>>
>>> However, right after resuming we get a GDB_SIGNAL_CHLD for the same
>> thread.
>>>
>>> Part of handling that is finalizing the displaced stepping:
>>> ...
>>>    [displaced] finish: restored 2937316.2937324.0 0x80490d2^M
>>>    [displaced] i386_displaced_step_fixup: fixup (0xf7cdce69, 0x80490d2), \
>>>      insn = 0xcd 0x80 ...^M
>>>    [displaced] i386_displaced_step_fixup: syscall changed %eip; not
>> relocating^M
>>>    [infrun] handle_signal_stop: stop_pc=0x80490d2^M ...
>>>
>>> The stop pc is 0x80490d2, the address of the copied instruction.  In
>>> other words, we've stopped without making progress.
>>>
>>> The problem is that the address is in the displaced stepping buffer,
>>> and needs relocating, but instead we have "syscall changed %eip; not
>> relocating".
>>>
>>> The code in i386_displaced_step_fixup doesn't recognize this situation:
>>> ...
>>>        if (i386_syscall_p (insn, &insn_len)
>>> 	  && pc != to + (insn - insn_start) + insn_len
>>> 	  /* GDB can get control back after the insn after the syscall.
>>> 	     Presumably this is a kernel bug.
>>> 	     i386_displaced_step_copy_insn ensures it's a nop,
>>> 	     we add one to the length for it.  */
>>> 	  && pc != to + (insn - insn_start) + insn_len + 1)
>>>   	displaced_debug_printf ("syscall changed %%eip; not relocating");
>>> ...
>>>
>>> It only handles the cases where the stop pc is:
>>> - the address after the syscall insn, or
>>> - the address after the nop after the syscall insn
>>>
>>> So, instead of relocating the stop pc back to the original 0xf7cdce69,
>>> it stays 0x80490d2.  After resuming at that address, the thread:
>>> - executes the syscall,
>>> - executes the rest of _start,
>>> - enters main, and
>>> - runs into the breakpoint at the start of main.
>>>
>>> Since commit cf141dd8ccd ("gdb: fix reg corruption from displaced
>>> stepping on amd64"), we do handle the "pc == to" case in
>> amd64_displaced_step_fixup:
>>> ...
>>>         if (amd64_syscall_p (insn_details, &insn_len)
>>>   	  /* GDB can get control back after the insn after the syscall.
>>> 	     Presumably this is a kernel bug.  Fixup ensures its a nop, we
>>> 	     add one to the length for it.  */
>>> 	  && (pc < to || pc > (to + insn_len + 1)))
>>>   	displaced_debug_printf ("syscall changed %%rip; not relocating");
>>> ...
>>>
>>> Fix this in the same way.
>>>
>>> Tested on x86_64-linux, with target board unix/-m32.
>>>
>>> On openSUSE Tumbleweed (kernel version 6.19.7), this patch fixes the test-
>> case.
>>>
>>> On openSUSE Leap 16.0 (kernel version 6.12.0), we still run into PR29040.
>>
>> I still see these failures:
>>
>>    FAIL: gdb.threads/next-fork-other-thread.exp: fork_func=fork: target-non-
>> stop=off: non-stop=off: displaced-stepping=auto: i=4: next to other line
>>    FAIL: gdb.threads/next-fork-other-thread.exp: fork_func=fork: target-non-
>> stop=off: non-stop=off: displaced-stepping=on: i=2: next to other line
>>    FAIL: gdb.threads/next-fork-other-thread.exp: fork_func=fork: target-non-
>> stop=off: non-stop=off: displaced-stepping=off: i=4: next to for loop
>>
>> The first two are described by PR gdb/29040, but the third is a different error
>> message, but I don't know if the third failure is a consequence of the first two
>> failures or not.  Still I do think the fix you are proposing here is correct, so
>> with the commit message cleaned
>> up:
>>
>> Approved-By: Andrew Burgess <aburgess@redhat.com>
>>
>> Thanks,
>> Andrew
>>
>>>
>>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=33997
>>>
>>> [1] https://sourceware.org/bugzilla/attachment.cgi?id=16660
>>> ---
>>>   gdb/i386-tdep.c | 3 +--
>>>   1 file changed, 1 insertion(+), 2 deletions(-)
>>>
>>> diff --git a/gdb/i386-tdep.c b/gdb/i386-tdep.c index
>>> b9013e183c2..93357b41b10 100644
>>> --- a/gdb/i386-tdep.c
>>> +++ b/gdb/i386-tdep.c
>>> @@ -813,12 +813,11 @@ i386_displaced_step_fixup (struct gdbarch
>> *gdbarch,
>>>   	 it unrelocated.  Goodness help us if there are PC-relative
>>>   	 system calls.  */
>>>         if (i386_syscall_p (insn, &insn_len)
>>> -	  && pc != to + (insn - insn_start) + insn_len
>>>   	  /* GDB can get control back after the insn after the syscall.
>>>   	     Presumably this is a kernel bug.
>>>   	     i386_displaced_step_copy_insn ensures it's a nop,
>>>   	     we add one to the length for it.  */
>>> -	  && pc != to + (insn - insn_start) + insn_len + 1)
>>> +	  && (pc < to || pc > to + (insn - insn_start) + insn_len + 1))
>>>   	displaced_debug_printf ("syscall changed %%eip; not relocating");
>>>         else
>>>   	{
>>>
>>> base-commit: 9cc83ec0ce9b4b75e8cd2b0c46f23d4cbf4b2f2b
>>> --
>>> 2.51.0
>>
> 
> Thank you for working on this.
> 
> I cannot reproduce the issue(s), as I do not have that specific system available.
> In any case, the fix makes sense to me. However, I am only adding my Reviewed-by
> tag here as this is an area I do not yet feel completely confident about.
> Andrew approved this anyways already (thanks!). So with the commit message fixed:
> 
> Reviewed-By: Christina Schimpe <christina.schimpe@intel.com>

Thanks both for the reviews, pushed.

- Tom

> Christina
> Intel Deutschland GmbH
> Registered Address: Dornacher Straße 1, 85622 Feldkirchen, Germany
> Tel: +49 89 991 430, www.intel.de
> Managing Directors: Harry Demas, Jeffrey Schneiderman, Yin Chong Sorrell
> Chairperson of the Supervisory Board: Nicole Lau
> Registered Seat: Munich
> Commercial Register: Amtsgericht München HRB 186928


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] [gdb/tdep] Fix unrelocated pc in i386_displaced_step_fixup
  2026-03-19 12:34 ` Andrew Burgess
  2026-03-19 17:40   ` Schimpe, Christina
@ 2026-03-20 16:33   ` Tom de Vries
  1 sibling, 0 replies; 5+ messages in thread
From: Tom de Vries @ 2026-03-20 16:33 UTC (permalink / raw)
  To: Andrew Burgess, gdb-patches

On 3/19/26 1:34 PM, Andrew Burgess wrote:
> I still see these failures:
> 
>    FAIL: gdb.threads/next-fork-other-thread.exp: fork_func=fork: target-non-stop=off: non-stop=off: displaced-stepping=auto: i=4: next to other line
>    FAIL: gdb.threads/next-fork-other-thread.exp: fork_func=fork: target-non-stop=off: non-stop=off: displaced-stepping=on: i=2: next to other line
>    FAIL: gdb.threads/next-fork-other-thread.exp: fork_func=fork: target-non-stop=off: non-stop=off: displaced-stepping=off: i=4: next to for loop
> 
> The first two are described by PR gdb/29040, but the third is a
> different error message, but I don't know if the third failure is a
> consequence of the first two failures or not.

The failures are independent, gdb gets a clean_restart for each 
different displaced-stepping value (on/off/auto).

But AFAUI, the FAILs are essentially the same: the main thread is 
stepping through a loop when a non-main thread runs into an abort.

The test name varies with were we happen to be in the loop at that moment.

Thanks,
- Tom

> Still I do think the fix
> you are proposing here is correct, so with the commit message cleaned
> up:


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-03-20 16:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-03-17  7:19 [PATCH] [gdb/tdep] Fix unrelocated pc in i386_displaced_step_fixup Tom de Vries
2026-03-19 12:34 ` Andrew Burgess
2026-03-19 17:40   ` Schimpe, Christina
2026-03-19 22:34     ` Tom de Vries
2026-03-20 16:33   ` Tom de Vries

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox