Displaced stepping not always working as expected

Mirror of the gdb mailing list
 help / color / mirror / Atom feed

* Displaced stepping not always working as expected
@ 2011-09-20 19:54 Marc Khouzam
  2011-09-21  6:09 ` Yao Qi
  2011-09-21 10:20 ` Pedro Alves
  0 siblings, 2 replies; 7+ messages in thread
From: Marc Khouzam @ 2011-09-20 19:54 UTC (permalink / raw)
  To: 'gdb@sourceware.org'

Hi,

I just need a hint on where next to look...

I've been asked to look into problems with non-stop on 
a user-mode-linux virtual machine
(http://user-mode-linux.sourceforge.net/)

On that AMD 64bit machine, I cannot step or resume past a breakpoint
when using non-stop with a multi-threaded program _if_ any of the
threads is still running.  If I interrupt all threads, then displaced
stepping works.

During the failure case, I confirmed that the displaced
instruction does _not_ get executed (the memory it should have 
changed stays the same).  So, the PC stays in the same place
and the step does not move forward.

I tried to turn on 'set debug infrun 1', but I get the exact same
logs during the failure as during a success case.

Sometimes, if I keep trying to step, it will finally work (could be
after 3 attempts, could be after 100 attempts or more).  It seems
related to what the other running thread is doing at the time.

Can someone let me know where in GDB I can look to see why a displaced
instruction is not being executed?  Or maybe other debug logs to enable?

For more details, below are stripped logs showing the problem as
concisely as possible.

Thanks a lot for any guidance

Marc




Displaced logs showing PC stuck:
===============================
displaced: stepping Thread 0x40b21940 (LWP 763) now
displaced: saved 0x4006d2: 49 89 d1 5e 48 89 e2 48 83 e4 f0 50 54 49 c7 c0 
displaced: copy 0x40083e->0x4006d2: 83 6d fc 01 8b 75 fc bf 8c 09 40 00 b8 00 00 00 
====> PC being displaced
displaced: displaced pc to 0x4006d2
====> Instruction being run
displaced: run 0x4006d2: 83 6d fc 01
               ^^^^^^^^
displaced: restored 0x4006d2
displaced: fixup (0x40083e, 0x4006d2), insn = 0x83 0x6d ...
====> PC being relocated from the same address 
====> as the displaced instruction!
displaced: relocated %rip from 0x4006d2 to 0x40083e
                               ^^^^^^^^
'next' operation stuck at line 9 of my program:
==============================================
(gdb) n
infrun: clear_proceed_status_thread (Thread 0x40b21940 (LWP 763))
infrun: proceed (addr=0xffffffffffffffff, signal=144, step=1)
infrun: resume (step=1, signal=0), trap_expected=1
displaced: stepping Thread 0x40b21940 (LWP 763) now
displaced: saved 0x4006d2: 49 89 d1 5e 48 89 e2 48 83 e4 f0 50 54 49 c7 c0 
displaced: copy 0x40083e->0x4006d2: 83 6d fc 01 8b 75 fc bf 8c 09 40 00 b8 00 00 00 
displaced: displaced pc to 0x4006d2
displaced: run 0x4006d2: 83 6d fc 01 
infrun: target_wait (-1, status) =
infrun:   760 [Thread 0x40b21940 (LWP 763)],
infrun:   status->kind = stopped, signal = SIGTRAP
infrun: Switching context from Thread 0x40b21940 (LWP 763) to Thread 0x40b21940 (LWP 763)
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_STOPPED
displaced: restored 0x4006d2
displaced: fixup (0x40083e, 0x4006d2), insn = 0x83 0x6d ...
displaced: relocated %rip from 0x4006d2 to 0x40083e
infrun: stop_pc = 0x40083e
infrun: BPSTAT_WHAT_STOP_NOISY
infrun: stop_stepping

Breakpoint 2, thread_exec1 (ptr=0x40095c) at multithread.c:9
9               i--;
(gdb) infrun: target_wait (-1, status) =
infrun:   -1 [process -1],
infrun:   status->kind = ignore
infrun: TARGET_WAITKIND_IGNORE
infrun: prepare_to_wait


'next' operation that finally gets to line 10:
(exact same output as failure except PC gets incremented)
=========================================================
(gdb) n
infrun: clear_proceed_status_thread (Thread 0x40b21940 (LWP 763))
infrun: proceed (addr=0xffffffffffffffff, signal=144, step=1)
infrun: resume (step=1, signal=0), trap_expected=1
displaced: stepping Thread 0x40b21940 (LWP 763) now
displaced: saved 0x4006d2: 49 89 d1 5e 48 89 e2 48 83 e4 f0 50 54 49 c7 c0 
displaced: copy 0x40083e->0x4006d2: 83 6d fc 01 8b 75 fc bf 8c 09 40 00 b8 00 00 00 
displaced: displaced pc to 0x4006d2
displaced: run 0x4006d2: 83 6d fc 01 
infrun: target_wait (-1, status) =
infrun:   760 [Thread 0x40b21940 (LWP 763)],
infrun:   status->kind = stopped, signal = SIGTRAP
infrun: Switching context from Thread 0x40b21940 (LWP 763) to Thread 0x40b21940 (LWP 763)
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_STOPPED
displaced: restored 0x4006d2
displaced: fixup (0x40083e, 0x4006d2), insn = 0x83 0x6d ...
displaced: relocated %rip from 0x4006d6 to 0x400842
infrun: stop_pc = 0x400842
infrun: stepped to a different line
infrun: stop_stepping
10              printf("in the second thread %d\n", i);
(gdb) infrun: target_wait (-1, status) =
infrun:   -1 [process -1],
infrun:   status->kind = ignore
infrun: TARGET_WAITKIND_IGNORE
infrun: prepare_to_wait


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Displaced stepping not always working as expected
  2011-09-20 19:54 Displaced stepping not always working as expected Marc Khouzam
@ 2011-09-21  6:09 ` Yao Qi
  2011-09-21 10:23   ` Pedro Alves
  2011-09-21 10:20 ` Pedro Alves
  1 sibling, 1 reply; 7+ messages in thread
From: Yao Qi @ 2011-09-21  6:09 UTC (permalink / raw)
  To: Marc Khouzam; +Cc: 'gdb@sourceware.org'

On Tue, 2011-09-20 at 15:54 -0400, Marc Khouzam wrote:

> Can someone let me know where in GDB I can look to see why a displaced
> instruction is not being executed?  Or maybe other debug logs to enable?
> 

Usually, I use 'set debug displaced 1' and 'set debug infrun 1'
together.  Looks you have used them.  I don't know extra debug log to
turn on.

>                                ^^^^^^^^
> 'next' operation stuck at line 9 of my program:
> ==============================================
> (gdb) n
> infrun: clear_proceed_status_thread (Thread 0x40b21940 (LWP 763))
> infrun: proceed (addr=0xffffffffffffffff, signal=144, step=1)
> infrun: resume (step=1, signal=0), trap_expected=1
> displaced: stepping Thread 0x40b21940 (LWP 763) now
> displaced: saved 0x4006d2: 49 89 d1 5e 48 89 e2 48 83 e4 f0 50 54 49 c7 c0 
> displaced: copy 0x40083e->0x4006d2: 83 6d fc 01 8b 75 fc bf 8c 09 40 00 b8 00 00 00 
> displaced: displaced pc to 0x4006d2
> displaced: run 0x4006d2: 83 6d fc 01 
> infrun: target_wait (-1, status) =
> infrun:   760 [Thread 0x40b21940 (LWP 763)],
> infrun:   status->kind = stopped, signal = SIGTRAP
> infrun: Switching context from Thread 0x40b21940 (LWP 763) to Thread 0x40b21940 (LWP 763)

The line of log looks strange to me.  Why LWP 763 switch to itself?

Usually, what I saw here is "from Thread A to Thread B", while A != B.
In this case, I know current displaced stepping infrastructure doesn't
handle "thread context switch during displaced stepping".  That is, when
some insns are copied to scratch pad and executed in thread A, but
thread B gets an event first.  Then gdb will forget that Thread A is
displaced stepping after handle event for Thread B.  This problem is
quite similar to context switch during software-single-step.

I don't think this is the same problem you asked here.  I had drafted a
patch for this problem I described above, but still need some time.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Displaced stepping not always working as expected
  2011-09-20 19:54 Displaced stepping not always working as expected Marc Khouzam
  2011-09-21  6:09 ` Yao Qi
@ 2011-09-21 10:20 ` Pedro Alves
  2011-09-21 20:46   ` Marc Khouzam
  1 sibling, 1 reply; 7+ messages in thread
From: Pedro Alves @ 2011-09-21 10:20 UTC (permalink / raw)
  To: gdb; +Cc: Marc Khouzam

On Tuesday 20 September 2011 20:54:24, Marc Khouzam wrote:
> Hi,
> 
> I just need a hint on where next to look...

> I've been asked to look into problems with non-stop on 
> a user-mode-linux virtual machine
> (http://user-mode-linux.sourceforge.net/)

So does this only happen with UML?  UML uses ptrace internally for
its own business, I wouldn't be surprised if there's something
wonky going on at that level.

> On that AMD 64bit machine, I cannot step or resume past a breakpoint
> when using non-stop with a multi-threaded program _if_ any of the
> threads is still running.  If I interrupt all threads, then displaced
> stepping works.

I wouldn't be surprised if the UM kernel is reporting a spurious
SIGTRAP to gdb.  Try "set debug lin-lwp 1" as well, but I don't
think it'll tell you much.  Maybe peeking at eflags or the siginfo
of that SIGTRAP reveals something.

> During the failure case, I confirmed that the displaced
> instruction does _not_ get executed (the memory it should have 
> changed stays the same).  So, the PC stays in the same place
> and the step does not move forward.
> 
> I tried to turn on 'set debug infrun 1', but I get the exact same
> logs during the failure as during a success case.
> 
> Sometimes, if I keep trying to step, it will finally work (could be
> after 3 attempts, could be after 100 attempts or more).  It seems
> related to what the other running thread is doing at the time.
> 
> Can someone let me know where in GDB I can look to see why a displaced
> instruction is not being executed?  Or maybe other debug logs to enable?

Try "set debug lin-lwp 1", and see if the resume was preempted and
for some bizarre reason the core is getting a cached wait status
instead of really resuming the thread.

Otherwise, this smells like a UML problem.

-- 
Pedro Alves


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Displaced stepping not always working as expected
  2011-09-21  6:09 ` Yao Qi
@ 2011-09-21 10:23   ` Pedro Alves
  2011-09-21 15:39     ` Yao Qi
  0 siblings, 1 reply; 7+ messages in thread
From: Pedro Alves @ 2011-09-21 10:23 UTC (permalink / raw)
  To: gdb; +Cc: Yao Qi, Marc Khouzam

On Wednesday 21 September 2011 07:08:43, Yao Qi wrote:
> The line of log looks strange to me.  Why LWP 763 switch to itself?

This is non-stop mode, and fetch_inferior_event always prints the
"context switch":

  if (non_stop
      && ecs->ws.kind != TARGET_WAITKIND_IGNORE
      && ecs->ws.kind != TARGET_WAITKIND_EXITED
      && ecs->ws.kind != TARGET_WAITKIND_SIGNALLED)
    /* In non-stop mode, each thread is handled individually.  Switch
       early, so the global state is set correctly for this
       thread.  */
    context_switch (ecs->ptid);

-- 
Pedro Alves


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Displaced stepping not always working as expected
  2011-09-21 10:23   ` Pedro Alves
@ 2011-09-21 15:39     ` Yao Qi
  2011-09-21 15:44       ` Pedro Alves
  0 siblings, 1 reply; 7+ messages in thread
From: Yao Qi @ 2011-09-21 15:39 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb, Marc Khouzam

On 09/21/2011 06:23 PM, Pedro Alves wrote:
> On Wednesday 21 September 2011 07:08:43, Yao Qi wrote:
>> The line of log looks strange to me.  Why LWP 763 switch to itself?
> 
> This is non-stop mode, and fetch_inferior_event always prints the
> "context switch":
> 
>   if (non_stop
>       && ecs->ws.kind != TARGET_WAITKIND_IGNORE
>       && ecs->ws.kind != TARGET_WAITKIND_EXITED
>       && ecs->ws.kind != TARGET_WAITKIND_SIGNALLED)
>     /* In non-stop mode, each thread is handled individually.  Switch
>        early, so the global state is set correctly for this
>        thread.  */
>     context_switch (ecs->ptid);
> 

Pedro,
I don't quite understand this piece of code and the comments here, but I
think that debug log "context switch from Thread A to Thread A" is not
useful, if not confusing.  How about this patch?

-- 
Yao (é½å°§) 

	gdb/
	* infrun.c (context_switch): Print debug message when switching to
	a different thread.

diff --git a/gdb/infrun.c b/gdb/infrun.c
index 9a2de5c..225034c 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -2852,7 +2852,7 @@ nullify_last_target_wait_ptid (void)
 static void
 context_switch (ptid_t ptid)
 {
-  if (debug_infrun)
+  if (debug_infrun && !ptid_equal (ptid, inferior_ptid))
     {
       fprintf_unfiltered (gdb_stdlog, "infrun: Switching context from %s ",
 			  target_pid_to_str (inferior_ptid));


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Displaced stepping not always working as expected
  2011-09-21 15:39     ` Yao Qi
@ 2011-09-21 15:44       ` Pedro Alves
  0 siblings, 0 replies; 7+ messages in thread
From: Pedro Alves @ 2011-09-21 15:44 UTC (permalink / raw)
  To: Yao Qi; +Cc: gdb, Marc Khouzam

On Wednesday 21 September 2011 16:39:06, Yao Qi wrote:
> On 09/21/2011 06:23 PM, Pedro Alves wrote:
> > On Wednesday 21 September 2011 07:08:43, Yao Qi wrote:
> >> The line of log looks strange to me.  Why LWP 763 switch to itself?
> > 
> > This is non-stop mode, and fetch_inferior_event always prints the
> > "context switch":
> > 
> >   if (non_stop
> >       && ecs->ws.kind != TARGET_WAITKIND_IGNORE
> >       && ecs->ws.kind != TARGET_WAITKIND_EXITED
> >       && ecs->ws.kind != TARGET_WAITKIND_SIGNALLED)
> >     /* In non-stop mode, each thread is handled individually.  Switch
> >        early, so the global state is set correctly for this
> >        thread.  */
> >     context_switch (ecs->ptid);
> > 
> 
> Pedro,
> I don't quite understand this piece of code and the comments here, but I
> think that debug log "context switch from Thread A to Thread A" is not
> useful, if not confusing.  

I guess you'd have to back to a time where context_switch really did more
than just switching the thread.  It used to swap a _bunch_ of globals
with the copies in the thread structure.  It was horrible.

> How about this patch?

Sure.  Okay.

-- 
Pedro Alves


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Displaced stepping not always working as expected
  2011-09-21 10:20 ` Pedro Alves
@ 2011-09-21 20:46   ` Marc Khouzam
  0 siblings, 0 replies; 7+ messages in thread
From: Marc Khouzam @ 2011-09-21 20:46 UTC (permalink / raw)
  To: 'Pedro Alves', gdb

> -----Original Message-----
> From: Pedro Alves [mailto:pedro@codesourcery.com] 
> Sent: Wednesday, September 21, 2011 6:20 AM
> To: gdb@sourceware.org
> Cc: Marc Khouzam
> Subject: Re: Displaced stepping not always working as expected
> 
> On Tuesday 20 September 2011 20:54:24, Marc Khouzam wrote:
> > Hi,
> > 
> > I just need a hint on where next to look...
> 
> > I've been asked to look into problems with non-stop on 
> > a user-mode-linux virtual machine
> > (http://user-mode-linux.sourceforge.net/)
> 
> So does this only happen with UML?  UML uses ptrace internally for
> its own business, I wouldn't be surprised if there's something
> wonky going on at that level.

Yes, only on UML.  In fact, only on a particular installation of UML:
I am not able to reproduce the problem on my own installation of UML.

So, I also believe it is because of the UML.  But I'm hoping it might be
something that can be fixed.  Which is why I'm trying to pin-point
the cause.

> > On that AMD 64bit machine, I cannot step or resume past a breakpoint
> > when using non-stop with a multi-threaded program _if_ any of the
> > threads is still running.  If I interrupt all threads, then 
> displaced
> > stepping works.
> 
> I wouldn't be surprised if the UM kernel is reporting a spurious
> SIGTRAP to gdb.  Try "set debug lin-lwp 1" as well, but I don't
> think it'll tell you much.  Maybe peeking at eflags or the siginfo
> of that SIGTRAP reveals something.

Thanks.
No luck with "set debug lin-lwp 1" but I will try to open up the
siginfo.  I've spend the better part of the day getting familiar
with the relevant code.

> Try "set debug lin-lwp 1", and see if the resume was preempted and
> for some bizarre reason the core is getting a cached wait status
> instead of really resuming the thread.
> 
> Otherwise, this smells like a UML problem.

One bizarre thing I noticed when trying "set debug target 1"
is that more threads get started than there should!

Normally, I get (in summary):

set non-stop on
set target-async on
b 8
r&
Breakpoint 1, thread_exec1 (ptr=0x400888) at multithread.c:8
8               i++;
info thr
  Id   Target Id         Frame 
  2    Thread 0x40804940 (LWP 946) "multi3" thread_exec1 (ptr=0x400888) at multithread.c:8
* 1    Thread 0x40003800 (LWP 943) "multi3" (running)

If I add "set debug target 1" before 'r&', I get:

info thr
  Id   Target Id         Frame 
  4    Thread 0x41806940 (LWP 940) "multi3" thread_exec1 (ptr=0x400888) at multithread.c:8
target_core_of_thread (935) = 0
  3    Thread 0x41005940 (LWP 939) "multi3" (running)
target_core_of_thread (935) = 0
  2    Thread 0x40804940 (LWP 938) "multi3" (running)
target_core_of_thread (935) = 0
* 1    Thread 0x40003800 (LWP 935) "multi3" (running)

What the???  The program (copied below), only starts two threads.

I wonder if libthread is causing some problem when using UML?
When using gdbserver, I do often get a gdbserver warning when attaching:
"PID mismatch!  Expected 789, got 791"
where 791 is the LWP of a thread other than the main one. 

Anyway, I'll keep digging.  Thanks for the pointers.

Marc

My test program
===============

#include <pthread.h>
#include <stdio.h>
#include <unistd.h>

void *thread_exec1(void *ptr) {
    int i;
    for (i=0;i<500;i++) {
        i++;
        i--;
        printf("in the second thread %d\n", i);
        sleep(1);
    }
}

int main() {
    pthread_t thread2;
    int iret2 = pthread_create( &thread2, NULL, thread_exec1, (void*) "Thread 2");

    printf("in the first thread\n");
    int i;
    for (i=0;i<30;i++) {
        sleep(2);    // while here, non-stop can't step over breakpoints
    }
    printf("ABOUT TO CALL JOIN\n");

    pthread_join(thread2, NULL);   // but while here, it can!
    return 0;
}


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-09-21 20:46 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-20 19:54 Displaced stepping not always working as expected Marc Khouzam
2011-09-21  6:09 ` Yao Qi
2011-09-21 10:23   ` Pedro Alves
2011-09-21 15:39     ` Yao Qi
2011-09-21 15:44       ` Pedro Alves
2011-09-21 10:20 ` Pedro Alves
2011-09-21 20:46   ` Marc Khouzam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox