* Displaced stepping not always working as expected
@ 2011-09-20 19:54 Marc Khouzam
2011-09-21 6:09 ` Yao Qi
2011-09-21 10:20 ` Pedro Alves
0 siblings, 2 replies; 7+ messages in thread
From: Marc Khouzam @ 2011-09-20 19:54 UTC (permalink / raw)
To: 'gdb@sourceware.org'
Hi,
I just need a hint on where next to look...
I've been asked to look into problems with non-stop on
a user-mode-linux virtual machine
(http://user-mode-linux.sourceforge.net/)
On that AMD 64bit machine, I cannot step or resume past a breakpoint
when using non-stop with a multi-threaded program _if_ any of the
threads is still running. If I interrupt all threads, then displaced
stepping works.
During the failure case, I confirmed that the displaced
instruction does _not_ get executed (the memory it should have
changed stays the same). So, the PC stays in the same place
and the step does not move forward.
I tried to turn on 'set debug infrun 1', but I get the exact same
logs during the failure as during a success case.
Sometimes, if I keep trying to step, it will finally work (could be
after 3 attempts, could be after 100 attempts or more). It seems
related to what the other running thread is doing at the time.
Can someone let me know where in GDB I can look to see why a displaced
instruction is not being executed? Or maybe other debug logs to enable?
For more details, below are stripped logs showing the problem as
concisely as possible.
Thanks a lot for any guidance
Marc
Displaced logs showing PC stuck:
===============================
displaced: stepping Thread 0x40b21940 (LWP 763) now
displaced: saved 0x4006d2: 49 89 d1 5e 48 89 e2 48 83 e4 f0 50 54 49 c7 c0
displaced: copy 0x40083e->0x4006d2: 83 6d fc 01 8b 75 fc bf 8c 09 40 00 b8 00 00 00
====> PC being displaced
displaced: displaced pc to 0x4006d2
====> Instruction being run
displaced: run 0x4006d2: 83 6d fc 01
^^^^^^^^
displaced: restored 0x4006d2
displaced: fixup (0x40083e, 0x4006d2), insn = 0x83 0x6d ...
====> PC being relocated from the same address
====> as the displaced instruction!
displaced: relocated %rip from 0x4006d2 to 0x40083e
^^^^^^^^
'next' operation stuck at line 9 of my program:
==============================================
(gdb) n
infrun: clear_proceed_status_thread (Thread 0x40b21940 (LWP 763))
infrun: proceed (addr=0xffffffffffffffff, signal=144, step=1)
infrun: resume (step=1, signal=0), trap_expected=1
displaced: stepping Thread 0x40b21940 (LWP 763) now
displaced: saved 0x4006d2: 49 89 d1 5e 48 89 e2 48 83 e4 f0 50 54 49 c7 c0
displaced: copy 0x40083e->0x4006d2: 83 6d fc 01 8b 75 fc bf 8c 09 40 00 b8 00 00 00
displaced: displaced pc to 0x4006d2
displaced: run 0x4006d2: 83 6d fc 01
infrun: target_wait (-1, status) =
infrun: 760 [Thread 0x40b21940 (LWP 763)],
infrun: status->kind = stopped, signal = SIGTRAP
infrun: Switching context from Thread 0x40b21940 (LWP 763) to Thread 0x40b21940 (LWP 763)
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_STOPPED
displaced: restored 0x4006d2
displaced: fixup (0x40083e, 0x4006d2), insn = 0x83 0x6d ...
displaced: relocated %rip from 0x4006d2 to 0x40083e
infrun: stop_pc = 0x40083e
infrun: BPSTAT_WHAT_STOP_NOISY
infrun: stop_stepping
Breakpoint 2, thread_exec1 (ptr=0x40095c) at multithread.c:9
9 i--;
(gdb) infrun: target_wait (-1, status) =
infrun: -1 [process -1],
infrun: status->kind = ignore
infrun: TARGET_WAITKIND_IGNORE
infrun: prepare_to_wait
'next' operation that finally gets to line 10:
(exact same output as failure except PC gets incremented)
=========================================================
(gdb) n
infrun: clear_proceed_status_thread (Thread 0x40b21940 (LWP 763))
infrun: proceed (addr=0xffffffffffffffff, signal=144, step=1)
infrun: resume (step=1, signal=0), trap_expected=1
displaced: stepping Thread 0x40b21940 (LWP 763) now
displaced: saved 0x4006d2: 49 89 d1 5e 48 89 e2 48 83 e4 f0 50 54 49 c7 c0
displaced: copy 0x40083e->0x4006d2: 83 6d fc 01 8b 75 fc bf 8c 09 40 00 b8 00 00 00
displaced: displaced pc to 0x4006d2
displaced: run 0x4006d2: 83 6d fc 01
infrun: target_wait (-1, status) =
infrun: 760 [Thread 0x40b21940 (LWP 763)],
infrun: status->kind = stopped, signal = SIGTRAP
infrun: Switching context from Thread 0x40b21940 (LWP 763) to Thread 0x40b21940 (LWP 763)
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_STOPPED
displaced: restored 0x4006d2
displaced: fixup (0x40083e, 0x4006d2), insn = 0x83 0x6d ...
displaced: relocated %rip from 0x4006d6 to 0x400842
infrun: stop_pc = 0x400842
infrun: stepped to a different line
infrun: stop_stepping
10 printf("in the second thread %d\n", i);
(gdb) infrun: target_wait (-1, status) =
infrun: -1 [process -1],
infrun: status->kind = ignore
infrun: TARGET_WAITKIND_IGNORE
infrun: prepare_to_wait
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Displaced stepping not always working as expected
2011-09-20 19:54 Displaced stepping not always working as expected Marc Khouzam
@ 2011-09-21 6:09 ` Yao Qi
2011-09-21 10:23 ` Pedro Alves
2011-09-21 10:20 ` Pedro Alves
1 sibling, 1 reply; 7+ messages in thread
From: Yao Qi @ 2011-09-21 6:09 UTC (permalink / raw)
To: Marc Khouzam; +Cc: 'gdb@sourceware.org'
On Tue, 2011-09-20 at 15:54 -0400, Marc Khouzam wrote:
> Can someone let me know where in GDB I can look to see why a displaced
> instruction is not being executed? Or maybe other debug logs to enable?
>
Usually, I use 'set debug displaced 1' and 'set debug infrun 1'
together. Looks you have used them. I don't know extra debug log to
turn on.
> ^^^^^^^^
> 'next' operation stuck at line 9 of my program:
> ==============================================
> (gdb) n
> infrun: clear_proceed_status_thread (Thread 0x40b21940 (LWP 763))
> infrun: proceed (addr=0xffffffffffffffff, signal=144, step=1)
> infrun: resume (step=1, signal=0), trap_expected=1
> displaced: stepping Thread 0x40b21940 (LWP 763) now
> displaced: saved 0x4006d2: 49 89 d1 5e 48 89 e2 48 83 e4 f0 50 54 49 c7 c0
> displaced: copy 0x40083e->0x4006d2: 83 6d fc 01 8b 75 fc bf 8c 09 40 00 b8 00 00 00
> displaced: displaced pc to 0x4006d2
> displaced: run 0x4006d2: 83 6d fc 01
> infrun: target_wait (-1, status) =
> infrun: 760 [Thread 0x40b21940 (LWP 763)],
> infrun: status->kind = stopped, signal = SIGTRAP
> infrun: Switching context from Thread 0x40b21940 (LWP 763) to Thread 0x40b21940 (LWP 763)
The line of log looks strange to me. Why LWP 763 switch to itself?
Usually, what I saw here is "from Thread A to Thread B", while A != B.
In this case, I know current displaced stepping infrastructure doesn't
handle "thread context switch during displaced stepping". That is, when
some insns are copied to scratch pad and executed in thread A, but
thread B gets an event first. Then gdb will forget that Thread A is
displaced stepping after handle event for Thread B. This problem is
quite similar to context switch during software-single-step.
I don't think this is the same problem you asked here. I had drafted a
patch for this problem I described above, but still need some time.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Displaced stepping not always working as expected
2011-09-20 19:54 Displaced stepping not always working as expected Marc Khouzam
2011-09-21 6:09 ` Yao Qi
@ 2011-09-21 10:20 ` Pedro Alves
2011-09-21 20:46 ` Marc Khouzam
1 sibling, 1 reply; 7+ messages in thread
From: Pedro Alves @ 2011-09-21 10:20 UTC (permalink / raw)
To: gdb; +Cc: Marc Khouzam
On Tuesday 20 September 2011 20:54:24, Marc Khouzam wrote:
> Hi,
>
> I just need a hint on where next to look...
> I've been asked to look into problems with non-stop on
> a user-mode-linux virtual machine
> (http://user-mode-linux.sourceforge.net/)
So does this only happen with UML? UML uses ptrace internally for
its own business, I wouldn't be surprised if there's something
wonky going on at that level.
> On that AMD 64bit machine, I cannot step or resume past a breakpoint
> when using non-stop with a multi-threaded program _if_ any of the
> threads is still running. If I interrupt all threads, then displaced
> stepping works.
I wouldn't be surprised if the UM kernel is reporting a spurious
SIGTRAP to gdb. Try "set debug lin-lwp 1" as well, but I don't
think it'll tell you much. Maybe peeking at eflags or the siginfo
of that SIGTRAP reveals something.
> During the failure case, I confirmed that the displaced
> instruction does _not_ get executed (the memory it should have
> changed stays the same). So, the PC stays in the same place
> and the step does not move forward.
>
> I tried to turn on 'set debug infrun 1', but I get the exact same
> logs during the failure as during a success case.
>
> Sometimes, if I keep trying to step, it will finally work (could be
> after 3 attempts, could be after 100 attempts or more). It seems
> related to what the other running thread is doing at the time.
>
> Can someone let me know where in GDB I can look to see why a displaced
> instruction is not being executed? Or maybe other debug logs to enable?
Try "set debug lin-lwp 1", and see if the resume was preempted and
for some bizarre reason the core is getting a cached wait status
instead of really resuming the thread.
Otherwise, this smells like a UML problem.
--
Pedro Alves
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Displaced stepping not always working as expected
2011-09-21 6:09 ` Yao Qi
@ 2011-09-21 10:23 ` Pedro Alves
2011-09-21 15:39 ` Yao Qi
0 siblings, 1 reply; 7+ messages in thread
From: Pedro Alves @ 2011-09-21 10:23 UTC (permalink / raw)
To: gdb; +Cc: Yao Qi, Marc Khouzam
On Wednesday 21 September 2011 07:08:43, Yao Qi wrote:
> The line of log looks strange to me. Why LWP 763 switch to itself?
This is non-stop mode, and fetch_inferior_event always prints the
"context switch":
if (non_stop
&& ecs->ws.kind != TARGET_WAITKIND_IGNORE
&& ecs->ws.kind != TARGET_WAITKIND_EXITED
&& ecs->ws.kind != TARGET_WAITKIND_SIGNALLED)
/* In non-stop mode, each thread is handled individually. Switch
early, so the global state is set correctly for this
thread. */
context_switch (ecs->ptid);
--
Pedro Alves
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Displaced stepping not always working as expected
2011-09-21 10:23 ` Pedro Alves
@ 2011-09-21 15:39 ` Yao Qi
2011-09-21 15:44 ` Pedro Alves
0 siblings, 1 reply; 7+ messages in thread
From: Yao Qi @ 2011-09-21 15:39 UTC (permalink / raw)
To: Pedro Alves; +Cc: gdb, Marc Khouzam
On 09/21/2011 06:23 PM, Pedro Alves wrote:
> On Wednesday 21 September 2011 07:08:43, Yao Qi wrote:
>> The line of log looks strange to me. Why LWP 763 switch to itself?
>
> This is non-stop mode, and fetch_inferior_event always prints the
> "context switch":
>
> if (non_stop
> && ecs->ws.kind != TARGET_WAITKIND_IGNORE
> && ecs->ws.kind != TARGET_WAITKIND_EXITED
> && ecs->ws.kind != TARGET_WAITKIND_SIGNALLED)
> /* In non-stop mode, each thread is handled individually. Switch
> early, so the global state is set correctly for this
> thread. */
> context_switch (ecs->ptid);
>
Pedro,
I don't quite understand this piece of code and the comments here, but I
think that debug log "context switch from Thread A to Thread A" is not
useful, if not confusing. How about this patch?
--
Yao (é½å°§)
gdb/
* infrun.c (context_switch): Print debug message when switching to
a different thread.
diff --git a/gdb/infrun.c b/gdb/infrun.c
index 9a2de5c..225034c 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -2852,7 +2852,7 @@ nullify_last_target_wait_ptid (void)
static void
context_switch (ptid_t ptid)
{
- if (debug_infrun)
+ if (debug_infrun && !ptid_equal (ptid, inferior_ptid))
{
fprintf_unfiltered (gdb_stdlog, "infrun: Switching context from %s ",
target_pid_to_str (inferior_ptid));
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Displaced stepping not always working as expected
2011-09-21 15:39 ` Yao Qi
@ 2011-09-21 15:44 ` Pedro Alves
0 siblings, 0 replies; 7+ messages in thread
From: Pedro Alves @ 2011-09-21 15:44 UTC (permalink / raw)
To: Yao Qi; +Cc: gdb, Marc Khouzam
On Wednesday 21 September 2011 16:39:06, Yao Qi wrote:
> On 09/21/2011 06:23 PM, Pedro Alves wrote:
> > On Wednesday 21 September 2011 07:08:43, Yao Qi wrote:
> >> The line of log looks strange to me. Why LWP 763 switch to itself?
> >
> > This is non-stop mode, and fetch_inferior_event always prints the
> > "context switch":
> >
> > if (non_stop
> > && ecs->ws.kind != TARGET_WAITKIND_IGNORE
> > && ecs->ws.kind != TARGET_WAITKIND_EXITED
> > && ecs->ws.kind != TARGET_WAITKIND_SIGNALLED)
> > /* In non-stop mode, each thread is handled individually. Switch
> > early, so the global state is set correctly for this
> > thread. */
> > context_switch (ecs->ptid);
> >
>
> Pedro,
> I don't quite understand this piece of code and the comments here, but I
> think that debug log "context switch from Thread A to Thread A" is not
> useful, if not confusing.
I guess you'd have to back to a time where context_switch really did more
than just switching the thread. It used to swap a _bunch_ of globals
with the copies in the thread structure. It was horrible.
> How about this patch?
Sure. Okay.
--
Pedro Alves
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: Displaced stepping not always working as expected
2011-09-21 10:20 ` Pedro Alves
@ 2011-09-21 20:46 ` Marc Khouzam
0 siblings, 0 replies; 7+ messages in thread
From: Marc Khouzam @ 2011-09-21 20:46 UTC (permalink / raw)
To: 'Pedro Alves', gdb
> -----Original Message-----
> From: Pedro Alves [mailto:pedro@codesourcery.com]
> Sent: Wednesday, September 21, 2011 6:20 AM
> To: gdb@sourceware.org
> Cc: Marc Khouzam
> Subject: Re: Displaced stepping not always working as expected
>
> On Tuesday 20 September 2011 20:54:24, Marc Khouzam wrote:
> > Hi,
> >
> > I just need a hint on where next to look...
>
> > I've been asked to look into problems with non-stop on
> > a user-mode-linux virtual machine
> > (http://user-mode-linux.sourceforge.net/)
>
> So does this only happen with UML? UML uses ptrace internally for
> its own business, I wouldn't be surprised if there's something
> wonky going on at that level.
Yes, only on UML. In fact, only on a particular installation of UML:
I am not able to reproduce the problem on my own installation of UML.
So, I also believe it is because of the UML. But I'm hoping it might be
something that can be fixed. Which is why I'm trying to pin-point
the cause.
> > On that AMD 64bit machine, I cannot step or resume past a breakpoint
> > when using non-stop with a multi-threaded program _if_ any of the
> > threads is still running. If I interrupt all threads, then
> displaced
> > stepping works.
>
> I wouldn't be surprised if the UM kernel is reporting a spurious
> SIGTRAP to gdb. Try "set debug lin-lwp 1" as well, but I don't
> think it'll tell you much. Maybe peeking at eflags or the siginfo
> of that SIGTRAP reveals something.
Thanks.
No luck with "set debug lin-lwp 1" but I will try to open up the
siginfo. I've spend the better part of the day getting familiar
with the relevant code.
> Try "set debug lin-lwp 1", and see if the resume was preempted and
> for some bizarre reason the core is getting a cached wait status
> instead of really resuming the thread.
>
> Otherwise, this smells like a UML problem.
One bizarre thing I noticed when trying "set debug target 1"
is that more threads get started than there should!
Normally, I get (in summary):
set non-stop on
set target-async on
b 8
r&
Breakpoint 1, thread_exec1 (ptr=0x400888) at multithread.c:8
8 i++;
info thr
Id Target Id Frame
2 Thread 0x40804940 (LWP 946) "multi3" thread_exec1 (ptr=0x400888) at multithread.c:8
* 1 Thread 0x40003800 (LWP 943) "multi3" (running)
If I add "set debug target 1" before 'r&', I get:
info thr
Id Target Id Frame
4 Thread 0x41806940 (LWP 940) "multi3" thread_exec1 (ptr=0x400888) at multithread.c:8
target_core_of_thread (935) = 0
3 Thread 0x41005940 (LWP 939) "multi3" (running)
target_core_of_thread (935) = 0
2 Thread 0x40804940 (LWP 938) "multi3" (running)
target_core_of_thread (935) = 0
* 1 Thread 0x40003800 (LWP 935) "multi3" (running)
What the??? The program (copied below), only starts two threads.
I wonder if libthread is causing some problem when using UML?
When using gdbserver, I do often get a gdbserver warning when attaching:
"PID mismatch! Expected 789, got 791"
where 791 is the LWP of a thread other than the main one.
Anyway, I'll keep digging. Thanks for the pointers.
Marc
My test program
===============
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
void *thread_exec1(void *ptr) {
int i;
for (i=0;i<500;i++) {
i++;
i--;
printf("in the second thread %d\n", i);
sleep(1);
}
}
int main() {
pthread_t thread2;
int iret2 = pthread_create( &thread2, NULL, thread_exec1, (void*) "Thread 2");
printf("in the first thread\n");
int i;
for (i=0;i<30;i++) {
sleep(2); // while here, non-stop can't step over breakpoints
}
printf("ABOUT TO CALL JOIN\n");
pthread_join(thread2, NULL); // but while here, it can!
return 0;
}
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-09-21 20:46 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-20 19:54 Displaced stepping not always working as expected Marc Khouzam
2011-09-21 6:09 ` Yao Qi
2011-09-21 10:23 ` Pedro Alves
2011-09-21 15:39 ` Yao Qi
2011-09-21 15:44 ` Pedro Alves
2011-09-21 10:20 ` Pedro Alves
2011-09-21 20:46 ` Marc Khouzam
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox