* RFC: skip_inline_frames failed assertion resuming from breakpoint on LynxOS @ 2014-11-20 5:11 Joel Brobecker 2014-11-20 5:12 ` Joel Brobecker 0 siblings, 1 reply; 9+ messages in thread From: Joel Brobecker @ 2014-11-20 5:11 UTC (permalink / raw) To: gdb-patches Hello, I was wondering what you guys would think of a patch like this. I am a bit uncertain, because I don't understand everything that is happening - and the problem is that this is happening with a fairly massive and complex program that I don't have access to, on a system that is also fairly opaque. When I'm lucky, getting answers is only very hard. I am still trying to reproduce the problem locally in order to find out more, but I couldn't understand why, in principle, one thread couldn't receive multiple notifications during the same single-step if the system decides to queue up signals? If that were the case, wouldn't the attached patch make sense? (currently untested against the program that triggered the issue, as I think I understand how inline-frame works, and what it does, but I am not sure I get it all). Thank you! -- Joel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: RFC: skip_inline_frames failed assertion resuming from breakpoint on LynxOS 2014-11-20 5:11 RFC: skip_inline_frames failed assertion resuming from breakpoint on LynxOS Joel Brobecker @ 2014-11-20 5:12 ` Joel Brobecker 2014-11-20 9:55 ` Pedro Alves 0 siblings, 1 reply; 9+ messages in thread From: Joel Brobecker @ 2014-11-20 5:12 UTC (permalink / raw) To: gdb-patches [-- Attachment #1: Type: text/plain, Size: 903 bytes --] [Fixing ENOPATCH... sigh.] > I was wondering what you guys would think of a patch like this. > I am a bit uncertain, because I don't understand everything > that is happening - and the problem is that this is happening > with a fairly massive and complex program that I don't have access > to, on a system that is also fairly opaque. When I'm lucky, getting > answers is only very hard. > > I am still trying to reproduce the problem locally in order to > find out more, but I couldn't understand why, in principle, > one thread couldn't receive multiple notifications during > the same single-step if the system decides to queue up signals? > If that were the case, wouldn't the attached patch make sense? > (currently untested against the program that triggered the issue, > as I think I understand how inline-frame works, and what it does, > but I am not sure I get it all). Thanks again! -- Joel [-- Attachment #2: 0001-skip_inline_frames-failed-assertion-resuming-from-br.patch --] [-- Type: text/x-diff, Size: 4704 bytes --] From f7ad35aa92a7007194582b1e23a110fc06b50cd1 Mon Sep 17 00:00:00 2001 From: Joel Brobecker <brobecker@adacore.com> Date: Thu, 20 Nov 2014 08:38:08 +0400 Subject: [PATCH] skip_inline_frames failed assertion resuming from breakpoint on LynxOS A user reported a failed assertion while debugging their program on a LynxOS system (thus via GDBserver), when trying to resume the program's execution after having reached a breakpoint: (gdb) continue [...] ../../src/gdb/inline-frame.c:339: internal-error: skip_inline_frames: Assertion `find_inline_frame_state (ptid) == NULL' failed. Turning infrun debug traces helps understand a little better what happens: (gdb) continue Continuing. infrun: clear_proceed_status_thread (Thread 126) [...] infrun: clear_proceed_status_thread (Thread 142) [...] infrun: clear_proceed_status_thread (Thread 146) infrun: clear_proceed_status_thread (Thread 125) infrun: proceed (addr=0xffffffff, signal=GDB_SIGNAL_DEFAULT, step=0) infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 142] at 0x10684838 infrun: wait_for_inferior () infrun: target_wait (-1, status) = infrun: 42000 [Thread 146], infrun: status->kind = stopped, signal = GDB_SIGNAL_REALTIME_34 infrun: infwait_normal_state infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x10a187f4 infrun: context switch infrun: Switching context from Thread 142 to Thread 146 infrun: random signal (GDB_SIGNAL_REALTIME_34) infrun: switching back to stepped thread infrun: Switching context from Thread 146 to Thread 142 infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 142] at 0x10684838 infrun: prepare_to_wait [...handling of similar events for threads 145, 144 and 143 snipped...] infrun: prepare_to_wait infrun: target_wait (-1, status) = infrun: 42000 [Thread 146], infrun: status->kind = stopped, signal = GDB_SIGNAL_REALTIME_34 infrun: infwait_normal_state infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x10a187f4 infrun: context switch infrun: Switching context from Thread 142 to Thread 146 ../../src/gdb/inline-frame.c:339: internal-error: skip_inline_frames: Assertion `find_inline_frame_state (ptid) == NULL' failed. It all happens while we're trying to single-step out of the breakpoint. We keep resuming the inferior trying to single-step the thread that hit the breakpoint, but each time we get a notification that another thread received a particular signal. This is OK until the same thread actually received a signal a second time, without having actually run further (same PC). That's when we hit the assertion in skip_inline_frames. This patch avoids the assertion by recognizing that a thread can indeed potentially receive multiple events without changing PC, and by therefore changing skip_inline_frames to return immediately if there we have already computed the inline_state for this thread's PC. gdb/ChangeLog: * inline-frame.c (skip_inline_frames): Do not raise a failed assertion if find_inline_frame_state finds an inlined frame state for PTID. Return early instead. Tested on x86_64-linux. --- gdb/inline-frame.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/gdb/inline-frame.c b/gdb/inline-frame.c index cecb2af..c60820c 100644 --- a/gdb/inline-frame.c +++ b/gdb/inline-frame.c @@ -307,6 +307,24 @@ skip_inline_frames (ptid_t ptid) int skip_count = 0; struct inline_state *state; + if (find_inline_frame_state (ptid) != NULL) + { + /* This thread is receiving multiple notifications without + making progress in its execution (same PC). + + This was seen happening on LynxOS where a program appears + to have a number of signals being queued then delivered + while trying to single-step a thread out of a breakpoint. + The single-step operation makes no progress until all signals + get delivered first, which can result in the same thread + receiving multiple signals during the same single-step + attempt. + + We have already computed the inline_state for that thread, + so there is no need to redo it again. */ + return; + } + /* This function is called right after reinitializing the frame cache. We try not to do more unwinding than absolutely necessary, for performance. */ @@ -335,7 +353,6 @@ skip_inline_frames (ptid_t ptid) } } - gdb_assert (find_inline_frame_state (ptid) == NULL); state = allocate_inline_frame_state (ptid); state->skipped_frames = skip_count; state->saved_pc = this_pc; -- 1.9.1 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: RFC: skip_inline_frames failed assertion resuming from breakpoint on LynxOS 2014-11-20 5:12 ` Joel Brobecker @ 2014-11-20 9:55 ` Pedro Alves 2014-11-20 17:11 ` Joel Brobecker 0 siblings, 1 reply; 9+ messages in thread From: Pedro Alves @ 2014-11-20 9:55 UTC (permalink / raw) To: Joel Brobecker, gdb-patches On 11/20/2014 05:12 AM, Joel Brobecker wrote: >> > I am still trying to reproduce the problem locally in order to >> > find out more, but I couldn't understand why, in principle, >> > one thread couldn't receive multiple notifications during >> > the same single-step if the system decides to queue up signals? >> > If that were the case, wouldn't the attached patch make sense? >> > (currently untested against the program that triggered the issue, >> > as I think I understand how inline-frame works, and what it does, >> > but I am not sure I get it all). > Thanks again! > -- Joel > > > 0001-skip_inline_frames-failed-assertion-resuming-from-br.patch > > > From f7ad35aa92a7007194582b1e23a110fc06b50cd1 Mon Sep 17 00:00:00 2001 > From: Joel Brobecker <brobecker@adacore.com> > Date: Thu, 20 Nov 2014 08:38:08 +0400 > Subject: [PATCH] skip_inline_frames failed assertion resuming from breakpoint > on LynxOS > > A user reported a failed assertion while debugging their program > on a LynxOS system (thus via GDBserver), when trying to resume > the program's execution after having reached a breakpoint: > > (gdb) continue > [...] > ../../src/gdb/inline-frame.c:339: internal-error: skip_inline_frames: Assertion `find_inline_frame_state (ptid) == NULL' failed. > > Turning infrun debug traces helps understand a little better what > happens: > > (gdb) continue > Continuing. > infrun: clear_proceed_status_thread (Thread 126) > [...] > infrun: clear_proceed_status_thread (Thread 142) > [...] > infrun: clear_proceed_status_thread (Thread 146) > infrun: clear_proceed_status_thread (Thread 125) > infrun: proceed (addr=0xffffffff, signal=GDB_SIGNAL_DEFAULT, step=0) > infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 142] at 0x10684838 trap_expected=1 indicates that GDB is about to step thread 142 _only_, leaving everything else stopped. Can you enable "set debug remote 1" as well? > infrun: wait_for_inferior () > infrun: target_wait (-1, status) = > infrun: 42000 [Thread 146], > infrun: status->kind = stopped, signal = GDB_SIGNAL_REALTIME_34 So how come we see an event for thread 146? That thread shouldn't have been resumed, so GDB shouldn't be getting an event for it. This is sounding like a bug in the target. > infrun: infwait_normal_state > infrun: TARGET_WAITKIND_STOPPED > infrun: stop_pc = 0x10a187f4 > infrun: context switch > infrun: Switching context from Thread 142 to Thread 146 > infrun: random signal (GDB_SIGNAL_REALTIME_34) > infrun: switching back to stepped thread > infrun: Switching context from Thread 146 to Thread 142 > infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 142] at 0x10684838 > infrun: prepare_to_wait > [...handling of similar events for threads 145, 144 and 143 snipped...] > infrun: prepare_to_wait > infrun: target_wait (-1, status) = > infrun: 42000 [Thread 146], > infrun: status->kind = stopped, signal = GDB_SIGNAL_REALTIME_34 > infrun: infwait_normal_state > infrun: TARGET_WAITKIND_STOPPED > infrun: stop_pc = 0x10a187f4 > infrun: context switch > infrun: Switching context from Thread 142 to Thread 146 > ../../src/gdb/inline-frame.c:339: internal-error: skip_inline_frames: Assertion `find_inline_frame_state (ptid) == NULL' failed. Thanks, Pedro Alves ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: RFC: skip_inline_frames failed assertion resuming from breakpoint on LynxOS 2014-11-20 9:55 ` Pedro Alves @ 2014-11-20 17:11 ` Joel Brobecker 2014-11-21 10:43 ` Pedro Alves 0 siblings, 1 reply; 9+ messages in thread From: Joel Brobecker @ 2014-11-20 17:11 UTC (permalink / raw) To: Pedro Alves; +Cc: gdb-patches Hi Pedro, > > infrun: proceed (addr=0xffffffff, signal=GDB_SIGNAL_DEFAULT, step=0) > > infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 142] at 0x10684838 > > trap_expected=1 indicates that GDB is about to step thread 142 _only_, leaving > everything else stopped. Can you enable "set debug remote 1" as well? Correct (we are single-stepping out of a breakpoint). Here is the output with remote debugging: | Continuing. | infrun: clear_proceed_status_thread (Thread 126) | infrun: clear_proceed_status_thread (Thread 147) | infrun: clear_proceed_status_thread (Thread 134) | infrun: clear_proceed_status_thread (Thread 135) | infrun: clear_proceed_status_thread (Thread 133) | infrun: clear_proceed_status_thread (Thread 136) | infrun: clear_proceed_status_thread (Thread 127) | infrun: clear_proceed_status_thread (Thread 129) | infrun: clear_proceed_status_thread (Thread 128) | infrun: clear_proceed_status_thread (Thread 130) | infrun: clear_proceed_status_thread (Thread 132) | infrun: clear_proceed_status_thread (Thread 141) | infrun: clear_proceed_status_thread (Thread 131) | infrun: clear_proceed_status_thread (Thread 137) | infrun: clear_proceed_status_thread (Thread 138) | infrun: clear_proceed_status_thread (Thread 139) | infrun: clear_proceed_status_thread (Thread 140) | infrun: clear_proceed_status_thread (Thread 142) | infrun: clear_proceed_status_thread (Thread 143) | infrun: clear_proceed_status_thread (Thread 144) | infrun: clear_proceed_status_thread (Thread 145) | infrun: clear_proceed_status_thread (Thread 146) | infrun: clear_proceed_status_thread (Thread 125) | infrun: proceed (addr=0xffffffff, signal=GDB_SIGNAL_DEFAULT, step=0) | infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 142] at 0x10684838 | Sending packet: $Hg8e#4c...Packet received: OK | Sending packet: $m10684838,4#73...Packet received: 4ba1db21 | Sending packet: $QPassSignals:#f3...Packet received: OK | Sending packet: $vCont;s:8e#8f...infrun: wait_for_inferior () | Packet received: T2e01:3a440910;40:10a187f4;thread:92; | infrun: target_wait (-1, status) = | infrun: 42000 [Thread 146], | infrun: status->kind = stopped, signal = GDB_SIGNAL_REALTIME_34 | infrun: infwait_normal_state | infrun: TARGET_WAITKIND_STOPPED | infrun: stop_pc = 0x10a187f4 | infrun: context switch | infrun: Switching context from Thread 142 to Thread 146 | Sending packet: $m10a187f0,4#c5...Packet received: 44000002 | Sending packet: $m10a187f0,4#c5...Packet received: 44000002 | Sending packet: $g#67...Packet received: 000000c33a4409102003b21020ed76a83a4408d80000000000000007000100010001005b20ed76a820ed79380000000010abd7a10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000020f071402000e6883a4409103fc34833395728754082c13483339573000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000003e112e0be826d69500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010a187f40000f030220000481099d6dc109db33420000000fff80000 | infrun: random signal (GDB_SIGNAL_REALTIME_34) | Sending packet: $T8e#f1...Packet received: OK | infrun: switching back to stepped thread | infrun: Switching context from Thread 146 to Thread 142 | Sending packet: $Hg8e#4c...Packet received: OK | Sending packet: $g#67...Packet received: 103422e83a8948e02003b21000000000000000000000000900000008000000090000000020037a301068480800000000220000420000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000020f071402001791c3a89499000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000106848380002f43042000042103422e81068480820000000fff80000 | infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 142] at 0x10684838 | Sending packet: $m10684838,4#73...Packet received: 4ba1db21 | Sending packet: $vCont;s:8e#8f...infrun: prepare_to_wait | Packet received: T2f01:3a55b910;40:10a187f4;thread:91; | infrun: target_wait (-1, status) = | infrun: 42000 [Thread 145], | infrun: status->kind = stopped, signal = GDB_SIGNAL_REALTIME_35 | infrun: infwait_normal_state | infrun: TARGET_WAITKIND_STOPPED | infrun: stop_pc = 0x10a187f4 | infrun: context switch | infrun: Switching context from Thread 142 to Thread 145 | Sending packet: $m10a187f0,4#c5...Packet received: 44000002 | Sending packet: $m10a187f0,4#c5...Packet received: 44000002 | Sending packet: $g#67...Packet received: 000000c33a55b9102003b21020ed76b03a55b8d800000000000001fe000000010000000120ed76b0100703ac00000000280000020000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000020f071402000e6883a55b9100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010a187f40000f030280000081099d6dc109db33400000000fff80000 | infrun: random signal (GDB_SIGNAL_REALTIME_35) | Sending packet: $T8e#f1...Packet received: OK | infrun: switching back to stepped thread | infrun: Switching context from Thread 145 to Thread 142 | Sending packet: $Hg8e#4c...Packet received: OK | Sending packet: $g#67...Packet received: 103422e83a8948e02003b21000000000000000000000000900000008000000090000000020037a301068480800000000220000420000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000020f071402001791c3a89499000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000106848380002f43042000042103422e81068480820000000fff80000 | infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 142] at 0x10684838 | Sending packet: $m10684838,4#73...Packet received: 4ba1db21 | Sending packet: $vCont;s:8e#8f...infrun: prepare_to_wait | Packet received: T3001:3a65e910;40:10a187f4;thread:90; | infrun: target_wait (-1, status) = | infrun: 42000 [Thread 144], | infrun: status->kind = stopped, signal = GDB_SIGNAL_REALTIME_36 | infrun: infwait_normal_state | infrun: TARGET_WAITKIND_STOPPED | infrun: stop_pc = 0x10a187f4 | infrun: context switch | infrun: Switching context from Thread 142 to Thread 144 | Sending packet: $m10a187f0,4#c5...Packet received: 44000002 | Sending packet: $m10a187f0,4#c5...Packet received: 44000002 | Sending packet: $g#67...Packet received: 000000c33a65e9102003b21020ed76b83a65e8d820f44dcc00000001000000020000000220ed76b800000060000016e020ef70900000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000020f071402000e6883a65e910408206d1cf98259e4081f6d1cf98259e000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000004081f6d1cf98259e00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010a187f40000f030230000481099d6dc109db33400000000fff80000 | infrun: random signal (GDB_SIGNAL_REALTIME_36) | Sending packet: $T8e#f1...Packet received: OK | infrun: switching back to stepped thread | infrun: Switching context from Thread 144 to Thread 142 | Sending packet: $Hg8e#4c...Packet received: OK | Sending packet: $g#67...Packet received: 103422e83a8948e02003b21000000000000000000000000900000008000000090000000020037a301068480800000000220000420000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000020f071402001791c3a89499000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000106848380002f43042000042103422e81068480820000000fff80000 | infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 142] at 0x10684838 | Sending packet: $m10684838,4#73...Packet received: 4ba1db21 | Sending packet: $vCont;s:8e#8f...infrun: prepare_to_wait | Packet received: T3101:3a791910;40:10a187f4;thread:8f; | infrun: target_wait (-1, status) = | infrun: 42000 [Thread 143], | infrun: status->kind = stopped, signal = GDB_SIGNAL_REALTIME_37 | infrun: infwait_normal_state | infrun: TARGET_WAITKIND_STOPPED | infrun: stop_pc = 0x10a187f4 | infrun: context switch | infrun: Switching context from Thread 142 to Thread 143 | Sending packet: $m10a187f0,4#c5...Packet received: 44000002 | Sending packet: $m10a187f0,4#c5...Packet received: 44000002 | Sending packet: $g#67...Packet received: 000000c33a7919102003b21020ed76c03a7918d800000000000002123a7919905448524420ed76c0b07da7b020f07728200000040000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000020f071402000e6883a7919100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010a187f40000f030280000081099d6dc109db33400000000fff80000 | infrun: random signal (GDB_SIGNAL_REALTIME_37) | Sending packet: $T8e#f1...Packet received: OK | infrun: switching back to stepped thread | infrun: Switching context from Thread 143 to Thread 142 | Sending packet: $Hg8e#4c...Packet received: OK | Sending packet: $g#67...Packet received: 103422e83a8948e02003b21000000000000000000000000900000008000000090000000020037a301068480800000000220000420000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000020f071402001791c3a89499000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000106848380002f43042000042103422e81068480820000000fff80000 | infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 142] at 0x10684838 | Sending packet: $m10684838,4#73...Packet received: 4ba1db21 | Sending packet: $vCont;s:8e#8f...infrun: prepare_to_wait | Packet received: T2e01:3a440910;40:10a187f4;thread:92; | infrun: target_wait (-1, status) = | infrun: 42000 [Thread 146], | infrun: status->kind = stopped, signal = GDB_SIGNAL_REALTIME_34 | infrun: infwait_normal_state | infrun: TARGET_WAITKIND_STOPPED | infrun: stop_pc = 0x10a187f4 | infrun: context switch | infrun: Switching context from Thread 142 to Thread 146 | Sending packet: $m10a187f0,4#c5...Packet received: 44000002 | Sending packet: $m10a187f0,4#c5...Packet received: 44000002 | ../../src/gdb/inline-frame.c:339: internal-error: skip_inline_frames: Assertion `find_inline_frame_state (ptid) == NULL' failed. > > infrun: wait_for_inferior () > > infrun: target_wait (-1, status) = > > infrun: 42000 [Thread 146], > > infrun: status->kind = stopped, signal = GDB_SIGNAL_REALTIME_34 > > So how come we see an event for thread 146? That thread shouldn't > have been resumed, so GDB shouldn't be getting an event for it. > > This is sounding like a bug in the target. I thought about this too, and there might be a ptrace request I can use to absolutely limit the resumption to the one thread. I say "might" because only testing will show if the request is supported, and works, on all versions of LynxOS. But I have always been relunctant to do so for 2 reasons [1]: - It affects the program's scheduling; - Can the program lock up if we're trying to single-step a thread that's blocked? Also, what made me consider this change independently of the questions above is that it seems to me that it the situation we are facing here seems to be easily handled. So, to avoid headaches from other "buggy" targets, containing this situation seemed friendlier. Don't we also have other targets that don't have the capability to resume one single thread? -- Joel [1]: I realize that this opens the door for other threads executing this instruction without triggering a breakpoint. I can't explain why I am more concerned by scheduling interference than the probability of missing a breakpoint. I may bite the bullet at some point... ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: RFC: skip_inline_frames failed assertion resuming from breakpoint on LynxOS 2014-11-20 17:11 ` Joel Brobecker @ 2014-11-21 10:43 ` Pedro Alves 2014-12-13 15:46 ` Joel Brobecker 0 siblings, 1 reply; 9+ messages in thread From: Pedro Alves @ 2014-11-21 10:43 UTC (permalink / raw) To: Joel Brobecker; +Cc: gdb-patches On 11/20/2014 05:11 PM, Joel Brobecker wrote: > Hi Pedro, > >>> infrun: proceed (addr=0xffffffff, signal=GDB_SIGNAL_DEFAULT, step=0) >>> infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 142] at 0x10684838 >> >> trap_expected=1 indicates that GDB is about to step thread 142 _only_, leaving >> everything else stopped. Can you enable "set debug remote 1" as well? > > Correct (we are single-stepping out of a breakpoint). > > Here is the output with remote debugging: > | Sending packet: $vCont;s:8e#8f...infrun: wait_for_inferior () Alright, GDB really did resume only thread 0x8e/142. >> So how come we see an event for thread 146? That thread shouldn't >> have been resumed, so GDB shouldn't be getting an event for it. >> >> This is sounding like a bug in the target. > > I thought about this too, and there might be a ptrace request > I can use to absolutely limit the resumption to the one thread. > I say "might" because only testing will show if the request is > supported, and works, on all versions of LynxOS. I had a feeling we had discussed this before... See: https://sourceware.org/ml/gdb-patches/2013-05/msg00436.html The (very) old gdb/lynx-nat.c code in GDB used to do this, so it should work. Could you try it? We're going to be keep hitting all sorts of issues until this is finally done. > > But I have always been relunctant to do so for 2 reasons [1]: > - It affects the program's scheduling; That's hardly an issue, when the program had just completely stopped for a breakpoint. :-) > - Can the program lock up if we're trying to single-step > a thread that's blocked? The thread just hit a breakpoint, so it was not blocked in sense of the kernel not allowing its scheduling before. The main issue is that we're trying to move the thread past a breakpoint. Barring displaced stepping support, to move the thread past the breakpoint, we have to remove the breakpoint from the target temporarily. But then we _cannot_ resume other threads but the one that is stopped at the breakpoint, because then those other threads could fly by the removed breakpoint and miss it. Regarding lock up, the only issue I see is if the instruction the breakpoint was put on is a syscall instruction that calls into the kernel and that could block. That's a corner case that we e.g., never found the need to handle on Linux. syscalls tend to wrapped in libc functions, so users don't normally put breakpoints on syscall instructions. But still, there would be ways to handle it. E.g., when stepping, ask the kernel to report syscall entry, and if a syscall entry is detected, we know the instruction has executed, so we can reinsert breakpoints, and resume execution of all threads again. Similarly to how we always want to be notified of signals when we step. (From infrun.c: "If we have removed breakpoints because we are stepping over one (in any thread), we need to receive all signals to avoid accidentally skipping a breakpoint during execution of a signal handler.") > Also, what made me consider this change independently of the questions > above is that it seems to me that it the situation we are facing here > seems to be easily handled. So, to avoid headaches from other "buggy" > targets, containing this situation seemed friendlier. Don't we also > have other targets that don't have the capability to resume one single > thread? I honestly hope not. Resuming only a particular thread is a very basic debug API feature. Thanks, Pedro Alves ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: RFC: skip_inline_frames failed assertion resuming from breakpoint on LynxOS 2014-11-21 10:43 ` Pedro Alves @ 2014-12-13 15:46 ` Joel Brobecker 2014-12-15 13:11 ` Pedro Alves 0 siblings, 1 reply; 9+ messages in thread From: Joel Brobecker @ 2014-12-13 15:46 UTC (permalink / raw) To: Pedro Alves; +Cc: gdb-patches [-- Attachment #1: Type: text/plain, Size: 1970 bytes --] Hi Pedro, > The main issue is that we're trying to move the thread past a > breakpoint. Barring displaced stepping support, to move the > thread past the breakpoint, we have to remove the breakpoint from > the target temporarily. But then we _cannot_ resume other threads > but the one that is stopped at the breakpoint, because then those > other threads could fly by the removed breakpoint and miss it. Attached is a patch that does just that, tested on ppc-lynx5 and ppc-lynx178. I waited a while before posting it here, because I wanted to put it in observation for a while first... gdb/gdbserver/ChangeLog: * lynx-low.c (lynx_resume): Use PTRACE_SINGLESTEP_ONE if N == 1. Remove FIXME comment about assumption about N. OK to commit? Note that parallel to that, I came across another issue, which I am going to call a limitation for now: consider the case where we have 2 threads, A and B, and we are tring to next/step some code in thread A. While doing so, thread B receives a signal, and therefore reports it to GDB. GDB sees that this signal is configured as nostop/noprint/pass, so presumably, you would think that we'd resume the inferior passing that signal to thread B. However, how do you do that while at the same time stepping thread A? IIRC, what happens currently in this case is that GDB keeps trying to resume/step thread A, and the kernel keeps telling GDB "no, thread B just received a signal", and so GDB and the kernel go into that infinite loop where nothing advances. I'm not quite sure why we keep getting the signal for thread B, if it's a new signal each time, or if it's about the signal not being passed back (the program I saw this in is fairly large and complicated). In any case, I don't see how we could improve this situation without settting sss-like breakpoints... Something I'm not really eager to do, at least for now, since "set scheduler-locking step" seems to work around the issue. Thanks! -- Joel [-- Attachment #2: 0001-gdbserver-lynxos-Use-PTRACE_SINGLESTEP_ONE-when-sing.patch --] [-- Type: text/x-diff, Size: 3831 bytes --] From ea7e173463120d24417a7706f98fff850f9aaa1a Mon Sep 17 00:00:00 2001 From: Joel Brobecker <brobecker@adacore.com> Date: Tue, 25 Nov 2014 11:12:10 -0500 Subject: [PATCH] [gdbserver/lynxos] Use PTRACE_SINGLESTEP_ONE when single-stepping one thread. Currently, when we receive a request to single-step one single thread (Eg, when single-stepping out of a breakpoint), we use the PTRACE_SINGLESTEP pthread request, which does single-step the corresponding thread, but also resumes execution of all other threads in the inferior. This causes problems when debugging programs where another thread receives multiple debug events while trying to single-step a specific thread out of a breakpoint (with infrun traces turned on): (gdb) continue Continuing. infrun: clear_proceed_status_thread (Thread 126) [...] infrun: clear_proceed_status_thread (Thread 142) [...] infrun: clear_proceed_status_thread (Thread 146) infrun: clear_proceed_status_thread (Thread 125) infrun: proceed (addr=0xffffffff, signal=GDB_SIGNAL_DEFAULT, step=0) infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 142] at 0x10684838 infrun: wait_for_inferior () infrun: target_wait (-1, status) = infrun: 42000 [Thread 146], infrun: status->kind = stopped, signal = GDB_SIGNAL_REALTIME_34 infrun: infwait_normal_state infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x10a187f4 infrun: context switch infrun: Switching context from Thread 142 to Thread 146 infrun: random signal (GDB_SIGNAL_REALTIME_34) infrun: switching back to stepped thread infrun: Switching context from Thread 146 to Thread 142 infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 142] at 0x10684838 infrun: prepare_to_wait [...handling of similar events for threads 145, 144 and 143 snipped...] infrun: prepare_to_wait infrun: target_wait (-1, status) = infrun: 42000 [Thread 146], infrun: status->kind = stopped, signal = GDB_SIGNAL_REALTIME_34 infrun: infwait_normal_state infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x10a187f4 infrun: context switch infrun: Switching context from Thread 142 to Thread 146 ../../src/gdb/inline-frame.c:339: internal-error: skip_inline_frames: Assertion `find_inline_frame_state (ptid) == NULL' failed. What happens is that GDB keeps sending requests to resume one specific thread, and keeps receiving debugging events for other threads. Things break down when the one of the other threads receives a debug event for the second time (thread 146 in the example above). This patch fixes the problem by making sure that only one thread gets resumed, thus preventing the other threads from generating an unexpected event. gdb/gdbserver/ChangeLog: * lynx-low.c (lynx_resume): Use PTRACE_SINGLESTEP_ONE if N == 1. Remove FIXME comment about assumption about N. --- gdb/gdbserver/lynx-low.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/gdb/gdbserver/lynx-low.c b/gdb/gdbserver/lynx-low.c index 6178e03..3b83669 100644 --- a/gdb/gdbserver/lynx-low.c +++ b/gdb/gdbserver/lynx-low.c @@ -320,10 +320,11 @@ lynx_attach (unsigned long pid) static void lynx_resume (struct thread_resume *resume_info, size_t n) { - /* FIXME: Assume for now that n == 1. */ ptid_t ptid = resume_info[0].thread; - const int request = (resume_info[0].kind == resume_step - ? PTRACE_SINGLESTEP : PTRACE_CONT); + const int request + = (resume_info[0].kind == resume_step + ? (n == 1 ? PTRACE_SINGLESTEP_ONE : PTRACE_SINGLESTEP) + : PTRACE_CONT); const int signal = resume_info[0].sig; /* If given a minus_one_ptid, then try using the current_process' -- 1.9.1 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: RFC: skip_inline_frames failed assertion resuming from breakpoint on LynxOS 2014-12-13 15:46 ` Joel Brobecker @ 2014-12-15 13:11 ` Pedro Alves 2014-12-15 14:58 ` Joel Brobecker 0 siblings, 1 reply; 9+ messages in thread From: Pedro Alves @ 2014-12-15 13:11 UTC (permalink / raw) To: Joel Brobecker; +Cc: gdb-patches On 12/13/2014 03:46 PM, Joel Brobecker wrote: > Hi Pedro, > >> The main issue is that we're trying to move the thread past a >> breakpoint. Barring displaced stepping support, to move the >> thread past the breakpoint, we have to remove the breakpoint from >> the target temporarily. But then we _cannot_ resume other threads >> but the one that is stopped at the breakpoint, because then those >> other threads could fly by the removed breakpoint and miss it. > > Attached is a patch that does just that, tested on ppc-lynx5 and > ppc-lynx178. I waited a while before posting it here, because > I wanted to put it in observation for a while first... > > gdb/gdbserver/ChangeLog: > > * lynx-low.c (lynx_resume): Use PTRACE_SINGLESTEP_ONE if N == 1. > Remove FIXME comment about assumption about N. > > OK to commit? Sure, OK. > > Note that parallel to that, I came across another issue, which I am > going to call a limitation for now: consider the case where we have > 2 threads, A and B, and we are tring to next/step some code in thread > A. While doing so, thread B receives a signal, and therefore reports > it to GDB. GDB sees that this signal is configured as > nostop/noprint/pass, so presumably, you would think that we'd resume > the inferior passing that signal to thread B. However, how do you do > that while at the same time stepping thread A? GDB nowadays sends a single vCont packet that both steps thread A, continues thread B with a signal and continues all other threads with no signal (previously in some cases it'd just lose control of the inferior, or deliver the signal to the wrong thread). Something like: vCont;s:A;C SIG:B;c See the switch_back_to_stepped_thread calls within: if (random_signal) { at the tail end of handle_signal_stop, and remote.c:append_pending_thread_resumptions. There are tests in the testsuite that result in packets just like that. > > IIRC, what happens currently in this case is that GDB keeps trying > to resume/step thread A, and the kernel keeps telling GDB "no, > thread B just received a signal", and so GDB and the kernel go > into that infinite loop where nothing advances. I'm not quite sure > why we keep getting the signal for thread B, if it's a new signal > each time, or if it's about the signal not being passed back (the > program I saw this in is fairly large and complicated). > > In any case, I don't see how we could improve this situation > without settting sss-like breakpoints... Something I'm not really > eager to do, at least for now, since "set scheduler-locking step" > seems to work around the issue. Couldn't you iterate over the threads, and use PTRACE_STEP_ONE for the stepped threads, and PTRACE_CONT_ONE for the others, instead of PTRACE_CONT ? For the case above, lynx_resume would end up issuing: PTRACE_STEP_ONE, thread A, sig 0 PTRACE_CONT_ONE, thread B, sig SIG PTRACE_CONT_ONE, thread C, sig 0 PTRACE_CONT_ONE, thread D, sig 0 ... Otherwise, yeah, sounds like handling the step request with breakpoints instead might be the solution. Thanks, Pedro Alves ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: RFC: skip_inline_frames failed assertion resuming from breakpoint on LynxOS 2014-12-15 13:11 ` Pedro Alves @ 2014-12-15 14:58 ` Joel Brobecker 2014-12-15 16:01 ` Pedro Alves 0 siblings, 1 reply; 9+ messages in thread From: Joel Brobecker @ 2014-12-15 14:58 UTC (permalink / raw) To: Pedro Alves; +Cc: gdb-patches > > gdb/gdbserver/ChangeLog: > > > > * lynx-low.c (lynx_resume): Use PTRACE_SINGLESTEP_ONE if N == 1. > > Remove FIXME comment about assumption about N. > > > > OK to commit? > > Sure, OK. Thank you, pushed! > GDB nowadays sends a single vCont packet that both steps thread A, > continues thread B with a signal and continues all other threads with > no signal (previously in some cases it'd just lose control of the > inferior, or deliver the signal to the wrong thread). Something like: > > vCont;s:A;C SIG:B;c [...] > Couldn't you iterate over the threads, and use PTRACE_STEP_ONE > for the stepped threads, and PTRACE_CONT_ONE for the others, > instead of PTRACE_CONT ? For the case above, lynx_resume would > end up issuing: > > PTRACE_STEP_ONE, thread A, sig 0 > PTRACE_CONT_ONE, thread B, sig SIG > PTRACE_CONT_ONE, thread C, sig 0 > PTRACE_CONT_ONE, thread D, sig 0 Interesting. Do you mean sending those requests without waiting for the inferior to stop? I'd have to verify that it's possible to send ptrace requests while the inferior is "in flight", but wouldn't you then have possible race conditions? -- Joel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: RFC: skip_inline_frames failed assertion resuming from breakpoint on LynxOS 2014-12-15 14:58 ` Joel Brobecker @ 2014-12-15 16:01 ` Pedro Alves 0 siblings, 0 replies; 9+ messages in thread From: Pedro Alves @ 2014-12-15 16:01 UTC (permalink / raw) To: Joel Brobecker; +Cc: gdb-patches On 12/15/2014 02:58 PM, Joel Brobecker wrote: >> GDB nowadays sends a single vCont packet that both steps thread A, >> continues thread B with a signal and continues all other threads with >> no signal (previously in some cases it'd just lose control of the >> inferior, or deliver the signal to the wrong thread). Something like: >> >> vCont;s:A;C SIG:B;c > [...] >> Couldn't you iterate over the threads, and use PTRACE_STEP_ONE >> for the stepped threads, and PTRACE_CONT_ONE for the others, >> instead of PTRACE_CONT ? For the case above, lynx_resume would >> end up issuing: >> >> PTRACE_STEP_ONE, thread A, sig 0 >> PTRACE_CONT_ONE, thread B, sig SIG >> PTRACE_CONT_ONE, thread C, sig 0 >> PTRACE_CONT_ONE, thread D, sig 0 > > Interesting. Do you mean sending those requests without waiting > for the inferior to stop? Yes. This is what we do e.g., on Linux. It just sounds like Lynx's PTRACE_CONT_ONE is like Linux's PTRACE_CONT. Linux has no equivalent of Lynx's PTRACE_CONT (resume all threads with a single request). > I'd have to verify that it's possible > to send ptrace requests while the inferior is "in flight", but > wouldn't you then have possible race conditions? Not sure what sort of race conditions you mean, but keep in mind that I'm pretty clueless about Lynx. :-) Thanks, Pedro Alves ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-12-15 16:01 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-11-20 5:11 RFC: skip_inline_frames failed assertion resuming from breakpoint on LynxOS Joel Brobecker 2014-11-20 5:12 ` Joel Brobecker 2014-11-20 9:55 ` Pedro Alves 2014-11-20 17:11 ` Joel Brobecker 2014-11-21 10:43 ` Pedro Alves 2014-12-13 15:46 ` Joel Brobecker 2014-12-15 13:11 ` Pedro Alves 2014-12-15 14:58 ` Joel Brobecker 2014-12-15 16:01 ` Pedro Alves
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox