From: Sterling Augustine <saugustine@google.com>
To: Pedro Alves <palves@redhat.com>
Cc: gdb-patches <gdb-patches@sourceware.org>
Subject: Re: [PATCH] RFC: All stepping threads need the same checks and cleanup as the currently stepping thread
Date: Thu, 23 Jan 2014 17:55:00 -0000 [thread overview]
Message-ID: <CAEG7qUzG1K08CLgeXcv6U7gZ+ryVuXfS9+9R9e6NBzK7EAEcgg@mail.gmail.com> (raw)
In-Reply-To: <52DFCFCF.4030101@redhat.com>
On Wed, Jan 22, 2014 at 6:03 AM, Pedro Alves <palves@redhat.com> wrote:
> On 01/21/2014 10:17 PM, Sterling Augustine wrote:
>
>> But the problem is that the thread out of the step range is not the
>> currently stepped thread.
>
> Then I'm quite confused. Each thread has its own step range. How come
> the thread that is not stepped has a step range at all then?
It is stepped. We are switching back to it after a complicated set of
events, see below.
>> Process_event_stop_test calls
>> switch_back_to_stepped_thread, which, in turn, calls resume, bypassing
>> the extra logic in process_event_stop_test to fix up the step range,
>> and leading to the assertion error.
>
> The issue seems to me, as previously discussed, not really about missing
> the "fix up" of the step range, but rather that we overstep the thread
> by mistake.
That is incorrect. The thread's stepping range looks something like
this, in source code:
x = 1; foo(); x = 2;
With the step range equivalent to the single line.
But the stepped-thread gets stopped in foo--that's what all the extra
logic in process_event_stop_test does to fix up the step range.
So the thread is not past the step range at all and will hit it
eventually, but it is outside it. There is logic in
process_event_stop_test to handle this exact case.
> Running the thread through process_event_stop_test makes us
> detect that the step finished (before we ever get to fix up the step
> range).
The step didn't finish. The thread stopped deeper in the stack.
> The thing missing is a testcase clearly showing that that's indeed
> the issue in question. I spent a few days trying to write one from
> scratch a while ago, but failed, because linux-nat.c always gives
> preference to reporting the stepping/SIGTRAP thread if there are multiple
> simultaneous events, and it seemed like another signal needs to be
> involved to trigger this.
Even if I could release this app (which I can't), it is several
gigabytes big, and it takes a while to hit the case--it is obviously a
race that is unusual extremely uncommon.
I have spent a solid week trying to write a case to hit this
independently, but I can't.
> Perhaps we could confirm all this already in a log produced by
> your extra debug outputs ran against your big app?
I have attached a severely trimmed log to the bug here:
https://sourceware.org/bugzilla/show_bug.cgi?id=16292
The relevant lines in the log as far as I can tell are:
157292: LWP 28899 Takes a trace/breakpoint trap, and is inside its step range.
157293: LWP 29437 has a breakpoint pushed back.
157307: LWP 28899 is resumed and prepared to step (still inside step range).
159355: GDB switches contexts from LWP 28899 to LWP 29437
159808: LWP 29437 Takes a trace/breakpoint trap, and GDB sends SIGSTOP
to all other threads
161427: LWP 28899 Takes a Profiling timer expired trap, instead of a
SIGSTOP. outside of its step range.
161466: switch_to_stepped_thread decides to restart LWP 28899
(switch_to_stepped_thread).
161488: assertion failure.
> I'm not convinced the first two branches in switch_back_to_stepped_thread
> should be changed at all. So without those that reduces to exactly the
> original patch I had shown you originally:
>
> https://github.com/palves/gdb/commit/b6b55ba610f8db5d89ec7405c93013a10d9a1c20
>
> Does that alone fix things for you?
Yes, by itself it does work, but I thought you had rejected that for
some reason. The second patch you made breaks things by bypassing the
step-range fix up in process_event_stop_test.
> In that branch, I then later rewrote that fix differently:
>
> https://github.com/palves/gdb/commit/1d56ddf439b6f7e7fa9759cf1f8e02106eea6af5
>
> The idea of that "better fix" was to handle the case mentioned in
> this comment:
>
> + There might be some cases where this loses signal
> + information, if a signal has arrived at exactly the same
> + time that the PC changed, but this is the best we can do
> + with the information available.
>
> by setting a breakpoint at the current PC, and re-resuming the thread.
> That means that if there was indeed some other signal/event pending,
> we'd collect it first. But that's unfinished, and breaks hardware-step
> targets again in the process, for it only handles software-step targets.
> The thing preventing moving this forward is a testcase (or a log showing
> clearly that the problem is what I say above it is, which should show
> the steps needed to construct a testcase).
This doesn't seem to address the case at hand. In any case, the second
patch does not fix it.
I would be fine with your first patch though.
next prev parent reply other threads:[~2014-01-23 17:55 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-21 22:17 Sterling Augustine
2014-01-22 14:04 ` Pedro Alves
2014-01-23 17:55 ` Sterling Augustine [this message]
2014-01-23 18:53 ` Pedro Alves
2014-01-24 12:44 ` Pedro Alves
2014-02-07 19:52 ` Pedro Alves
2014-02-07 20:12 ` Pedro Alves
2014-02-26 17:12 ` Pedro Alves
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAEG7qUzG1K08CLgeXcv6U7gZ+ryVuXfS9+9R9e6NBzK7EAEcgg@mail.gmail.com \
--to=saugustine@google.com \
--cc=gdb-patches@sourceware.org \
--cc=palves@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox