From: Simon Marchi via Gdb-patches <gdb-patches@sourceware.org>
To: gdb-patches@sourceware.org
Cc: Simon Marchi <simon.marchi@efficios.com>
Subject: Re: [PATCH v3 4/5] gdb: generalize commit_resume, avoid commit-resuming when threads have pending statuses
Date: Tue, 12 Jan 2021 12:14:06 -0500 [thread overview]
Message-ID: <ef5a6aad-fa3c-7887-9322-e6c9dbf87a34@polymtl.ca> (raw)
In-Reply-To: <20210108041734.3873826-5-simon.marchi@polymtl.ca>
On 2021-01-07 11:17 p.m., Simon Marchi via Gdb-patches wrote:
> From: Simon Marchi <simon.marchi@efficios.com>
>
> The rationale for this patch comes from the ROCm port [1], the goal
> being to reduce the number of back and forths between GDB and the target
> when doing successive operations. I'll start with explaining the
> rationale and then go over the implementation. In the ROCm / GPU world,
> the term "wave" is somewhat equivalent to a "thread" in GDB. So if you
> read if from a GPU stand point, just s/thread/wave/.
>
> ROCdbgapi, the library used by GDB [2] to communicate with the GPU
> target, gives the illusion that it's possible for the debugger to
> control (start and stop) individual threads. But in reality, this is
> not how it works. Under the hood, all threads of a queue are controlled
> as a group. To stop one thread in a group of running ones, the state of
> all threads is retrieved from the GPU, all threads are destroyed, and all
> threads but the one we want to stop are re-created from the saved state.
> The net result, from the point of view of GDB, is that the library
> stopped one thread. The same thing goes if we want to resume one thread
> while others are running: the state of all running threads is retrieved
> from the GPU, they are all destroyed, and they are all re-created,
> including the thread we want to resume.
>
> This leads to some inefficiencies when combined with how GDB works, here
> are two examples:
>
> - Stopping all threads: because the target operates in non-stop mode,
> when the user interface mode is all-stop, GDB must stop all threads
> individually when presenting a stop. Let's suppose we have 1000
> threads and the user does ^C. GDB asks the target to stop one
> thread. Behind the scenes, the library retrieves 1000 thread states
> and restores the 999 others still running ones. GDB asks the target
> to stop another one. The target retrieves 999 thread states and
> restores the 998 remaining ones. That means that to stop 1000
> threads, we did 1000 back and forths with the GPU. It would have
> been much better to just retrieve the states once and stop there.
>
> - Resuming with pending events: suppose the 1000 threads hit a
> breakpoint at the same time. The breakpoint is conditional and
> evaluates to true for the first thread, to false for all others. GDB
> pulls one event (for the first thread) from the target, decides that
> it should present a stop, so stops all threads using
> stop_all_threads. All these other threads have a breakpoint event to
> report, which is saved in `thread_info::suspend::waitstatus` for
> later. When the user does "continue", GDB resumes that one thread
> that did hit the breakpoint. It then processes the pending events
> one by one as if they just arrived. It picks one, evaluates the
> condition to false, and resumes the thread. It picks another one,
> evaluates the condition to false, and resumes the thread. And so on.
> In between each resumption, there is a full state retrieval and
> re-creation. It would be much nicer if we could wait a little bit
> before sending those threads on the GPU, until it processed all those
> pending events.
>
> To address this kind of performance issue, ROCdbgapi has a concept
> called "forward progress required", which is a boolean state that allows
> its user (i.e. GDB) to say "I'm doing a bunch of operations, you can
> hold off putting the threads on the GPU until I'm done" (the "forward
> progress not required" state). Turning forward progress back on
> indicates to the library that all threads that are supposed to be
> running should now be really running on the GPU.
>
> It turns out that GDB has a similar concept, though not as general,
> commit_resume. On difference is that commit_resume is not stateful: the
> target can't look up "does the core need me to schedule resumed threads
> for execution right now". It is also specifically linked to the resume
> method, it is not used in other contexts. The target accumulates
> resumption requests through target_ops::resume calls, and then commits
> those resumptions when target_ops::commit_resume is called. The target
> has no way to check if it's ok to leave resumed threads stopped in other
> target methods.
>
> To bridge the gap, this patch generalizes the commit_resume concept in
> GDB to match the forward progress concept of ROCdbgapi. The current
> name (commit_resume) can be interpreted as "commit the previous resume
> calls". I renamed the concept to "commit_resumed", as in "commit the
> threads that are resumed".
>
> In the new version, we have two things in process_stratum_target:
>
> - the commit_resumed_state field: indicates whether GDB requires this
> target to have resumed threads committed to the execution
> target/device. If false, the target is allowed to leave resumed
> threads un-committed at the end of whatever method it is executing.
>
> - the commit_resumed method: called when commit_resumed_state
> transitions from false to true. While commit_resumed_state was
> false, the target may have left some resumed threads un-committed.
> This method being called tells it that it should commit them back to
> the execution device.
>
> Let's take the "Stopping all threads" scenario from above and see how it
> would work with the ROCm target with this change. Before stopping all
> threads, GDB would set the target's commit_resumed_state field to false.
> It would then ask the target to stop the first thread. The target would
> retrieve all threads' state from the GPU and mark that one as stopped.
> Since commit_resumed_state is false, it leaves all the other threads
> (still resumed) stopped. GDB would then proceed to call target_stop for
> all the other threads. Since resumed threads are not committed, this
> doesn't do any back and forth with the GPU.
>
> To simplify the implementation of targets, I made it so that when
> calling certain target methods, the contract between the core and the
> targets guarantees that commit_resumed_state is false. This way, the
> target doesn't need two paths, one commit_resumed_state == true and one
> for commit_resumed_state == false. It can just assert that
> commit_resumed_state is false and work with that assumption. This also
> helps catch places where we forgot to disable commit_resumed_state
> before calling the method, which represents a probable optimization
> opportunity.
>
> To have some confidence that this contract between the core and the
> targets is respected, I added assertions in the linux-nat target
> methods, even though the linux-nat target doesn't actually use that
> feature. Since linux-nat is tested much more than other targets, this
> will help catch these issues quicker.
>
> To ensure that commit_resumed_state is always turned back on (only if
> necessary, see below) and the commit_resumed method is called when doing
> so, I introduced the scoped_disabled_commit_resumed RAII object, which
> replaces make_scoped_defer_process_target_commit_resume. On
> construction, it clears the commit_resumed_state flag of all process
> targets. On destruction, it turns it back on (if necessary) and calls
> the commit_resumed method. The nested case is handled by having a
> "nesting" counter: only when the counter goes back to 0 is
> commit_resumed_state turned back on.
>
> On destruction, commit-resumed is not re-enabled for a given target if:
>
> 1. this target has no threads resumed, or
> 2. this target at least one thread with a pending status known to the
> core (saved in thread_info::suspend::waitstatus).
>
> The first point is not technically necessary, because a proper
> commit_resumed implementation would be a no-op if the target has no
> resumed threads. But since we have a flag do to a quick check, I think
> it doesn't hurt.
>
> The second point is more important: together with the
> scoped_disable_commit_resumed instance added in fetch_inferior_event, it
> makes it so the "Resuming with pending events" described above is
> handled efficiently. Here's what happens in that case:
>
> 1. The user types "continue".
> 2. Upon destruction, the scoped_disable_commit_resumed in the `proceed`
> function does not enable commit-resumed, as it sees other threads
> have pending statuses.
> 3. fetch_inferior_event is called to handle another event, one thread
> is resumed. Because there are still more threads with pending
> statuses, the destructor of scoped_disable_commit_resumed in
> fetch_inferior_event still doesn't enable commit-resumed.
> 4. Rinse and repeat step 3, until the last pending status is handled by
> fetch_inferior_event. In that case, scoped_disable_commit_resumed's
> destructor sees there are no more threads with pending statues, so
> it asks the target to commit resumed threads.
>
> This allows us to avoid all unnecessary back and forths, there is a
> single commit_resumed call.
>
> This change required remote_target::remote_stop_ns to learn how to
> handle stopping threads that were resumed but pending vCont. The
> simplest example where that happens is when using the remote target in
> all-stop, but with "maint set target-non-stop on", to force it to
> operate in non-stop mode under the hood. If two threads hit a
> breakpoint at the same time, GDB will receive two stop replies. It will
> present the stop for one thread and save the other one in
> thread_info::suspend::waitstatus.
>
> Before this patch, when doing "continue", GDB first resumes the thread
> without a pending status:
>
> Sending packet: $vCont;c:p172651.172676#f3
>
> It then consumes the pending status in the next fetch_inferior_event
> call:
>
> [infrun] do_target_wait_1: Using pending wait status status->kind = stopped, signal = GDB_SIGNAL_TRAP for Thread 1517137.1517137.
> [infrun] target_wait (-1.0.0, status) =
> [infrun] 1517137.1517137.0 [Thread 1517137.1517137],
> [infrun] status->kind = stopped, signal = GDB_SIGNAL_TRAP
>
> It then realizes it needs to stop all threads to present the stop, so
> stops the thread it just resumed:
>
> [infrun] stop_all_threads: Thread 1517137.1517137 not executing
> [infrun] stop_all_threads: Thread 1517137.1517174 executing, need stop
> remote_stop called
> Sending packet: $vCont;t:p172651.172676#04
>
> This is an unnecessary resume/stop. With this patch, we don't commit
> resumed threads after proceeding, because of the pending status:
>
> [infrun] maybe_commit_resumed_all_process_targets: not requesting commit-resumed for target extended-remote, a thread has a pending waitstatus
>
> When GDB handles the pending status and stop_all_threads runs, we stop a
> resumed but pending vCont thread:
>
> remote_stop_ns: Enqueueing phony stop reply for thread pending vCont-resume (1520940, 1520976, 0)
>
> That thread was never actually resumed on the remote stub / gdbserver.
> This is why remote_stop_ns needed to learn this new trick of enqueueing
> phony stop replies.
>
> Note that this patch only considers pending statuses known to the core
> of GDB, that is the events that were pulled out of the target and stored
> in `thread_info::suspend::waitstatus`. In some cases, we could also
> avoid unnecessary back and forth when the target has events that it has
> not yet reported the core. I plan to implement this as a subsequent
> patch, once this series has settled.
I think this patch introduces some regressions, when running
$ while make check TESTS="gdb.threads/interrupt-while-step-over.exp" RUNTESTFLAGS="--target_board=native-extended-gdbserver"; do done
I'll sometimes get:
/home/smarchi/src/binutils-gdb/gdb/inline-frame.c:383: internal-error: void skip_inline_frames(thread_info*, bpstat): Assertion `find_inline_frame_state (thread) == NULL' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n)
Simon
next prev parent reply other threads:[~2021-01-12 17:14 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-08 4:17 [PATCH v3 0/5] Reduce back and forth with target when threads have pending statuses + better handling of 'S' packets Simon Marchi via Gdb-patches
2021-01-08 4:17 ` [PATCH v3 1/5] gdb: make the remote target track its own thread resume state Simon Marchi via Gdb-patches
2021-01-08 15:41 ` Pedro Alves
2021-01-08 18:56 ` Simon Marchi via Gdb-patches
2021-01-18 5:16 ` Sebastian Huber
2021-01-18 6:04 ` Simon Marchi via Gdb-patches
2021-01-18 10:36 ` Sebastian Huber
2021-01-18 13:53 ` Simon Marchi via Gdb-patches
2021-01-08 4:17 ` [PATCH v3 2/5] gdb: remove target_ops::commit_resume implementation in record-{btrace, full}.c Simon Marchi via Gdb-patches
2021-01-08 15:43 ` [PATCH v3 2/5] gdb: remove target_ops::commit_resume implementation in record-{btrace,full}.c Pedro Alves
2021-01-08 19:00 ` Simon Marchi via Gdb-patches
2021-01-08 4:17 ` [PATCH v3 3/5] gdb: move commit_resume to process_stratum_target Simon Marchi via Gdb-patches
2021-01-08 18:12 ` Andrew Burgess
2021-01-08 19:01 ` Simon Marchi via Gdb-patches
2021-01-09 20:29 ` Pedro Alves
2021-01-08 4:17 ` [PATCH v3 4/5] gdb: generalize commit_resume, avoid commit-resuming when threads have pending statuses Simon Marchi via Gdb-patches
2021-01-08 18:34 ` Andrew Burgess
2021-01-08 19:04 ` Simon Marchi via Gdb-patches
2021-01-09 20:34 ` Pedro Alves
2021-01-11 20:28 ` Simon Marchi via Gdb-patches
2021-01-22 2:46 ` Simon Marchi via Gdb-patches
2021-01-22 22:07 ` Simon Marchi via Gdb-patches
2021-01-12 17:14 ` Simon Marchi via Gdb-patches [this message]
2021-01-12 18:04 ` Simon Marchi via Gdb-patches
2021-01-15 19:17 ` Simon Marchi via Gdb-patches
2021-01-08 4:17 ` [PATCH v3 5/5] gdb: better handling of 'S' packets Simon Marchi via Gdb-patches
2021-01-08 18:19 ` Andrew Burgess
2021-01-08 19:11 ` Simon Marchi via Gdb-patches
2021-01-09 21:26 ` Pedro Alves
2021-01-11 20:36 ` Simon Marchi via Gdb-patches
2021-01-12 3:07 ` Simon Marchi via Gdb-patches
2021-01-13 20:17 ` Pedro Alves
2021-01-14 1:28 ` Simon Marchi via Gdb-patches
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ef5a6aad-fa3c-7887-9322-e6c9dbf87a34@polymtl.ca \
--to=gdb-patches@sourceware.org \
--cc=simon.marchi@efficios.com \
--cc=simon.marchi@polymtl.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox