Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed
From: Simon Marchi via Gdb-patches <gdb-patches@sourceware.org>
To: Andrew Burgess <andrew.burgess@embecosm.com>
Cc: Simon Marchi <simon.marchi@efficios.com>, gdb-patches@sourceware.org
Subject: Re: [PATCH v3 4/5] gdb: generalize commit_resume, avoid commit-resuming when threads have pending statuses
Date: Fri, 8 Jan 2021 14:04:40 -0500	[thread overview]
Message-ID: <998e1449-f74c-d9ac-242a-d956196fbbe7@polymtl.ca> (raw)
In-Reply-To: <20210108183459.GD2945@embecosm.com>

On 2021-01-08 1:34 p.m., Andrew Burgess wrote:
> * Simon Marchi <simon.marchi@polymtl.ca> [2021-01-07 23:17:33 -0500]:
> 
>> From: Simon Marchi <simon.marchi@efficios.com>
>>
>> The rationale for this patch comes from the ROCm port [1], the goal
>> being to reduce the number of back and forths between GDB and the target
>> when doing successive operations.  I'll start with explaining the
>> rationale and then go over the implementation.  In the ROCm / GPU world,
>> the term "wave" is somewhat equivalent to a "thread" in GDB.  So if you
>> read if from a GPU stand point, just s/thread/wave/.
>>
>> ROCdbgapi, the library used by GDB [2] to communicate with the GPU
>> target, gives the illusion that it's possible for the debugger to
>> control (start and stop) individual threads.  But in reality, this is
>> not how it works.  Under the hood, all threads of a queue are controlled
>> as a group.  To stop one thread in a group of running ones, the state of
>> all threads is retrieved from the GPU, all threads are destroyed, and all
>> threads but the one we want to stop are re-created from the saved state.
>> The net result, from the point of view of GDB, is that the library
>> stopped one thread.  The same thing goes if we want to resume one thread
>> while others are running: the state of all running threads is retrieved
>> from the GPU, they are all destroyed, and they are all re-created,
>> including the thread we want to resume.
>>
>> This leads to some inefficiencies when combined with how GDB works, here
>> are two examples:
>>
>>  - Stopping all threads: because the target operates in non-stop mode,
>>    when the user interface mode is all-stop, GDB must stop all threads
>>    individually when presenting a stop.  Let's suppose we have 1000
>>    threads and the user does ^C.  GDB asks the target to stop one
>>    thread.  Behind the scenes, the library retrieves 1000 thread states
>>    and restores the 999 others still running ones.  GDB asks the target
>>    to stop another one.  The target retrieves 999 thread states and
>>    restores the 998 remaining ones.  That means that to stop 1000
>>    threads, we did 1000 back and forths with the GPU.  It would have
>>    been much better to just retrieve the states once and stop there.
>>
>>  - Resuming with pending events: suppose the 1000 threads hit a
>>    breakpoint at the same time.  The breakpoint is conditional and
>>    evaluates to true for the first thread, to false for all others.  GDB
>>    pulls one event (for the first thread) from the target, decides that
>>    it should present a stop, so stops all threads using
>>    stop_all_threads.  All these other threads have a breakpoint event to
>>    report, which is saved in `thread_info::suspend::waitstatus` for
>>    later.  When the user does "continue", GDB resumes that one thread
>>    that did hit the breakpoint.  It then processes the pending events
>>    one by one as if they just arrived.  It picks one, evaluates the
>>    condition to false, and resumes the thread.  It picks another one,
>>    evaluates the condition to false, and resumes the thread.  And so on.
>>    In between each resumption, there is a full state retrieval and
>>    re-creation.  It would be much nicer if we could wait a little bit
>>    before sending those threads on the GPU, until it processed all those
>>    pending events.
>>
>> To address this kind of performance issue, ROCdbgapi has a concept
>> called "forward progress required", which is a boolean state that allows
>> its user (i.e. GDB) to say "I'm doing a bunch of operations, you can
>> hold off putting the threads on the GPU until I'm done" (the "forward
>> progress not required" state).  Turning forward progress back on
>> indicates to the library that all threads that are supposed to be
>> running should now be really running on the GPU.
>>
>> It turns out that GDB has a similar concept, though not as general,
>> commit_resume.  On difference is that commit_resume is not stateful: the
> 
> typo: 'On difference' ?

Fixed to "One difference".

> 
>> target can't look up "does the core need me to schedule resumed threads
>> for execution right now".  It is also specifically linked to the resume
>> method, it is not used in other contexts.  The target accumulates
>> resumption requests through target_ops::resume calls, and then commits
>> those resumptions when target_ops::commit_resume is called.  The target
>> has no way to check if it's ok to leave resumed threads stopped in other
>> target methods.
>>
>> To bridge the gap, this patch generalizes the commit_resume concept in
>> GDB to match the forward progress concept of ROCdbgapi.  The current
>> name (commit_resume) can be interpreted as "commit the previous resume
>> calls".  I renamed the concept to "commit_resumed", as in "commit the
>> threads that are resumed".
>>
>> In the new version, we have two things in process_stratum_target:
>>
>>  - the commit_resumed_state field: indicates whether GDB requires this
>>    target to have resumed threads committed to the execution
>>    target/device.  If false, the target is allowed to leave resumed
>>    threads un-committed at the end of whatever method it is executing.
>>
>>  - the commit_resumed method: called when commit_resumed_state
>>    transitions from false to true.  While commit_resumed_state was
>>    false, the target may have left some resumed threads un-committed.
>>    This method being called tells it that it should commit them back to
>>    the execution device.
>>
>> Let's take the "Stopping all threads" scenario from above and see how it
>> would work with the ROCm target with this change.  Before stopping all
>> threads, GDB would set the target's commit_resumed_state field to false.
>> It would then ask the target to stop the first thread.  The target would
>> retrieve all threads' state from the GPU and mark that one as stopped.
>> Since commit_resumed_state is false, it leaves all the other threads
>> (still resumed) stopped.  GDB would then proceed to call target_stop for
>> all the other threads.  Since resumed threads are not committed, this
>> doesn't do any back and forth with the GPU.
>>
>> To simplify the implementation of targets, I made it so that when
>> calling certain target methods, the contract between the core and the
>> targets guarantees that commit_resumed_state is false.  This way, the
>> target doesn't need two paths, one commit_resumed_state == true and one
>> for commit_resumed_state == false.  It can just assert that
>> commit_resumed_state is false and work with that assumption.  This also
>> helps catch places where we forgot to disable commit_resumed_state
>> before calling the method, which represents a probable optimization
>> opportunity.
>>
>> To have some confidence that this contract between the core and the
>> targets is respected, I added assertions in the linux-nat target
>> methods, even though the linux-nat target doesn't actually use that
>> feature.  Since linux-nat is tested much more than other targets, this
>> will help catch these issues quicker.
>>
>> To ensure that commit_resumed_state is always turned back on (only if
>> necessary, see below) and the commit_resumed method is called when doing
>> so, I introduced the scoped_disabled_commit_resumed RAII object, which
>> replaces make_scoped_defer_process_target_commit_resume.  On
>> construction, it clears the commit_resumed_state flag of all process
>> targets.  On destruction, it turns it back on (if necessary) and calls
>> the commit_resumed method.  The nested case is handled by having a
>> "nesting" counter: only when the counter goes back to 0 is
>> commit_resumed_state turned back on.
>>
>> On destruction, commit-resumed is not re-enabled for a given target if:
>>
>>  1. this target has no threads resumed, or
>>  2. this target at least one thread with a pending status known to the
> 
> Missing word: 'this target at least'?

Fixed to "this target has at least".

> I read through the commit message and convinced myself that it made
> sense.  I ran out of time to look at the actual code.

That's already very nice, thanks!

Simon

  reply	other threads:[~2021-01-08 19:04 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-08  4:17 [PATCH v3 0/5] Reduce back and forth with target when threads have pending statuses + better handling of 'S' packets Simon Marchi via Gdb-patches
2021-01-08  4:17 ` [PATCH v3 1/5] gdb: make the remote target track its own thread resume state Simon Marchi via Gdb-patches
2021-01-08 15:41   ` Pedro Alves
2021-01-08 18:56     ` Simon Marchi via Gdb-patches
2021-01-18  5:16   ` Sebastian Huber
2021-01-18  6:04     ` Simon Marchi via Gdb-patches
2021-01-18 10:36       ` Sebastian Huber
2021-01-18 13:53         ` Simon Marchi via Gdb-patches
2021-01-08  4:17 ` [PATCH v3 2/5] gdb: remove target_ops::commit_resume implementation in record-{btrace, full}.c Simon Marchi via Gdb-patches
2021-01-08 15:43   ` [PATCH v3 2/5] gdb: remove target_ops::commit_resume implementation in record-{btrace,full}.c Pedro Alves
2021-01-08 19:00     ` Simon Marchi via Gdb-patches
2021-01-08  4:17 ` [PATCH v3 3/5] gdb: move commit_resume to process_stratum_target Simon Marchi via Gdb-patches
2021-01-08 18:12   ` Andrew Burgess
2021-01-08 19:01     ` Simon Marchi via Gdb-patches
2021-01-09 20:29   ` Pedro Alves
2021-01-08  4:17 ` [PATCH v3 4/5] gdb: generalize commit_resume, avoid commit-resuming when threads have pending statuses Simon Marchi via Gdb-patches
2021-01-08 18:34   ` Andrew Burgess
2021-01-08 19:04     ` Simon Marchi via Gdb-patches [this message]
2021-01-09 20:34   ` Pedro Alves
2021-01-11 20:28     ` Simon Marchi via Gdb-patches
2021-01-22  2:46       ` Simon Marchi via Gdb-patches
2021-01-22 22:07       ` Simon Marchi via Gdb-patches
2021-01-12 17:14   ` Simon Marchi via Gdb-patches
2021-01-12 18:04     ` Simon Marchi via Gdb-patches
2021-01-15 19:17   ` Simon Marchi via Gdb-patches
2021-01-08  4:17 ` [PATCH v3 5/5] gdb: better handling of 'S' packets Simon Marchi via Gdb-patches
2021-01-08 18:19   ` Andrew Burgess
2021-01-08 19:11     ` Simon Marchi via Gdb-patches
2021-01-09 21:26   ` Pedro Alves
2021-01-11 20:36     ` Simon Marchi via Gdb-patches
2021-01-12  3:07       ` Simon Marchi via Gdb-patches
2021-01-13 20:17         ` Pedro Alves
2021-01-14  1:28           ` Simon Marchi via Gdb-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=998e1449-f74c-d9ac-242a-d956196fbbe7@polymtl.ca \
    --to=gdb-patches@sourceware.org \
    --cc=andrew.burgess@embecosm.com \
    --cc=simon.marchi@efficios.com \
    --cc=simon.marchi@polymtl.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox