From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-bounces@sourceware.org>
Received: from simark.ca
	by simark.ca with LMTP
	id V88bAFAT+l/wWwAAWB0awg
	(envelope-from <gdb-patches-bounces@sourceware.org>)
	for <public-inbox@simark.ca>; Sat, 09 Jan 2021 15:34:24 -0500
Received: by simark.ca (Postfix, from userid 112)
	id E5FA51EE11; Sat,  9 Jan 2021 15:34:23 -0500 (EST)
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on simark.ca
X-Spam-Level: 
X-Spam-Status: No, score=0.3 required=5.0 tests=MAILING_LIST_MULTI,RDNS_NONE,
	URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.2
Received: from sourceware.org (unknown [8.43.85.97])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
	(No client certificate requested)
	by simark.ca (Postfix) with ESMTPS id C7E501E940
	for <public-inbox@simark.ca>; Sat,  9 Jan 2021 15:34:22 -0500 (EST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 79C6638708FD;
	Sat,  9 Jan 2021 20:34:22 +0000 (GMT)
Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com
 [209.85.128.53])
 by sourceware.org (Postfix) with ESMTPS id E2680386103F
 for <gdb-patches@sourceware.org>; Sat,  9 Jan 2021 20:34:17 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org E2680386103F
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=palves.net
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=alves.ped@gmail.com
Received: by mail-wm1-f53.google.com with SMTP id 3so11345515wmg.4
 for <gdb-patches@sourceware.org>; Sat, 09 Jan 2021 12:34:17 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:subject:from:to:cc:references:message-id:date
 :user-agent:mime-version:in-reply-to:content-language;
 bh=ssoVFyJd28hQJCpOs4JRKYD/4eQaFxLU3ZNycOovmKg=;
 b=kC80u5tW1qvYedrvo+Dr0JoxjIP4uaNNAcdBsttFDgTWr9tpSAflX+oCx+6EXeb4xZ
 nHYORGqQuOCU/wnAbE5Me9EPqFvywGyh6QBVCvHhm6a6Xu4p30Es4yFMcQQebM1rl9Ox
 tzK5zIxrSVpwxoIGnSbDRRmefYbjYOCYRkkWy8ZdPwJSTDNVsRUscv6f/xvRsPLlZ2Dc
 rPId/rgK4y+8xin6r/vaUEtuRiF2W1g4fzR2GtgOlQUw0S1AnR2pEDYJZEwq9Tu46qKx
 UcgUcuWLApe6wc5M7EFYBJMK/koQVyRsWI41ktY6SZLu1ORHrbmtsjisly2kOjVZPQfe
 CBLw==
X-Gm-Message-State: AOAM530iP+CCd6q3sSwttdavr1PdDktImPvbZZ27VXO8FemlrI4TZ81h
 WGNQ00izuV26NbAOE5qzdco=
X-Google-Smtp-Source: ABdhPJxLLG2lB5YavjyN6ncX5KKzhR3CdbHeZspC/rRFZd4CnjyFJzBI3CM7TClNsRNoP1aWmaV7sQ==
X-Received: by 2002:a1c:6056:: with SMTP id u83mr8233575wmb.90.1610224456684; 
 Sat, 09 Jan 2021 12:34:16 -0800 (PST)
Received: from ?IPv6:2001:8a0:f91f:e900:1d90:d745:3c32:c159?
 ([2001:8a0:f91f:e900:1d90:d745:3c32:c159])
 by smtp.gmail.com with ESMTPSA id v11sm18217391wrt.25.2021.01.09.12.34.14
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Sat, 09 Jan 2021 12:34:15 -0800 (PST)
Subject: Re: [PATCH v3 4/5] gdb: generalize commit_resume, avoid
 commit-resuming when threads have pending statuses
From: Pedro Alves <pedro@palves.net>
To: Simon Marchi <simon.marchi@polymtl.ca>, gdb-patches@sourceware.org
References: <20210108041734.3873826-1-simon.marchi@polymtl.ca>
 <20210108041734.3873826-5-simon.marchi@polymtl.ca>
Message-ID: <93a38356-a5ea-b6e4-d86d-ed8db5f9545e@palves.net>
Date: Sat, 9 Jan 2021 20:34:14 +0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
 Thunderbird/78.6.0
MIME-Version: 1.0
In-Reply-To: <20210108041734.3873826-5-simon.marchi@polymtl.ca>
Content-Type: multipart/mixed; boundary="------------6990BCA83D49BB10287A4A5A"
Content-Language: en-US
X-BeenThere: gdb-patches@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gdb-patches mailing list <gdb-patches.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=subscribe>
Cc: Simon Marchi <simon.marchi@efficios.com>
Errors-To: gdb-patches-bounces@sourceware.org
Sender: "Gdb-patches" <gdb-patches-bounces@sourceware.org>

This is a multi-part message in MIME format.
--------------6990BCA83D49BB10287A4A5A
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

On 08/01/21 04:17, Simon Marchi wrote:
> From: Simon Marchi <simon.marchi@efficios.com>
> 
> The rationale for this patch comes from the ROCm port [1], the goal
> being to reduce the number of back and forths between GDB and the target
> when doing successive operations.  I'll start with explaining the
> rationale and then go over the implementation.  In the ROCm / GPU world,
> the term "wave" is somewhat equivalent to a "thread" in GDB.  So if you
> read if from a GPU stand point, just s/thread/wave/.
> 
> ROCdbgapi, the library used by GDB [2] to communicate with the GPU
> target, gives the illusion that it's possible for the debugger to
> control (start and stop) individual threads.  But in reality, this is
> not how it works.  Under the hood, all threads of a queue are controlled
> as a group.  To stop one thread in a group of running ones, the state of
> all threads is retrieved from the GPU, all threads are destroyed, and all
> threads but the one we want to stop are re-created from the saved state.
> The net result, from the point of view of GDB, is that the library
> stopped one thread.  The same thing goes if we want to resume one thread
> while others are running: the state of all running threads is retrieved
> from the GPU, they are all destroyed, and they are all re-created,
> including the thread we want to resume.
> 
> This leads to some inefficiencies when combined with how GDB works, here
> are two examples:
> 
>  - Stopping all threads: because the target operates in non-stop mode,
>    when the user interface mode is all-stop, GDB must stop all threads
>    individually when presenting a stop.  Let's suppose we have 1000
>    threads and the user does ^C.  GDB asks the target to stop one
>    thread.  Behind the scenes, the library retrieves 1000 thread states
>    and restores the 999 others still running ones.  GDB asks the target
>    to stop another one.  The target retrieves 999 thread states and
>    restores the 998 remaining ones.  That means that to stop 1000
>    threads, we did 1000 back and forths with the GPU.  It would have
>    been much better to just retrieve the states once and stop there.
> 
>  - Resuming with pending events: suppose the 1000 threads hit a
>    breakpoint at the same time.  The breakpoint is conditional and
>    evaluates to true for the first thread, to false for all others.  GDB
>    pulls one event (for the first thread) from the target, decides that
>    it should present a stop, so stops all threads using
>    stop_all_threads.  All these other threads have a breakpoint event to
>    report, which is saved in `thread_info::suspend::waitstatus` for
>    later.  When the user does "continue", GDB resumes that one thread
>    that did hit the breakpoint.  It then processes the pending events
>    one by one as if they just arrived.  It picks one, evaluates the
>    condition to false, and resumes the thread.  It picks another one,
>    evaluates the condition to false, and resumes the thread.  And so on.
>    In between each resumption, there is a full state retrieval and
>    re-creation.  It would be much nicer if we could wait a little bit
>    before sending those threads on the GPU, until it processed all those
>    pending events.

A potential downside of holding on in this latter scenario, with regular
host debugging, is that currently, threads are resumed immediately, thus potentially
the inferior process's threads spend less time paused, at least with the native target
if we implemented commit_resume there.  With remote, the trade off is
probably more in favor of deferring, given the higher latency.

However, since we don't implement commit_resume for native target
currently, it shouldn't have any effect there.

To confirm this, I tried the testcase we used when debugging the
displaced stepping buffers series, with 100 threads continuously
stepping over a breakpoint, for 10 seconds.  Suprisingly, when
native target, I see a consistent ~3% slowdown caused by this series.

I don't see any material difference with gdbserver.

(higher is better)

native, pristine

 avg              440.240000
 avg              436.670000
 avg              451.310000
 avg              432.840000
 avg              437.060000
 ===========================
 avg of avg       439.624000

native, patched

 avg              420.940000
 avg              428.130000
 avg              425.230000
 avg              428.080000
 avg              424.880000
 ===========================
 avg of avg       425.452000


gdbserver, pristine:

 avg              633.490000
 avg              639.910000
 avg              642.300000
 avg              626.160000
 avg              626.460000
 ===========================
 avg of avg       633.664000


gdbserver, patched

 avg              630.970000
 avg              628.960000
 avg              638.340000
 avg              627.030000
 avg              638.390000
 ===========================
 avg of avg       632.738000

tests run like this:

  $ gcc disp-step-buffers-test.c -o disp-step-buffers-test -g3 -O2 -pthread
  $ g="./gdb -data-directory=data-directory"
  $ time $g -q --batch disp-step-buffers-test -ex "b 16 if 0" -ex "r"
  $ time $g -q --batch disp-step-buffers-test -ex "set sysroot" -ex "target remote | ../gdbserver/gdbserver - disp-step-buffers-test" -ex "b 16 if 0" -ex "c" 

I've attached disp-step-buffers-test.c.

I'm surprised that native debugging is quite slower here, compared to
gdbserver.  I don't recall observing that earlier.  Maybe I just missed
it then.

I wouldn't have thought we would be doing that much work that it
would be noticeable with the native target (pristive vs patched, the 3%
slowdown).  I wonder whether that is caused by the constant std::set allocation
in all_process_targets.  But then it's strange that we don't see that
same slowdown when remote debugging.  I'm surprised.

> 
> To address this kind of performance issue, ROCdbgapi has a concept
> called "forward progress required", which is a boolean state that allows
> its user (i.e. GDB) to say "I'm doing a bunch of operations, you can
> hold off putting the threads on the GPU until I'm done" (the "forward
> progress not required" state).  Turning forward progress back on
> indicates to the library that all threads that are supposed to be
> running should now be really running on the GPU.
> 
> It turns out that GDB has a similar concept, though not as general,
> commit_resume.  On difference is that commit_resume is not stateful: the
> target can't look up "does the core need me to schedule resumed threads
> for execution right now".  It is also specifically linked to the resume
> method, it is not used in other contexts.  The target accumulates
> resumption requests through target_ops::resume calls, and then commits
> those resumptions when target_ops::commit_resume is called.  The target
> has no way to check if it's ok to leave resumed threads stopped in other
> target methods.
> 
> To bridge the gap, this patch generalizes the commit_resume concept in
> GDB to match the forward progress concept of ROCdbgapi.  The current
> name (commit_resume) can be interpreted as "commit the previous resume
> calls".  I renamed the concept to "commit_resumed", as in "commit the
> threads that are resumed".

Makes sense.

> 
> In the new version, we have two things in process_stratum_target:
> 
>  - the commit_resumed_state field: indicates whether GDB requires this
>    target to have resumed threads committed to the execution
>    target/device.  If false, the target is allowed to leave resumed
>    threads un-committed at the end of whatever method it is executing.
> 
>  - the commit_resumed method: called when commit_resumed_state
>    transitions from false to true.  While commit_resumed_state was
>    false, the target may have left some resumed threads un-committed.
>    This method being called tells it that it should commit them back to
>    the execution device.
> 
> Let's take the "Stopping all threads" scenario from above and see how it
> would work with the ROCm target with this change.  Before stopping all
> threads, GDB would set the target's commit_resumed_state field to false.
> It would then ask the target to stop the first thread.  The target would
> retrieve all threads' state from the GPU and mark that one as stopped.
> Since commit_resumed_state is false, it leaves all the other threads
> (still resumed) stopped.  GDB would then proceed to call target_stop for
> all the other threads.  Since resumed threads are not committed, this
> doesn't do any back and forth with the GPU.
> 
> To simplify the implementation of targets, I made it so that when
> calling certain target methods, the contract between the core and the
> targets guarantees that commit_resumed_state is false.  This way, the
> target doesn't need two paths, one commit_resumed_state == true and one
> for commit_resumed_state == false.  It can just assert that
> commit_resumed_state is false and work with that assumption.  This also
> helps catch places where we forgot to disable commit_resumed_state
> before calling the method, which represents a probable optimization
> opportunity.
> 
> To have some confidence that this contract between the core and the
> targets is respected, I added assertions in the linux-nat target
> methods, even though the linux-nat target doesn't actually use that
> feature.  Since linux-nat is tested much more than other targets, this
> will help catch these issues quicker.

Did you consider adding the assertions to target.c instead, in the
target_resume/target_wait/target_stop wrapper methods?  That would
cover all targets.

> 
> To ensure that commit_resumed_state is always turned back on (only if
> necessary, see below) and the commit_resumed method is called when doing
> so, I introduced the scoped_disabled_commit_resumed RAII object, which
> replaces make_scoped_defer_process_target_commit_resume.  On
> construction, it clears the commit_resumed_state flag of all process
> targets.  On destruction, it turns it back on (if necessary) and calls
> the commit_resumed method.  

This part makes me nervous and I think will cause us problems.  I'm
really not sure it's a good idea.  The issue is that the commit_resumed method can
throw, and we'll be in a dtor, which means that we will need to swallow the
error, there's no way to propagate it out aborting the current function.
That's why we currently have explicit commit calls, and the scoped object just
tweaks the "defer commit" flag.  Would it work to build on the current
design instead of moving the commit to the dtor?

> The nested case is handled by having a
> "nesting" counter: only when the counter goes back to 0 is
> commit_resumed_state turned back on.

It wasn't obvious to me from the description why do we need both commit_resumed_state
and a counter.  As in, wouldn't just the counter work?  Like, if the count is 0,
the state is on, if >0, it is off.  

Can different targets ever have different commit resumed states?
The only spot I see that tweaks the flag outside of the scoped object,
is record-full.c, but I think that's only to avoid hitting the assertion?
Do you plan on adding more spots that would override the state even if
a scoped_disable_commit_resumed object is live?

> 
> On destruction, commit-resumed is not re-enabled for a given target if:
> 
>  1. this target has no threads resumed, or
>  2. this target at least one thread with a pending status known to the
>     core (saved in thread_info::suspend::waitstatus).

Should also check whether the thread with the pending status is resumed.
/me reads patch, oh, did you that.  Good.  Please mention it here:
... one resumed thread ...

> 
> The first point is not technically necessary, because a proper
> commit_resumed implementation would be a no-op if the target has no
> resumed threads.  But since we have a flag do to a quick check, I think
> it doesn't hurt.
> 
> The second point is more important: together with the
> scoped_disable_commit_resumed instance added in fetch_inferior_event, it
> makes it so the "Resuming with pending events" described above is
> handled efficiently.  Here's what happens in that case:
> 
>  1. The user types "continue".
>  2. Upon destruction, the scoped_disable_commit_resumed in the `proceed`
>     function does not enable commit-resumed, as it sees other threads
>     have pending statuses.
>  3. fetch_inferior_event is called to handle another event, one thread
>     is resumed.  Because there are still more threads with pending
>     statuses, the destructor of scoped_disable_commit_resumed in
>     fetch_inferior_event still doesn't enable commit-resumed.
>  4. Rinse and repeat step 3, until the last pending status is handled by
>     fetch_inferior_event.  In that case, scoped_disable_commit_resumed's
>     destructor sees there are no more threads with pending statues, so
>     it asks the target to commit resumed threads.
> 
> This allows us to avoid all unnecessary back and forths, there is a
> single commit_resumed call.
> 
> This change required remote_target::remote_stop_ns to learn how to
> handle stopping threads that were resumed but pending vCont.  The
> simplest example where that happens is when using the remote target in
> all-stop, but with "maint set target-non-stop on", to force it to
> operate in non-stop mode under the hood.  If two threads hit a
> breakpoint at the same time, GDB will receive two stop replies.  It will
> present the stop for one thread and save the other one in
> thread_info::suspend::waitstatus.
> 
> Before this patch, when doing "continue", GDB first resumes the thread
> without a pending status:
> 
>     Sending packet: $vCont;c:p172651.172676#f3
> 
> It then consumes the pending status in the next fetch_inferior_event
> call:
> 
>     [infrun] do_target_wait_1: Using pending wait status status->kind = stopped, signal = GDB_SIGNAL_TRAP for Thread 1517137.1517137.
>     [infrun] target_wait (-1.0.0, status) =
>     [infrun]   1517137.1517137.0 [Thread 1517137.1517137],
>     [infrun]   status->kind = stopped, signal = GDB_SIGNAL_TRAP
> 
> It then realizes it needs to stop all threads to present the stop, so
> stops the thread it just resumed:
> 
>     [infrun] stop_all_threads:   Thread 1517137.1517137 not executing
>     [infrun] stop_all_threads:   Thread 1517137.1517174 executing, need stop
>     remote_stop called
>     Sending packet: $vCont;t:p172651.172676#04
> 
> This is an unnecessary resume/stop.  With this patch, we don't commit
> resumed threads after proceeding, because of the pending status:
> 
>     [infrun] maybe_commit_resumed_all_process_targets: not requesting commit-resumed for target extended-remote, a thread has a pending waitstatus
> 
> When GDB handles the pending status and stop_all_threads runs, we stop a
> resumed but pending vCont thread:
> 
>     remote_stop_ns: Enqueueing phony stop reply for thread pending vCont-resume (1520940, 1520976, 0)
> 
> That thread was never actually resumed on the remote stub / gdbserver.
> This is why remote_stop_ns needed to learn this new trick of enqueueing
> phony stop replies.
> 
> Note that this patch only considers pending statuses known to the core
> of GDB, that is the events that were pulled out of the target and stored
> in `thread_info::suspend::waitstatus`.  In some cases, we could also
> avoid unnecessary back and forth when the target has events that it has
> not yet reported the core.  I plan to implement this as a subsequent
> patch, once this series has settled.
> 
> gdb/ChangeLog:
> 
> 	* infrun.h (struct scoped_disable_commit_resumed): New.
> 	* infrun.c (do_target_resume): Remove
> 	maybe_commit_resume_process_target call.
> 	(maybe_commit_resume_all_process_targets): Rename to...
> 	(maybe_commit_resumed_all_process_targets): ... this.  Skip
> 	targets that have no executing threads or resumed threads with
> 	a pending status.
> 	(scoped_disable_commit_resumed_depth): New.
> 	(scoped_disable_commit_resumed::scoped_disable_commit_resumed):
> 	New.
> 	(scoped_disable_commit_resumed::~scoped_disable_commit_resumed):
> 	New.
> 	(proceed): Use scoped_disable_commit_resumed.
> 	(fetch_inferior_event): Use scoped_disable_commit_resumed.
> 	* process-stratum-target.h (class process_stratum_target):
> 	<commit_resume>: Rename to...
> 	<commit_resumed>: ... this.
> 	<commit_resumed_state>: New.
> 	(all_process_targets): New.
> 	(maybe_commit_resume_process_target): Remove.
> 	(make_scoped_defer_process_target_commit_resume): Remove.
> 	* process-stratum-target.c (all_process_targets): New.
> 	(defer_process_target_commit_resume): Remove.
> 	(maybe_commit_resume_process_target): Remove.
> 	(make_scoped_defer_process_target_commit_resume): Remove.
> 	* linux-nat.c (linux_nat_target::resume): Add gdb_assert.
> 	(linux_nat_target::wait): Add gdb_assert.
> 	(linux_nat_target::stop): Add gdb_assert.
> 	* infcmd.c (run_command_1): Use scoped_disable_commit_resumed.
> 	(attach_command): Use scoped_disable_commit_resumed.
> 	(detach_command): Use scoped_disable_commit_resumed.
> 	(interrupt_target_1): Use scoped_disable_commit_resumed.
> 	* mi/mi-main.c (exec_continue): Use
> 	scoped_disable_commit_resumed.
> 	* record-full.c (record_full_wait_1): Change
> 	commit_resumed_state around calling commit_resumed.
> 	* remote.c (class remote_target) <commit_resume>: Rename to...
> 	<commit_resumed>: ... this.
> 	(remote_target::resume): Add gdb_assert.
> 	(remote_target::commit_resume): Rename to...
> 	(remote_target::commit_resumed): ... this.  Check if there is
> 	any thread pending vCont resume.
> 	(struct stop_reply): Move up.
> 	(remote_target::remote_stop_ns): Generate stop replies for
> 	resumed but pending vCont threads.
> 	(remote_target::wait_ns): Add gdb_assert.
> 
> [1] https://github.com/ROCm-Developer-Tools/ROCgdb/
> [2] https://github.com/ROCm-Developer-Tools/ROCdbgapi
> 
> Change-Id: I836135531a29214b21695736deb0a81acf8cf566
> ---
>  gdb/infcmd.c                 |   8 +++
>  gdb/infrun.c                 | 116 +++++++++++++++++++++++++++++++----
>  gdb/infrun.h                 |  41 +++++++++++++
>  gdb/linux-nat.c              |   5 ++
>  gdb/mi/mi-main.c             |   2 +
>  gdb/process-stratum-target.c |  37 +++++------
>  gdb/process-stratum-target.h |  63 +++++++++++--------
>  gdb/record-full.c            |   4 +-
>  gdb/remote.c                 | 111 +++++++++++++++++++++++----------
>  9 files changed, 292 insertions(+), 95 deletions(-)
> 
> diff --git a/gdb/infcmd.c b/gdb/infcmd.c
> index 6f0ed952de67..b7595e42e265 100644
> --- a/gdb/infcmd.c
> +++ b/gdb/infcmd.c
> @@ -488,6 +488,8 @@ run_command_1 (const char *args, int from_tty, enum run_how run_how)
>        uiout->flush ();
>      }
>  
> +  scoped_disable_commit_resumed disable_commit_resumed ("running");
> +
>    /* We call get_inferior_args() because we might need to compute
>       the value now.  */
>    run_target->create_inferior (exec_file,
> @@ -2591,6 +2593,8 @@ attach_command (const char *args, int from_tty)
>    if (non_stop && !attach_target->supports_non_stop ())
>      error (_("Cannot attach to this target in non-stop mode"));
>  
> +  scoped_disable_commit_resumed disable_commit_resumed ("attaching");
> +
>    attach_target->attach (args, from_tty);
>    /* to_attach should push the target, so after this point we
>       shouldn't refer to attach_target again.  */
> @@ -2746,6 +2750,8 @@ detach_command (const char *args, int from_tty)
>    if (inferior_ptid == null_ptid)
>      error (_("The program is not being run."));
>  
> +  scoped_disable_commit_resumed disable_commit_resumed ("detaching");
> +

This one looks incorrect -- target_detach -> prepare_for_detach
may need to finish off displaced steps, and resume the target
in the process.  This here will inhibit it.  I have some WIP patches
that will stop prepare_for_detach from doing that though, so it'll
end up being correct after.

>    query_if_trace_running (from_tty);
>  
>    disconnect_tracing ();
> @@ -2814,6 +2820,8 @@ stop_current_target_threads_ns (ptid_t ptid)
>  void
>  interrupt_target_1 (bool all_threads)
>  {
> +  scoped_disable_commit_resumed inhibit ("interrupting");
> +
>    if (non_stop)
>      {
>        if (all_threads)
> diff --git a/gdb/infrun.c b/gdb/infrun.c
> index 1a27af51b7e9..92a1102cb595 100644
> --- a/gdb/infrun.c
> +++ b/gdb/infrun.c
> @@ -2172,8 +2172,6 @@ do_target_resume (ptid_t resume_ptid, bool step, enum gdb_signal sig)
>  
>    target_resume (resume_ptid, step, sig);
>  
> -  maybe_commit_resume_process_target (tp->inf->process_target ());
> -
>    if (target_can_async_p ())
>      target_async (1);
>  }
> @@ -2760,17 +2758,109 @@ schedlock_applies (struct thread_info *tp)
>  					    execution_direction)));
>  }
>  
> -/* Calls maybe_commit_resume_process_target on all process targets.  */
> +/* Maybe require all process stratum targets to commit their resumed threads.
> +
> +   A specific process stratum target is not required to do so if:
> +
> +   - it has no resumed threads
> +   - it has a thread with a pending status  */
>  
>  static void
> -maybe_commit_resume_all_process_targets ()
> +maybe_commit_resumed_all_process_targets ()
>  {
> -  scoped_restore_current_thread restore_thread;
> +  /* This is an optional to avoid unnecessary thread switches. */

Missing double space after period.

But, just scoped_restore_current_thread itself doesn't switch the
thread.  Is this trying to save something else?  It seems pointless
to me offhand.

> +  gdb::optional<scoped_restore_current_thread> restore_thread;
>  
>    for (process_stratum_target *target : all_non_exited_process_targets ())
>      {
> +      gdb_assert (!target->commit_resumed_state);

Not sure I understand this assertion.  Isn't this another thing
showing that the per-target state isn't really necessary, and we
could just use the global state?

> +
> +      if (!target->threads_executing)
> +	{
> +	  infrun_debug_printf ("not re-enabling forward progress for target "
> +			       "%s, no executing threads",
> +			       target->shortname ());
> +	  continue;
> +	}

...

> diff --git a/gdb/infrun.h b/gdb/infrun.h
> index 7160b60f1368..5c32c0c97f6e 100644
> --- a/gdb/infrun.h
> +++ b/gdb/infrun.h

> +
> +struct scoped_disable_commit_resumed
> +{
> +  scoped_disable_commit_resumed (const char *reason);

explicit

> index 1436a550ac04..9877f0d81931 100644
> --- a/gdb/process-stratum-target.c
> +++ b/gdb/process-stratum-target.c
> @@ -99,6 +99,20 @@ all_non_exited_process_targets ()
>  
>  /* See process-stratum-target.h.  */
>  
> +std::set<process_stratum_target *>
> +all_process_targets ()
> +{
> +  /* Inferiors may share targets.  To eliminate duplicates, use a set.  */
> +  std::set<process_stratum_target *> targets;
> +  for (inferior *inf : all_inferiors ())
> +    if (inf->process_target () != nullptr)
> +      targets.insert (inf->process_target ());
> +
> +  return targets;
> +}

An alternative that would avoid creating this temporary std::set
(along with its internal heap allocations) on every call would be to expose
target-connection.c:process_targets.

> +
> +/* See process-stratum-target.h.  */
> +
>  void
>  switch_to_target_no_thread (process_stratum_target *target)
>  {
> @@ -108,26 +122,3 @@ switch_to_target_no_thread (process_stratum_target *target)
>        break;
>      }
>  }
> -
> -/* If true, `maybe_commit_resume_process_target` is a no-op.  */
> -
> -static bool defer_process_target_commit_resume;
> -
> -/* See target.h.  */
> -
> -void
> -maybe_commit_resume_process_target (process_stratum_target *proc_target)
> -{
> -  if (defer_process_target_commit_resume)
> -    return;
> -
> -  proc_target->commit_resume ();
> -}
> -
> -/* See process-stratum-target.h.  */
> -
> -scoped_restore_tmpl<bool>
> -make_scoped_defer_process_target_commit_resume ()
> -{
> -  return make_scoped_restore (&defer_process_target_commit_resume, true);
> -}
> diff --git a/gdb/process-stratum-target.h b/gdb/process-stratum-target.h
> index c8060c46be93..3cea911dee09 100644
> --- a/gdb/process-stratum-target.h
> +++ b/gdb/process-stratum-target.h
> @@ -63,19 +63,10 @@ class process_stratum_target : public target_ops
>    bool has_registers () override;
>    bool has_execution (inferior *inf) override;
>  
> -  /* Commit a series of resumption requests previously prepared with
> -     resume calls.
> +  /* Ensure that all resumed threads are committed to the target.
>  
> -     GDB always calls `commit_resume` on the process stratum target after
> -     calling `resume` on a target stack.  A process stratum target may thus use
> -     this method in coordination with its `resume` method to batch resumption
> -     requests.  In that case, the target doesn't actually resume in its
> -     `resume` implementation.  Instead, it takes note of resumption intent in
> -     `resume`, and defers the actual resumption `commit_resume`.
> -
> -     E.g., the remote target uses this to coalesce multiple resumption requests
> -     in a single vCont packet.  */
> -  virtual void commit_resume () {}
> +     See the description of COMMIT_RESUMED_STATE for more details.  */
> +  virtual void commit_resumed () {}
>  
>    /* True if any thread is, or may be executing.  We need to track
>       this separately because until we fully sync the thread list, we
> @@ -86,6 +77,35 @@ class process_stratum_target : public target_ops
>  
>    /* The connection number.  Visible in "info connections".  */
>    int connection_number = 0;
> +
> +  /* Whether resumed threads must be committed to the target.
> +
> +     When true, resumed threads must be committed to the execution target.
> +
> +     When false, the process stratum target may leave resumed threads stopped
> +     when it's convenient or efficient to do so.  When the core requires resumed
> +     threads to be committed again, this is set back to true and calls the
> +     `commit_resumed` method to allow the target to do so.
> +
> +     To simplify the implementation of process stratum targets, the following
> +     methods are guaranteed to be called with COMMIT_RESUMED_STATE set to
> +     false:
> +
> +       - resume
> +       - stop
> +       - wait

Should we mention this in the documentation of each of these methods?

> +
> +     Knowing this, the process stratum target doesn't need to implement
> +     different behaviors depending on the COMMIT_RESUMED_STATE, and can
> +     simply assert that it is false.
> +
> +     Process stratum targets can take advantage of this to batch resumption
> +     requests, for example.  In that case, the target doesn't actually resume in
> +     its `resume` implementation.  Instead, it takes note of the resumption
> +     intent in `resume` and defers the actual resumption to `commit_resumed`.
> +     For example, the remote target uses this to coalesce multiple resumption
> +     requests in a single vCont packet.  */
> +  bool commit_resumed_state = false;
>  };


> @@ -6656,6 +6660,9 @@ remote_target::commit_resume ()
>  	  continue;
>  	}
>  
> +      if (priv->resume_state () == resume_state::RESUMED_PENDING_VCONT)
> +	any_pending_vcont_resume = true;
> +
>        /* If a thread is the parent of an unfollowed fork, then we
>  	 can't do a global wildcard, as that would resume the fork
>  	 child.  */
> @@ -6663,6 +6670,11 @@ remote_target::commit_resume ()
>  	may_global_wildcard_vcont = 0;
>      }
>  
> +  /* We didn't have any resumed thread pending a vCont resume, so nothing to
> +     do.  */
> +  if (!any_pending_vcont_resume)
> +    return;

Is this just an optimization you noticed, or something more related to
this patch?

--------------6990BCA83D49BB10287A4A5A
Content-Type: text/x-csrc; charset=UTF-8;
 name="disp-step-buffers-test.c"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="disp-step-buffers-test.c"

#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

#define NUM_THREADS 100

static pthread_t child_thread[NUM_THREADS];
static unsigned long long counters[NUM_THREADS];
static volatile int done;

static void *
child_function (void *arg)
{
  while (!done)
    counters[(long) arg]++;   // set breakpoint here
  return NULL;
}

int
main (void)
{
  long i;

  for (i = 0; i < NUM_THREADS; i++)
    pthread_create (&child_thread[i], NULL, child_function, (void *) i);

  sleep (10);

  done = 1;

  for (i = 0; i < NUM_THREADS; i++)
    pthread_join (child_thread[i], NULL);

  double avg = 0;
  for (i = 0; i < NUM_THREADS; i++)
    {
      printf ("thread %02ld, count %llu\n", i, counters[i]);
      avg += counters[i];
    }

  double f = avg;
  f /= NUM_THREADS;

  printf ("avg              %f\n", f);

  return 0;
}

--------------6990BCA83D49BB10287A4A5A--