From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-bounces@sourceware.org>
Received: from simark.ca
	by simark.ca with LMTP
	id rNd5OYy5vl9EFwAAWB0awg
	(envelope-from <gdb-patches-bounces@sourceware.org>)
	for <public-inbox@simark.ca>; Wed, 25 Nov 2020 15:07:40 -0500
Received: by simark.ca (Postfix, from userid 112)
	id E45B21F0AB; Wed, 25 Nov 2020 15:07:40 -0500 (EST)
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on simark.ca
X-Spam-Level: 
X-Spam-Status: No, score=0.3 required=5.0 tests=MAILING_LIST_MULTI,RDNS_NONE
	autolearn=no autolearn_force=no version=3.4.2
Received: from sourceware.org (unknown [8.43.85.97])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
	(No client certificate requested)
	by simark.ca (Postfix) with ESMTPS id 9B2C51E552
	for <public-inbox@simark.ca>; Wed, 25 Nov 2020 15:07:40 -0500 (EST)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 39636384A032;
	Wed, 25 Nov 2020 20:07:40 +0000 (GMT)
Received: from simark.ca (simark.ca [158.69.221.121])
 by sourceware.org (Postfix) with ESMTPS id B579D3851C1A
 for <gdb-patches@sourceware.org>; Wed, 25 Nov 2020 20:07:37 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org B579D3851C1A
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=simark.ca
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=simark@simark.ca
Received: from [10.0.0.11] (173-246-6-90.qc.cable.ebox.net [173.246.6.90])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by simark.ca (Postfix) with ESMTPSA id 3BDE71E552;
 Wed, 25 Nov 2020 15:07:37 -0500 (EST)
Subject: Re: [PATCH 12/12] gdb: use two displaced step buffers on amd64/Linux
To: Pedro Alves <pedro@palves.net>, Simon Marchi <simon.marchi@efficios.com>, 
 gdb-patches@sourceware.org
References: <20201110214614.2842615-1-simon.marchi@efficios.com>
 <20201110214614.2842615-13-simon.marchi@efficios.com>
 <7040e2ee-4d28-a83e-22df-20b2ace082bb@palves.net>
 <88518922-ffb6-f221-f3b8-569c5577ae5a@simark.ca>
From: Simon Marchi <simark@simark.ca>
Message-ID: <91426053-1ce6-3154-3635-cef5390248f4@simark.ca>
Date: Wed, 25 Nov 2020 15:07:36 -0500
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
 Thunderbird/68.12.0
MIME-Version: 1.0
In-Reply-To: <88518922-ffb6-f221-f3b8-569c5577ae5a@simark.ca>
Content-Type: text/plain; charset=utf-8
Content-Language: fr
Content-Transfer-Encoding: 7bit
X-BeenThere: gdb-patches@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gdb-patches mailing list <gdb-patches.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=subscribe>
Errors-To: gdb-patches-bounces@sourceware.org
Sender: "Gdb-patches" <gdb-patches-bounces@sourceware.org>

On 2020-11-25 1:26 a.m., Simon Marchi wrote:
> Since patches C and D are about reducing the work in start_step_over, I
> think it shows clearly that looping over all threads there is what gives
> the biggest hit.  Before this patch, we break out as soon as we manage
> to initiate a displaced step, whereas in patch 9 we constantly go
> through all threads in the list

Sorry, that's incorrect.  Before this patch, when a displaced step is
already started in an inferior, we skip any subsequent thread for that
inferior.  It's done very early, before calling
thread_still_needs_step_over.

If I modify patch C to do the same, it looks more like this:

(And to clarify my previous mail, it wasn't clear when I said "#9 + A",
"9 + B" and so on.  It should say "#9 + A", "#9 + A + B", and so on.
Otherwise it gives the impression I tested the fixup patches
individually, which I didn't.)

master:             109,362
#9:                  19,815
#9 + A:              19,463
#9 + A + B:          20,152
#9 + A + B + C:     101,170
#9 + A + B + C + D: 103,948

And for fun, with two buffers:

#9 + A + B + C + D + #10 to #12: 105,864

So, with those mitigations (especially patch C), it's not as bad as it
was, but still slower than master.  And even when using 2 buffers, which
is meant to speed things up, so it's not good.

So, patch C makes implements that when a displaced step prepare returns
"unavailable" for an inferior, we don't try to start any more for that
inferior (for the rest of that start_step_over call).  That
unfortunately does not allow to implement "perfect" displaced step
buffer sharing though.

Maybe we can settle for a compromise though, since sharing buffers is
just an optimization and not required: implement something like patch C,
but also implement buffer sharing for threads that need to step the same
PC.  If the threads are ordered in the chain in a lucky way, in a way
that multiple threads at the same PC are handled before the buffers are
all occupied, then these threads will share a buffer.  Once all buffers
are occupied, we quit, even if there might more threads able to share a
buffer further in the list.

To illustrate, let's say you have these threads in the step over chain,
that require to step over PC A, B and C:

  T1 at PC A
  T2 at PC A
  T3 at PC B
  T4 at PC A
  T5 at PC C
  T6 at PC A

With two buffers, we'll be able to accomodate the first 4 threads.  When
we try to prepare the disp step for T5, the implemention will return
"unavailable", so T6 won't even be considered.  So in the end T1, T2 and
T4 will share a buffer whle T3 will be by itself in the other buffer.

I think that strikes a good balance because as long as we didn't get an
"unavailable", we do expensive work to try to prepare some displaced
steps, but it's useful work since we'll actually resume these threads.
But after that, we plow our way through a list of hundreds of threads,
doing expensive work for each, with the hope of finding some for which
the prepare may work.  That's not a very good investment (well, it very
much depends on the workload).

We still have to figure out why the performance regresses compared to
master.  One thing is that we still do an extra resume attempt each
time.  We do one resume where the disp step prepare succeeds and one
more where it fails.  That's twice as much work as before, where we did
one successful prepare and then skipped the next ones.  I'll make an
experimental change such that the arch can say "the prepare worked, but
now I'm out of buffer", just to see how that improves (or not)
performance.

But I'm hitting send on this email now, so that you don't spend too much
time rebuking the lies of my previous email :).

Simon