From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca by simark.ca with LMTP id 1P+vNn8hwF9RNwAAWB0awg (envelope-from ) for ; Thu, 26 Nov 2020 16:43:27 -0500 Received: by simark.ca (Postfix, from userid 112) id D66B61F0AC; Thu, 26 Nov 2020 16:43:27 -0500 (EST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on simark.ca X-Spam-Level: X-Spam-Status: No, score=0.3 required=5.0 tests=MAILING_LIST_MULTI,RDNS_NONE autolearn=no autolearn_force=no version=3.4.2 Received: from sourceware.org (unknown [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPS id 7E09B1E776 for ; Thu, 26 Nov 2020 16:43:27 -0500 (EST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 13326397241B; Thu, 26 Nov 2020 21:43:27 +0000 (GMT) Received: from simark.ca (simark.ca [158.69.221.121]) by sourceware.org (Postfix) with ESMTPS id AF6E8397242C for ; Thu, 26 Nov 2020 21:43:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org AF6E8397242C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=simark.ca Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=simark@simark.ca Received: from [10.0.0.11] (173-246-6-90.qc.cable.ebox.net [173.246.6.90]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPSA id 085F91E552; Thu, 26 Nov 2020 16:43:23 -0500 (EST) Subject: Re: [PATCH 12/12] gdb: use two displaced step buffers on amd64/Linux To: Pedro Alves , Simon Marchi , gdb-patches@sourceware.org References: <20201110214614.2842615-1-simon.marchi@efficios.com> <20201110214614.2842615-13-simon.marchi@efficios.com> <7040e2ee-4d28-a83e-22df-20b2ace082bb@palves.net> <88518922-ffb6-f221-f3b8-569c5577ae5a@simark.ca> <91426053-1ce6-3154-3635-cef5390248f4@simark.ca> <9d717009-facf-66b8-6846-e687bee2ad1d@simark.ca> From: Simon Marchi Message-ID: Date: Thu, 26 Nov 2020 16:43:22 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: <9d717009-facf-66b8-6846-e687bee2ad1d@simark.ca> Content-Type: text/plain; charset=utf-8 Content-Language: fr Content-Transfer-Encoding: 7bit X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gdb-patches-bounces@sourceware.org Sender: "Gdb-patches" On 2020-11-25 3:56 p.m., Simon Marchi wrote: > On 2020-11-25 3:07 p.m., Simon Marchi wrote: >> We still have to figure out why the performance regresses compared to >> master. One thing is that we still do an extra resume attempt each >> time. We do one resume where the disp step prepare succeeds and one >> more where it fails. That's twice as much work as before, where we did >> one successful prepare and then skipped the next ones. I'll make an >> experimental change such that the arch can say "the prepare worked, but >> now I'm out of buffer", just to see how that improves (or not) >> performance. > > That made just a small difference. Let's call this new patch E: > > #9 + A + B + C + D + E: 105,727 Ok, I played a bit more with this and found a case where we do more work that may explain the difference. handle_signal_stop calls finish_step_over, which calls start_step_over. Let's say that the displaced step buffer is used by some thread and another thread either hits the breakpoint, we go into start_step_over to see if we can start a new step over. We can't, because the event thread isn't the one that is using the displaced stepping buffer, the buffer is still occupied. In the pre-patch code, that early check in start_step_over: /* If this inferior already has a displaced step in process, don't start a new one. */ if (displaced_step_in_progress (tp->inf)) continue; means that we won't even try one resumption, we'll skip over all threads in the chain (the only cost is the cost of iteration). In the patch I previously called "C", start_step_over starts with the assumption that the inferior has an available buffer, and needs to do at least one resumption attempt that returns "unavailable" before it assumes there are no more available. That one unnecessary resumption attempt happens over and over and ends up being costly. I changed my patch C to make the "unavailable" property persistent, instead of just using it locally in start_step_over. If a displaced step prepare returns "unavailable", we mark that this inferior has no more buffers available, and only reset this flag when a displaced step for this inferior finishes. Doing so, when a thread hits the breakpoint and start_step_over is called, the unavailable flag is already set. Last time we tried to prepare a displaced step, there were no buffers available. And if no displaced step finished for this inferior since, then why bother asking? As a result, we don't even try one resume if the buffers are already take, just like with the pre-patch code. I re-did some tests with the new patch C: master: 110,736 9: 19,973 9 + A + B + C: 107,711 9 + A + B + C + E: 111,135 Everything up to using two displaced stepping buffers: 110,697 Note that I dropped patch D, which was to break out of the start_step_over loop when getting "unavailable". It didn't help much and wasn't desired functionality anyway. And for the reminder, patch E is that when returning "_OK", the arch can also tell us "that was the last one, any more request would return unavailable". So that allows to do just one resumption attempt (at most) per start_step_over invocation, rather than 2. What this shows is that by using everything up to patch E, we are pretty much on par with the performance, but with the more flexible architecture. That makes sense, as (I think) we do pretty much the same work. Re-thinking about patch C and E, I'm thinking that we could let the architecture's displaced stepping implementation decide when it sets the "unavailable" flag, instead of the core of GDB setting it. The architecture would then choose the policy it wants to adopt: - set the flag when it used its last buffer (equivalent to patch E) - set the flag when it returns "_UNAVAILABLE" (equivalent to patch C') - never set the unavavaible flag (equivalent to patch 9) It also shows that using two buffers on x86-64 doesn't improve performance (unless there's another unfortunate bottleneck I haven't found). But that's what I expected. My hypothesis is that the actual step portion of the displaced step is so fast that we never really use the buffers in parallel. If we start two displaced steps, by the time we prepare the second one, the first one has already finished and is waiting for GDB to clean it up. If for some reason we could displaced step an instruction that took, say, 1 second to execute, then having many displaced step buffers would be useful, because GDB would be able to start them all and have them really run in parallel. I'll prepare a user branch with all those fixup patches so you can take a look, and if you think it makes sense I'll be ready for a v2. Simon