From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca by simark.ca with LMTP id aLJSC0+dwl+5bwAAWB0awg (envelope-from ) for ; Sat, 28 Nov 2020 13:56:15 -0500 Received: by simark.ca (Postfix, from userid 112) id 29DDF1F0AB; Sat, 28 Nov 2020 13:56:15 -0500 (EST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on simark.ca X-Spam-Level: X-Spam-Status: No, score=0.3 required=5.0 tests=MAILING_LIST_MULTI,RDNS_NONE autolearn=no autolearn_force=no version=3.4.2 Received: from sourceware.org (unknown [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPS id 947BE1E552 for ; Sat, 28 Nov 2020 13:56:14 -0500 (EST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0961D3857C71; Sat, 28 Nov 2020 18:56:14 +0000 (GMT) Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com [209.85.128.68]) by sourceware.org (Postfix) with ESMTPS id 817E83857C71 for ; Sat, 28 Nov 2020 18:56:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 817E83857C71 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=palves.net Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=alves.ped@gmail.com Received: by mail-wm1-f68.google.com with SMTP id w24so10332646wmi.0 for ; Sat, 28 Nov 2020 10:56:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=0wsaWmB0a5gr1Kluo5exx26abcITSt8+R/RLbA5pkqY=; b=Ak5m7cGaHJaxxInNXoRJkh3PaxFjdYQalDXF3Ir1PCGL+Rs4q9aOQr0VLp+zxeSUDp fm0qaSHbqhvEs4t0KJnATLIZ3ol8tp+AnjSSQn10fyyNPekuSx7JtGpwdzq548Fl/ATh T4LYUOxfNpp1E+86TbAHvAIE3wigKfVUW4IN8AkQalJiVDJ0TLF4wJcCSh6rmFp3VdrZ eH/0dFzz6qg2w6MMMaRzcKPhPLzGp5Qq5SSLdt2WLUpcMRR5ghpmOsOAlGCM1193vB5b dRexi0f7a1TE1n4X1vn8TXSeuqOZdfdM9ICEPsCcL5XEssCkuixDK4lhvhrewY/YVY5A +8YA== X-Gm-Message-State: AOAM530QfjzRY9jtRJOwCJyc92iEe2ImEPci1YtokDjNFhKmHgZSHkr5 YUyqwa3GYTcebpQ6e16WN6PLuGqjU7PfhLD5 X-Google-Smtp-Source: ABdhPJxCAQD2cAvM2b9UA/p+zYNmVTAlyPxkJku5RRDRUzsmqq5NJe4Ibu9g8f2tEitFgYWQPxf6fg== X-Received: by 2002:a1c:5f8a:: with SMTP id t132mr8701605wmb.121.1606589769295; Sat, 28 Nov 2020 10:56:09 -0800 (PST) Received: from ?IPv6:2001:8a0:f91f:e900:1d90:d745:3c32:c159? ([2001:8a0:f91f:e900:1d90:d745:3c32:c159]) by smtp.gmail.com with ESMTPSA id g25sm18053815wmk.5.2020.11.28.10.56.07 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 28 Nov 2020 10:56:07 -0800 (PST) Subject: Re: [PATCH 12/12] gdb: use two displaced step buffers on amd64/Linux To: Simon Marchi , Simon Marchi , gdb-patches@sourceware.org References: <20201110214614.2842615-1-simon.marchi@efficios.com> <20201110214614.2842615-13-simon.marchi@efficios.com> <7040e2ee-4d28-a83e-22df-20b2ace082bb@palves.net> <88518922-ffb6-f221-f3b8-569c5577ae5a@simark.ca> <91426053-1ce6-3154-3635-cef5390248f4@simark.ca> <9d717009-facf-66b8-6846-e687bee2ad1d@simark.ca> From: Pedro Alves Message-ID: Date: Sat, 28 Nov 2020 18:56:06 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gdb-patches-bounces@sourceware.org Sender: "Gdb-patches" On 11/26/20 9:43 PM, Simon Marchi wrote: > On 2020-11-25 3:56 p.m., Simon Marchi wrote: >> On 2020-11-25 3:07 p.m., Simon Marchi wrote: >>> We still have to figure out why the performance regresses compared to >>> master. One thing is that we still do an extra resume attempt each >>> time. We do one resume where the disp step prepare succeeds and one >>> more where it fails. That's twice as much work as before, where we did >>> one successful prepare and then skipped the next ones. I'll make an >>> experimental change such that the arch can say "the prepare worked, but >>> now I'm out of buffer", just to see how that improves (or not) >>> performance. >> >> That made just a small difference. Let's call this new patch E: >> >> #9 + A + B + C + D + E: 105,727 > > Ok, I played a bit more with this and found a case where we do more > work that may explain the difference. > > handle_signal_stop calls finish_step_over, which calls start_step_over. > Let's say that the displaced step buffer is used by some thread and > another thread either hits the breakpoint, we go into start_step_over to > see if we can start a new step over. We can't, because the event thread > isn't the one that is using the displaced stepping buffer, the buffer is > still occupied. > > In the pre-patch code, that early check in start_step_over: > > /* If this inferior already has a displaced step in process, > don't start a new one. */ > if (displaced_step_in_progress (tp->inf)) > continue; > > means that we won't even try one resumption, we'll skip over all threads > in the chain (the only cost is the cost of iteration). > > In the patch I previously called "C", start_step_over starts with the > assumption that the inferior has an available buffer, and needs to do at > least one resumption attempt that returns "unavailable" before it > assumes there are no more available. That one unnecessary resumption > attempt happens over and over and ends up being costly. > > I changed my patch C to make the "unavailable" property persistent, > instead of just using it locally in start_step_over. If a displaced > step prepare returns "unavailable", we mark that this inferior has no > more buffers available, and only reset this flag when a displaced step > for this inferior finishes. Doing so, when a thread hits the breakpoint > and start_step_over is called, the unavailable flag is already set. > Last time we tried to prepare a displaced step, there were no buffers > available. And if no displaced step finished for this inferior since, > then why bother asking? As a result, we don't even try one resume if > the buffers are already take, just like with the pre-patch code. > > I re-did some tests with the new patch C: > > master: 110,736 > 9: 19,973 > 9 + A + B + C: 107,711 > 9 + A + B + C + E: 111,135 > Everything up to > using two displaced > stepping buffers: 110,697 > > Note that I dropped patch D, which was to break out of the > start_step_over loop when getting "unavailable". It didn't help much > and wasn't desired functionality anyway. > > And for the reminder, patch E is that when returning "_OK", the arch can > also tell us "that was the last one, any more request would return > unavailable". So that allows to do just one resumption attempt (at > most) per start_step_over invocation, rather than 2. > > What this shows is that by using everything up to patch E, we are pretty > much on par with the performance, but with the more flexible > architecture. That makes sense, as (I think) we do pretty much the same > work. > > Re-thinking about patch C and E, I'm thinking that we could let the > architecture's displaced stepping implementation decide when it sets the > "unavailable" flag, instead of the core of GDB setting it. The > architecture would then choose the policy it wants to adopt: > > - set the flag when it used its last buffer (equivalent to patch E) > - set the flag when it returns "_UNAVAILABLE" (equivalent to patch C') > - never set the unavavaible flag (equivalent to patch 9) > > It also shows that using two buffers on x86-64 doesn't improve > performance (unless there's another unfortunate bottleneck I haven't > found). But that's what I expected. My hypothesis is that the actual > step portion of the displaced step is so fast that we never really use > the buffers in parallel. If we start two displaced steps, by the time > we prepare the second one, the first one has already finished and is > waiting for GDB to clean it up. If for some reason we could displaced > step an instruction that took, say, 1 second to execute, then having > many displaced step buffers would be useful, because GDB would be able > to start them all and have them really run in parallel. > > I'll prepare a user branch with all those fixup patches so you can take > a look, and if you think it makes sense I'll be ready for a v2. I took a look and your branch, and it makes sense to me. I run the perf test on my end, and the performance gap is practically gone on my end as well. I'm happy with that.