From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca by simark.ca with LMTP id B21bOyH5vV/zAQAAWB0awg (envelope-from ) for ; Wed, 25 Nov 2020 01:26:41 -0500 Received: by simark.ca (Postfix, from userid 112) id E5DBD1F0AC; Wed, 25 Nov 2020 01:26:41 -0500 (EST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on simark.ca X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPS id 3CE8A1F086 for ; Wed, 25 Nov 2020 01:26:40 -0500 (EST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 72B783870913; Wed, 25 Nov 2020 06:26:39 +0000 (GMT) Received: from simark.ca (simark.ca [158.69.221.121]) by sourceware.org (Postfix) with ESMTPS id 3242A3857809 for ; Wed, 25 Nov 2020 06:26:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 3242A3857809 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=simark.ca Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=simark@simark.ca Received: from [10.0.0.11] (173-246-6-90.qc.cable.ebox.net [173.246.6.90]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPSA id A0A901E552; Wed, 25 Nov 2020 01:26:35 -0500 (EST) Subject: Re: [PATCH 12/12] gdb: use two displaced step buffers on amd64/Linux To: Pedro Alves , Simon Marchi , gdb-patches@sourceware.org References: <20201110214614.2842615-1-simon.marchi@efficios.com> <20201110214614.2842615-13-simon.marchi@efficios.com> <7040e2ee-4d28-a83e-22df-20b2ace082bb@palves.net> From: Simon Marchi Message-ID: <88518922-ffb6-f221-f3b8-569c5577ae5a@simark.ca> Date: Wed, 25 Nov 2020 01:26:35 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: <7040e2ee-4d28-a83e-22df-20b2ace082bb@palves.net> Content-Type: text/plain; charset=utf-8 Content-Language: fr Content-Transfer-Encoding: 7bit X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gdb-patches-bounces@sourceware.org Sender: "Gdb-patches" On 2020-11-24 8:42 p.m., Pedro Alves wrote: >> Even when stress-testing the implementation, by making many threads do >> displaced steps over and over, I didn't see a significant performance (I >> confirmed that the two buffers were used by checking the "set debug >> displaced" logs though). However, this patch mostly helps make the >> feature testable by anybody with an AMD64/Linux machine, so I think it's >> useful. > > It should speed up the use case of multiple threads hitting a conditional > breakpoint that evals false at the same time. The more buffers, the less > time threads need to spend waiting for their turn in the displaced stepping > queue. If that doesn't speed things up, then probably something isn't quite > right. > > /me gives it a try. > > Here's a testcase that can be used to observe speed and fairness. > It has 10 threads running a single tight loop for 3 seconds. After > the 3 seconds, the threads exit and the number of iterations each > thread managed to do is printed. The idea is to set a breakpoint > with a condition that evals false on the loop. The higher the > number, the better. > > I compiled it with: > > $ gcc disp-step-buffers-test.c -o disp-step-buffers-test -g3 -O2 -pthread > > Here's a run outside gdb: > > $ /tmp/disp-step-buffers-test > thread 0, count 12785417966 > thread 1, count 12784090476 > thread 2, count 12773373753 > thread 3, count 12782542707 > thread 4, count 12767835404 > thread 5, count 12754382637 > thread 6, count 12783454775 > thread 7, count 12605966064 > thread 8, count 12635255271 > thread 9, count 12783436261 > > Here's the test: > > $ cat /tmp/disp-step-buffers-test.c > #include > #include > #include > #include > > #define NUM_THREADS 10 > > static pthread_t child_thread[NUM_THREADS]; > static unsigned long long counters[NUM_THREADS]; > static volatile int done; > > static void * > child_function (void *arg) > { > while (!done) > counters[(long) arg]++; // set breakpoint here > return NULL; > } > > int > main (void) > { > long i; > > for (i = 0; i < NUM_THREADS; i++) > pthread_create (&child_thread[i], NULL, child_function, (void *) i); > > sleep (3); > > done = 1; > > for (i = 0; i < NUM_THREADS; i++) > pthread_join (child_thread[i], NULL); > > for (i = 0; i < NUM_THREADS; i++) > printf ("thread %ld, count %llu\n", i, counters[i]); > return 0; > } > > And here are the results under gdb. > > Test #1, pristine master (119e99bb7f50): > > (gdb) b 16 if 0 > Breakpoint 1 at 0x1290: file disp-step-buffers-test.c, line 16. > (gdb) r > Starting program: /tmp/disp-step-buffers-test > thread 0, count 1661 > thread 1, count 1663 > thread 2, count 1646 > thread 3, count 1663 > thread 4, count 1622 > thread 5, count 1661 > thread 6, count 1659 > thread 7, count 1662 > thread 8, count 1660 > thread 9, count 1660 > [Inferior 1 (process 18852) exited normally] > (gdb) > > Test #2, patches 1 through 11 applied, i.e., 1 buffer: > > (gdb) b 16 if 0 > Breakpoint 1 at 0x1290: file disp-step-buffers-test.c, line 16. > (gdb) r > Starting program: /tmp/disp-step-buffers-test > ... > thread 0, count 966 > thread 1, count 950 > thread 2, count 951 > thread 3, count 950 > thread 4, count 946 > thread 5, count 948 > thread 6, count 946 > thread 7, count 979 > thread 8, count 951 > thread 9, count 966 > [Inferior 1 (process 16099) exited normally] > (gdb) > > Test #3, patches 1 through 12 applied, i.e., 2 buffers: > > (gdb) r > Starting program: /tmp/disp-step-buffers-test > ... > thread 0, count 1124 > thread 1, count 1128 > thread 2, count 1127 > thread 3, count 1123 > thread 4, count 1121 > thread 5, count 1125 > thread 6, count 1126 > thread 7, count 1126 > thread 8, count 1122 > thread 9, count 1122 > [Inferior 1 (process 14983) exited normally] > > Wow, test #2 was surprising, it's twice as slow as current > master. That was unexpected. > > Test #3 with the two buffers improves the speed a bit, but still > quite behind current master. I think this should be sorted out, > at least understood. > > I checked, and it's patch #9 the one that introduces the slowdown: > > gdb: move displaced stepping logic to gdbarch, allow starting concurrent displaced steps > > so, I'd suspect some new linear thread walks are causing this? > E.g., displaced_step_in_progress. Maybe we need to keep counters > of number of threads displaced stepping instead of walking all threads, > for example. I didn't expect this to be observable with only 10 threads. > I'd expect syscall time / ptrace, etc. to dominate. Maybe it's > something else? > > The above numbers were with an -O0 gdb. I tried again against a gdb built > at -O2, and the absolute numbers were of course better, but the > slowdown is still observed. I also tried with different numbers > of threads. Here's what I see: > > |threads | master | patch #11 | patch #12| > |----------------------------------------| > | 10 | 3959 | 2847 | 3055| > | 20 | 1806 | 1055 | 1134| > | 50 | 690 | 258 | 258| > I can reproduce those results, thanks for the benchmark progam. I previously tried to measure how much time it takes to have N threads do M displaced steps. measuring the whole execution run time of GDB, but that wasn't very precise. Your method appears to give more reproducible results. I modified your program slightly such that it sums the counters for all threads at the end and prints it. I think that looking at the metric "total number of displaced steps done in that period of time (across all threads)" makes sense. I focused on patch 9 and ignored 10-12, since it's patch 9 that is the main culprit. I made 4 experimental patches on top of patch 9, which are appended at the bottom of this email (I think you can apply all of them with on big git-am). They are: A. Implement displaced_step_in_progress without linear walk on thread list I followed your suggestion and added a counter of the number of displaced steps in progress in each inferior. That allows implementing displaced_step_in_progress_any_thread and displaced_step_in_progress more efficiently thank walking all threads. B. Append remaining threads to step in O(1) I also followed you suggestion and made it so we append remaining threads to step by fiddling with linked list pointers, in start_step_over. C. Stop trying to prepare displaced steps for an inferior when no available buffers When the target tells us there is no buffer available for a thread, skip any subsequent thread of the same inferior for the rest of start_step_over. This won't reduce the number of iterations in start_step_over, but it means we'll do much less unnecessary attempts at resuming threads when we know it won't work. I didn't include this in my original patchset, because this won't work if we want to allow arches to do buffer sharing. It could return UNAVAILABLE for a thread, but then return OK for a subsequent thread, if that thread happens to be able to share a buffer with another thread displaced stepping the same PC. Perhaps we could have a way to ask arches "do you do buffer sharing?", and if not we can apply this optimization. However, since I was planning on implementing buffer sharing as a follow-up, we'll be back in the same boat then... But I still included this patch so you can see the speedup it provides. D. Break loop on unavailable This is not correct either, but it shows that lots of time is lost in start_step_over. It make start_step_over break out of the loop as soon as we get one unavailable. If we had hindsight and could know that no thread further in the list will be able to initiate a displaced step, we could break out of the loop immediatly. And this is the kind of speedup we'd get. I ran the test with 100 threads to try to expose more the inefficiencies and for a bit longer (20 seconds) to try to reduce the variations. master: 109,015 #9: 20,304 #9 + A: 19,807 #9 + B: 20,499 #9 + C: 66,468 #9 + D: 103,406 Since patches C and D are about reducing the work in start_step_over, I think it shows clearly that looping over all threads there is what gives the biggest hit. Before this patch, we break out as soon as we manage to initiate a displaced step, whereas in patch 9 we constantly go through all threads in the list Patch C saves us the cost of all those unnecessary switch_to_thread/keep_going_pass_signal. Patch D saves us the cost of what is above that (mostly thread_still_needs_step_over I guess), plus the actual iteration cost. Simon >From 12f90dbe8522afef9c619766c403095f6adcac77 Mon Sep 17 00:00:00 2001 From: Simon Marchi Date: Tue, 24 Nov 2020 22:12:55 -0500 Subject: [PATCH 1/4] Implement displaced_step_in_progress without linear walk on thread list Keep a counter of the number of active displaced steps in each inferior. This allows implementing displaced_step_in_progress and displaced_step_in_progress_any_thread more efficiently. Change-Id: If2609856f339055ad326652021e6ca911419bdc9 --- gdb/displaced-stepping.h | 3 +++ gdb/infrun.c | 17 ++++++++--------- 2 files changed, 11 insertions(+), 9 deletions(-) diff --git a/gdb/displaced-stepping.h b/gdb/displaced-stepping.h index 6c1da46777c2..8de2b98bc7dc 100644 --- a/gdb/displaced-stepping.h +++ b/gdb/displaced-stepping.h @@ -100,11 +100,14 @@ struct displaced_step_inferior_state void reset () { failed_before = false; + active_count = 0; } /* True if preparing a displaced step ever failed. If so, we won't try displaced stepping for this inferior again. */ bool failed_before; + + unsigned int active_count; }; /* Per-thread displaced stepping state. */ diff --git a/gdb/infrun.c b/gdb/infrun.c index 521f4a65f0f7..f659de2a474f 100644 --- a/gdb/infrun.c +++ b/gdb/infrun.c @@ -1483,13 +1483,7 @@ displaced_step_in_progress_thread (thread_info *thread) static bool displaced_step_in_progress (inferior *inf) { - for (thread_info *thread : inf->non_exited_threads ()) - { - if (displaced_step_in_progress_thread (thread)) - return true; - } - - return false; + return inf->displaced_step_state.active_count > 0; } /* Return true if any thread is doing a displaced step. */ @@ -1497,9 +1491,9 @@ displaced_step_in_progress (inferior *inf) static bool displaced_step_in_progress_any_thread () { - for (thread_info *thread : all_non_exited_threads ()) + for (inferior *inf : all_non_exited_inferiors ()) { - if (displaced_step_in_progress_thread (thread)) + if (displaced_step_in_progress (inf)) return true; } @@ -1709,6 +1703,8 @@ displaced_step_prepare_throw (thread_info *tp) paddress (gdbarch, original_pc), paddress (gdbarch, displaced_pc)); + tp->inf->displaced_step_state.active_count++; + return DISPLACED_STEP_PREPARE_STATUS_OK; } @@ -1780,6 +1776,9 @@ displaced_step_finish (thread_info *event_thread, enum gdb_signal signal) displaced_step_reset_cleanup cleanup (displaced); + gdb_assert (event_thread->inf->displaced_step_state.active_count > 0); + event_thread->inf->displaced_step_state.active_count--; + /* Do the fixup, and release the resources acquired to do the displaced step. */ return gdbarch_displaced_step_finish (displaced->get_original_gdbarch (), -- 2.29.2 >From 41d2e7ede90945888fd3b24c819d66a3d7b947ae Mon Sep 17 00:00:00 2001 From: Simon Marchi Date: Tue, 24 Nov 2020 22:43:15 -0500 Subject: [PATCH 2/4] Append remaining threads to step in O(1) Add a function to enqueue a whole chain to the global thread step over chain. Change-Id: Ifca37af2006ce132348aa66c97294c13bcd09bf4 --- gdb/gdbthread.h | 2 ++ gdb/infrun.c | 12 +----------- gdb/thread.c | 22 ++++++++++++++++++++++ 3 files changed, 25 insertions(+), 11 deletions(-) diff --git a/gdb/gdbthread.h b/gdb/gdbthread.h index 71dc14711be5..cbaee420546e 100644 --- a/gdb/gdbthread.h +++ b/gdb/gdbthread.h @@ -748,6 +748,8 @@ extern bool value_in_thread_stack_temporaries (struct value *, /* Add TP to the end of the global pending step-over chain. */ extern void global_thread_step_over_chain_enqueue (struct thread_info *tp); +extern void global_thread_step_over_chain_enqueue_chain + (thread_info *chain_head); /* Remove TP from step-over chain LIST_P. */ diff --git a/gdb/infrun.c b/gdb/infrun.c index f659de2a474f..4c6a06d053e3 100644 --- a/gdb/infrun.c +++ b/gdb/infrun.c @@ -1954,17 +1954,7 @@ start_step_over (void) infrun_debug_printf ("putting back %d threads to step in global queue", thread_step_over_chain_length (threads_to_step)); - while (threads_to_step != nullptr) - { - thread_info *thread = threads_to_step; - - /* Remove from that list. */ - thread_step_over_chain_remove (&threads_to_step, thread); - - /* Add to global list. */ - global_thread_step_over_chain_enqueue (thread); - - } + global_thread_step_over_chain_enqueue_chain (threads_to_step); } return started; diff --git a/gdb/thread.c b/gdb/thread.c index a3c9aed1754a..d82e67732812 100644 --- a/gdb/thread.c +++ b/gdb/thread.c @@ -442,6 +442,28 @@ global_thread_step_over_chain_enqueue (struct thread_info *tp) /* See gdbthread.h. */ +void +global_thread_step_over_chain_enqueue_chain (thread_info *chain_head) +{ + gdb_assert (chain_head->step_over_next != nullptr); + gdb_assert (chain_head->step_over_prev != nullptr); + + if (global_thread_step_over_chain_head == nullptr) + global_thread_step_over_chain_head = chain_head; + else + { + thread_info *global_last = global_thread_step_over_chain_head->step_over_prev; + thread_info *chain_last = chain_head->step_over_prev; + + chain_last->step_over_next = global_thread_step_over_chain_head; + global_last->step_over_next = chain_head; + global_thread_step_over_chain_head->step_over_prev = chain_last; + chain_head->step_over_prev = global_last; + } +} + +/* See gdbthread.h. */ + void global_thread_step_over_chain_remove (struct thread_info *tp) { -- 2.29.2 >From 5ebafa794f3f74dcfe9a412bc58cab2a2c5d04d4 Mon Sep 17 00:00:00 2001 From: Simon Marchi Date: Mon, 16 Mar 2020 17:21:55 -0400 Subject: [PATCH 3/4] Stop trying to prepare displaced steps for an inferior when no available buffers Once we got one _UNAVAILABLE for a given inferior, don't try to prepare other threads for displaced stepping for that inferior during this execution of start_step_over. Change-Id: I13063e3c21729f7c7556c3ede38069a39ee98f1c --- gdb/displaced-stepping.h | 2 ++ gdb/infrun.c | 10 ++++++++++ 2 files changed, 12 insertions(+) diff --git a/gdb/displaced-stepping.h b/gdb/displaced-stepping.h index 8de2b98bc7dc..001260c6d921 100644 --- a/gdb/displaced-stepping.h +++ b/gdb/displaced-stepping.h @@ -108,6 +108,8 @@ struct displaced_step_inferior_state bool failed_before; unsigned int active_count; + + bool unavailable = false; }; /* Per-thread displaced stepping state. */ diff --git a/gdb/infrun.c b/gdb/infrun.c index 4c6a06d053e3..e05bb95347e8 100644 --- a/gdb/infrun.c +++ b/gdb/infrun.c @@ -1687,6 +1687,7 @@ displaced_step_prepare_throw (thread_info *tp) target_pid_to_str (tp->ptid).c_str ()); global_thread_step_over_chain_enqueue (tp); + tp->inf->displaced_step_state.unavailable = true; return DISPLACED_STEP_PREPARE_STATUS_UNAVAILABLE; } @@ -1845,6 +1846,9 @@ start_step_over (void) infrun_debug_printf ("stealing global queue of threads to step, length = %d", thread_step_over_chain_length (threads_to_step)); + for (inferior *inf : all_inferiors ()) + inf->displaced_step_state.unavailable = false; + for (thread_info *tp = threads_to_step; tp != NULL; tp = next) { struct execution_control_state ecss; @@ -1894,6 +1898,12 @@ start_step_over (void) if (!target_is_non_stop_p () && !step_what) continue; + if (tp->inf->displaced_step_state.unavailable) + { + global_thread_step_over_chain_enqueue (tp); + continue; + } + switch_to_thread (tp); reset_ecs (ecs, tp); keep_going_pass_signal (ecs); -- 2.29.2 >From b941a5e270602579e6c22569db8281f3aba6953d Mon Sep 17 00:00:00 2001 From: Simon Marchi Date: Tue, 24 Nov 2020 23:21:07 -0500 Subject: [PATCH 4/4] Break loop on unavailable This gives a preview of how it could be if we were able to break out of the loop once we know no more displaced steps prepare are going to succeed. Change-Id: I7a22acfbbbd4673367f3f93fb946cbc529d8f028 --- gdb/infrun.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gdb/infrun.c b/gdb/infrun.c index e05bb95347e8..4c96cdf0685c 100644 --- a/gdb/infrun.c +++ b/gdb/infrun.c @@ -1901,7 +1901,7 @@ start_step_over (void) if (tp->inf->displaced_step_state.unavailable) { global_thread_step_over_chain_enqueue (tp); - continue; + break; } switch_to_thread (tp); -- 2.29.2