From: Simon Marchi <simark@simark.ca>
To: Pedro Alves <pedro@palves.net>,
Simon Marchi <simon.marchi@efficios.com>,
gdb-patches@sourceware.org
Subject: Re: [PATCH 12/12] gdb: use two displaced step buffers on amd64/Linux
Date: Wed, 25 Nov 2020 01:26:35 -0500 [thread overview]
Message-ID: <88518922-ffb6-f221-f3b8-569c5577ae5a@simark.ca> (raw)
In-Reply-To: <7040e2ee-4d28-a83e-22df-20b2ace082bb@palves.net>
On 2020-11-24 8:42 p.m., Pedro Alves wrote:
>> Even when stress-testing the implementation, by making many threads do
>> displaced steps over and over, I didn't see a significant performance (I
>> confirmed that the two buffers were used by checking the "set debug
>> displaced" logs though). However, this patch mostly helps make the
>> feature testable by anybody with an AMD64/Linux machine, so I think it's
>> useful.
>
> It should speed up the use case of multiple threads hitting a conditional
> breakpoint that evals false at the same time. The more buffers, the less
> time threads need to spend waiting for their turn in the displaced stepping
> queue. If that doesn't speed things up, then probably something isn't quite
> right.
>
> /me gives it a try.
>
> Here's a testcase that can be used to observe speed and fairness.
> It has 10 threads running a single tight loop for 3 seconds. After
> the 3 seconds, the threads exit and the number of iterations each
> thread managed to do is printed. The idea is to set a breakpoint
> with a condition that evals false on the loop. The higher the
> number, the better.
>
> I compiled it with:
>
> $ gcc disp-step-buffers-test.c -o disp-step-buffers-test -g3 -O2 -pthread
>
> Here's a run outside gdb:
>
> $ /tmp/disp-step-buffers-test
> thread 0, count 12785417966
> thread 1, count 12784090476
> thread 2, count 12773373753
> thread 3, count 12782542707
> thread 4, count 12767835404
> thread 5, count 12754382637
> thread 6, count 12783454775
> thread 7, count 12605966064
> thread 8, count 12635255271
> thread 9, count 12783436261
>
> Here's the test:
>
> $ cat /tmp/disp-step-buffers-test.c
> #include <pthread.h>
> #include <unistd.h>
> #include <stdlib.h>
> #include <stdio.h>
>
> #define NUM_THREADS 10
>
> static pthread_t child_thread[NUM_THREADS];
> static unsigned long long counters[NUM_THREADS];
> static volatile int done;
>
> static void *
> child_function (void *arg)
> {
> while (!done)
> counters[(long) arg]++; // set breakpoint here
> return NULL;
> }
>
> int
> main (void)
> {
> long i;
>
> for (i = 0; i < NUM_THREADS; i++)
> pthread_create (&child_thread[i], NULL, child_function, (void *) i);
>
> sleep (3);
>
> done = 1;
>
> for (i = 0; i < NUM_THREADS; i++)
> pthread_join (child_thread[i], NULL);
>
> for (i = 0; i < NUM_THREADS; i++)
> printf ("thread %ld, count %llu\n", i, counters[i]);
> return 0;
> }
>
> And here are the results under gdb.
>
> Test #1, pristine master (119e99bb7f50):
>
> (gdb) b 16 if 0
> Breakpoint 1 at 0x1290: file disp-step-buffers-test.c, line 16.
> (gdb) r
> Starting program: /tmp/disp-step-buffers-test
> thread 0, count 1661
> thread 1, count 1663
> thread 2, count 1646
> thread 3, count 1663
> thread 4, count 1622
> thread 5, count 1661
> thread 6, count 1659
> thread 7, count 1662
> thread 8, count 1660
> thread 9, count 1660
> [Inferior 1 (process 18852) exited normally]
> (gdb)
>
> Test #2, patches 1 through 11 applied, i.e., 1 buffer:
>
> (gdb) b 16 if 0
> Breakpoint 1 at 0x1290: file disp-step-buffers-test.c, line 16.
> (gdb) r
> Starting program: /tmp/disp-step-buffers-test
> ...
> thread 0, count 966
> thread 1, count 950
> thread 2, count 951
> thread 3, count 950
> thread 4, count 946
> thread 5, count 948
> thread 6, count 946
> thread 7, count 979
> thread 8, count 951
> thread 9, count 966
> [Inferior 1 (process 16099) exited normally]
> (gdb)
>
> Test #3, patches 1 through 12 applied, i.e., 2 buffers:
>
> (gdb) r
> Starting program: /tmp/disp-step-buffers-test
> ...
> thread 0, count 1124
> thread 1, count 1128
> thread 2, count 1127
> thread 3, count 1123
> thread 4, count 1121
> thread 5, count 1125
> thread 6, count 1126
> thread 7, count 1126
> thread 8, count 1122
> thread 9, count 1122
> [Inferior 1 (process 14983) exited normally]
>
> Wow, test #2 was surprising, it's twice as slow as current
> master. That was unexpected.
>
> Test #3 with the two buffers improves the speed a bit, but still
> quite behind current master. I think this should be sorted out,
> at least understood.
>
> I checked, and it's patch #9 the one that introduces the slowdown:
>
> gdb: move displaced stepping logic to gdbarch, allow starting concurrent displaced steps
>
> so, I'd suspect some new linear thread walks are causing this?
> E.g., displaced_step_in_progress. Maybe we need to keep counters
> of number of threads displaced stepping instead of walking all threads,
> for example. I didn't expect this to be observable with only 10 threads.
> I'd expect syscall time / ptrace, etc. to dominate. Maybe it's
> something else?
>
> The above numbers were with an -O0 gdb. I tried again against a gdb built
> at -O2, and the absolute numbers were of course better, but the
> slowdown is still observed. I also tried with different numbers
> of threads. Here's what I see:
>
> |threads | master | patch #11 | patch #12|
> |----------------------------------------|
> | 10 | 3959 | 2847 | 3055|
> | 20 | 1806 | 1055 | 1134|
> | 50 | 690 | 258 | 258|
>
I can reproduce those results, thanks for the benchmark progam. I
previously tried to measure how much time it takes to have N threads do
M displaced steps. measuring the whole execution run time of GDB, but
that wasn't very precise. Your method appears to give more reproducible
results.
I modified your program slightly such that it sums the counters for all
threads at the end and prints it. I think that looking at the metric
"total number of displaced steps done in that period of time (across all
threads)" makes sense.
I focused on patch 9 and ignored 10-12, since it's patch 9 that is the main
culprit. I made 4 experimental patches on top of patch 9, which are appended
at the bottom of this email (I think you can apply all of them with on big
git-am). They are:
A. Implement displaced_step_in_progress without linear walk on thread list
I followed your suggestion and added a counter of the number of
displaced steps in progress in each inferior. That allows
implementing displaced_step_in_progress_any_thread and
displaced_step_in_progress more efficiently thank walking all
threads.
B. Append remaining threads to step in O(1)
I also followed you suggestion and made it so we append remaining
threads to step by fiddling with linked list pointers, in
start_step_over.
C. Stop trying to prepare displaced steps for an inferior when no available buffers
When the target tells us there is no buffer available for a thread,
skip any subsequent thread of the same inferior for the rest of
start_step_over. This won't reduce the number of iterations in
start_step_over, but it means we'll do much less unnecessary attempts
at resuming threads when we know it won't work.
I didn't include this in my original patchset, because this won't
work if we want to allow arches to do buffer sharing. It could
return UNAVAILABLE for a thread, but then return OK for a subsequent
thread, if that thread happens to be able to share a buffer with
another thread displaced stepping the same PC.
Perhaps we could have a way to ask arches "do you do buffer
sharing?", and if not we can apply this optimization. However, since
I was planning on implementing buffer sharing as a follow-up, we'll
be back in the same boat then... But I still included this patch so
you can see the speedup it provides.
D. Break loop on unavailable
This is not correct either, but it shows that lots of time is lost in
start_step_over. It make start_step_over break out of the loop as
soon as we get one unavailable. If we had hindsight and could know
that no thread further in the list will be able to initiate a
displaced step, we could break out of the loop immediatly. And this
is the kind of speedup we'd get.
I ran the test with 100 threads to try to expose more the inefficiencies
and for a bit longer (20 seconds) to try to reduce the variations.
master: 109,015
#9: 20,304
#9 + A: 19,807
#9 + B: 20,499
#9 + C: 66,468
#9 + D: 103,406
Since patches C and D are about reducing the work in start_step_over, I
think it shows clearly that looping over all threads there is what gives
the biggest hit. Before this patch, we break out as soon as we manage
to initiate a displaced step, whereas in patch 9 we constantly go
through all threads in the list
Patch C saves us the cost of all those unnecessary
switch_to_thread/keep_going_pass_signal.
Patch D saves us the cost of what is above that (mostly
thread_still_needs_step_over I guess), plus the actual iteration cost.
Simon
From 12f90dbe8522afef9c619766c403095f6adcac77 Mon Sep 17 00:00:00 2001
From: Simon Marchi <simon.marchi@polymtl.ca>
Date: Tue, 24 Nov 2020 22:12:55 -0500
Subject: [PATCH 1/4] Implement displaced_step_in_progress without linear walk
on thread list
Keep a counter of the number of active displaced steps in each inferior.
This allows implementing displaced_step_in_progress and
displaced_step_in_progress_any_thread more efficiently.
Change-Id: If2609856f339055ad326652021e6ca911419bdc9
---
gdb/displaced-stepping.h | 3 +++
gdb/infrun.c | 17 ++++++++---------
2 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/gdb/displaced-stepping.h b/gdb/displaced-stepping.h
index 6c1da46777c2..8de2b98bc7dc 100644
--- a/gdb/displaced-stepping.h
+++ b/gdb/displaced-stepping.h
@@ -100,11 +100,14 @@ struct displaced_step_inferior_state
void reset ()
{
failed_before = false;
+ active_count = 0;
}
/* True if preparing a displaced step ever failed. If so, we won't
try displaced stepping for this inferior again. */
bool failed_before;
+
+ unsigned int active_count;
};
/* Per-thread displaced stepping state. */
diff --git a/gdb/infrun.c b/gdb/infrun.c
index 521f4a65f0f7..f659de2a474f 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -1483,13 +1483,7 @@ displaced_step_in_progress_thread (thread_info *thread)
static bool
displaced_step_in_progress (inferior *inf)
{
- for (thread_info *thread : inf->non_exited_threads ())
- {
- if (displaced_step_in_progress_thread (thread))
- return true;
- }
-
- return false;
+ return inf->displaced_step_state.active_count > 0;
}
/* Return true if any thread is doing a displaced step. */
@@ -1497,9 +1491,9 @@ displaced_step_in_progress (inferior *inf)
static bool
displaced_step_in_progress_any_thread ()
{
- for (thread_info *thread : all_non_exited_threads ())
+ for (inferior *inf : all_non_exited_inferiors ())
{
- if (displaced_step_in_progress_thread (thread))
+ if (displaced_step_in_progress (inf))
return true;
}
@@ -1709,6 +1703,8 @@ displaced_step_prepare_throw (thread_info *tp)
paddress (gdbarch, original_pc),
paddress (gdbarch, displaced_pc));
+ tp->inf->displaced_step_state.active_count++;
+
return DISPLACED_STEP_PREPARE_STATUS_OK;
}
@@ -1780,6 +1776,9 @@ displaced_step_finish (thread_info *event_thread, enum gdb_signal signal)
displaced_step_reset_cleanup cleanup (displaced);
+ gdb_assert (event_thread->inf->displaced_step_state.active_count > 0);
+ event_thread->inf->displaced_step_state.active_count--;
+
/* Do the fixup, and release the resources acquired to do the displaced
step. */
return gdbarch_displaced_step_finish (displaced->get_original_gdbarch (),
--
2.29.2
From 41d2e7ede90945888fd3b24c819d66a3d7b947ae Mon Sep 17 00:00:00 2001
From: Simon Marchi <simon.marchi@polymtl.ca>
Date: Tue, 24 Nov 2020 22:43:15 -0500
Subject: [PATCH 2/4] Append remaining threads to step in O(1)
Add a function to enqueue a whole chain to the global thread step over
chain.
Change-Id: Ifca37af2006ce132348aa66c97294c13bcd09bf4
---
gdb/gdbthread.h | 2 ++
gdb/infrun.c | 12 +-----------
gdb/thread.c | 22 ++++++++++++++++++++++
3 files changed, 25 insertions(+), 11 deletions(-)
diff --git a/gdb/gdbthread.h b/gdb/gdbthread.h
index 71dc14711be5..cbaee420546e 100644
--- a/gdb/gdbthread.h
+++ b/gdb/gdbthread.h
@@ -748,6 +748,8 @@ extern bool value_in_thread_stack_temporaries (struct value *,
/* Add TP to the end of the global pending step-over chain. */
extern void global_thread_step_over_chain_enqueue (struct thread_info *tp);
+extern void global_thread_step_over_chain_enqueue_chain
+ (thread_info *chain_head);
/* Remove TP from step-over chain LIST_P. */
diff --git a/gdb/infrun.c b/gdb/infrun.c
index f659de2a474f..4c6a06d053e3 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -1954,17 +1954,7 @@ start_step_over (void)
infrun_debug_printf ("putting back %d threads to step in global queue",
thread_step_over_chain_length (threads_to_step));
- while (threads_to_step != nullptr)
- {
- thread_info *thread = threads_to_step;
-
- /* Remove from that list. */
- thread_step_over_chain_remove (&threads_to_step, thread);
-
- /* Add to global list. */
- global_thread_step_over_chain_enqueue (thread);
-
- }
+ global_thread_step_over_chain_enqueue_chain (threads_to_step);
}
return started;
diff --git a/gdb/thread.c b/gdb/thread.c
index a3c9aed1754a..d82e67732812 100644
--- a/gdb/thread.c
+++ b/gdb/thread.c
@@ -442,6 +442,28 @@ global_thread_step_over_chain_enqueue (struct thread_info *tp)
/* See gdbthread.h. */
+void
+global_thread_step_over_chain_enqueue_chain (thread_info *chain_head)
+{
+ gdb_assert (chain_head->step_over_next != nullptr);
+ gdb_assert (chain_head->step_over_prev != nullptr);
+
+ if (global_thread_step_over_chain_head == nullptr)
+ global_thread_step_over_chain_head = chain_head;
+ else
+ {
+ thread_info *global_last = global_thread_step_over_chain_head->step_over_prev;
+ thread_info *chain_last = chain_head->step_over_prev;
+
+ chain_last->step_over_next = global_thread_step_over_chain_head;
+ global_last->step_over_next = chain_head;
+ global_thread_step_over_chain_head->step_over_prev = chain_last;
+ chain_head->step_over_prev = global_last;
+ }
+}
+
+/* See gdbthread.h. */
+
void
global_thread_step_over_chain_remove (struct thread_info *tp)
{
--
2.29.2
From 5ebafa794f3f74dcfe9a412bc58cab2a2c5d04d4 Mon Sep 17 00:00:00 2001
From: Simon Marchi <simon.marchi@efficios.com>
Date: Mon, 16 Mar 2020 17:21:55 -0400
Subject: [PATCH 3/4] Stop trying to prepare displaced steps for an inferior
when no available buffers
Once we got one _UNAVAILABLE for a given inferior, don't try to prepare
other threads for displaced stepping for that inferior during this
execution of start_step_over.
Change-Id: I13063e3c21729f7c7556c3ede38069a39ee98f1c
---
gdb/displaced-stepping.h | 2 ++
gdb/infrun.c | 10 ++++++++++
2 files changed, 12 insertions(+)
diff --git a/gdb/displaced-stepping.h b/gdb/displaced-stepping.h
index 8de2b98bc7dc..001260c6d921 100644
--- a/gdb/displaced-stepping.h
+++ b/gdb/displaced-stepping.h
@@ -108,6 +108,8 @@ struct displaced_step_inferior_state
bool failed_before;
unsigned int active_count;
+
+ bool unavailable = false;
};
/* Per-thread displaced stepping state. */
diff --git a/gdb/infrun.c b/gdb/infrun.c
index 4c6a06d053e3..e05bb95347e8 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -1687,6 +1687,7 @@ displaced_step_prepare_throw (thread_info *tp)
target_pid_to_str (tp->ptid).c_str ());
global_thread_step_over_chain_enqueue (tp);
+ tp->inf->displaced_step_state.unavailable = true;
return DISPLACED_STEP_PREPARE_STATUS_UNAVAILABLE;
}
@@ -1845,6 +1846,9 @@ start_step_over (void)
infrun_debug_printf ("stealing global queue of threads to step, length = %d",
thread_step_over_chain_length (threads_to_step));
+ for (inferior *inf : all_inferiors ())
+ inf->displaced_step_state.unavailable = false;
+
for (thread_info *tp = threads_to_step; tp != NULL; tp = next)
{
struct execution_control_state ecss;
@@ -1894,6 +1898,12 @@ start_step_over (void)
if (!target_is_non_stop_p () && !step_what)
continue;
+ if (tp->inf->displaced_step_state.unavailable)
+ {
+ global_thread_step_over_chain_enqueue (tp);
+ continue;
+ }
+
switch_to_thread (tp);
reset_ecs (ecs, tp);
keep_going_pass_signal (ecs);
--
2.29.2
From b941a5e270602579e6c22569db8281f3aba6953d Mon Sep 17 00:00:00 2001
From: Simon Marchi <simon.marchi@polymtl.ca>
Date: Tue, 24 Nov 2020 23:21:07 -0500
Subject: [PATCH 4/4] Break loop on unavailable
This gives a preview of how it could be if we were able to break out of
the loop once we know no more displaced steps prepare are going to
succeed.
Change-Id: I7a22acfbbbd4673367f3f93fb946cbc529d8f028
---
gdb/infrun.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gdb/infrun.c b/gdb/infrun.c
index e05bb95347e8..4c96cdf0685c 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -1901,7 +1901,7 @@ start_step_over (void)
if (tp->inf->displaced_step_state.unavailable)
{
global_thread_step_over_chain_enqueue (tp);
- continue;
+ break;
}
switch_to_thread (tp);
--
2.29.2
next prev parent reply other threads:[~2020-11-25 6:26 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-10 21:46 [PATCH 00/12] Concurrent displaced stepping Simon Marchi via Gdb-patches
2020-11-10 21:46 ` [PATCH 01/12] gdb: add inferior_execd observable Simon Marchi via Gdb-patches
2020-11-25 1:28 ` Pedro Alves
2020-11-10 21:46 ` [PATCH 02/12] gdb: clear inferior displaced stepping state on exec Simon Marchi via Gdb-patches
2020-11-25 1:28 ` Pedro Alves
2020-12-01 4:27 ` Simon Marchi
2020-11-10 21:46 ` [PATCH 03/12] gdb: rename things related to step over chains Simon Marchi via Gdb-patches
2020-11-25 1:28 ` Pedro Alves
2020-11-25 13:16 ` Simon Marchi via Gdb-patches
2020-11-10 21:46 ` [PATCH 04/12] gdb: rename displaced_step_closure to displaced_step_copy_insn_closure Simon Marchi via Gdb-patches
2020-11-25 1:29 ` Pedro Alves
2020-11-10 21:46 ` [PATCH 05/12] gdb: rename displaced_step_fixup to displaced_step_finish Simon Marchi via Gdb-patches
2020-11-25 1:29 ` Pedro Alves
2020-11-10 21:46 ` [PATCH 06/12] gdb: introduce status enum for displaced step prepare/finish Simon Marchi via Gdb-patches
2020-11-11 23:36 ` Andrew Burgess
2020-11-25 13:17 ` Simon Marchi via Gdb-patches
2020-11-25 1:30 ` Pedro Alves
2020-11-25 13:20 ` Simon Marchi via Gdb-patches
2020-11-10 21:46 ` [PATCH 07/12] gdb: pass inferior to get_linux_inferior_data Simon Marchi via Gdb-patches
2020-11-25 1:30 ` Pedro Alves
2020-11-10 21:46 ` [PATCH 08/12] gdb: move displaced stepping types to displaced-stepping.{h, c} Simon Marchi via Gdb-patches
2020-11-25 1:30 ` Pedro Alves
2020-11-10 21:46 ` [PATCH 09/12] gdb: move displaced stepping logic to gdbarch, allow starting concurrent displaced steps Simon Marchi via Gdb-patches
2020-11-25 1:40 ` Pedro Alves
2020-11-25 19:29 ` Simon Marchi via Gdb-patches
2020-11-25 19:35 ` Simon Marchi
2020-11-26 14:25 ` Pedro Alves
2020-11-30 19:13 ` Simon Marchi via Gdb-patches
2020-11-26 14:24 ` Pedro Alves
2020-11-30 20:26 ` Simon Marchi
2020-11-10 21:46 ` [PATCH 10/12] gdb: change linux gdbarch data from post to pre-init Simon Marchi via Gdb-patches
2020-11-25 1:41 ` Pedro Alves
2020-11-10 21:46 ` [PATCH 11/12] gdb: make displaced stepping implementation capable of managing multiple buffers Simon Marchi via Gdb-patches
2020-11-25 1:41 ` Pedro Alves
2020-11-30 18:58 ` Simon Marchi
2020-11-30 19:01 ` Simon Marchi
2020-11-10 21:46 ` [PATCH 12/12] gdb: use two displaced step buffers on amd64/Linux Simon Marchi via Gdb-patches
2020-11-25 1:42 ` Pedro Alves
2020-11-25 6:26 ` Simon Marchi [this message]
2020-11-25 20:07 ` Simon Marchi
2020-11-25 20:56 ` Simon Marchi
2020-11-26 21:43 ` Simon Marchi
2020-11-26 22:34 ` Simon Marchi
2020-11-28 18:56 ` Pedro Alves
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=88518922-ffb6-f221-f3b8-569c5577ae5a@simark.ca \
--to=simark@simark.ca \
--cc=gdb-patches@sourceware.org \
--cc=pedro@palves.net \
--cc=simon.marchi@efficios.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox