From: Pedro Alves <pedro@palves.net>
To: Simon Marchi <simon.marchi@efficios.com>, gdb-patches@sourceware.org
Subject: Re: [PATCH 12/12] gdb: use two displaced step buffers on amd64/Linux
Date: Wed, 25 Nov 2020 01:42:20 +0000 [thread overview]
Message-ID: <7040e2ee-4d28-a83e-22df-20b2ace082bb@palves.net> (raw)
In-Reply-To: <20201110214614.2842615-13-simon.marchi@efficios.com>
On 11/10/20 9:46 PM, Simon Marchi via Gdb-patches wrote:
> As observed on a binary compiled on AMD64 Ubuntu 20.04, against glibc
> 2.31 (I think it's the libc that provides this startup code, right?),
Right. This is crt0.o. It's sysdeps/x86_64/start.S in this case:
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/start.S;h=7477b632f7ad6ffe6cd1975cdc4ec2317f11a6c3;hb=HEAD
> there are enough bytes at the executable's entry point to hold more than
> one displaced step buffer. gdbarch_max_insn_length is 16, and the
> code at _start looks like:
>
> 0000000000001040 <_start>:
> 1040: f3 0f 1e fa endbr64
> 1044: 31 ed xor %ebp,%ebp
> 1046: 49 89 d1 mov %rdx,%r9
> 1049: 5e pop %rsi
> 104a: 48 89 e2 mov %rsp,%rdx
> 104d: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
> 1051: 50 push %rax
> 1052: 54 push %rsp
> 1053: 4c 8d 05 56 01 00 00 lea 0x156(%rip),%r8 # 11b0 <__libc_csu_fini>
> 105a: 48 8d 0d df 00 00 00 lea 0xdf(%rip),%rcx # 1140 <__libc_csu_init>
> 1061: 48 8d 3d c1 00 00 00 lea 0xc1(%rip),%rdi # 1129 <main>
> 1068: ff 15 72 2f 00 00 callq *0x2f72(%rip) # 3fe0 <__libc_start_main@GLIBC_2.2.5>
> 106e: f4 hlt
> 106f: 90 nop
>
> The two buffers would occupy [0x1040, 0x1060).
>
> I checked on Alpine, which uses the musl C library, the startup code
> looks like:
>
> 0000000000001048 <_start>:
> 1048: 48 31 ed xor %rbp,%rbp
> 104b: 48 89 e7 mov %rsp,%rdi
> 104e: 48 8d 35 e3 2d 00 00 lea 0x2de3(%rip),%rsi # 3e38 <_DYNAMIC>
> 1055: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
> 1059: e8 00 00 00 00 callq 105e <_start_c>
>
> 000000000000105e <_start_c>:
> 105e: 48 8b 37 mov (%rdi),%rsi
> 1061: 48 8d 57 08 lea 0x8(%rdi),%rdx
> 1065: 45 31 c9 xor %r9d,%r9d
> 1068: 4c 8d 05 47 01 00 00 lea 0x147(%rip),%r8 # 11b6 <_fini>
> 106f: 48 8d 0d 8a ff ff ff lea -0x76(%rip),%rcx # 1000 <_init>
> 1076: 48 8d 3d 0c 01 00 00 lea 0x10c(%rip),%rdi # 1189 <main>
> 107d: e9 9e ff ff ff jmpq 1020 <__libc_start_main@plt>
>
> Even though there's a _start_c symbol, it all appears to be code that
> runs once at the very beginning of the program, so it looks fine if the
> two buffers occupy [0x1048, 0x1068).
>
> One important thing I discovered while doing this is that when debugging
> a dynamically-linked executable, breakpoints in the shared library
> loader are hit before executing the _start code, and these breakpoints
> may be displaced-stepped. So it's very important that the buffer bytes
> are restored properly after doing the displaced steps, otherwise the
> _start code will be corrupted once we try to execute it.
>
> Another thing that made me think about is that library constructors (as
> in `__attribute__((constructor))`) run before _start. And they are free
> to spawn threads. What if one of these threads executes a displaced
> step, therefore changing the bytes at _start, while the main thread
> executes _start? That doesn't sound good and I don't know how we could
> prevent it. But this is a problem that predates the current patch.
Maybe we should put the buffer/buffers at the beginning of the
dynamic loader instead.
>
> Even when stress-testing the implementation, by making many threads do
> displaced steps over and over, I didn't see a significant performance (I
> confirmed that the two buffers were used by checking the "set debug
> displaced" logs though). However, this patch mostly helps make the
> feature testable by anybody with an AMD64/Linux machine, so I think it's
> useful.
It should speed up the use case of multiple threads hitting a conditional
breakpoint that evals false at the same time. The more buffers, the less
time threads need to spend waiting for their turn in the displaced stepping
queue. If that doesn't speed things up, then probably something isn't quite
right.
/me gives it a try.
Here's a testcase that can be used to observe speed and fairness.
It has 10 threads running a single tight loop for 3 seconds. After
the 3 seconds, the threads exit and the number of iterations each
thread managed to do is printed. The idea is to set a breakpoint
with a condition that evals false on the loop. The higher the
number, the better.
I compiled it with:
$ gcc disp-step-buffers-test.c -o disp-step-buffers-test -g3 -O2 -pthread
Here's a run outside gdb:
$ /tmp/disp-step-buffers-test
thread 0, count 12785417966
thread 1, count 12784090476
thread 2, count 12773373753
thread 3, count 12782542707
thread 4, count 12767835404
thread 5, count 12754382637
thread 6, count 12783454775
thread 7, count 12605966064
thread 8, count 12635255271
thread 9, count 12783436261
Here's the test:
$ cat /tmp/disp-step-buffers-test.c
#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#define NUM_THREADS 10
static pthread_t child_thread[NUM_THREADS];
static unsigned long long counters[NUM_THREADS];
static volatile int done;
static void *
child_function (void *arg)
{
while (!done)
counters[(long) arg]++; // set breakpoint here
return NULL;
}
int
main (void)
{
long i;
for (i = 0; i < NUM_THREADS; i++)
pthread_create (&child_thread[i], NULL, child_function, (void *) i);
sleep (3);
done = 1;
for (i = 0; i < NUM_THREADS; i++)
pthread_join (child_thread[i], NULL);
for (i = 0; i < NUM_THREADS; i++)
printf ("thread %ld, count %llu\n", i, counters[i]);
return 0;
}
And here are the results under gdb.
Test #1, pristine master (119e99bb7f50):
(gdb) b 16 if 0
Breakpoint 1 at 0x1290: file disp-step-buffers-test.c, line 16.
(gdb) r
Starting program: /tmp/disp-step-buffers-test
thread 0, count 1661
thread 1, count 1663
thread 2, count 1646
thread 3, count 1663
thread 4, count 1622
thread 5, count 1661
thread 6, count 1659
thread 7, count 1662
thread 8, count 1660
thread 9, count 1660
[Inferior 1 (process 18852) exited normally]
(gdb)
Test #2, patches 1 through 11 applied, i.e., 1 buffer:
(gdb) b 16 if 0
Breakpoint 1 at 0x1290: file disp-step-buffers-test.c, line 16.
(gdb) r
Starting program: /tmp/disp-step-buffers-test
...
thread 0, count 966
thread 1, count 950
thread 2, count 951
thread 3, count 950
thread 4, count 946
thread 5, count 948
thread 6, count 946
thread 7, count 979
thread 8, count 951
thread 9, count 966
[Inferior 1 (process 16099) exited normally]
(gdb)
Test #3, patches 1 through 12 applied, i.e., 2 buffers:
(gdb) r
Starting program: /tmp/disp-step-buffers-test
...
thread 0, count 1124
thread 1, count 1128
thread 2, count 1127
thread 3, count 1123
thread 4, count 1121
thread 5, count 1125
thread 6, count 1126
thread 7, count 1126
thread 8, count 1122
thread 9, count 1122
[Inferior 1 (process 14983) exited normally]
Wow, test #2 was surprising, it's twice as slow as current
master. That was unexpected.
Test #3 with the two buffers improves the speed a bit, but still
quite behind current master. I think this should be sorted out,
at least understood.
I checked, and it's patch #9 the one that introduces the slowdown:
gdb: move displaced stepping logic to gdbarch, allow starting concurrent displaced steps
so, I'd suspect some new linear thread walks are causing this?
E.g., displaced_step_in_progress. Maybe we need to keep counters
of number of threads displaced stepping instead of walking all threads,
for example. I didn't expect this to be observable with only 10 threads.
I'd expect syscall time / ptrace, etc. to dominate. Maybe it's
something else?
The above numbers were with an -O0 gdb. I tried again against a gdb built
at -O2, and the absolute numbers were of course better, but the
slowdown is still observed. I also tried with different numbers
of threads. Here's what I see:
|threads | master | patch #11 | patch #12|
|----------------------------------------|
| 10 | 3959 | 2847 | 3055|
| 20 | 1806 | 1055 | 1134|
| 50 | 690 | 258 | 258|
next prev parent reply other threads:[~2020-11-25 1:42 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-10 21:46 [PATCH 00/12] Concurrent displaced stepping Simon Marchi via Gdb-patches
2020-11-10 21:46 ` [PATCH 01/12] gdb: add inferior_execd observable Simon Marchi via Gdb-patches
2020-11-25 1:28 ` Pedro Alves
2020-11-10 21:46 ` [PATCH 02/12] gdb: clear inferior displaced stepping state on exec Simon Marchi via Gdb-patches
2020-11-25 1:28 ` Pedro Alves
2020-12-01 4:27 ` Simon Marchi
2020-11-10 21:46 ` [PATCH 03/12] gdb: rename things related to step over chains Simon Marchi via Gdb-patches
2020-11-25 1:28 ` Pedro Alves
2020-11-25 13:16 ` Simon Marchi via Gdb-patches
2020-11-10 21:46 ` [PATCH 04/12] gdb: rename displaced_step_closure to displaced_step_copy_insn_closure Simon Marchi via Gdb-patches
2020-11-25 1:29 ` Pedro Alves
2020-11-10 21:46 ` [PATCH 05/12] gdb: rename displaced_step_fixup to displaced_step_finish Simon Marchi via Gdb-patches
2020-11-25 1:29 ` Pedro Alves
2020-11-10 21:46 ` [PATCH 06/12] gdb: introduce status enum for displaced step prepare/finish Simon Marchi via Gdb-patches
2020-11-11 23:36 ` Andrew Burgess
2020-11-25 13:17 ` Simon Marchi via Gdb-patches
2020-11-25 1:30 ` Pedro Alves
2020-11-25 13:20 ` Simon Marchi via Gdb-patches
2020-11-10 21:46 ` [PATCH 07/12] gdb: pass inferior to get_linux_inferior_data Simon Marchi via Gdb-patches
2020-11-25 1:30 ` Pedro Alves
2020-11-10 21:46 ` [PATCH 08/12] gdb: move displaced stepping types to displaced-stepping.{h, c} Simon Marchi via Gdb-patches
2020-11-25 1:30 ` Pedro Alves
2020-11-10 21:46 ` [PATCH 09/12] gdb: move displaced stepping logic to gdbarch, allow starting concurrent displaced steps Simon Marchi via Gdb-patches
2020-11-25 1:40 ` Pedro Alves
2020-11-25 19:29 ` Simon Marchi via Gdb-patches
2020-11-25 19:35 ` Simon Marchi
2020-11-26 14:25 ` Pedro Alves
2020-11-30 19:13 ` Simon Marchi via Gdb-patches
2020-11-26 14:24 ` Pedro Alves
2020-11-30 20:26 ` Simon Marchi
2020-11-10 21:46 ` [PATCH 10/12] gdb: change linux gdbarch data from post to pre-init Simon Marchi via Gdb-patches
2020-11-25 1:41 ` Pedro Alves
2020-11-10 21:46 ` [PATCH 11/12] gdb: make displaced stepping implementation capable of managing multiple buffers Simon Marchi via Gdb-patches
2020-11-25 1:41 ` Pedro Alves
2020-11-30 18:58 ` Simon Marchi
2020-11-30 19:01 ` Simon Marchi
2020-11-10 21:46 ` [PATCH 12/12] gdb: use two displaced step buffers on amd64/Linux Simon Marchi via Gdb-patches
2020-11-25 1:42 ` Pedro Alves [this message]
2020-11-25 6:26 ` Simon Marchi
2020-11-25 20:07 ` Simon Marchi
2020-11-25 20:56 ` Simon Marchi
2020-11-26 21:43 ` Simon Marchi
2020-11-26 22:34 ` Simon Marchi
2020-11-28 18:56 ` Pedro Alves
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7040e2ee-4d28-a83e-22df-20b2ace082bb@palves.net \
--to=pedro@palves.net \
--cc=gdb-patches@sourceware.org \
--cc=simon.marchi@efficios.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox