Re: [PATCH] Use mmap for symbol tables

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

From: Eirik Fuller <eirik@hackrat.com>
To: Jim Blandy <jimb@red-bean.com>
Cc: gdb-patches@sourceware.org
Subject: Re: [PATCH] Use mmap for symbol tables
Date: Mon, 30 Jan 2006 11:44:00 -0000	[thread overview]
Message-ID: <43DDFC13.90909@hackrat.com> (raw)
In-Reply-To: <8f2776cb0601292104y1c29f4adl6e681b20cd86c177@mail.gmail.com>

> What kind of effect does this have on performance?  Is there a speed
> benefit to it, or is it just that it allows multiple GDB's to share
> memory?

I haven't done careful timings, but I suspect it speeds up gdb.

Without the patch there are two cases.  (I work at Network Appliance, so
it's no surprise that I access all of the symbol tables through NFS :-).

On a fresh mount, there are two stages.  The first stage is scattered
system CPU time and saturated network reads (that's the read syscall).
The second stage is 100% user time CPU on one processor.

On a mount that's been around long enough to have the entire symbol
table cached, the first stage is 100% CPU time, all system time, with no
extra network traffic (but for a shorter time than the first stage in a
fresh mount); that's the read syscall from cache.  The second stage is
100% user time CPU, same as the fresh mount case.

With the patch, the first stage (the read syscall) goes away; access to
the symbol table is all page faults.  With a fresh mount there's a
mixture of system time and user time, not quite 100% total because of
I/O delays; network traffic runs about half of the capacity of my
100BaseT interface (with suitable threading or readahead I could
probably peg the network interface, and have 100% user time CPU).  With
the symbol table in cache, it's similar to the case without the patch,
without the brief syscall (100% system time) interval.

So for a fresh mount, the patch essentially interleaves the network
traffic and the 100% user space CPU.  As I mentioned, I don't have any
specific numbers for the elapsed time, but I'm suddenly realizing that
having such numbers might be helpful.  :-)

I suspect the biggest impact on performance isn't the startup time, but
rather the gdb memory footprint.  With the patch, I never start pushing
stuff into swap on a 2GB system.  Without the patch, it only takes a few
instances of gdb to fill up memory with what I think of as the green bar
in xosview (the xosview memory bar labels USED+SHAR/BUFF/CACHE/FREE as
green/orange/red/light blue).  The BUFF part is always minimal; the mmap
patch reduces growth of the green bar so that only the red bar really
grows (and it grows without the patch too).

Actually, with that memory leak I mentioned, it only takes one instance
of gdb to fill the green bar, if I reload the symbol table a few times.

> You don't want bugs in GDB to corrupt your executables.

That's true, which is why my patch uses PROT_READ.  In several years of
using the patch (sorry about that, keeping it to myself that long) that
has never caused a segfault, but if gdb ever decides to modify the mmap
region, that will crash gdb rather than corrupting the executable.  In
my case I typically don't have write access to the symbol tables anyway,
but even for executables I own, PROT_READ won't let me change them.

> I understand that it would make your BFD code more complicated, but it
> seems to me you want to map individual sections, not entire files.
> Again, this will still share memory with the block cache, so aside
> from the complexity I don't see the downside.

I don't see the upside of making the code more complicated.  The
downside of the extra complication is that it makes the patch less
likely to the point of never actually existing.  :-)

Could you be more specific about why multiple mmap regions per file are
preferable?  (It might help to keep in mind that I'm using PROT_READ and
MAP_SHARED).  The only downside I can see is the (relatively small)
fraction of each symbol table which is not accessed via mmap, but that
doesn't use memory, just virtual address space (if it does use memory,
that contradicts the "not accessed" part).

> The last time this was brought up, there was concern about mmap's
> reliability and portability, but if I remember right, people weren't
> specific about exactly where the problems were to be expected.

I have nothing to offer in the way of speculation about what might be
wrong with using mmap, but I can offer years of experience which says it
works fine for me, on Linux systems using TCP NFS mounts over 2.4
kernels.  I suspect 2.6 kernels work fine too, but that hasn't gotten
nearly as much testing as 2.4 kernels.  It's possible other mmap
implementations aren't as good.

> If this is a decent performance win, I think we should consider it,
> and sort out those portability issues as they arise.

One very simple way to evaluate such a patch is to automatically add it
at configure time to any system which claims to provide mmap, and
disable it by default, with a variable which enables it on the fly.
That would make it easy to measure the effect on performance, because a
single gdb binary could be used either way.  I could roll up a modified
patch to that effect, perhaps with the other things I mentioned (changes
to the configure script, a ChangeLog entry, documentation, test cases).

I would also be interested in hearing about the results if anyone else
tries the patch I already sent.  I'd expect the improvement to be more
dramatic on large symbol tables than small ones.  As an example, one of
the symbol tables I used for my recent testing is over 300MB, with over
20MB of just text segment.

next prev parent reply	other threads:[~2006-01-30 11:44 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-01-29 23:36 Eirik Fuller
2006-01-30  5:04 ` Jim Blandy
2006-01-30 11:44   ` Eirik Fuller [this message]
2006-01-30 18:07     ` Jim Blandy
2006-01-30 18:59       ` Eirik Fuller
2006-01-30 22:11         ` Jim Blandy
2006-01-31  0:38           ` Eirik Fuller
2006-01-31  1:49             ` Jim Blandy
2006-01-31  3:12               ` Eirik Fuller
2006-01-31 21:48             ` Mark Kettenis
2006-02-01 17:52               ` Eirik Fuller
2006-02-01  6:04       ` Michael Snyder
2006-01-30 11:34 ` Andrew STUBBS
2006-01-30 11:42   ` Corinna Vinschen
2006-01-30 11:48     ` Andrew STUBBS
2006-01-31  2:23 ` Daniel Jacobowitz
2006-01-31  3:31   ` Eirik Fuller
2006-01-31  3:38     ` Daniel Jacobowitz
2006-02-07 22:05     ` Eirik Fuller
2006-02-20 15:52       ` Daniel Jacobowitz
2006-01-31  5:28   ` Jim Blandy
2006-01-31 13:59     ` Daniel Jacobowitz
2006-01-31  4:40 David Anderson
2006-01-31  5:00 ` Eirik Fuller
2006-01-31  5:34   ` Jim Blandy
2006-01-31 14:00     ` Daniel Jacobowitz
2006-01-31 18:39       ` Jim Blandy
2006-02-01 18:11         ` Eirik Fuller
2006-01-31 17:45 ` David Anderson
2006-01-31 18:24 ` Jim Blandy
2006-01-31  4:53 David Anderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43DDFC13.90909@hackrat.com \
    --to=eirik@hackrat.com \
    --cc=gdb-patches@sourceware.org \
    --cc=jimb@red-bean.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox