From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-42537-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 13287 invoked by alias); 30 Jan 2006 11:44:24 -0000
Received: (qmail 13264 invoked by uid 22791); 30 Jan 2006 11:44:22 -0000
X-Spam-Check-By: sourceware.org
Received: from ns.hackrat.net (HELO ns.hackrat.org) (157.22.130.2)     by sourceware.org (qpsmtpd/0.31) with ESMTP; Mon, 30 Jan 2006 11:44:21 +0000
Received: from [192.168.0.241] (kagnu.hackrat.com [192.168.0.241]) 	by ns.hackrat.org (Postfix) with ESMTP id B188B6900C4; 	Mon, 30 Jan 2006 03:44:19 -0800 (PST)
Message-ID: <43DDFC13.90909@hackrat.com>
Date: Mon, 30 Jan 2006 11:44:00 -0000
From: Eirik Fuller <eirik@hackrat.com>
User-Agent: Debian Thunderbird 1.0.2 (X11/20051002)
MIME-Version: 1.0
To: Jim Blandy <jimb@red-bean.com>
Cc: gdb-patches@sourceware.org
Subject: Re: [PATCH] Use mmap for symbol tables
References: <20060129233630.3EFA6690067@ns.hackrat.org> <8f2776cb0601292104y1c29f4adl6e681b20cd86c177@mail.gmail.com>
In-Reply-To: <8f2776cb0601292104y1c29f4adl6e681b20cd86c177@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
X-SW-Source: 2006-01/txt/msg00470.txt.bz2

> What kind of effect does this have on performance?  Is there a speed
> benefit to it, or is it just that it allows multiple GDB's to share
> memory?

I haven't done careful timings, but I suspect it speeds up gdb.

Without the patch there are two cases.  (I work at Network Appliance, so
it's no surprise that I access all of the symbol tables through NFS :-).

On a fresh mount, there are two stages.  The first stage is scattered
system CPU time and saturated network reads (that's the read syscall).
The second stage is 100% user time CPU on one processor.

On a mount that's been around long enough to have the entire symbol
table cached, the first stage is 100% CPU time, all system time, with no
extra network traffic (but for a shorter time than the first stage in a
fresh mount); that's the read syscall from cache.  The second stage is
100% user time CPU, same as the fresh mount case.

With the patch, the first stage (the read syscall) goes away; access to
the symbol table is all page faults.  With a fresh mount there's a
mixture of system time and user time, not quite 100% total because of
I/O delays; network traffic runs about half of the capacity of my
100BaseT interface (with suitable threading or readahead I could
probably peg the network interface, and have 100% user time CPU).  With
the symbol table in cache, it's similar to the case without the patch,
without the brief syscall (100% system time) interval.

So for a fresh mount, the patch essentially interleaves the network
traffic and the 100% user space CPU.  As I mentioned, I don't have any
specific numbers for the elapsed time, but I'm suddenly realizing that
having such numbers might be helpful.  :-)

I suspect the biggest impact on performance isn't the startup time, but
rather the gdb memory footprint.  With the patch, I never start pushing
stuff into swap on a 2GB system.  Without the patch, it only takes a few
instances of gdb to fill up memory with what I think of as the green bar
in xosview (the xosview memory bar labels USED+SHAR/BUFF/CACHE/FREE as
green/orange/red/light blue).  The BUFF part is always minimal; the mmap
patch reduces growth of the green bar so that only the red bar really
grows (and it grows without the patch too).

Actually, with that memory leak I mentioned, it only takes one instance
of gdb to fill the green bar, if I reload the symbol table a few times.

> You don't want bugs in GDB to corrupt your executables.

That's true, which is why my patch uses PROT_READ.  In several years of
using the patch (sorry about that, keeping it to myself that long) that
has never caused a segfault, but if gdb ever decides to modify the mmap
region, that will crash gdb rather than corrupting the executable.  In
my case I typically don't have write access to the symbol tables anyway,
but even for executables I own, PROT_READ won't let me change them.

> I understand that it would make your BFD code more complicated, but it
> seems to me you want to map individual sections, not entire files.
> Again, this will still share memory with the block cache, so aside
> from the complexity I don't see the downside.

I don't see the upside of making the code more complicated.  The
downside of the extra complication is that it makes the patch less
likely to the point of never actually existing.  :-)

Could you be more specific about why multiple mmap regions per file are
preferable?  (It might help to keep in mind that I'm using PROT_READ and
MAP_SHARED).  The only downside I can see is the (relatively small)
fraction of each symbol table which is not accessed via mmap, but that
doesn't use memory, just virtual address space (if it does use memory,
that contradicts the "not accessed" part).

> The last time this was brought up, there was concern about mmap's
> reliability and portability, but if I remember right, people weren't
> specific about exactly where the problems were to be expected.

I have nothing to offer in the way of speculation about what might be
wrong with using mmap, but I can offer years of experience which says it
works fine for me, on Linux systems using TCP NFS mounts over 2.4
kernels.  I suspect 2.6 kernels work fine too, but that hasn't gotten
nearly as much testing as 2.4 kernels.  It's possible other mmap
implementations aren't as good.

> If this is a decent performance win, I think we should consider it,
> and sort out those portability issues as they arise.

One very simple way to evaluate such a patch is to automatically add it
at configure time to any system which claims to provide mmap, and
disable it by default, with a variable which enables it on the fly.
That would make it easy to measure the effect on performance, because a
single gdb binary could be used either way.  I could roll up a modified
patch to that effect, perhaps with the other things I mentioned (changes
to the configure script, a ChangeLog entry, documentation, test cases).

I would also be interested in hearing about the results if anyone else
tries the patch I already sent.  I'd expect the improvement to be more
dramatic on large symbol tables than small ones.  As an example, one of
the symbol tables I used for my recent testing is over 300MB, with over
20MB of just text segment.