From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13287 invoked by alias); 30 Jan 2006 11:44:24 -0000 Received: (qmail 13264 invoked by uid 22791); 30 Jan 2006 11:44:22 -0000 X-Spam-Check-By: sourceware.org Received: from ns.hackrat.net (HELO ns.hackrat.org) (157.22.130.2) by sourceware.org (qpsmtpd/0.31) with ESMTP; Mon, 30 Jan 2006 11:44:21 +0000 Received: from [192.168.0.241] (kagnu.hackrat.com [192.168.0.241]) by ns.hackrat.org (Postfix) with ESMTP id B188B6900C4; Mon, 30 Jan 2006 03:44:19 -0800 (PST) Message-ID: <43DDFC13.90909@hackrat.com> Date: Mon, 30 Jan 2006 11:44:00 -0000 From: Eirik Fuller User-Agent: Debian Thunderbird 1.0.2 (X11/20051002) MIME-Version: 1.0 To: Jim Blandy Cc: gdb-patches@sourceware.org Subject: Re: [PATCH] Use mmap for symbol tables References: <20060129233630.3EFA6690067@ns.hackrat.org> <8f2776cb0601292104y1c29f4adl6e681b20cd86c177@mail.gmail.com> In-Reply-To: <8f2776cb0601292104y1c29f4adl6e681b20cd86c177@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2006-01/txt/msg00470.txt.bz2 > What kind of effect does this have on performance? Is there a speed > benefit to it, or is it just that it allows multiple GDB's to share > memory? I haven't done careful timings, but I suspect it speeds up gdb. Without the patch there are two cases. (I work at Network Appliance, so it's no surprise that I access all of the symbol tables through NFS :-). On a fresh mount, there are two stages. The first stage is scattered system CPU time and saturated network reads (that's the read syscall). The second stage is 100% user time CPU on one processor. On a mount that's been around long enough to have the entire symbol table cached, the first stage is 100% CPU time, all system time, with no extra network traffic (but for a shorter time than the first stage in a fresh mount); that's the read syscall from cache. The second stage is 100% user time CPU, same as the fresh mount case. With the patch, the first stage (the read syscall) goes away; access to the symbol table is all page faults. With a fresh mount there's a mixture of system time and user time, not quite 100% total because of I/O delays; network traffic runs about half of the capacity of my 100BaseT interface (with suitable threading or readahead I could probably peg the network interface, and have 100% user time CPU). With the symbol table in cache, it's similar to the case without the patch, without the brief syscall (100% system time) interval. So for a fresh mount, the patch essentially interleaves the network traffic and the 100% user space CPU. As I mentioned, I don't have any specific numbers for the elapsed time, but I'm suddenly realizing that having such numbers might be helpful. :-) I suspect the biggest impact on performance isn't the startup time, but rather the gdb memory footprint. With the patch, I never start pushing stuff into swap on a 2GB system. Without the patch, it only takes a few instances of gdb to fill up memory with what I think of as the green bar in xosview (the xosview memory bar labels USED+SHAR/BUFF/CACHE/FREE as green/orange/red/light blue). The BUFF part is always minimal; the mmap patch reduces growth of the green bar so that only the red bar really grows (and it grows without the patch too). Actually, with that memory leak I mentioned, it only takes one instance of gdb to fill the green bar, if I reload the symbol table a few times. > You don't want bugs in GDB to corrupt your executables. That's true, which is why my patch uses PROT_READ. In several years of using the patch (sorry about that, keeping it to myself that long) that has never caused a segfault, but if gdb ever decides to modify the mmap region, that will crash gdb rather than corrupting the executable. In my case I typically don't have write access to the symbol tables anyway, but even for executables I own, PROT_READ won't let me change them. > I understand that it would make your BFD code more complicated, but it > seems to me you want to map individual sections, not entire files. > Again, this will still share memory with the block cache, so aside > from the complexity I don't see the downside. I don't see the upside of making the code more complicated. The downside of the extra complication is that it makes the patch less likely to the point of never actually existing. :-) Could you be more specific about why multiple mmap regions per file are preferable? (It might help to keep in mind that I'm using PROT_READ and MAP_SHARED). The only downside I can see is the (relatively small) fraction of each symbol table which is not accessed via mmap, but that doesn't use memory, just virtual address space (if it does use memory, that contradicts the "not accessed" part). > The last time this was brought up, there was concern about mmap's > reliability and portability, but if I remember right, people weren't > specific about exactly where the problems were to be expected. I have nothing to offer in the way of speculation about what might be wrong with using mmap, but I can offer years of experience which says it works fine for me, on Linux systems using TCP NFS mounts over 2.4 kernels. I suspect 2.6 kernels work fine too, but that hasn't gotten nearly as much testing as 2.4 kernels. It's possible other mmap implementations aren't as good. > If this is a decent performance win, I think we should consider it, > and sort out those portability issues as they arise. One very simple way to evaluate such a patch is to automatically add it at configure time to any system which claims to provide mmap, and disable it by default, with a variable which enables it on the fly. That would make it easy to measure the effect on performance, because a single gdb binary could be used either way. I could roll up a modified patch to that effect, perhaps with the other things I mentioned (changes to the configure script, a ChangeLog entry, documentation, test cases). I would also be interested in hearing about the results if anyone else tries the patch I already sent. I'd expect the improvement to be more dramatic on large symbol tables than small ones. As an example, one of the symbol tables I used for my recent testing is over 300MB, with over 20MB of just text segment.