From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 19063 invoked by alias); 29 Jan 2006 23:36:36 -0000 Received: (qmail 19055 invoked by uid 22791); 29 Jan 2006 23:36:35 -0000 X-Spam-Check-By: sourceware.org Received: from ns.hackrat.net (HELO ns.hackrat.org) (157.22.130.2) by sourceware.org (qpsmtpd/0.31) with ESMTP; Sun, 29 Jan 2006 23:36:31 +0000 Received: by ns.hackrat.org (Postfix, from userid 3807) id 3EFA6690067; Sun, 29 Jan 2006 15:36:30 -0800 (PST) From: Eirik Fuller To: Subject: [PATCH] Use mmap for symbol tables Message-Id: <20060129233630.3EFA6690067@ns.hackrat.org> Date: Sun, 29 Jan 2006 23:36:00 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2006-01/txt/msg00466.txt.bz2 I'm enclosing a patch which modifies gdb to use mmap instead of malloc and seek and read. It's a rough cut, in a number of respects. I have not written a ChangeLog entry or test case or documentation. I assume there are other places in gdb that can use this, but I've only made changes for a few of them (the biggest memory users for the symbol table formats I use most). I have not provided a way to select this at compile time, or test for features it requires at configure time. The patch doesn't even indent the code properly, to make it a little bit easier to see what's different. My immediate goal is to solicit opinions on whether the basic approach is sound. Comments in gdb/dwarf2read.c indicate the idea is not new. I first started using an approach similar to what's in the enclosed patch long before I noticed those comments; those comments finally motivated me to dust this off. This patch takes a very simple approach. It mmaps the entire symbol table file the first time any caller tries to use the new function (I am in no way loyal to the name bfd_fetch, but I think it's preferable to, say, bfd_mmap in that it conveys the notion that a single mmap call can lead to multiple accesses). It munmaps the file contents only when its BFD is deleted. If there's any drawback to the simple approach of mmaping the entire file, it's that it uses more than bare minimum of virtual address space. I haven't seen a symbol table big enough for that to matter. For the symbol tables I use, most of that address space will be used anyway (the symbol table data occupies far more than half of the file size in the symbol tables I use), but it fits in the file system cache rather than in swap space, and concurrent gdb processes on the same symbol table can save considerable amounts of system memory. If there's any one drawback to the general idea of using mmap, it's that in some cases (like symbol tables over NFS) the file data can become unavailable, if an NFS server goes down or the file disappears (unlinked open files don't work reliably over NFS). I suspect those same drawbacks can kick in even without mmap, if gdb needs to flesh out partial symbols long after it first reads the symbol table. It wouldn't be difficult to add code to make this behavior optional even when it's compiled in, if that seems useful. This patch has one interesting benefit; it papers over a memory leak in elfxx-mips.c; the READ macro in _bfd_mips_elf_read_ecoff_info uses bfd_malloc instead of, say, bfd_alloc or the obstack machinery. Choosing another symbol table with the file command, or even just unloading the current one, doesn't seem to release that memory. I can investigate that leak further if that seems useful, but for my own purposes it's easier to use mmap for the big allocations. It's possible there are similar leaks for other symbol table formats, but I haven't fully explored that question. I'm open to suggestions about how to improve this patch, or how to make its adoption more likely. I don't think I've done any copright assignment paperwork for gdb. I discussed it with my manager years ago, but I don't specifically remember anything coming of that. I plan to fix that some day; perhaps this patch will help provide impetus for that. Here's the patch: --- bfd/bfd-in2.h.orig 2006-01-18 13:07:48.000000000 -0800 +++ bfd/bfd-in2.h 2006-01-29 14:30:38.000000000 -0800 @@ -476,6 +476,7 @@ extern file_ptr bfd_tell (bfd *); extern int bfd_flush (bfd *); extern int bfd_stat (bfd *, struct stat *); +extern void *bfd_fetch (file_ptr fp, bfd_size_type size, bfd *abfd); /* Deprecated old routines. */ #if __GNUC__ @@ -4270,6 +4271,9 @@ BFD_SEND (obfd, _bfd_copy_private_symbol_data, \ (ibfd, isymbol, obfd, osymbol)) +#include +#include + /* Extracted from bfd.c. */ struct bfd { @@ -4435,6 +4439,10 @@ struct objalloc *, but we use void * to avoid requiring the inclusion of objalloc.h. */ void *memory; + struct { + void *contents; + size_t length; + } mmap; }; typedef enum bfd_error --- bfd/opncls.c.orig 2005-11-03 08:06:11.000000000 -0800 +++ bfd/opncls.c 2006-01-29 14:39:22.000000000 -0800 @@ -117,6 +117,10 @@ { bfd_hash_table_free (&abfd->section_htab); objalloc_free ((struct objalloc *) abfd->memory); + + if (abfd->mmap.contents) + munmap(abfd->mmap.contents, abfd->mmap.length); + free (abfd); } @@ -301,6 +305,30 @@ return bfd_fopen (filename, target, mode, fd); } +void * +bfd_fetch (file_ptr fp, bfd_size_type size, bfd* abfd) +{ + if (!abfd->mmap.contents) + { + struct stat buf; + int fd = open(abfd->filename, O_RDONLY); + void *map; + size_t length; + fstat(fd, &buf); + length = buf.st_size; + map = mmap(0, length, PROT_READ, MAP_SHARED, fd, 0); + if (map && map != (void *) -1) { + abfd->mmap.length = length; + abfd->mmap.contents = map; + } + close(fd); + } + if (abfd->mmap.contents && abfd->mmap.length >= fp + size) + return abfd->mmap.contents + fp; + else + return NULL; +} + /* FUNCTION bfd_openstreamr --- bfd/ecoff.c.orig 2005-12-26 19:06:27.000000000 -0800 +++ bfd/ecoff.c 2006-01-29 12:44:16.000000000 -0800 @@ -556,18 +556,23 @@ ecoff_data (abfd)->sym_filepos = 0; return TRUE; } + pos = ecoff_data (abfd)->sym_filepos; + pos += backend->debug_swap.external_hdr_size; + + raw = bfd_fetch(pos, raw_size, abfd); + + if (!raw) { raw = bfd_alloc (abfd, raw_size); if (raw == NULL) return FALSE; - pos = ecoff_data (abfd)->sym_filepos; - pos += backend->debug_swap.external_hdr_size; if (bfd_seek (abfd, pos, SEEK_SET) != 0 || bfd_bread (raw, raw_size, abfd) != raw_size) { bfd_release (abfd, raw); return FALSE; } + } ecoff_data (abfd)->raw_syments = raw; --- bfd/elfxx-mips.c.orig 2005-12-30 21:02:22.000000000 -0800 +++ bfd/elfxx-mips.c 2006-01-29 12:46:13.000000000 -0800 @@ -768,12 +768,15 @@ else \ { \ bfd_size_type amt = (bfd_size_type) size * symhdr->count; \ + debug->ptr = bfd_fetch (symhdr->offset, amt, abfd); \ + if (debug->ptr == NULL) { \ debug->ptr = bfd_malloc (amt); \ if (debug->ptr == NULL) \ goto error_return; \ if (bfd_seek (abfd, symhdr->offset, SEEK_SET) != 0 \ || bfd_bread (debug->ptr, amt, abfd) != amt) \ goto error_return; \ + } \ } READ (line, cbLineOffset, cbLine, sizeof (unsigned char), unsigned char *); --- gdb/dbxread.c.orig 2005-12-17 14:33:59.000000000 -0800 +++ gdb/dbxread.c 2006-01-29 12:47:10.000000000 -0800 @@ -708,6 +708,11 @@ error (_("ridiculous string table size (%d bytes)."), DBX_STRINGTAB_SIZE (objfile)); + DBX_STRINGTAB (objfile) = bfd_fetch(STRING_TABLE_OFFSET, + DBX_STRINGTAB_SIZE (objfile), + sym_bfd); + + if (!DBX_STRINGTAB (objfile)) { DBX_STRINGTAB (objfile) = (char *) obstack_alloc (&objfile->objfile_obstack, DBX_STRINGTAB_SIZE (objfile)); @@ -723,6 +728,7 @@ sym_bfd); if (val != DBX_STRINGTAB_SIZE (objfile)) perror_with_name (name); + } } } } --- gdb/dwarf2read.c.orig 2006-01-17 14:30:29.000000000 -0800 +++ gdb/dwarf2read.c 2006-01-29 13:14:11.000000000 -0800 @@ -4953,6 +4953,9 @@ if (size == 0) return NULL; + buf = bfd_fetch(sectp->filepos, size, abfd); + + if (!buf) { buf = obstack_alloc (&objfile->objfile_obstack, size); retbuf = symfile_relocate_debug_section (abfd, sectp, buf); if (retbuf != NULL) @@ -4962,6 +4965,7 @@ || bfd_bread (buf, size, abfd) != size) error (_("Dwarf Error: Can't read DWARF data from '%s'"), bfd_get_filename (abfd)); + } return buf; }