[PATCH] Use mmap for symbol tables

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

From: Eirik Fuller <eirik@hackrat.com>
To: <gdb-patches@sourceware.org>
Subject: [PATCH] Use mmap for symbol tables
Date: Sun, 29 Jan 2006 23:36:00 -0000	[thread overview]
Message-ID: <20060129233630.3EFA6690067@ns.hackrat.org> (raw)

I'm enclosing a patch which modifies gdb to use mmap instead of malloc
and seek and read.  It's a rough cut, in a number of respects.  I have
not written a ChangeLog entry or test case or documentation.  I assume
there are other places in gdb that can use this, but I've only made
changes for a few of them (the biggest memory users for the symbol
table formats I use most).  I have not provided a way to select this
at compile time, or test for features it requires at configure time.
The patch doesn't even indent the code properly, to make it a little
bit easier to see what's different.

My immediate goal is to solicit opinions on whether the basic approach
is sound.  Comments in gdb/dwarf2read.c indicate the idea is not new.
I first started using an approach similar to what's in the enclosed
patch long before I noticed those comments; those comments finally
motivated me to dust this off.

This patch takes a very simple approach.  It mmaps the entire symbol
table file the first time any caller tries to use the new function (I
am in no way loyal to the name bfd_fetch, but I think it's preferable
to, say, bfd_mmap in that it conveys the notion that a single mmap
call can lead to multiple accesses).  It munmaps the file contents
only when its BFD is deleted.

If there's any drawback to the simple approach of mmaping the entire
file, it's that it uses more than bare minimum of virtual address
space.  I haven't seen a symbol table big enough for that to matter.
For the symbol tables I use, most of that address space will be used
anyway (the symbol table data occupies far more than half of the file
size in the symbol tables I use), but it fits in the file system cache
rather than in swap space, and concurrent gdb processes on the same
symbol table can save considerable amounts of system memory.

If there's any one drawback to the general idea of using mmap, it's
that in some cases (like symbol tables over NFS) the file data can
become unavailable, if an NFS server goes down or the file disappears
(unlinked open files don't work reliably over NFS).  I suspect those
same drawbacks can kick in even without mmap, if gdb needs to flesh
out partial symbols long after it first reads the symbol table.  It
wouldn't be difficult to add code to make this behavior optional even
when it's compiled in, if that seems useful.

This patch has one interesting benefit; it papers over a memory leak
in elfxx-mips.c; the READ macro in _bfd_mips_elf_read_ecoff_info uses
bfd_malloc instead of, say, bfd_alloc or the obstack machinery.
Choosing another symbol table with the file command, or even just
unloading the current one, doesn't seem to release that memory.  I can
investigate that leak further if that seems useful, but for my own
purposes it's easier to use mmap for the big allocations.  It's
possible there are similar leaks for other symbol table formats, but I
haven't fully explored that question.

I'm open to suggestions about how to improve this patch, or how to
make its adoption more likely.  I don't think I've done any copright
assignment paperwork for gdb.  I discussed it with my manager years
ago, but I don't specifically remember anything coming of that.  I
plan to fix that some day; perhaps this patch will help provide
impetus for that.

Here's the patch:

--- bfd/bfd-in2.h.orig	2006-01-18 13:07:48.000000000 -0800
+++ bfd/bfd-in2.h	2006-01-29 14:30:38.000000000 -0800
@@ -476,6 +476,7 @@
 extern file_ptr bfd_tell (bfd *);
 extern int bfd_flush (bfd *);
 extern int bfd_stat (bfd *, struct stat *);
+extern void *bfd_fetch (file_ptr fp, bfd_size_type size, bfd *abfd);

 /* Deprecated old routines.  */
 #if __GNUC__
@@ -4270,6 +4271,9 @@
   BFD_SEND (obfd, _bfd_copy_private_symbol_data, \
             (ibfd, isymbol, obfd, osymbol))

+#include <unistd.h>
+#include <sys/mman.h>
+
 /* Extracted from bfd.c.  */
 struct bfd
 {
@@ -4435,6 +4439,10 @@
      struct objalloc *, but we use void * to avoid requiring the inclusion
      of objalloc.h.  */
   void *memory;
+  struct {
+    void *contents;
+    size_t length;
+  } mmap;
 };

 typedef enum bfd_error
--- bfd/opncls.c.orig	2005-11-03 08:06:11.000000000 -0800
+++ bfd/opncls.c	2006-01-29 14:39:22.000000000 -0800
@@ -117,6 +117,10 @@
 {
   bfd_hash_table_free (&abfd->section_htab);
   objalloc_free ((struct objalloc *) abfd->memory);
+
+  if (abfd->mmap.contents)
+    munmap(abfd->mmap.contents, abfd->mmap.length);
+
   free (abfd);
 }

@@ -301,6 +305,30 @@
   return bfd_fopen (filename, target, mode, fd);
 }

+void *
+bfd_fetch (file_ptr fp, bfd_size_type size, bfd* abfd)
+{
+  if (!abfd->mmap.contents)
+    {
+      struct stat buf;
+      int fd = open(abfd->filename, O_RDONLY);
+      void *map;
+      size_t length;
+      fstat(fd, &buf);
+      length = buf.st_size;
+      map = mmap(0, length, PROT_READ, MAP_SHARED, fd, 0);
+      if (map && map != (void *) -1) {
+	abfd->mmap.length = length;
+	abfd->mmap.contents = map;
+      }
+      close(fd);
+    }
+  if (abfd->mmap.contents && abfd->mmap.length >= fp + size)
+    return abfd->mmap.contents + fp;
+  else
+    return NULL;
+}
+
 /*
 FUNCTION
 	bfd_openstreamr
--- bfd/ecoff.c.orig	2005-12-26 19:06:27.000000000 -0800
+++ bfd/ecoff.c	2006-01-29 12:44:16.000000000 -0800
@@ -556,18 +556,23 @@
       ecoff_data (abfd)->sym_filepos = 0;
       return TRUE;
     }
+  pos = ecoff_data (abfd)->sym_filepos;
+  pos += backend->debug_swap.external_hdr_size;
+
+  raw = bfd_fetch(pos, raw_size, abfd);
+
+  if (!raw) {
   raw = bfd_alloc (abfd, raw_size);
   if (raw == NULL)
     return FALSE;

-  pos = ecoff_data (abfd)->sym_filepos;
-  pos += backend->debug_swap.external_hdr_size;
   if (bfd_seek (abfd, pos, SEEK_SET) != 0
       || bfd_bread (raw, raw_size, abfd) != raw_size)
     {
       bfd_release (abfd, raw);
       return FALSE;
     }
+  }

   ecoff_data (abfd)->raw_syments = raw;

--- bfd/elfxx-mips.c.orig	2005-12-30 21:02:22.000000000 -0800
+++ bfd/elfxx-mips.c	2006-01-29 12:46:13.000000000 -0800
@@ -768,12 +768,15 @@
   else									\
     {									\
       bfd_size_type amt = (bfd_size_type) size * symhdr->count;		\
+      debug->ptr = bfd_fetch (symhdr->offset, amt, abfd);		\
+      if (debug->ptr == NULL) {						\
       debug->ptr = bfd_malloc (amt);					\
       if (debug->ptr == NULL)						\
 	goto error_return;						\
       if (bfd_seek (abfd, symhdr->offset, SEEK_SET) != 0		\
 	  || bfd_bread (debug->ptr, amt, abfd) != amt)			\
 	goto error_return;						\
+      }									\
     }

   READ (line, cbLineOffset, cbLine, sizeof (unsigned char), unsigned char *);
--- gdb/dbxread.c.orig	2005-12-17 14:33:59.000000000 -0800
+++ gdb/dbxread.c	2006-01-29 12:47:10.000000000 -0800
@@ -708,6 +708,11 @@
 	    error (_("ridiculous string table size (%d bytes)."),
 		   DBX_STRINGTAB_SIZE (objfile));

+	  DBX_STRINGTAB (objfile) = bfd_fetch(STRING_TABLE_OFFSET,
+					      DBX_STRINGTAB_SIZE (objfile),
+					      sym_bfd);
+
+	  if (!DBX_STRINGTAB (objfile)) {
 	  DBX_STRINGTAB (objfile) =
 	    (char *) obstack_alloc (&objfile->objfile_obstack,
 				    DBX_STRINGTAB_SIZE (objfile));
@@ -723,6 +728,7 @@
 			   sym_bfd);
 	  if (val != DBX_STRINGTAB_SIZE (objfile))
 	    perror_with_name (name);
+	  }
 	}
     }
 }
--- gdb/dwarf2read.c.orig	2006-01-17 14:30:29.000000000 -0800
+++ gdb/dwarf2read.c	2006-01-29 13:14:11.000000000 -0800
@@ -4953,6 +4953,9 @@
   if (size == 0)
     return NULL;

+  buf = bfd_fetch(sectp->filepos, size, abfd);
+
+  if (!buf) {
   buf = obstack_alloc (&objfile->objfile_obstack, size);
   retbuf = symfile_relocate_debug_section (abfd, sectp, buf);
   if (retbuf != NULL)
@@ -4962,6 +4965,7 @@
       || bfd_bread (buf, size, abfd) != size)
     error (_("Dwarf Error: Can't read DWARF data from '%s'"),
 	   bfd_get_filename (abfd));
+  }

   return buf;
 }

next             reply	other threads:[~2006-01-29 23:36 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-01-29 23:36 Eirik Fuller [this message]
2006-01-30  5:04 ` Jim Blandy
2006-01-30 11:44   ` Eirik Fuller
2006-01-30 18:07     ` Jim Blandy
2006-01-30 18:59       ` Eirik Fuller
2006-01-30 22:11         ` Jim Blandy
2006-01-31  0:38           ` Eirik Fuller
2006-01-31  1:49             ` Jim Blandy
2006-01-31  3:12               ` Eirik Fuller
2006-01-31 21:48             ` Mark Kettenis
2006-02-01 17:52               ` Eirik Fuller
2006-02-01  6:04       ` Michael Snyder
2006-01-30 11:34 ` Andrew STUBBS
2006-01-30 11:42   ` Corinna Vinschen
2006-01-30 11:48     ` Andrew STUBBS
2006-01-31  2:23 ` Daniel Jacobowitz
2006-01-31  3:31   ` Eirik Fuller
2006-01-31  3:38     ` Daniel Jacobowitz
2006-02-07 22:05     ` Eirik Fuller
2006-02-20 15:52       ` Daniel Jacobowitz
2006-01-31  5:28   ` Jim Blandy
2006-01-31 13:59     ` Daniel Jacobowitz
2006-01-31  4:40 David Anderson
2006-01-31  5:00 ` Eirik Fuller
2006-01-31  5:34   ` Jim Blandy
2006-01-31 14:00     ` Daniel Jacobowitz
2006-01-31 18:39       ` Jim Blandy
2006-02-01 18:11         ` Eirik Fuller
2006-01-31 17:45 ` David Anderson
2006-01-31 18:24 ` Jim Blandy
2006-01-31  4:53 David Anderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060129233630.3EFA6690067@ns.hackrat.org \
    --to=eirik@hackrat.com \
    --cc=gdb-patches@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox