Re: (patch) hpjyg09: bcache optimizations

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

From: Jim Blandy <jimb@cygnus.com>
To: Jimmy Guo <guo@cup.hp.com>
Cc: gdb-patches@sourceware.cygnus.com
Subject: Re: (patch) hpjyg09: bcache optimizations
Date: Fri, 05 Nov 1999 09:59:00 -0000	[thread overview]
Message-ID: <npaeoskevw.fsf@zwingli.cygnus.com> (raw)
In-Reply-To: <Pine.LNX.4.10.9911041332020.15719-100000@hpcll168.cup.hp.com>

The first red flag here is that this patch adds a new target macro,
BCACHE_ALLOW_DUPLICATES (in config/pa/tm-hppa.h), which is actually
dependent on the application being debugged, not the architecture.
Whether the bcache helps you depends on the contents of your debug
info, not whether it's running on a PA microprocessor.  Hmm.

I'd like to look at this problem on a larger scale:

I've read Srikanth's comment in the patch, explaining why the bcache
imposes too much overhead for some code while it is still quite
effective for other code.

But this situation still sounds weird.  The bcache is a very simple
thing --- bcache.h and bcache.c total 289 lines of code.  You give it
a string of bytes, and it either adds a copy of your string to its
hash table, or finds an identical copy already present.  Either way,
it hands you back a pointer to its stashed copy.  So, a bcache is
helpful if you expect a lot of duplicates.  In Linux's libc.so, the
bcache reduces the space required for the psymtabs by half.

What I can't understand is why such a simple data structure needs to
have such a terrible overhead.  Srikanth writes:

    See that the overhead (which is really every byte that is not used to
    store GDB's data i.e., cells used for housekeeping info like pointers,
    hash chain heads etc.,) is of the order of O(m * n * 64k) where m is
    the number of load modules compiled with -g, and n is the number of
    strings that have unique length. This spells doom for applications wih
    a large number of shared libraries (m increases) and C++ (n increases
    since we also stick demangled names into the cache.)  When debugging
    HP's C compiler more than 140 MB or about 48% of memory is due to the
    bcache overhead. Not the data just the overhead !

I'm sure that's true, but it seems completely unnecessary.  One should
be able to design a bcache with much less overhead.

Looking at the output of "maint print statistics" from running GDB on
itself, I see:

Average hash table population: 8%
Average hash table population: (not applicable)
Average hash table population: (not applicable)
Average hash table population: 0%
Average hash table population: 1%
Average hash table population: 3%
Average hash table population: 1%

(The "not applicable" items are for bcaches with no entries.)

I don't think those lower-level hash tables are being used very
effectively, if the hash function is distributing the items across ~3%
of the buckets.  A good hash function distributes items across as many
different buckets as possible.  I would guess that the bcache has
never been tuned.

So, while I understand the impulse to just disable the bcache, because
it's causing you trouble in your driving uses of the debugger, I think
the better solution, for both Cygnus and HP, is to fix the data
structure so it actually works.  I would recommend:

- using a single-level hash table, with an initial size of about 32k
  buckets (based on looking at GDB debugging GDB)
- growing the hash table when the average chain length grows beyond 
  a certain limit, so the time overhead remains the same as the
  problem size grows
- choosing a hash function by running GDB on your C compiler, doing
  "maint print statistics" to see how it's distributing items across
  buckets, and then tweaking the function to minimize your maximum
  chain length (this is kind of fun, and takes less time than you'd
  think, since hash functions have almost no *required* semantics)
- adding code to print_bcache_statistics to print the number of
  bytes spend on bcache overhead

Yes, this is more work than commenting it out.  But we can't
perpetually choose the quick fix over the real cure.  Especially when
the quick fix adds complexity, like more CPP conditionals and target
parameters that aren't target parameters.

In this light, I don't think we should apply this patch.
From guo@cup.hp.com Fri Nov 05 10:24:00 1999
From: Jimmy Guo <guo@cup.hp.com>
To: Jim Blandy <jimb@cygnus.com>
Cc: gdb-patches@sourceware.cygnus.com
Subject: Re: (patch) hpjyg09: bcache optimizations
Date: Fri, 05 Nov 1999 10:24:00 -0000
Message-id: <Pine.LNX.4.10.9911051016470.11755-100000@hpcll168.cup.hp.com>
References: <npaeoskevw.fsf@zwingli.cygnus.com>
X-SW-Source: 1999-q4/msg00189.html
Content-length: 462

Given the questions and concerns over this patch, I will 'hash' it a bit
more in the direction as suggested here ... just ignore this patch.

- Jimmy Guo

>Yes, this is more work than commenting it out.  But we can't
>perpetually choose the quick fix over the real cure.  Especially when
>the quick fix adds complexity, like more CPP conditionals and target
>parameters that aren't target parameters.
>
>In this light, I don't think we should apply this patch.

next prev parent reply	other threads:[~1999-11-05  9:59 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
1999-11-04 13:48 Jimmy Guo
1999-11-05  9:59 ` Jim Blandy [this message]
1999-11-05 10:50   ` Srikanth Adayapalam
1999-11-05 13:29     ` Jim Blandy
1999-12-15  1:16       ` Jeffrey A Law
1999-12-16  0:26   ` Jeffrey A Law
     [not found] <Pine.LNX.4.10.9911041529510.15357-100000@hpcll168.cup.hp.com>
1999-11-04 17:31 ` Andrew Cagney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=npaeoskevw.fsf@zwingli.cygnus.com \
    --to=jimb@cygnus.com \
    --cc=gdb-patches@sourceware.cygnus.com \
    --cc=guo@cup.hp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox