Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed
* Fear and loathing in lookup_block_symbol
@ 2001-09-06 11:41 Jason Molenda
  2001-09-06 11:44 ` Jason Molenda
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Jason Molenda @ 2001-09-06 11:41 UTC (permalink / raw)
  To: gdb-patches

This is a little long and involved, but there is a big problem in one of 
the lowest level symbol lookup functions, so (someone) please bear with 
it all.

Daniel put together some great C++ symbol lookups last summer
	http://sources.redhat.com/ml/gdb-patches/2000-08/msg00199.html

Real problems - I'd just started digging into #3 on his set of patches 
myself.
Elena looks at the patch a little while longer, raises some issues (like 
why the namespace check is no longer done in lookup_block_symbol - good 
catch Elena):
	http://sources.redhat.com/ml/gdb-patches/2000-09/msg00237.html

There is some back and forth discussion.  Elena commits some of the 
changes (all of them?  I haven't read all the archives that thoroughly, 
and it's not relevant)
	http://sources.redhat.com/ml/gdb-cvs/2000-10/msg00014.html

Incidentally, Daniel (tisk tisk :) had assured her that the namespace 
check wasn't necessary before the commit happened.  There is some more 
back-and-forth on these patches -- Daniel wasn't especially thorough in 
updating all callers of lookup_block_symbol, but Peter Schauer jumps in 
and fixes several of these.  Daniel says he'll just yank the whole lot 
of them and submits a patch to revert it all, but I don't think that 
ever happened (again, my goal isn't accurate historical 
representation -- there's a bug to be fixed here. :-)

Side bar - we see gdb/15 pop up and Michael Chastain chase it down
	http://sources.redhat.com/ml/gdb-patches/2001-01/msg00353.html
It's because the namespace check is no longer being done.  Michael is 
mystified
as to how this happened.

OK OK, I'll get to the real bug.  The point is that these patches have a 
long and seedy history. :-)  And at the same time, I want to throw out 
props to Daniel -- despite everything, he is getting at real, BIG, gdb 
inefficiencies, and he's trying to fix them.  A little more follow-up 
and completeness would have been good, that's all.

In the patch that Elena committed (URL above), which makes C++ symbol 
lookup use the binary search method has a mistake.  lookup_block_symbol 
has two search methods.  One is a binary search; it gets "close" to the 
symbol (alphabetically 'below' the correct entry) and then searches over 
the next few symbols.  The other method is a carefully written linear 
search.  In the binary search method, this linear-stepping second half 
is necessary because we might have multiple symbol names in different 
namespaces (e.g. function vs struct namespaces).

Daniel's patch inadvertently changes this second half of the binary 
search method -- instead of stepping over a few symbols, gdb now binary 
searches to the middle of the block symbols, then linearly steps over 
all symbols until it gets to the end of the block.  This is very 
expensive.  Specifically, I'm talking about this patch here:

@@ -1247,19 +1282,8 @@ lookup_block_symbol (register const stru
        while (bot < top)
         {
           sym = BLOCK_SYM (block, bot);
-         inc = SYMBOL_NAME (sym)[0] - name[0];
-         if (inc == 0)
-           {
-             inc = STRCMP (SYMBOL_NAME (sym), name);
-           }
-         if (inc == 0 && SYMBOL_NAMESPACE (sym) == namespace)
-           {
-             return (sym);
-           }
-         if (inc > 0)
-           {
-             break;
-           }
+         if (SYMBOL_MATCHES_NAME (sym, name))
+           return sym;
           bot++;
         }
      }

The old code is straightforward - the binary search has put us at the 
beginning of matching symbols (the 'bot' index). If the symbol names 
match, and the namespaces match, we return the sym.  If we've stepped 
beyond the block symbols that could match our symbol name, we break out 
of the loop.  Daniel's change loses this critical "break out of the 
loop" part, and iterates over all the block symbols from the 
binary-search start point until the end.

We can't copy the old code's style exactly.  SYMBOL_MATCHES_NAME calls 
strcmp_iw() on the demangled names (to ignore whitespace), but strcmp_iw 
is poorly named; it doesn't give you a less-than greater-than return 
value like strcmp().  It'd be better termed streq_iw() or something 
unlike strcmp().  But we can make vast improvements without digging into 
this problem by changing the code to something like:

       while (bot < top)
         {
           sym = BLOCK_SYM (block, bot);
           if (SYMBOL_NAMESPACE (sym) == namespace &&
               SYMBOL_MATCHES_NAME (sym, name))
             {
               return sym;
             }
           if (SYMBOL_SOURCE_NAME (sym)[0] > name[0])
             {
               break;
             }
           bot++;
         }

The search is not optimal (we'll probably end up stepping over more 
symbols than is necessary), but it's a huge improvement over the current 
one.

A specific example:  One of our demo programs that we ship with the 
development tools is a little notepad GUI app.  It has these opaque 
structure pointers that I mentioned last week, which gdb will never find 
a definition of.  These opaque structure pointers are often in programs 
as local variables, so stepping with a Locals window open highlights any 
delay here.  With the old iterate-over-half-the-symbols approach, "maint 
time 1" reported 1.6 - 1.7 seconds to look grope through all the symbols 
looking for this structure.  With this break-out clause added, "maint 
time 1" reports 0.0 seconds (it's below the measurement threshold).    
(and before that second call to check_typedef() was removed, doing "info 
locals" would take around 3.5 seconds.)

If someone will agree that this approach is reasonable, I'll check in a 
patch.  I've already run the above patch through the testsuite, it adds 
no new regressions.

Jason

PS-  This patch was brought to you by Tom Tromey's gdb-profile patch.  
Yes, gdb-profile, it does the things a profiling patch ought to do.  I 
fired up his patch on gdb, found this naughty little 
lookup_block_symbol() function in about one minute, and started digging 
in.  Very helpful.  Don't you wish YOU could use Tom Tromey's 
gdb-profile patch?


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fear and loathing in lookup_block_symbol
  2001-09-06 11:41 Fear and loathing in lookup_block_symbol Jason Molenda
@ 2001-09-06 11:44 ` Jason Molenda
  2001-09-07  1:00 ` Jason Molenda
  2001-09-14  7:40 ` Andrew Cagney
  2 siblings, 0 replies; 7+ messages in thread
From: Jason Molenda @ 2001-09-06 11:44 UTC (permalink / raw)
  To: gdb-patches

On Thursday, September 6, 2001, at 11:41 AM, Jason Molenda wrote:
> Side bar - we see gdb/15 pop up and Michael Chastain chase it down
> 	http://sources.redhat.com/ml/gdb-patches/2001-01/msg00353.html
> It's because the namespace check is no longer being done.  Michael is 
> mystified
> as to how this happened.
>

I passed the wrong URL for my side bar.  Doh.
	http://sources.redhat.com/ml/gdb-patches/2001-01/msg00337.html


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fear and loathing in lookup_block_symbol
  2001-09-06 11:41 Fear and loathing in lookup_block_symbol Jason Molenda
  2001-09-06 11:44 ` Jason Molenda
@ 2001-09-07  1:00 ` Jason Molenda
  2001-09-07  1:02   ` Jason Molenda
  2001-09-14  7:40 ` Andrew Cagney
  2 siblings, 1 reply; 7+ messages in thread
From: Jason Molenda @ 2001-09-07  1:00 UTC (permalink / raw)
  To: gdb-patches

On Thu, Sep 06, 2001 at 11:41:01AM -0700, Jason Molenda wrote:
> This is a little long and involved, but there is a big problem in one of 
> the lowest level symbol lookup functions, so (someone) please bear with 
> it all.


I'll offer a little incentive by explaining how much of a performance
loss this is.  As I mentioned in the note, gdb usually does its
per-block symbol lookup with a binary search.  This bug degrades
that log-time algorithm so that it clocks in around O(1/2 * nsyms)
time.

This adds up quick.  I did some measurements of the impact it has
if you're debugging gdb itself.  If the binary search algorithm is
working correctly, doing "print kjdkjdkjd" in (top-gdb) will
ultimately do 865 strcmp()'s.  With the bug, doing "print kjdkjd"
will do 33,000 string comparisons.  And it's linear, so if you
choose an alphabetically lower symbol, say "aaaa", the number of
string comparisons goes up - to around 82,000.

As an aside, in Daniel's August 2000 mondo-patch, one of the things
he addressed is code like this:

  ALL_SYMTABS (objfile, s)
  {
    bv = BLOCKVECTOR (s);
    block = BLOCKVECTOR_BLOCK (bv, GLOBAL_BLOCK);
    sym = lookup_block_symbol (block, name, namespace);
[...]

The problem here is that there are many symtabs in a object file,
but few global blocks.  In gdb's case, there are around 330 symtabs,
and only 23 global blocks.  The result?  We call lookup_block_symbol
many, many times more than necessary, and often (307 times in this
case) it's just re-searching the same old blocks.  

Daniel's approach was to add a sixteen entry hash table in
lookup_symbol which would try to catch re-searching of a block.
I only looked it over briefly, but I don't think the patch is
suitable as-is.  However, he is trying to address an important
problem in gdb and it behooves us to look at this and clean up his
approach, or come up with another way of addressing the problem.

I don't know much about symtabs.  If a symtab (e.g. "foobaz.h")
has no global symbols, wouldn't it make sense for it to have no
global block associated with it?  Or a zero-entry block?  I don't
really know here, I'm just trying to think of another approach to
the problem - if we could eliminate the many-symtabs-to-few-global-blocks,
that would be another route.


This binary search breakage is the important one to address.  As
bad as it was when debugging gdb, I'd hate to see what happens if
you were debugging a C++ program with all those header files those
folks love to use.  Link in a little KDE or Mozilla code and you
might as well get coffee while your debugger is searching for a
nonexistant symbol. :-)  This _does_ represent a regression over
gdb 5.0, so IMHO it'd be worth addressing in 5.1 (it looks lke
y'all have branched for that release already).


The symtab-vs-global-block thing looks like an area ripe for some
easy speedup, but when you've got the binary search algorithm
working right it removes a lot of the incentive to bother with this.
C++ programmers may have a different opinion on this, though. :-)


Jason


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fear and loathing in lookup_block_symbol
  2001-09-07  1:00 ` Jason Molenda
@ 2001-09-07  1:02   ` Jason Molenda
  2001-09-07  1:04     ` Jason Molenda
  0 siblings, 1 reply; 7+ messages in thread
From: Jason Molenda @ 2001-09-07  1:02 UTC (permalink / raw)
  To: gdb-patches

On Fri, Sep 07, 2001 at 12:59:17AM -0700, Jason Molenda wrote:

> nonexistant symbol. :-)  This _does_ represent a regression over
> gdb 5.0, so IMHO it'd be worth addressing in 5.1 (it looks lke
> y'all have branched for that release already).

I'm such a retard, I just noticed from the tags in the cvs files that
5.1 is a branch based off of the gdb 5.0 branch.  As one would expect.

J


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fear and loathing in lookup_block_symbol
  2001-09-07  1:02   ` Jason Molenda
@ 2001-09-07  1:04     ` Jason Molenda
  2001-09-14  7:24       ` Andrew Cagney
  0 siblings, 1 reply; 7+ messages in thread
From: Jason Molenda @ 2001-09-07  1:04 UTC (permalink / raw)
  To: gdb-patches

On Fri, Sep 07, 2001 at 01:01:45AM -0700, Jason Molenda wrote:
> On Fri, Sep 07, 2001 at 12:59:17AM -0700, Jason Molenda wrote:
> 
> > nonexistant symbol. :-)  This _does_ represent a regression over
> > gdb 5.0, so IMHO it'd be worth addressing in 5.1 (it looks lke
> > y'all have branched for that release already).
> 
> I'm such a retard, I just noticed from the tags in the cvs files that
> 5.1 is a branch based off of the gdb 5.0 branch.  As one would expect.

No, I'm wrong again!  It will be a regression vs 5.0 after all.
(curse you, hyper-fast qmail which I can't shut down before my
messages goes out ;-)

Hey, look whose bedtime it's past.  Bye-bye.

J


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fear and loathing in lookup_block_symbol
  2001-09-07  1:04     ` Jason Molenda
@ 2001-09-14  7:24       ` Andrew Cagney
  0 siblings, 0 replies; 7+ messages in thread
From: Andrew Cagney @ 2001-09-14  7:24 UTC (permalink / raw)
  To: Jason Molenda; +Cc: gdb-patches

> No, I'm wrong again!  It will be a regression vs 5.0 after all.
> (curse you, hyper-fast qmail which I can't shut down before my
> messages goes out  [;-)] 

yes, it is this one.  N.M (5.0, 5.1) gets taken from the trunk.  A 5.1.1 
would get taken from the 5.1 branch though.

andrew



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fear and loathing in lookup_block_symbol
  2001-09-06 11:41 Fear and loathing in lookup_block_symbol Jason Molenda
  2001-09-06 11:44 ` Jason Molenda
  2001-09-07  1:00 ` Jason Molenda
@ 2001-09-14  7:40 ` Andrew Cagney
  2 siblings, 0 replies; 7+ messages in thread
From: Andrew Cagney @ 2001-09-14  7:40 UTC (permalink / raw)
  To: Jason Molenda; +Cc: gdb-patches

> We can't copy the old code's style exactly.  SYMBOL_MATCHES_NAME calls strcmp_iw() on the demangled names (to ignore whitespace), but strcmp_iw is poorly named; it doesn't give you a less-than greater-than return value like strcmp().  It'd be better termed streq_iw() or something unlike strcmp().  But we can make vast improvements without digging into this problem by changing the code to something like:

I'd consider a change to fix strcmp_iw() semantics so that it more 
correctlu resembles strcmp(), to be an obvious fix.

as for streq(), we're trying to kill off those STREQ() macro's so this 
could confuse things :-)

andrew



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2001-09-14  7:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-09-06 11:41 Fear and loathing in lookup_block_symbol Jason Molenda
2001-09-06 11:44 ` Jason Molenda
2001-09-07  1:00 ` Jason Molenda
2001-09-07  1:02   ` Jason Molenda
2001-09-07  1:04     ` Jason Molenda
2001-09-14  7:24       ` Andrew Cagney
2001-09-14  7:40 ` Andrew Cagney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox