* Fear and loathing in lookup_block_symbol
@ 2001-09-06 11:41 Jason Molenda
2001-09-06 11:44 ` Jason Molenda
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Jason Molenda @ 2001-09-06 11:41 UTC (permalink / raw)
To: gdb-patches
This is a little long and involved, but there is a big problem in one of
the lowest level symbol lookup functions, so (someone) please bear with
it all.
Daniel put together some great C++ symbol lookups last summer
http://sources.redhat.com/ml/gdb-patches/2000-08/msg00199.html
Real problems - I'd just started digging into #3 on his set of patches
myself.
Elena looks at the patch a little while longer, raises some issues (like
why the namespace check is no longer done in lookup_block_symbol - good
catch Elena):
http://sources.redhat.com/ml/gdb-patches/2000-09/msg00237.html
There is some back and forth discussion. Elena commits some of the
changes (all of them? I haven't read all the archives that thoroughly,
and it's not relevant)
http://sources.redhat.com/ml/gdb-cvs/2000-10/msg00014.html
Incidentally, Daniel (tisk tisk :) had assured her that the namespace
check wasn't necessary before the commit happened. There is some more
back-and-forth on these patches -- Daniel wasn't especially thorough in
updating all callers of lookup_block_symbol, but Peter Schauer jumps in
and fixes several of these. Daniel says he'll just yank the whole lot
of them and submits a patch to revert it all, but I don't think that
ever happened (again, my goal isn't accurate historical
representation -- there's a bug to be fixed here. :-)
Side bar - we see gdb/15 pop up and Michael Chastain chase it down
http://sources.redhat.com/ml/gdb-patches/2001-01/msg00353.html
It's because the namespace check is no longer being done. Michael is
mystified
as to how this happened.
OK OK, I'll get to the real bug. The point is that these patches have a
long and seedy history. :-) And at the same time, I want to throw out
props to Daniel -- despite everything, he is getting at real, BIG, gdb
inefficiencies, and he's trying to fix them. A little more follow-up
and completeness would have been good, that's all.
In the patch that Elena committed (URL above), which makes C++ symbol
lookup use the binary search method has a mistake. lookup_block_symbol
has two search methods. One is a binary search; it gets "close" to the
symbol (alphabetically 'below' the correct entry) and then searches over
the next few symbols. The other method is a carefully written linear
search. In the binary search method, this linear-stepping second half
is necessary because we might have multiple symbol names in different
namespaces (e.g. function vs struct namespaces).
Daniel's patch inadvertently changes this second half of the binary
search method -- instead of stepping over a few symbols, gdb now binary
searches to the middle of the block symbols, then linearly steps over
all symbols until it gets to the end of the block. This is very
expensive. Specifically, I'm talking about this patch here:
@@ -1247,19 +1282,8 @@ lookup_block_symbol (register const stru
while (bot < top)
{
sym = BLOCK_SYM (block, bot);
- inc = SYMBOL_NAME (sym)[0] - name[0];
- if (inc == 0)
- {
- inc = STRCMP (SYMBOL_NAME (sym), name);
- }
- if (inc == 0 && SYMBOL_NAMESPACE (sym) == namespace)
- {
- return (sym);
- }
- if (inc > 0)
- {
- break;
- }
+ if (SYMBOL_MATCHES_NAME (sym, name))
+ return sym;
bot++;
}
}
The old code is straightforward - the binary search has put us at the
beginning of matching symbols (the 'bot' index). If the symbol names
match, and the namespaces match, we return the sym. If we've stepped
beyond the block symbols that could match our symbol name, we break out
of the loop. Daniel's change loses this critical "break out of the
loop" part, and iterates over all the block symbols from the
binary-search start point until the end.
We can't copy the old code's style exactly. SYMBOL_MATCHES_NAME calls
strcmp_iw() on the demangled names (to ignore whitespace), but strcmp_iw
is poorly named; it doesn't give you a less-than greater-than return
value like strcmp(). It'd be better termed streq_iw() or something
unlike strcmp(). But we can make vast improvements without digging into
this problem by changing the code to something like:
while (bot < top)
{
sym = BLOCK_SYM (block, bot);
if (SYMBOL_NAMESPACE (sym) == namespace &&
SYMBOL_MATCHES_NAME (sym, name))
{
return sym;
}
if (SYMBOL_SOURCE_NAME (sym)[0] > name[0])
{
break;
}
bot++;
}
The search is not optimal (we'll probably end up stepping over more
symbols than is necessary), but it's a huge improvement over the current
one.
A specific example: One of our demo programs that we ship with the
development tools is a little notepad GUI app. It has these opaque
structure pointers that I mentioned last week, which gdb will never find
a definition of. These opaque structure pointers are often in programs
as local variables, so stepping with a Locals window open highlights any
delay here. With the old iterate-over-half-the-symbols approach, "maint
time 1" reported 1.6 - 1.7 seconds to look grope through all the symbols
looking for this structure. With this break-out clause added, "maint
time 1" reports 0.0 seconds (it's below the measurement threshold).
(and before that second call to check_typedef() was removed, doing "info
locals" would take around 3.5 seconds.)
If someone will agree that this approach is reasonable, I'll check in a
patch. I've already run the above patch through the testsuite, it adds
no new regressions.
Jason
PS- This patch was brought to you by Tom Tromey's gdb-profile patch.
Yes, gdb-profile, it does the things a profiling patch ought to do. I
fired up his patch on gdb, found this naughty little
lookup_block_symbol() function in about one minute, and started digging
in. Very helpful. Don't you wish YOU could use Tom Tromey's
gdb-profile patch?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fear and loathing in lookup_block_symbol
2001-09-06 11:41 Fear and loathing in lookup_block_symbol Jason Molenda
@ 2001-09-06 11:44 ` Jason Molenda
2001-09-07 1:00 ` Jason Molenda
2001-09-14 7:40 ` Andrew Cagney
2 siblings, 0 replies; 7+ messages in thread
From: Jason Molenda @ 2001-09-06 11:44 UTC (permalink / raw)
To: gdb-patches
On Thursday, September 6, 2001, at 11:41 AM, Jason Molenda wrote:
> Side bar - we see gdb/15 pop up and Michael Chastain chase it down
> http://sources.redhat.com/ml/gdb-patches/2001-01/msg00353.html
> It's because the namespace check is no longer being done. Michael is
> mystified
> as to how this happened.
>
I passed the wrong URL for my side bar. Doh.
http://sources.redhat.com/ml/gdb-patches/2001-01/msg00337.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fear and loathing in lookup_block_symbol
2001-09-06 11:41 Fear and loathing in lookup_block_symbol Jason Molenda
2001-09-06 11:44 ` Jason Molenda
@ 2001-09-07 1:00 ` Jason Molenda
2001-09-07 1:02 ` Jason Molenda
2001-09-14 7:40 ` Andrew Cagney
2 siblings, 1 reply; 7+ messages in thread
From: Jason Molenda @ 2001-09-07 1:00 UTC (permalink / raw)
To: gdb-patches
On Thu, Sep 06, 2001 at 11:41:01AM -0700, Jason Molenda wrote:
> This is a little long and involved, but there is a big problem in one of
> the lowest level symbol lookup functions, so (someone) please bear with
> it all.
I'll offer a little incentive by explaining how much of a performance
loss this is. As I mentioned in the note, gdb usually does its
per-block symbol lookup with a binary search. This bug degrades
that log-time algorithm so that it clocks in around O(1/2 * nsyms)
time.
This adds up quick. I did some measurements of the impact it has
if you're debugging gdb itself. If the binary search algorithm is
working correctly, doing "print kjdkjdkjd" in (top-gdb) will
ultimately do 865 strcmp()'s. With the bug, doing "print kjdkjd"
will do 33,000 string comparisons. And it's linear, so if you
choose an alphabetically lower symbol, say "aaaa", the number of
string comparisons goes up - to around 82,000.
As an aside, in Daniel's August 2000 mondo-patch, one of the things
he addressed is code like this:
ALL_SYMTABS (objfile, s)
{
bv = BLOCKVECTOR (s);
block = BLOCKVECTOR_BLOCK (bv, GLOBAL_BLOCK);
sym = lookup_block_symbol (block, name, namespace);
[...]
The problem here is that there are many symtabs in a object file,
but few global blocks. In gdb's case, there are around 330 symtabs,
and only 23 global blocks. The result? We call lookup_block_symbol
many, many times more than necessary, and often (307 times in this
case) it's just re-searching the same old blocks.
Daniel's approach was to add a sixteen entry hash table in
lookup_symbol which would try to catch re-searching of a block.
I only looked it over briefly, but I don't think the patch is
suitable as-is. However, he is trying to address an important
problem in gdb and it behooves us to look at this and clean up his
approach, or come up with another way of addressing the problem.
I don't know much about symtabs. If a symtab (e.g. "foobaz.h")
has no global symbols, wouldn't it make sense for it to have no
global block associated with it? Or a zero-entry block? I don't
really know here, I'm just trying to think of another approach to
the problem - if we could eliminate the many-symtabs-to-few-global-blocks,
that would be another route.
This binary search breakage is the important one to address. As
bad as it was when debugging gdb, I'd hate to see what happens if
you were debugging a C++ program with all those header files those
folks love to use. Link in a little KDE or Mozilla code and you
might as well get coffee while your debugger is searching for a
nonexistant symbol. :-) This _does_ represent a regression over
gdb 5.0, so IMHO it'd be worth addressing in 5.1 (it looks lke
y'all have branched for that release already).
The symtab-vs-global-block thing looks like an area ripe for some
easy speedup, but when you've got the binary search algorithm
working right it removes a lot of the incentive to bother with this.
C++ programmers may have a different opinion on this, though. :-)
Jason
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fear and loathing in lookup_block_symbol
2001-09-07 1:00 ` Jason Molenda
@ 2001-09-07 1:02 ` Jason Molenda
2001-09-07 1:04 ` Jason Molenda
0 siblings, 1 reply; 7+ messages in thread
From: Jason Molenda @ 2001-09-07 1:02 UTC (permalink / raw)
To: gdb-patches
On Fri, Sep 07, 2001 at 12:59:17AM -0700, Jason Molenda wrote:
> nonexistant symbol. :-) This _does_ represent a regression over
> gdb 5.0, so IMHO it'd be worth addressing in 5.1 (it looks lke
> y'all have branched for that release already).
I'm such a retard, I just noticed from the tags in the cvs files that
5.1 is a branch based off of the gdb 5.0 branch. As one would expect.
J
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fear and loathing in lookup_block_symbol
2001-09-07 1:02 ` Jason Molenda
@ 2001-09-07 1:04 ` Jason Molenda
2001-09-14 7:24 ` Andrew Cagney
0 siblings, 1 reply; 7+ messages in thread
From: Jason Molenda @ 2001-09-07 1:04 UTC (permalink / raw)
To: gdb-patches
On Fri, Sep 07, 2001 at 01:01:45AM -0700, Jason Molenda wrote:
> On Fri, Sep 07, 2001 at 12:59:17AM -0700, Jason Molenda wrote:
>
> > nonexistant symbol. :-) This _does_ represent a regression over
> > gdb 5.0, so IMHO it'd be worth addressing in 5.1 (it looks lke
> > y'all have branched for that release already).
>
> I'm such a retard, I just noticed from the tags in the cvs files that
> 5.1 is a branch based off of the gdb 5.0 branch. As one would expect.
No, I'm wrong again! It will be a regression vs 5.0 after all.
(curse you, hyper-fast qmail which I can't shut down before my
messages goes out ;-)
Hey, look whose bedtime it's past. Bye-bye.
J
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fear and loathing in lookup_block_symbol
2001-09-07 1:04 ` Jason Molenda
@ 2001-09-14 7:24 ` Andrew Cagney
0 siblings, 0 replies; 7+ messages in thread
From: Andrew Cagney @ 2001-09-14 7:24 UTC (permalink / raw)
To: Jason Molenda; +Cc: gdb-patches
> No, I'm wrong again! It will be a regression vs 5.0 after all.
> (curse you, hyper-fast qmail which I can't shut down before my
> messages goes out [;-)]
yes, it is this one. N.M (5.0, 5.1) gets taken from the trunk. A 5.1.1
would get taken from the 5.1 branch though.
andrew
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fear and loathing in lookup_block_symbol
2001-09-06 11:41 Fear and loathing in lookup_block_symbol Jason Molenda
2001-09-06 11:44 ` Jason Molenda
2001-09-07 1:00 ` Jason Molenda
@ 2001-09-14 7:40 ` Andrew Cagney
2 siblings, 0 replies; 7+ messages in thread
From: Andrew Cagney @ 2001-09-14 7:40 UTC (permalink / raw)
To: Jason Molenda; +Cc: gdb-patches
> We can't copy the old code's style exactly. SYMBOL_MATCHES_NAME calls strcmp_iw() on the demangled names (to ignore whitespace), but strcmp_iw is poorly named; it doesn't give you a less-than greater-than return value like strcmp(). It'd be better termed streq_iw() or something unlike strcmp(). But we can make vast improvements without digging into this problem by changing the code to something like:
I'd consider a change to fix strcmp_iw() semantics so that it more
correctlu resembles strcmp(), to be an obvious fix.
as for streq(), we're trying to kill off those STREQ() macro's so this
could confuse things :-)
andrew
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2001-09-14 7:40 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-09-06 11:41 Fear and loathing in lookup_block_symbol Jason Molenda
2001-09-06 11:44 ` Jason Molenda
2001-09-07 1:00 ` Jason Molenda
2001-09-07 1:02 ` Jason Molenda
2001-09-07 1:04 ` Jason Molenda
2001-09-14 7:24 ` Andrew Cagney
2001-09-14 7:40 ` Andrew Cagney
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox