From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jason Molenda To: gdb-patches@sources.redhat.com Subject: Fear and loathing in lookup_block_symbol Date: Thu, 06 Sep 2001 11:41:00 -0000 Message-id: X-SW-Source: 2001-09/msg00076.html This is a little long and involved, but there is a big problem in one of the lowest level symbol lookup functions, so (someone) please bear with it all. Daniel put together some great C++ symbol lookups last summer http://sources.redhat.com/ml/gdb-patches/2000-08/msg00199.html Real problems - I'd just started digging into #3 on his set of patches myself. Elena looks at the patch a little while longer, raises some issues (like why the namespace check is no longer done in lookup_block_symbol - good catch Elena): http://sources.redhat.com/ml/gdb-patches/2000-09/msg00237.html There is some back and forth discussion. Elena commits some of the changes (all of them? I haven't read all the archives that thoroughly, and it's not relevant) http://sources.redhat.com/ml/gdb-cvs/2000-10/msg00014.html Incidentally, Daniel (tisk tisk :) had assured her that the namespace check wasn't necessary before the commit happened. There is some more back-and-forth on these patches -- Daniel wasn't especially thorough in updating all callers of lookup_block_symbol, but Peter Schauer jumps in and fixes several of these. Daniel says he'll just yank the whole lot of them and submits a patch to revert it all, but I don't think that ever happened (again, my goal isn't accurate historical representation -- there's a bug to be fixed here. :-) Side bar - we see gdb/15 pop up and Michael Chastain chase it down http://sources.redhat.com/ml/gdb-patches/2001-01/msg00353.html It's because the namespace check is no longer being done. Michael is mystified as to how this happened. OK OK, I'll get to the real bug. The point is that these patches have a long and seedy history. :-) And at the same time, I want to throw out props to Daniel -- despite everything, he is getting at real, BIG, gdb inefficiencies, and he's trying to fix them. A little more follow-up and completeness would have been good, that's all. In the patch that Elena committed (URL above), which makes C++ symbol lookup use the binary search method has a mistake. lookup_block_symbol has two search methods. One is a binary search; it gets "close" to the symbol (alphabetically 'below' the correct entry) and then searches over the next few symbols. The other method is a carefully written linear search. In the binary search method, this linear-stepping second half is necessary because we might have multiple symbol names in different namespaces (e.g. function vs struct namespaces). Daniel's patch inadvertently changes this second half of the binary search method -- instead of stepping over a few symbols, gdb now binary searches to the middle of the block symbols, then linearly steps over all symbols until it gets to the end of the block. This is very expensive. Specifically, I'm talking about this patch here: @@ -1247,19 +1282,8 @@ lookup_block_symbol (register const stru while (bot < top) { sym = BLOCK_SYM (block, bot); - inc = SYMBOL_NAME (sym)[0] - name[0]; - if (inc == 0) - { - inc = STRCMP (SYMBOL_NAME (sym), name); - } - if (inc == 0 && SYMBOL_NAMESPACE (sym) == namespace) - { - return (sym); - } - if (inc > 0) - { - break; - } + if (SYMBOL_MATCHES_NAME (sym, name)) + return sym; bot++; } } The old code is straightforward - the binary search has put us at the beginning of matching symbols (the 'bot' index). If the symbol names match, and the namespaces match, we return the sym. If we've stepped beyond the block symbols that could match our symbol name, we break out of the loop. Daniel's change loses this critical "break out of the loop" part, and iterates over all the block symbols from the binary-search start point until the end. We can't copy the old code's style exactly. SYMBOL_MATCHES_NAME calls strcmp_iw() on the demangled names (to ignore whitespace), but strcmp_iw is poorly named; it doesn't give you a less-than greater-than return value like strcmp(). It'd be better termed streq_iw() or something unlike strcmp(). But we can make vast improvements without digging into this problem by changing the code to something like: while (bot < top) { sym = BLOCK_SYM (block, bot); if (SYMBOL_NAMESPACE (sym) == namespace && SYMBOL_MATCHES_NAME (sym, name)) { return sym; } if (SYMBOL_SOURCE_NAME (sym)[0] > name[0]) { break; } bot++; } The search is not optimal (we'll probably end up stepping over more symbols than is necessary), but it's a huge improvement over the current one. A specific example: One of our demo programs that we ship with the development tools is a little notepad GUI app. It has these opaque structure pointers that I mentioned last week, which gdb will never find a definition of. These opaque structure pointers are often in programs as local variables, so stepping with a Locals window open highlights any delay here. With the old iterate-over-half-the-symbols approach, "maint time 1" reported 1.6 - 1.7 seconds to look grope through all the symbols looking for this structure. With this break-out clause added, "maint time 1" reports 0.0 seconds (it's below the measurement threshold). (and before that second call to check_typedef() was removed, doing "info locals" would take around 3.5 seconds.) If someone will agree that this approach is reasonable, I'll check in a patch. I've already run the above patch through the testsuite, it adds no new regressions. Jason PS- This patch was brought to you by Tom Tromey's gdb-profile patch. Yes, gdb-profile, it does the things a profiling patch ought to do. I fired up his patch on gdb, found this naughty little lookup_block_symbol() function in about one minute, and started digging in. Very helpful. Don't you wish YOU could use Tom Tromey's gdb-profile patch?