From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jason Molenda To: Eli Zaretskii Cc: gdb-patches@sources.redhat.com Subject: Re: [RFA] bug in symtab.c:lookup_block_symbol()'s search method Date: Sat, 15 Sep 2001 00:54:00 -0000 Message-id: <20010915005255.A2369@shell17.ba.best.com> References: <20010909074800.A8112@shell17.ba.best.com> <3B9D054A.4C3CC2B1@cygnus.com> <20010910113226.A23487@shell17.ba.best.com> <87zo82swwa.fsf@cgsoftware.com> <20010910130347.A5628@shell17.ba.best.com> <8766aq7nki.fsf@cgsoftware.com> <3BA219EF.3000300@cygnus.com> <9003-Fri14Sep2001190223+0300-eliz@is.elta.co.il> <20010914091241.A28921@shell17.ba.best.com> <1659-Fri14Sep2001204927+0300-eliz@is.elta.co.il> X-SW-Source: 2001-09/msg00188.html On Fri, Sep 14, 2001 at 08:49:28PM +0300, Eli Zaretskii wrote: > Even if the performance hit is significant, I fail to understand how > can someone say the entire program is broken, or that it cannot be > released. Can we please get things back into their proportion? I'll stand what I said earlier. > > Why, MacOS X just happens to be one. In some of our > > libraries, there are a lot of these opaque struct pointers in use. > > Looking up one of those variables requires gdb to sit-and-spin for > > 5-10 seconds, before realizing that it has no definition. > > On what machine? Timing data is useless without saying on what kind > of CPU and with what clock speed did you see that. I re-ran the timings to verify on a 733 MHz G4 PPC Mac OS X machine. My test program is called "SimpleText", it is 80k of C source, it is a little notepad GUI program that is included with the Apple development tools. It links in two of our fundamental shared libraries. As I mentioned earlier, our Carbon library makes heavy use of these "opaque struct pointers" - in the Carbon header files you'll see things like "typedef struct OpaqueWindowPtr *WindowPtr", and people will pass variables of type "WindowPtr" around to various system calls. If you have a variable of this type, every time you examine it, gdb will try to find a definition of struct OpaqueWindowPtr. With the gdb 5.1 symtab.c, typing "p aWindowPtr" in SimpleText will take 3.5 seconds to complete (as reported by 'maint time 1'). With the patches (or with gdb 5.0), typing "p aWindowPtr" in SimpleText takes between 0.01 and 0.00 seconds (as reported by 'maint time 1' -- it's at the limits of maint time's granularity. Hence my earlier characterization as "unmeasurable".) I've already written this more than once, but I'll cover the technical details again. lookup_block_symbol is given a symbol name to find, and a pointer to a block to search--the block has a list of symbol names present there. l_b_s() does a binary search to get to the bottom range of possible matches of the symbol, then uses a simple linear search to step over a few symbols. Once it is beyond the range of plausible matches, it exits out of the linear search and returns a no-match result. Dan's patch dropped the "exits out of linear search" -- now it binary searches to the beginning of plausible ranges, and steps through to the end of the block. It's now an O(1/2*N) algorithm for worst-case, i.e. non-matches. More concretely. With the current gdb 5.1 sources, in SimpleText, running "p aWindowPtr" will cause gdb to loop 6,785,175 times in that linear-search after the binary search. With the patch I submitted, or the gdb 5.0 sources, running "p aWindowPtr" will run this loop 4,390 times. > Anyway, I don't consider 5-10 seconds such a long time. We still have > in GDB operations that take much more, and we don't consider it > ``broken'' because of that. It used to run in a hundredth of a second. Now it's taking over three seconds, on one of our fastest machines. When you hit the 'next' button in a GUI and it seizes up for four - ten seconds, and it didn't used to do that. How else can it be described? This behavior is not an act of god, an inevitable consequence of making gdb better, or an unavoidable tradeoff. It's a mistake. The argument against this this change is that there _might_ be a language which has space chars at the beginning of a demangled string. lookup_block_symbol() is already very, very broken if that's the case. The initial binary search uses a combination of first-letter comparisons and strcmp() -- the latter of which is clearly out of the question for making unmangled comparisons. > > If you're using GDB in under an IDE and you have a Locals window > > open, and one of those locals is an opaque structure, whenever you > > step into our out of that frame, you'll have this 5-10 second delay. > > So display the hourglass for 10 seconds and be done with that. No one > will really notice, except you and me. The world is full with good > software that sometimes has 10-sec delays, to say nothing of bad > software. I strongly disagree. A word processor that delays for two seconds every time you add a new character is not acceptable. A debugger that can take ten seconds to complete a "next" is not acceptable. If we're paging in a big shared library, there's nothing we can do about it. If we're doing a linear search instead of a binary seach, there _is_ something we can do about it. I'm not just making things up here. Our IDE at Apple currently has "next" performance of around 0.5 - 0.7 seconds -- and we hear no end of complaints about how it is too slow. There is a very tangible speed at which users want a GUI debugger to respond to a simple Next, and we're not meeting that yet. If 0.7 seconds is too slow, you can sure bet that 5 seconds will gain their ire. Jason