From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 25589 invoked by alias); 20 Feb 2003 22:41:38 -0000 Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sources.redhat.com Received: (qmail 25569 invoked from network); 20 Feb 2003 22:41:37 -0000 Received: from unknown (HELO jackfruit.Stanford.EDU) (171.64.38.136) by 172.16.49.205 with SMTP; 20 Feb 2003 22:41:37 -0000 Received: (from carlton@localhost) by jackfruit.Stanford.EDU (8.11.6/8.11.6) id h1KMfJi03861; Thu, 20 Feb 2003 14:41:19 -0800 X-Authentication-Warning: jackfruit.Stanford.EDU: carlton set sender to carlton@math.stanford.edu using -f To: Daniel Jacobowitz Cc: gdb-patches@sources.redhat.com, Elena Zannoni , Jim Blandy Subject: Re: RFA/symtab: Let search_symbols find exact matches References: <20030210160107.GA587@nevyn.them.org> From: David Carlton Date: Thu, 20 Feb 2003 22:41:00 -0000 In-Reply-To: <20030210160107.GA587@nevyn.them.org> Message-ID: User-Agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Common Lisp) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-SW-Source: 2003-02/txt/msg00504.txt.bz2 On Mon, 10 Feb 2003 11:01:07 -0500, Daniel Jacobowitz said: > This patch renames search_symbols to search_symbols_aux, and lets it take an > argument that specified regex or exact (well, strcmp_iw) matching. Then > search_symbols becomes a wrapper. > The new function is used by make_symbol_overload_list, which shaves 50% time > off of some tests in the C++ directory if your system libc has debugging > symbols; it removes the bogusity where all psymtabs were converted to > symtabs during overload resolution. Whew. This cuts memory for > namespace.exp from 70MB to 7MB, allowing some of my hardware to actually run > the test without crashing. Here's take two on my comments on this. First, I should say that I totally agree that the aforementioned bogusity in make_symbol_overload_list should go; as discussed in , though, there are ways of doing that that involve much less drastic changes. So the question is: do search_symbols and make_symbol_overload_list have enough of a common core such that extracting that common core into a single function would clean up the code, reducing maintenance overhead, or would that common code be artificial, increasing maintenance overhead? Here's an analysis of the two functions. It may well be incorrect; I understand GDB's symbol mechanisms pretty well (better than I'd like, I sometimes think...), and I understand make_symbol_overload_list pretty well, but I haven't spent too much time with search_symbols yet. Loosely speaking, this is what search_symbols does: 1) If the regexp is non-NULL, it cleans it up a bit and compiles it. 2) It looks at every partial symtab; in each one, it looks at every partial symbol until it finds one that matches, at which point it reads in that partial symtab. 3) It (usually) looks at every single minimal symbol, to see if thre's a corresponding symbol. 4) It then looks at every single global and static symbol, to find the ones that match. 5) If there were minimal symbols without corresponding symbols, then it looks at every single minimal symbol again. Here, on the other hand, is what make_symbol_overload_list does. 1) It looks at every partial symtab, and reads them all in (in an amusingly verbose fashion). (My e-mail referenced above contains a patch fixing this.) 2) It looks at local variables, to find the ones that match. 3) It looks at every single global and static symbol, to find the ones that match. So: what's the same, what's different. Things that only search_symbols does: * It does some regexp-related stuff at the beginning. * It looks at every single minimal symbol at least once, possibly twice. Things that only make_symbol_overload_list does: * It looks at local variables. Things that both functions do: * Determine which psymtabs to read in. * Look at every single global and static symbol, to find the ones that match. So already I'm suspicious: those differences seem potentially significant to me. Some of them may well be due to accidents in the way the two functions were written (which would be exactly the sort of accidents that patches like Daniel's would help prevent!); but I think that, at the least, we need more analysis to see if that's the case. When doing this sort of refactoring, I think the proper order is to first minimize the differences and only then to combine them. (Shrinking the functions first will help, too: see below.) But even the similarities aren't as close as they seem. In both cases, make_symbol_overload_list can potentially use a much faster algorithm than search_symbols can, one that avoids looking at the vast majority of partial or regular symbols. For the psymtab case, it can just do one call to lookup_partial_symbol per psymtab, rather than look at (almost) every partial symbol. For the symbols, all the matching symbols will be inside the same hash bucket, so we could conceivably just look in one bucket per symtab (well, two, one for the globals and one for the statics), instead of looking at every symbol. (We don't currently have an appropriate iterator to do that; I do have one in my branch, that I plan to try to move to mainline in a few weeks.) So. I like the basic idea of combining the functions as a longer-term goal, but I'm pretty sure it's not a good idea right now. I think a better thing to do would be to get a good handle on the difference between search_symbol and make_symbol_overload_list. (And make_symbol_completion_list: it may well be easier to combine make_symbol_overload_list and make_symbol_completion_list as a first step.) One possible concrete recommendation as a step towards this refactoring would be to extract the body of search_symbol into 5 separate functions along the lines of my analysis above, to extract the body of make_symbol_overload_list into 3 separate functions along the lines of my analysis above, and similarly with make_symbol_completion_list. (This is like what I did with lookup_symbol_aux.) That will enable us to do two things: * By looking at the resulting much shorter versions of those functions, it'll be much clearer what each one does in broad terms. We can then think about whether the differences between them are important or not. * When an extracted function from, say, make_symbol_overload_list is playing a similar role to an extracted function from, say, make_symbol_completion_list, then examine those functions to see if they can be profitably unified into a single function. That may sound like a lot of work, but having gone through it a few times, it's really not so bad. And it's a great way to understand and clean up functions. It doesn't interact well with GDB's approval process, though; I have some ideas about that, too, but I don't think this is the place for them. David Carlton carlton@math.stanford.edu