From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-22646-listarch-gdb-patches=sources.redhat.com@sources.redhat.com>
Received: (qmail 25589 invoked by alias); 20 Feb 2003 22:41:38 -0000
Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-patches-subscribe@sources.redhat.com>
List-Archive: <http://sources.redhat.com/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sources.redhat.com>
List-Help: <mailto:gdb-patches-help@sources.redhat.com>, <http://sources.redhat.com/ml/#faqs>
Sender: gdb-patches-owner@sources.redhat.com
Received: (qmail 25569 invoked from network); 20 Feb 2003 22:41:37 -0000
Received: from unknown (HELO jackfruit.Stanford.EDU) (171.64.38.136)
  by 172.16.49.205 with SMTP; 20 Feb 2003 22:41:37 -0000
Received: (from carlton@localhost)
	by jackfruit.Stanford.EDU (8.11.6/8.11.6) id h1KMfJi03861;
	Thu, 20 Feb 2003 14:41:19 -0800
X-Authentication-Warning: jackfruit.Stanford.EDU: carlton set sender to carlton@math.stanford.edu using -f
To: Daniel Jacobowitz <drow@mvista.com>
Cc: gdb-patches@sources.redhat.com, Elena Zannoni <ezannoni@redhat.com>,
   Jim Blandy <jimb@redhat.com>
Subject: Re: RFA/symtab: Let search_symbols find exact matches
References: <20030210160107.GA587@nevyn.them.org>
From: David Carlton <carlton@math.stanford.edu>
Date: Thu, 20 Feb 2003 22:41:00 -0000
In-Reply-To: <20030210160107.GA587@nevyn.them.org>
Message-ID: <ro1k7fu4ltd.fsf@jackfruit.Stanford.EDU>
User-Agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Common Lisp)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-SW-Source: 2003-02/txt/msg00504.txt.bz2

On Mon, 10 Feb 2003 11:01:07 -0500, Daniel Jacobowitz <drow@mvista.com> said:

> This patch renames search_symbols to search_symbols_aux, and lets it take an
> argument that specified regex or exact (well, strcmp_iw) matching.  Then
> search_symbols becomes a wrapper.

> The new function is used by make_symbol_overload_list, which shaves 50% time
> off of some tests in the C++ directory if your system libc has debugging
> symbols; it removes the bogusity where all psymtabs were converted to
> symtabs during overload resolution.  Whew.  This cuts memory for
> namespace.exp from 70MB to 7MB, allowing some of my hardware to actually run
> the test without crashing.

Here's take two on my comments on this.  First, I should say that I
totally agree that the aforementioned bogusity in
make_symbol_overload_list should go; as discussed in
<http://sources.redhat.com/ml/gdb-patches/2003-02/msg00261.html>,
though, there are ways of doing that that involve much less drastic
changes.

So the question is: do search_symbols and make_symbol_overload_list
have enough of a common core such that extracting that common core
into a single function would clean up the code, reducing maintenance
overhead, or would that common code be artificial, increasing
maintenance overhead?

Here's an analysis of the two functions.  It may well be incorrect; I
understand GDB's symbol mechanisms pretty well (better than I'd like,
I sometimes think...), and I understand make_symbol_overload_list
pretty well, but I haven't spent too much time with search_symbols
yet.


Loosely speaking, this is what search_symbols does:

1) If the regexp is non-NULL, it cleans it up a bit and compiles it.

2) It looks at every partial symtab; in each one, it looks at every
   partial symbol until it finds one that matches, at which point it
   reads in that partial symtab.

3) It (usually) looks at every single minimal symbol, to see if thre's
   a corresponding symbol.

4) It then looks at every single global and static symbol, to find the
   ones that match.

5) If there were minimal symbols without corresponding symbols, then
   it looks at every single minimal symbol again.

Here, on the other hand, is what make_symbol_overload_list does.

1) It looks at every partial symtab, and reads them all in (in an
   amusingly verbose fashion).  (My e-mail referenced above contains a
   patch fixing this.)

2) It looks at local variables, to find the ones that match.

3) It looks at every single global and static symbol, to find the ones
   that match.


So: what's the same, what's different.

Things that only search_symbols does:

* It does some regexp-related stuff at the beginning.

* It looks at every single minimal symbol at least once, possibly
  twice.

Things that only make_symbol_overload_list does:

* It looks at local variables.

Things that both functions do:

* Determine which psymtabs to read in.

* Look at every single global and static symbol, to find the ones that
  match.


So already I'm suspicious: those differences seem potentially
significant to me.  Some of them may well be due to accidents in the
way the two functions were written (which would be exactly the sort of
accidents that patches like Daniel's would help prevent!); but I think
that, at the least, we need more analysis to see if that's the case.
When doing this sort of refactoring, I think the proper order is to
first minimize the differences and only then to combine them.
(Shrinking the functions first will help, too: see below.)

But even the similarities aren't as close as they seem.  In both
cases, make_symbol_overload_list can potentially use a much faster
algorithm than search_symbols can, one that avoids looking at the vast
majority of partial or regular symbols.  For the psymtab case, it can
just do one call to lookup_partial_symbol per psymtab, rather than
look at (almost) every partial symbol.  For the symbols, all the
matching symbols will be inside the same hash bucket, so we could
conceivably just look in one bucket per symtab (well, two, one for the
globals and one for the statics), instead of looking at every symbol.
(We don't currently have an appropriate iterator to do that; I do have
one in my branch, that I plan to try to move to mainline in a few
weeks.)


So.  I like the basic idea of combining the functions as a longer-term
goal, but I'm pretty sure it's not a good idea right now.  I think a
better thing to do would be to get a good handle on the difference
between search_symbol and make_symbol_overload_list.  (And
make_symbol_completion_list: it may well be easier to combine
make_symbol_overload_list and make_symbol_completion_list as a first
step.)

One possible concrete recommendation as a step towards this
refactoring would be to extract the body of search_symbol into 5
separate functions along the lines of my analysis above, to extract
the body of make_symbol_overload_list into 3 separate functions along
the lines of my analysis above, and similarly with
make_symbol_completion_list.  (This is like what I did with
lookup_symbol_aux.)  That will enable us to do two things:

* By looking at the resulting much shorter versions of those
  functions, it'll be much clearer what each one does in broad terms.
  We can then think about whether the differences between them are
  important or not.

* When an extracted function from, say, make_symbol_overload_list is
  playing a similar role to an extracted function from, say,
  make_symbol_completion_list, then examine those functions to see if
  they can be profitably unified into a single function.

That may sound like a lot of work, but having gone through it a few
times, it's really not so bad.  And it's a great way to understand and
clean up functions.  It doesn't interact well with GDB's approval
process, though; I have some ideas about that, too, but I don't think
this is the place for them.

David Carlton
carlton@math.stanford.edu