From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-19191-listarch-gdb-patches=sourceware.cygnus.com@sources.redhat.com>
Received: (qmail 3152 invoked by alias); 2 Oct 2002 18:04:50 -0000
Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-patches-subscribe@sources.redhat.com>
List-Archive: <http://sources.redhat.com/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sources.redhat.com>
List-Help: <mailto:gdb-patches-help@sources.redhat.com>, <http://sources.redhat.com/ml/#faqs>
Sender: gdb-patches-owner@sources.redhat.com
Received: (qmail 3142 invoked from network); 2 Oct 2002 18:04:49 -0000
Received: from unknown (HELO crack.them.org) (65.125.64.184)
  by sources.redhat.com with SMTP; 2 Oct 2002 18:04:49 -0000
Received: from nevyn.them.org ([66.93.61.169] ident=mail)
	by crack.them.org with asmtp (Exim 3.12 #1 (Debian))
	id 17wonU-0004gz-00; Wed, 02 Oct 2002 14:04:37 -0500
Received: from drow by nevyn.them.org with local (Exim 3.35 #1 (Debian))
	id 17wns3-0008B3-00; Wed, 02 Oct 2002 14:05:15 -0400
Date: Wed, 02 Oct 2002 11:04:00 -0000
From: Daniel Jacobowitz <drow@mvista.com>
To: Jim Blandy <jimb@redhat.com>
Cc: gdb-patches@sources.redhat.com
Subject: Re: RFA: Search for symbol names the same way they're hashed.
Message-ID: <20021002180515.GA8880@nevyn.them.org>
Mail-Followup-To: Jim Blandy <jimb@redhat.com>,
	gdb-patches@sources.redhat.com
References: <200210020329.g923TE702388@zenia.red-bean.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <200210020329.g923TE702388@zenia.red-bean.com>
User-Agent: Mutt/1.5.1i
X-SW-Source: 2002-10/txt/msg00063.txt.bz2

On Tue, Oct 01, 2002 at 10:29:14PM -0500, Jim Blandy wrote:
> 
> This one is odd.  If the C++ folks (and Elena, if she has time) could
> review this carefully, that would be great.
> 
> We build hashed blocks in buildsym.c:finish_block as follows:
> 
>       for (next = *listhead; next; next = next->next)
> 	{
> 	  for (j = next->nsyms - 1; j >= 0; j--)
> 	    {
> 	      struct symbol *sym;
> 	      unsigned int hash_index;
> 	      const char *name = SYMBOL_DEMANGLED_NAME (next->symbol[j]);
> 	      if (name == NULL)
> 		name = SYMBOL_NAME (next->symbol[j]);
> 	      hash_index = msymbol_hash_iw (name);
> 	      hash_index = hash_index % BLOCK_BUCKETS (block);
> 	      sym = BLOCK_BUCKET (block, hash_index);
> 	      BLOCK_BUCKET (block, hash_index) = next->symbol[j];
> 	      next->symbol[j]->hash_next = sym;
> 	    }
> 	}
> 
> But then, we search for them in the hashed block in
> symtab.c:lookup_block_symbol like this:
> 
>   if (BLOCK_HASHTABLE (block))
>     {
>       unsigned int hash_index;
>       hash_index = msymbol_hash_iw (name);
>       hash_index = hash_index % BLOCK_BUCKETS (block);
>       for (sym = BLOCK_BUCKET (block, hash_index); sym; sym = sym->hash_next)
> 	{
> 	  if (SYMBOL_NAMESPACE (sym) == namespace 
> 	      && (mangled_name
> 		  ? strcmp (SYMBOL_NAME (sym), mangled_name) == 0
> 		  : SYMBOL_MATCHES_NAME (sym, name)))
> 	    return sym;
> 	}
>       return NULL;
>     }
> 
> Now, it's a basic fact of hash tables that you can only search using a
> matching criterion strictly more conservative than equality under your
> hash function.  If you hash one way, but then search another way, some
> of your matches might have been filed in other hash buckets.
> 
> So the above code probably isn't quite right.

OK.  So the fact is that if name was not the name which was hashed, we
can't assume we'll find it here.  So either we should be hashing both
names we want to check, or we should only check one.  That's the basic
principle.

> Let's assume that, if the caller has supplied a non-zero MANGLED_NAME,
> it is indeed the mangled form of NAME.  Otherwise, the caller is
> broken and deserves whatever it gets.  Mangled name matching is a more
> conservative matching criterion than demangled name matching (right?),
> so the `mangled_name ? strcmp ...' part is okay.

I like your assumption.  That's what ought to happen.  The whole
mangled_name thing is for handling multiple variants of constructors,
etc; I don't think it's the long-term right solution but it works for
now.

(What is the right solution?  Simple.  While lookup_block_symbol
presumably only returns one symbol, all of GDB needs to be aware that
lookup_symbol can find more than one.  We need a way to distinguish
them (for static functions in multiple files - often all with the same
filename! This happens in BFD) and a way to collapse them (for cloned
constructors, breaking on one should probably break on all of them).

The one-to-one mapping of symbols to addresses doesn't hold up under
pressure.

But I digress.

Right now, there are multiple functions with the same symbol demangled
name, but different mangled names.  If we start using DMGL_VERBOSE
(spelling?) then that won't happen but we need to be ready for the
extra information it provides; things like "X::X[in-charge](const X&)".
So we just take "X::X(const X&)" for now, and all the X::X's are hashed
to the same bucket, which we could take advantage of in overload
resolution once all symbol readers do that.  And the mangled name tells
us which one we want back.

> But when MANGLED_NAME is zero, then the SYMBOL_MATCHES_NAME part is
> questionable.  The definition of SYMBOL_MATCHES_NAME from symtab.h is
> as follows:
> 
> #define SYMBOL_MATCHES_NAME(symbol, name)				\
>   (STREQ (SYMBOL_NAME (symbol), (name))					\
>    || (SYMBOL_DEMANGLED_NAME (symbol) != NULL				\
>        && strcmp_iw (SYMBOL_DEMANGLED_NAME (symbol), (name)) == 0))
> 
> It returns true if NAME matches either SYMBOL_NAME or
> SYMBOL_DEMANGLED_NAME.
> 
> If the intention really was to use SYMBOL_MATCHES_NAME to recognize
> matches, then the code is broken.  If SYMBOL_NAME (sym) matches NAME,
> and sym has a demangled name which is different from NAME (which will
> usually be the case), SYMBOL_MATCHES_NAME (sym, name) will be true,
> but finish_block will have hashed on the demangled name, and probably
> will have filed sym in a different hash bucket than the one we'll
> search.
> 
> If finish_block's criteria are the correct ones, this liberal
> searching criteria can't cause us any problems.  Where finish_block
> would consider a name to match a symbol, SYMBOL_MATCHES_NAME would
> too: the latter checks both names.  And where finish_block would
> consider two names to differ, SYMBOL_MATCHES_NAME behaves the same way
> where there is no a demangled name, and since mangled matching is more
> conservative than demangled matching, it will also behave the same way
> where there is a demangled name.
> 
> Does that sound right?  :)

Yep!

> My first reaction was to say that finish_block is broken --- we want
> to recognize matches on either mangled or unmangled names, but
> finish_block's behavior means that only matches on unmangled names, or
> mangled names where there are no unmangled_names, work reliably.
> 
> But the block hashing code has been in since July, and we haven't had
> any problems due to this behavior.  (The bug I fixed in
> lookup_symbol_aux had to do with bad arguments being passed to
> lookup_block_symbol, so that doesn't count.)  The analysis above means
> that we've been living with finish_block's criteria since then.
> 
> Changing lookup_block_symbol to use the same criteria as finish block
> should have no effect, and will make all the subtlety above go away.

I agree.  Your patch is functionally the same, given our invariants;
it's a little different (two strcmps for the mangled_name case instead
of one) but much clearer.  It's fine with me, although if you and David
can agree on (or find a better name than) SYMBOL_BEST_NAME you might
want to use a macro or function to centralize this matching, and move
the comment there.  I can't think of anywhere else we'd want to match
in the hashtable, so a private function in symtab.c might be best.

> So here's the patch I'm offering.  The point is that it makes the
> match criteria clearly stricter than the hashing criteria, avoiding
> the confusion that I've had to wade through.
> 
> 2002-10-01  Jim Blandy  <jimb@redhat.com>
> 
> 	* symtab.c (lookup_block_symbol): Use the same matching criteria
> 	that buildsym.c:finish_block used when constructing the hash
> 	table.
> 
> Index: gdb/symtab.c
> ===================================================================
> RCS file: /cvs/src/src/gdb/symtab.c,v
> retrieving revision 1.70
> diff -c -r1.70 symtab.c
> *** gdb/symtab.c	20 Sep 2002 14:58:58 -0000	1.70
> --- gdb/symtab.c	2 Oct 2002 03:40:28 -0000
> ***************
> *** 1347,1357 ****
>         hash_index = hash_index % BLOCK_BUCKETS (block);
>         for (sym = BLOCK_BUCKET (block, hash_index); sym; sym = sym->hash_next)
>   	{
> ! 	  if (SYMBOL_NAMESPACE (sym) == namespace 
> ! 	      && (mangled_name
> ! 		  ? strcmp (SYMBOL_NAME (sym), mangled_name) == 0
> ! 		  : SYMBOL_MATCHES_NAME (sym, name)))
> ! 	    return sym;
>   	}
>         return NULL;
>       }
> --- 1347,1372 ----
>         hash_index = hash_index % BLOCK_BUCKETS (block);
>         for (sym = BLOCK_BUCKET (block, hash_index); sym; sym = sym->hash_next)
>   	{
> !           /* Note that you can't just tweak these matching criteria
> !              arbitrarily.  They must be stricter than those assumed
> !              when we build the hash table in finish_block; otherwise,
> !              the code there will put symbols you'd like to match in
> !              different hash buckets.  */
> ! 	  if (SYMBOL_NAMESPACE (sym) == namespace)
> !             {
> !               /* We hash using a hash function that respects
> !                  strcmp_iw; strcmp is more conservative than
> !                  strcmp_iw, so it's fine to use that instead here if
> !                  we like.  */
> !               if (SYMBOL_DEMANGLED_NAME (sym)
> !                   ? strcmp_iw (SYMBOL_DEMANGLED_NAME (sym), name) == 0
> !                   : strcmp (SYMBOL_NAME (sym), name) == 0)
> !                 {
> !                   if (! mangled_name
> !                       || strcmp (SYMBOL_NAME (sym), mangled_name) == 0)
> !                     return sym;
> !                 }
> !             }
>   	}
>         return NULL;
>       }

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer