From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-19168-listarch-gdb-patches=sourceware.cygnus.com@sources.redhat.com>
Received: (qmail 4915 invoked by alias); 2 Oct 2002 03:45:21 -0000
Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-patches-subscribe@sources.redhat.com>
List-Archive: <http://sources.redhat.com/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sources.redhat.com>
List-Help: <mailto:gdb-patches-help@sources.redhat.com>, <http://sources.redhat.com/ml/#faqs>
Sender: gdb-patches-owner@sources.redhat.com
Received: (qmail 4785 invoked from network); 2 Oct 2002 03:45:18 -0000
Received: from unknown (HELO zenia.red-bean.com) (66.244.67.22)
  by sources.redhat.com with SMTP; 2 Oct 2002 03:45:18 -0000
Received: (from jimb@localhost)
	by zenia.red-bean.com (8.11.6/8.11.6) id g923TE702388;
	Tue, 1 Oct 2002 22:29:14 -0500
Date: Tue, 01 Oct 2002 20:45:00 -0000
Message-Id: <200210020329.g923TE702388@zenia.red-bean.com>
From: Jim Blandy <jimb@redhat.com>
To: gdb-patches@sources.redhat.com
Subject: RFA: Search for symbol names the same way they're hashed.
X-SW-Source: 2002-10/txt/msg00040.txt.bz2


This one is odd.  If the C++ folks (and Elena, if she has time) could
review this carefully, that would be great.

We build hashed blocks in buildsym.c:finish_block as follows:

      for (next = *listhead; next; next = next->next)
	{
	  for (j = next->nsyms - 1; j >= 0; j--)
	    {
	      struct symbol *sym;
	      unsigned int hash_index;
	      const char *name = SYMBOL_DEMANGLED_NAME (next->symbol[j]);
	      if (name == NULL)
		name = SYMBOL_NAME (next->symbol[j]);
	      hash_index = msymbol_hash_iw (name);
	      hash_index = hash_index % BLOCK_BUCKETS (block);
	      sym = BLOCK_BUCKET (block, hash_index);
	      BLOCK_BUCKET (block, hash_index) = next->symbol[j];
	      next->symbol[j]->hash_next = sym;
	    }
	}

But then, we search for them in the hashed block in
symtab.c:lookup_block_symbol like this:

  if (BLOCK_HASHTABLE (block))
    {
      unsigned int hash_index;
      hash_index = msymbol_hash_iw (name);
      hash_index = hash_index % BLOCK_BUCKETS (block);
      for (sym = BLOCK_BUCKET (block, hash_index); sym; sym = sym->hash_next)
	{
	  if (SYMBOL_NAMESPACE (sym) == namespace 
	      && (mangled_name
		  ? strcmp (SYMBOL_NAME (sym), mangled_name) == 0
		  : SYMBOL_MATCHES_NAME (sym, name)))
	    return sym;
	}
      return NULL;
    }

Now, it's a basic fact of hash tables that you can only search using a
matching criterion strictly more conservative than equality under your
hash function.  If you hash one way, but then search another way, some
of your matches might have been filed in other hash buckets.

So the above code probably isn't quite right.

Let's assume that, if the caller has supplied a non-zero MANGLED_NAME,
it is indeed the mangled form of NAME.  Otherwise, the caller is
broken and deserves whatever it gets.  Mangled name matching is a more
conservative matching criterion than demangled name matching (right?),
so the `mangled_name ? strcmp ...' part is okay.

But when MANGLED_NAME is zero, then the SYMBOL_MATCHES_NAME part is
questionable.  The definition of SYMBOL_MATCHES_NAME from symtab.h is
as follows:

#define SYMBOL_MATCHES_NAME(symbol, name)				\
  (STREQ (SYMBOL_NAME (symbol), (name))					\
   || (SYMBOL_DEMANGLED_NAME (symbol) != NULL				\
       && strcmp_iw (SYMBOL_DEMANGLED_NAME (symbol), (name)) == 0))

It returns true if NAME matches either SYMBOL_NAME or
SYMBOL_DEMANGLED_NAME.

If the intention really was to use SYMBOL_MATCHES_NAME to recognize
matches, then the code is broken.  If SYMBOL_NAME (sym) matches NAME,
and sym has a demangled name which is different from NAME (which will
usually be the case), SYMBOL_MATCHES_NAME (sym, name) will be true,
but finish_block will have hashed on the demangled name, and probably
will have filed sym in a different hash bucket than the one we'll
search.

If finish_block's criteria are the correct ones, this liberal
searching criteria can't cause us any problems.  Where finish_block
would consider a name to match a symbol, SYMBOL_MATCHES_NAME would
too: the latter checks both names.  And where finish_block would
consider two names to differ, SYMBOL_MATCHES_NAME behaves the same way
where there is no a demangled name, and since mangled matching is more
conservative than demangled matching, it will also behave the same way
where there is a demangled name.

Does that sound right?  :)

My first reaction was to say that finish_block is broken --- we want
to recognize matches on either mangled or unmangled names, but
finish_block's behavior means that only matches on unmangled names, or
mangled names where there are no unmangled_names, work reliably.

But the block hashing code has been in since July, and we haven't had
any problems due to this behavior.  (The bug I fixed in
lookup_symbol_aux had to do with bad arguments being passed to
lookup_block_symbol, so that doesn't count.)  The analysis above means
that we've been living with finish_block's criteria since then.

Changing lookup_block_symbol to use the same criteria as finish block
should have no effect, and will make all the subtlety above go away.

So here's the patch I'm offering.  The point is that it makes the
match criteria clearly stricter than the hashing criteria, avoiding
the confusion that I've had to wade through.

2002-10-01  Jim Blandy  <jimb@redhat.com>

	* symtab.c (lookup_block_symbol): Use the same matching criteria
	that buildsym.c:finish_block used when constructing the hash
	table.

Index: gdb/symtab.c
===================================================================
RCS file: /cvs/src/src/gdb/symtab.c,v
retrieving revision 1.70
diff -c -r1.70 symtab.c
*** gdb/symtab.c	20 Sep 2002 14:58:58 -0000	1.70
--- gdb/symtab.c	2 Oct 2002 03:40:28 -0000
***************
*** 1347,1357 ****
        hash_index = hash_index % BLOCK_BUCKETS (block);
        for (sym = BLOCK_BUCKET (block, hash_index); sym; sym = sym->hash_next)
  	{
! 	  if (SYMBOL_NAMESPACE (sym) == namespace 
! 	      && (mangled_name
! 		  ? strcmp (SYMBOL_NAME (sym), mangled_name) == 0
! 		  : SYMBOL_MATCHES_NAME (sym, name)))
! 	    return sym;
  	}
        return NULL;
      }
--- 1347,1372 ----
        hash_index = hash_index % BLOCK_BUCKETS (block);
        for (sym = BLOCK_BUCKET (block, hash_index); sym; sym = sym->hash_next)
  	{
!           /* Note that you can't just tweak these matching criteria
!              arbitrarily.  They must be stricter than those assumed
!              when we build the hash table in finish_block; otherwise,
!              the code there will put symbols you'd like to match in
!              different hash buckets.  */
! 	  if (SYMBOL_NAMESPACE (sym) == namespace)
!             {
!               /* We hash using a hash function that respects
!                  strcmp_iw; strcmp is more conservative than
!                  strcmp_iw, so it's fine to use that instead here if
!                  we like.  */
!               if (SYMBOL_DEMANGLED_NAME (sym)
!                   ? strcmp_iw (SYMBOL_DEMANGLED_NAME (sym), name) == 0
!                   : strcmp (SYMBOL_NAME (sym), name) == 0)
!                 {
!                   if (! mangled_name
!                       || strcmp (SYMBOL_NAME (sym), mangled_name) == 0)
!                     return sym;
!                 }
!             }
  	}
        return NULL;
      }