From: Daniel Jacobowitz <drow@mvista.com>
To: Jim Blandy <jimb@redhat.com>
Cc: gdb-patches@sources.redhat.com
Subject: Re: RFA: Search for symbol names the same way they're hashed.
Date: Wed, 02 Oct 2002 11:04:00 -0000 [thread overview]
Message-ID: <20021002180515.GA8880@nevyn.them.org> (raw)
In-Reply-To: <200210020329.g923TE702388@zenia.red-bean.com>
On Tue, Oct 01, 2002 at 10:29:14PM -0500, Jim Blandy wrote:
>
> This one is odd. If the C++ folks (and Elena, if she has time) could
> review this carefully, that would be great.
>
> We build hashed blocks in buildsym.c:finish_block as follows:
>
> for (next = *listhead; next; next = next->next)
> {
> for (j = next->nsyms - 1; j >= 0; j--)
> {
> struct symbol *sym;
> unsigned int hash_index;
> const char *name = SYMBOL_DEMANGLED_NAME (next->symbol[j]);
> if (name == NULL)
> name = SYMBOL_NAME (next->symbol[j]);
> hash_index = msymbol_hash_iw (name);
> hash_index = hash_index % BLOCK_BUCKETS (block);
> sym = BLOCK_BUCKET (block, hash_index);
> BLOCK_BUCKET (block, hash_index) = next->symbol[j];
> next->symbol[j]->hash_next = sym;
> }
> }
>
> But then, we search for them in the hashed block in
> symtab.c:lookup_block_symbol like this:
>
> if (BLOCK_HASHTABLE (block))
> {
> unsigned int hash_index;
> hash_index = msymbol_hash_iw (name);
> hash_index = hash_index % BLOCK_BUCKETS (block);
> for (sym = BLOCK_BUCKET (block, hash_index); sym; sym = sym->hash_next)
> {
> if (SYMBOL_NAMESPACE (sym) == namespace
> && (mangled_name
> ? strcmp (SYMBOL_NAME (sym), mangled_name) == 0
> : SYMBOL_MATCHES_NAME (sym, name)))
> return sym;
> }
> return NULL;
> }
>
> Now, it's a basic fact of hash tables that you can only search using a
> matching criterion strictly more conservative than equality under your
> hash function. If you hash one way, but then search another way, some
> of your matches might have been filed in other hash buckets.
>
> So the above code probably isn't quite right.
OK. So the fact is that if name was not the name which was hashed, we
can't assume we'll find it here. So either we should be hashing both
names we want to check, or we should only check one. That's the basic
principle.
> Let's assume that, if the caller has supplied a non-zero MANGLED_NAME,
> it is indeed the mangled form of NAME. Otherwise, the caller is
> broken and deserves whatever it gets. Mangled name matching is a more
> conservative matching criterion than demangled name matching (right?),
> so the `mangled_name ? strcmp ...' part is okay.
I like your assumption. That's what ought to happen. The whole
mangled_name thing is for handling multiple variants of constructors,
etc; I don't think it's the long-term right solution but it works for
now.
(What is the right solution? Simple. While lookup_block_symbol
presumably only returns one symbol, all of GDB needs to be aware that
lookup_symbol can find more than one. We need a way to distinguish
them (for static functions in multiple files - often all with the same
filename! This happens in BFD) and a way to collapse them (for cloned
constructors, breaking on one should probably break on all of them).
The one-to-one mapping of symbols to addresses doesn't hold up under
pressure.
But I digress.
Right now, there are multiple functions with the same symbol demangled
name, but different mangled names. If we start using DMGL_VERBOSE
(spelling?) then that won't happen but we need to be ready for the
extra information it provides; things like "X::X[in-charge](const X&)".
So we just take "X::X(const X&)" for now, and all the X::X's are hashed
to the same bucket, which we could take advantage of in overload
resolution once all symbol readers do that. And the mangled name tells
us which one we want back.
> But when MANGLED_NAME is zero, then the SYMBOL_MATCHES_NAME part is
> questionable. The definition of SYMBOL_MATCHES_NAME from symtab.h is
> as follows:
>
> #define SYMBOL_MATCHES_NAME(symbol, name) \
> (STREQ (SYMBOL_NAME (symbol), (name)) \
> || (SYMBOL_DEMANGLED_NAME (symbol) != NULL \
> && strcmp_iw (SYMBOL_DEMANGLED_NAME (symbol), (name)) == 0))
>
> It returns true if NAME matches either SYMBOL_NAME or
> SYMBOL_DEMANGLED_NAME.
>
> If the intention really was to use SYMBOL_MATCHES_NAME to recognize
> matches, then the code is broken. If SYMBOL_NAME (sym) matches NAME,
> and sym has a demangled name which is different from NAME (which will
> usually be the case), SYMBOL_MATCHES_NAME (sym, name) will be true,
> but finish_block will have hashed on the demangled name, and probably
> will have filed sym in a different hash bucket than the one we'll
> search.
>
> If finish_block's criteria are the correct ones, this liberal
> searching criteria can't cause us any problems. Where finish_block
> would consider a name to match a symbol, SYMBOL_MATCHES_NAME would
> too: the latter checks both names. And where finish_block would
> consider two names to differ, SYMBOL_MATCHES_NAME behaves the same way
> where there is no a demangled name, and since mangled matching is more
> conservative than demangled matching, it will also behave the same way
> where there is a demangled name.
>
> Does that sound right? :)
Yep!
> My first reaction was to say that finish_block is broken --- we want
> to recognize matches on either mangled or unmangled names, but
> finish_block's behavior means that only matches on unmangled names, or
> mangled names where there are no unmangled_names, work reliably.
>
> But the block hashing code has been in since July, and we haven't had
> any problems due to this behavior. (The bug I fixed in
> lookup_symbol_aux had to do with bad arguments being passed to
> lookup_block_symbol, so that doesn't count.) The analysis above means
> that we've been living with finish_block's criteria since then.
>
> Changing lookup_block_symbol to use the same criteria as finish block
> should have no effect, and will make all the subtlety above go away.
I agree. Your patch is functionally the same, given our invariants;
it's a little different (two strcmps for the mangled_name case instead
of one) but much clearer. It's fine with me, although if you and David
can agree on (or find a better name than) SYMBOL_BEST_NAME you might
want to use a macro or function to centralize this matching, and move
the comment there. I can't think of anywhere else we'd want to match
in the hashtable, so a private function in symtab.c might be best.
> So here's the patch I'm offering. The point is that it makes the
> match criteria clearly stricter than the hashing criteria, avoiding
> the confusion that I've had to wade through.
>
> 2002-10-01 Jim Blandy <jimb@redhat.com>
>
> * symtab.c (lookup_block_symbol): Use the same matching criteria
> that buildsym.c:finish_block used when constructing the hash
> table.
>
> Index: gdb/symtab.c
> ===================================================================
> RCS file: /cvs/src/src/gdb/symtab.c,v
> retrieving revision 1.70
> diff -c -r1.70 symtab.c
> *** gdb/symtab.c 20 Sep 2002 14:58:58 -0000 1.70
> --- gdb/symtab.c 2 Oct 2002 03:40:28 -0000
> ***************
> *** 1347,1357 ****
> hash_index = hash_index % BLOCK_BUCKETS (block);
> for (sym = BLOCK_BUCKET (block, hash_index); sym; sym = sym->hash_next)
> {
> ! if (SYMBOL_NAMESPACE (sym) == namespace
> ! && (mangled_name
> ! ? strcmp (SYMBOL_NAME (sym), mangled_name) == 0
> ! : SYMBOL_MATCHES_NAME (sym, name)))
> ! return sym;
> }
> return NULL;
> }
> --- 1347,1372 ----
> hash_index = hash_index % BLOCK_BUCKETS (block);
> for (sym = BLOCK_BUCKET (block, hash_index); sym; sym = sym->hash_next)
> {
> ! /* Note that you can't just tweak these matching criteria
> ! arbitrarily. They must be stricter than those assumed
> ! when we build the hash table in finish_block; otherwise,
> ! the code there will put symbols you'd like to match in
> ! different hash buckets. */
> ! if (SYMBOL_NAMESPACE (sym) == namespace)
> ! {
> ! /* We hash using a hash function that respects
> ! strcmp_iw; strcmp is more conservative than
> ! strcmp_iw, so it's fine to use that instead here if
> ! we like. */
> ! if (SYMBOL_DEMANGLED_NAME (sym)
> ! ? strcmp_iw (SYMBOL_DEMANGLED_NAME (sym), name) == 0
> ! : strcmp (SYMBOL_NAME (sym), name) == 0)
> ! {
> ! if (! mangled_name
> ! || strcmp (SYMBOL_NAME (sym), mangled_name) == 0)
> ! return sym;
> ! }
> ! }
> }
> return NULL;
> }
--
Daniel Jacobowitz
MontaVista Software Debian GNU/Linux Developer
next prev parent reply other threads:[~2002-10-02 18:04 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-10-01 20:45 Jim Blandy
2002-10-02 10:45 ` David Carlton
2002-10-02 12:35 ` Jim Blandy
2002-10-02 12:41 ` David Carlton
2002-10-02 13:02 ` Daniel Jacobowitz
2002-10-02 13:20 ` David Carlton
2002-10-02 13:25 ` Daniel Jacobowitz
2002-10-02 13:43 ` Daniel Berlin
2002-10-02 14:03 ` Daniel Jacobowitz
2002-10-02 16:10 ` Daniel Berlin
2002-10-02 16:19 ` Setting the demangling style Daniel Jacobowitz
2002-10-02 11:04 ` Daniel Jacobowitz [this message]
2002-10-02 12:10 ` RFA: Search for symbol names the same way they're hashed David Carlton
2002-10-07 16:19 ` David Carlton
2002-10-07 16:38 ` Daniel Jacobowitz
[not found] <1033595444.9324.ezmlm@sources.redhat.com>
2002-10-02 17:49 ` Jim Ingham
2002-10-02 18:16 ` Daniel Jacobowitz
2002-10-02 18:25 ` Jim Ingham
2002-10-03 8:15 ` Daniel Berlin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20021002180515.GA8880@nevyn.them.org \
--to=drow@mvista.com \
--cc=gdb-patches@sources.redhat.com \
--cc=jimb@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox