From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4915 invoked by alias); 2 Oct 2002 03:45:21 -0000 Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sources.redhat.com Received: (qmail 4785 invoked from network); 2 Oct 2002 03:45:18 -0000 Received: from unknown (HELO zenia.red-bean.com) (66.244.67.22) by sources.redhat.com with SMTP; 2 Oct 2002 03:45:18 -0000 Received: (from jimb@localhost) by zenia.red-bean.com (8.11.6/8.11.6) id g923TE702388; Tue, 1 Oct 2002 22:29:14 -0500 Date: Tue, 01 Oct 2002 20:45:00 -0000 Message-Id: <200210020329.g923TE702388@zenia.red-bean.com> From: Jim Blandy To: gdb-patches@sources.redhat.com Subject: RFA: Search for symbol names the same way they're hashed. X-SW-Source: 2002-10/txt/msg00040.txt.bz2 This one is odd. If the C++ folks (and Elena, if she has time) could review this carefully, that would be great. We build hashed blocks in buildsym.c:finish_block as follows: for (next = *listhead; next; next = next->next) { for (j = next->nsyms - 1; j >= 0; j--) { struct symbol *sym; unsigned int hash_index; const char *name = SYMBOL_DEMANGLED_NAME (next->symbol[j]); if (name == NULL) name = SYMBOL_NAME (next->symbol[j]); hash_index = msymbol_hash_iw (name); hash_index = hash_index % BLOCK_BUCKETS (block); sym = BLOCK_BUCKET (block, hash_index); BLOCK_BUCKET (block, hash_index) = next->symbol[j]; next->symbol[j]->hash_next = sym; } } But then, we search for them in the hashed block in symtab.c:lookup_block_symbol like this: if (BLOCK_HASHTABLE (block)) { unsigned int hash_index; hash_index = msymbol_hash_iw (name); hash_index = hash_index % BLOCK_BUCKETS (block); for (sym = BLOCK_BUCKET (block, hash_index); sym; sym = sym->hash_next) { if (SYMBOL_NAMESPACE (sym) == namespace && (mangled_name ? strcmp (SYMBOL_NAME (sym), mangled_name) == 0 : SYMBOL_MATCHES_NAME (sym, name))) return sym; } return NULL; } Now, it's a basic fact of hash tables that you can only search using a matching criterion strictly more conservative than equality under your hash function. If you hash one way, but then search another way, some of your matches might have been filed in other hash buckets. So the above code probably isn't quite right. Let's assume that, if the caller has supplied a non-zero MANGLED_NAME, it is indeed the mangled form of NAME. Otherwise, the caller is broken and deserves whatever it gets. Mangled name matching is a more conservative matching criterion than demangled name matching (right?), so the `mangled_name ? strcmp ...' part is okay. But when MANGLED_NAME is zero, then the SYMBOL_MATCHES_NAME part is questionable. The definition of SYMBOL_MATCHES_NAME from symtab.h is as follows: #define SYMBOL_MATCHES_NAME(symbol, name) \ (STREQ (SYMBOL_NAME (symbol), (name)) \ || (SYMBOL_DEMANGLED_NAME (symbol) != NULL \ && strcmp_iw (SYMBOL_DEMANGLED_NAME (symbol), (name)) == 0)) It returns true if NAME matches either SYMBOL_NAME or SYMBOL_DEMANGLED_NAME. If the intention really was to use SYMBOL_MATCHES_NAME to recognize matches, then the code is broken. If SYMBOL_NAME (sym) matches NAME, and sym has a demangled name which is different from NAME (which will usually be the case), SYMBOL_MATCHES_NAME (sym, name) will be true, but finish_block will have hashed on the demangled name, and probably will have filed sym in a different hash bucket than the one we'll search. If finish_block's criteria are the correct ones, this liberal searching criteria can't cause us any problems. Where finish_block would consider a name to match a symbol, SYMBOL_MATCHES_NAME would too: the latter checks both names. And where finish_block would consider two names to differ, SYMBOL_MATCHES_NAME behaves the same way where there is no a demangled name, and since mangled matching is more conservative than demangled matching, it will also behave the same way where there is a demangled name. Does that sound right? :) My first reaction was to say that finish_block is broken --- we want to recognize matches on either mangled or unmangled names, but finish_block's behavior means that only matches on unmangled names, or mangled names where there are no unmangled_names, work reliably. But the block hashing code has been in since July, and we haven't had any problems due to this behavior. (The bug I fixed in lookup_symbol_aux had to do with bad arguments being passed to lookup_block_symbol, so that doesn't count.) The analysis above means that we've been living with finish_block's criteria since then. Changing lookup_block_symbol to use the same criteria as finish block should have no effect, and will make all the subtlety above go away. So here's the patch I'm offering. The point is that it makes the match criteria clearly stricter than the hashing criteria, avoiding the confusion that I've had to wade through. 2002-10-01 Jim Blandy * symtab.c (lookup_block_symbol): Use the same matching criteria that buildsym.c:finish_block used when constructing the hash table. Index: gdb/symtab.c =================================================================== RCS file: /cvs/src/src/gdb/symtab.c,v retrieving revision 1.70 diff -c -r1.70 symtab.c *** gdb/symtab.c 20 Sep 2002 14:58:58 -0000 1.70 --- gdb/symtab.c 2 Oct 2002 03:40:28 -0000 *************** *** 1347,1357 **** hash_index = hash_index % BLOCK_BUCKETS (block); for (sym = BLOCK_BUCKET (block, hash_index); sym; sym = sym->hash_next) { ! if (SYMBOL_NAMESPACE (sym) == namespace ! && (mangled_name ! ? strcmp (SYMBOL_NAME (sym), mangled_name) == 0 ! : SYMBOL_MATCHES_NAME (sym, name))) ! return sym; } return NULL; } --- 1347,1372 ---- hash_index = hash_index % BLOCK_BUCKETS (block); for (sym = BLOCK_BUCKET (block, hash_index); sym; sym = sym->hash_next) { ! /* Note that you can't just tweak these matching criteria ! arbitrarily. They must be stricter than those assumed ! when we build the hash table in finish_block; otherwise, ! the code there will put symbols you'd like to match in ! different hash buckets. */ ! if (SYMBOL_NAMESPACE (sym) == namespace) ! { ! /* We hash using a hash function that respects ! strcmp_iw; strcmp is more conservative than ! strcmp_iw, so it's fine to use that instead here if ! we like. */ ! if (SYMBOL_DEMANGLED_NAME (sym) ! ? strcmp_iw (SYMBOL_DEMANGLED_NAME (sym), name) == 0 ! : strcmp (SYMBOL_NAME (sym), name) == 0) ! { ! if (! mangled_name ! || strcmp (SYMBOL_NAME (sym), mangled_name) == 0) ! return sym; ! } ! } } return NULL; }