From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 23543 invoked by alias); 2 Oct 2002 17:45:39 -0000 Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sources.redhat.com Received: (qmail 23520 invoked from network); 2 Oct 2002 17:45:32 -0000 Received: from unknown (HELO jackfruit.Stanford.EDU) (171.64.38.136) by sources.redhat.com with SMTP; 2 Oct 2002 17:45:32 -0000 Received: (from carlton@localhost) by jackfruit.Stanford.EDU (8.11.6/8.11.6) id g92HjOx01172; Wed, 2 Oct 2002 10:45:24 -0700 X-Authentication-Warning: jackfruit.Stanford.EDU: carlton set sender to carlton@math.stanford.edu using -f To: Jim Blandy Cc: gdb-patches@sources.redhat.com Subject: Re: RFA: Search for symbol names the same way they're hashed. References: <200210020329.g923TE702388@zenia.red-bean.com> From: David Carlton Date: Wed, 02 Oct 2002 10:45:00 -0000 In-Reply-To: <200210020329.g923TE702388@zenia.red-bean.com> Message-ID: User-Agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Common Lisp) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-SW-Source: 2002-10/txt/msg00060.txt.bz2 On Tue, 1 Oct 2002 22:29:14 -0500, Jim Blandy said: > But when MANGLED_NAME is zero, then the SYMBOL_MATCHES_NAME part is > questionable. The definition of SYMBOL_MATCHES_NAME from symtab.h is > as follows: > #define SYMBOL_MATCHES_NAME(symbol, name) \ > (STREQ (SYMBOL_NAME (symbol), (name)) \ > || (SYMBOL_DEMANGLED_NAME (symbol) != NULL \ > && strcmp_iw (SYMBOL_DEMANGLED_NAME (symbol), (name)) == 0)) > It returns true if NAME matches either SYMBOL_NAME or > SYMBOL_DEMANGLED_NAME. > If the intention really was to use SYMBOL_MATCHES_NAME to recognize > matches, then the code is broken. If SYMBOL_NAME (sym) matches > NAME, and sym has a demangled name which is different from NAME > (which will usually be the case), SYMBOL_MATCHES_NAME (sym, name) > will be true, but finish_block will have hashed on the demangled > name, and probably will have filed sym in a different hash bucket > than the one we'll search. I was thinking about this some last week, albeit from a slightly different angle, and I agree with you that this doesn't work well. My attitude is that, in lookup_block_symbol, 'name' should be the demangled form of the name. (After all, there's a separate 'mangled_name' argument, not a separate 'demangled_name' argument.) Assuming of course that we're in the C++ case where there's a difference between mangled and demangled names, then using the current definition of SYMBOL_MATCHES_NAME is wrong: it first does STREQ (SYMBOL_NAME (symbol), (name) which tests 'name' against the _mangled_ form of the name, which we never want to do, so this could lead to false positives. It's not very likely to run into false positives, of course, because a demangled name should never look like a mangled name, but why run the risk? (Though I hadn't thought of the interaction with hashing that you brought up: that certainly decreases the likelihood of false positives.) My current solution is perhaps a bit hard to read from the sources on my branch, but basically it boils down to this: 1) This concept of 'a name that is as demangled as possible' is a pretty important one and occurs in multiple places in GDB's sources, so let's formalize it: /* Macro that returns the demangled name of the symbol if if possible and the symbol name if not possible. This is like SYMBOL_SOURCE_NAME except that it doesn't depend on the value of 'demangle' (and is hence more suitable for internal usage). The result should never be NULL. */ /* FIXME: carlton/2002-09-26: Probably the situation with this and SYMBOL_SOURCE_NAME should be rethought. */ #define SYMBOL_BEST_NAME(symbol) \ (SYMBOL_DEMANGLED_NAME (symbol) != NULL \ ? SYMBOL_DEMANGLED_NAME (symbol) \ : SYMBOL_NAME (symbol)) (I'm not exactly wedded to the name 'SYMBOL_BEST_NAME'; it was the first acceptable name that I thought of, given that SYMBOL_SOURCE_NAME was already taken to mean something subtly different.) I then went and inserted this macro in some (but by no means all: I haven't yet had time to do an exhaustive search) of all the places in GDB where it would fit. E.g. the definition for SYMBOL_SOURCE_NAME now becomes /* Macro that returns the "natural source name" of a symbol. In C++ this is the "demangled" form of the name if demangle is on and the "mangled" form of the name if demangle is off. In other languages this is just the symbol name. The result should never be NULL. */ /* NOTE: carlton/2002-09-26: For external use only; in many situations, SYMBOL_BEST_NAME is more appropriate. */ #define SYMBOL_SOURCE_NAME(symbol) \ (demangle ? SYMBOL_BEST_NAME (symbol) : SYMBOL_NAME (symbol)) or, when I'm hashing symbol names to build the hash table, I do hash_index = msymbol_hash_iw (SYMBOL_BEST_NAME (sym)) % nbuckets; 2) So then what happens to lookup_block_symbol? When looking up names in a hash table, I find the index just like I do when building it. Then, to test whether or not I've found the right symbol, if 'mangled_name' is nonzero, I do strcmp (SYMBOL_NAME (sym), mangled_name) and otherwise, I do strcmp_iw (SYMBOL_BEST_NAME (sym), name) Some further thoughts: It seems entirely plausible to me that SYMBOL_MATCHES_NAME should really be defined as follows: #define SYMBOL_MATCHES_NAME(symbol, name) \ (strcmp_iw (SYMBOL_BEST_NAME (sym), name) == 0) But I haven't seriously looked at the other places where SYMBOL_MATCHES_NAME is being used. I suppose another plausible option would be #define SYMBOL_MATCHES_NAME(symbol, name) \ (SYMBOL_DEMANGLED_NAME (sym) != NULL ? strcmp_iw (SYMBOL_DEMANGLED_NAME (sym), name) == 0 : strcmp (SYMBOL_NAME (sym), name) == 0) (This is effectively what you're doing in your patch to lookup_block_symbol.) The difference here would be that we use strcmp in the non-C++ case instead of strcmp_iw. My guess is that, in practice, it wouldn't make a difference in the non-C++ case (since symbol names shouldn't contain spaces or parentheses) and, if it somehow did make a difference, then one could make a good case for strcmp_iw being the right thing. Given that I want to encourage usage of SYMBOL_BEST_NAME wherever possible, I prefer the first definition of SYMBOL_MATCHES_NAME. If people agree with me that introducing a macro like SYMBOL_BEST_NAME would clarify matters here and elsewhere, I could spend some more time looking at where it could be used and submit a patch that introduces it. David Carlton carlton@math.stanford.edu