From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 14289 invoked by alias); 9 Feb 2003 22:03:25 -0000 Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sources.redhat.com Received: (qmail 14282 invoked from network); 9 Feb 2003 22:03:24 -0000 Received: from unknown (HELO crack.them.org) (65.125.64.184) by 172.16.49.205 with SMTP; 9 Feb 2003 22:03:24 -0000 Received: from nevyn.them.org ([66.93.61.169] ident=mail) by crack.them.org with asmtp (Exim 3.12 #1 (Debian)) id 18i1Qs-0006R1-00 for ; Sun, 09 Feb 2003 18:04:22 -0600 Received: from drow by nevyn.them.org with local (Exim 3.36 #1 (Debian)) id 18hzXm-00056a-00 for ; Sun, 09 Feb 2003 17:03:22 -0500 Date: Sun, 09 Feb 2003 22:03:00 -0000 From: Daniel Jacobowitz To: gdb-patches@sources.redhat.com Subject: RFA/symtab: (Almost) always hash blocks when searching them Message-ID: <20030209220321.GA19572@nevyn.them.org> Mail-Followup-To: gdb-patches@sources.redhat.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.1i X-SW-Source: 2003-02/txt/msg00253.txt.bz2 I'm working on modifying the symbol lookup functions to return multiple symbols when there are multiple possible matches, and I didn't want to have to modify all three kinds of binary search in lookup_block_symbol. First I tried fixing mdebugread.c to generate hashed blocks properly; it was too messy, and I couldn't build an mdebug toolchain to test with [mips-ecoff was my best guess, and it's been broken for months. Part of it was my fault and then GCC started segfaulting after I fixed that]. So instead, I added a new function to hash a block retroactively. Then, in lookup_block_symbol, where we would previously have done a binary search we instead hash the block and do a hash table search. Amortized cost is somewhat lower, complexity cost is much lower. I like it. I also updated the comments; the bit about not matching demangled names was out of date. Symbols are hashed by their demangled name, if any. Is this patch OK? -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer 2003-02-09 Daniel Jacobowitz * Makefile.in (symtab.o): Add $(buildsym_h) and $(gdb_assert_h). * buildsym.c (hash_block): New function. * buildsym.h: Add prototype for hash_block. * symtab.c: Include "buildsym.h" and "gdb_assert.h". (lookup_block_symbol): Use hash_block. Remove binary search. Update comments. Index: Makefile.in =================================================================== RCS file: /cvs/src/src/gdb/Makefile.in,v retrieving revision 1.328 diff -u -p -r1.328 Makefile.in --- Makefile.in 7 Feb 2003 05:33:44 -0000 1.328 +++ Makefile.in 9 Feb 2003 21:52:05 -0000 @@ -2216,7 +2216,7 @@ symtab.o: symtab.c $(defs_h) $(symtab_h) $(gdbcmd_h) $(call_cmds_h) $(gdb_regex_h) $(expression_h) \ $(language_h) $(demangle_h) $(inferior_h) $(linespec_h) \ $(filenames_h) $(gdb_obstack_h) $(gdb_string_h) $(gdb_stat_h) \ - $(cp_abi_h) $(source_h) + $(cp_abi_h) $(source_h) $(buildsym_h) $(gdb_assert_h) target.o: target.c $(defs_h) $(gdb_string_h) $(target_h) $(gdbcmd_h) \ $(symtab_h) $(inferior_h) $(bfd_h) $(symfile_h) $(objfiles_h) \ $(gdb_wait_h) $(dcache_h) $(regcache_h) Index: buildsym.c =================================================================== RCS file: /cvs/src/src/gdb/buildsym.c,v retrieving revision 1.27 diff -u -p -r1.27 buildsym.c --- buildsym.c 14 Jan 2003 00:15:05 -0000 1.27 +++ buildsym.c 9 Feb 2003 21:52:05 -0000 @@ -203,6 +203,56 @@ free_pending_blocks (void) pending_blocks = NULL; } +/* Convert a block BLOCK with unhashed symbols into a block with + hashed symbols. + + All blocks coming out of the debug info readers should either be + function blocks (which contain a list of arguments in addition to + locals, so we don't hash them [yet] in order to preserve the order + of the arguments) or already hashed. However, some older readers + don't use finish_block, so they don't create hashed blocks. The only + offender as of 2003-02-09 is mdebugread. */ + +void +hash_block (struct block *block) +{ + int i, nsyms; + struct symbol **syms; + + gdb_assert (BLOCK_FUNCTION (block) == NULL); + gdb_assert (BLOCK_HASHTABLE (block) == 0); + + nsyms = BLOCK_NSYMS (block); + + /* Save the current list of symbols. */ + syms = xmalloc (nsyms * sizeof (struct symbol *)); + memcpy (syms, block->sym, nsyms * sizeof (struct symbol *)); + memset (block->sym, 0, nsyms * sizeof (struct symbol *)); + + BLOCK_HASHTABLE (block) = 1; + + /* Just reuse the already-allocated symbol pointers as our hashtable. + Normally we'd use a smaller hash table than this, but reclaiming + the memory at this point is hard. */ + BLOCK_BUCKETS (block) = nsyms; + + for (i = 0; i < nsyms; i++) + { + struct symbol *sym; + unsigned int hash_index; + const char *name = SYMBOL_DEMANGLED_NAME (syms[i]); + if (name == NULL) + name = SYMBOL_NAME (syms[i]); + hash_index = msymbol_hash_iw (name); + hash_index = hash_index % BLOCK_BUCKETS (block); + sym = BLOCK_BUCKET (block, hash_index); + BLOCK_BUCKET (block, hash_index) = syms[i]; + syms[i]->hash_next = sym; + } + + xfree (syms); +} + /* Take one of the lists of symbols and make a block from it. Keep the order the symbols have in the list (reversed from the input file). Put the block on the list of pending blocks. */ Index: buildsym.h =================================================================== RCS file: /cvs/src/src/gdb/buildsym.h,v retrieving revision 1.10 diff -u -p -r1.10 buildsym.h --- buildsym.h 9 Jan 2003 18:30:32 -0000 1.10 +++ buildsym.h 9 Feb 2003 21:52:05 -0000 @@ -227,6 +227,8 @@ extern void add_symbol_to_list (struct s extern struct symbol *find_symbol_in_list (struct pending *list, char *name, int length); +extern void hash_block (struct block *block); + extern void finish_block (struct symbol *symbol, struct pending **listhead, struct pending_block *old_blocks, Index: symtab.c =================================================================== RCS file: /cvs/src/src/gdb/symtab.c,v retrieving revision 1.87 diff -u -p -r1.87 symtab.c --- symtab.c 4 Feb 2003 18:07:01 -0000 1.87 +++ symtab.c 9 Feb 2003 21:52:06 -0000 @@ -40,17 +40,17 @@ #include "linespec.h" #include "source.h" #include "filenames.h" /* for FILENAME_CMP */ +#include "cp-abi.h" +#include "buildsym.h" #include "hashtab.h" - #include "gdb_obstack.h" - #include #include #include "gdb_string.h" #include "gdb_stat.h" #include -#include "cp-abi.h" +#include "gdb_assert.h" /* Prototypes for local functions */ @@ -1572,30 +1572,23 @@ find_main_psymtab (void) return (NULL); } -/* Search BLOCK for symbol NAME in NAMESPACE. - - Note that if NAME is the demangled form of a C++ symbol, we will fail - to find a match during the binary search of the non-encoded names, but - for now we don't worry about the slight inefficiency of looking for - a match we'll never find, since it will go pretty quick. Once the - binary search terminates, we drop through and do a straight linear - search on the symbols. Each symbol which is marked as being a C++ - symbol (language_cplus set) has both the encoded and non-encoded names - tested for a match. - - If MANGLED_NAME is non-NULL, verify that any symbol we find has this - particular mangled name. -*/ +/* Search BLOCK for symbol NAME in NAMESPACE. If MANGLED_NAME is non-NULL, + verify that any symbol we find has this particular mangled name. */ struct symbol * lookup_block_symbol (register const struct block *block, const char *name, const char *mangled_name, const namespace_enum namespace) { - register int bot, top, inc; - register struct symbol *sym; - register struct symbol *sym_found = NULL; - register int do_linear_search = 1; + int bot, top; + struct symbol *sym; + struct symbol *sym_found = NULL; + + if (BLOCK_SHOULD_SORT (block)) + { + hash_block (block); + gdb_assert (BLOCK_HASHTABLE (block)); + } if (BLOCK_HASHTABLE (block)) { @@ -1613,88 +1606,8 @@ lookup_block_symbol (register const stru return NULL; } - /* If the blocks's symbols were sorted, start with a binary search. */ - - if (BLOCK_SHOULD_SORT (block)) - { - /* Reset the linear search flag so if the binary search fails, we - won't do the linear search once unless we find some reason to - do so */ - - do_linear_search = 0; - top = BLOCK_NSYMS (block); - bot = 0; - - /* Advance BOT to not far before the first symbol whose name is NAME. */ - - while (1) - { - inc = (top - bot + 1); - /* No need to keep binary searching for the last few bits worth. */ - if (inc < 4) - { - break; - } - inc = (inc >> 1) + bot; - sym = BLOCK_SYM (block, inc); - if (!do_linear_search && (SYMBOL_LANGUAGE (sym) == language_java)) - { - do_linear_search = 1; - } - if (SYMBOL_SOURCE_NAME (sym)[0] < name[0]) - { - bot = inc; - } - else if (SYMBOL_SOURCE_NAME (sym)[0] > name[0]) - { - top = inc; - } - else if (strcmp (SYMBOL_SOURCE_NAME (sym), name) < 0) - { - bot = inc; - } - else - { - top = inc; - } - } - - /* Now scan forward until we run out of symbols, find one whose - name is greater than NAME, or find one we want. If there is - more than one symbol with the right name and namespace, we - return the first one; I believe it is now impossible for us - to encounter two symbols with the same name and namespace - here, because blocks containing argument symbols are no - longer sorted. The exception is for C++, where multiple functions - (cloned constructors / destructors, in particular) can have - the same demangled name. So if we have a particular - mangled name to match, try to do so. */ - - top = BLOCK_NSYMS (block); - while (bot < top) - { - sym = BLOCK_SYM (block, bot); - if (SYMBOL_NAMESPACE (sym) == namespace - && (mangled_name - ? strcmp (SYMBOL_NAME (sym), mangled_name) == 0 - : SYMBOL_MATCHES_NAME (sym, name))) - { - return sym; - } - if (SYMBOL_SOURCE_NAME (sym)[0] > name[0]) - { - break; - } - bot++; - } - } - - /* Here if block isn't sorted, or we fail to find a match during the - binary search above. If during the binary search above, we find a - symbol which is a Java symbol, then we have re-enabled the linear - search flag which was reset when starting the binary search. - - This loop is equivalent to the loop above, but hacked greatly for speed. + /* If a block isn't hashable, we do a linear search here. Right now + this means only for blocks associated with a function. Note that parameter symbols do not always show up last in the list; this loop makes sure to take anything else other than @@ -1702,55 +1615,53 @@ lookup_block_symbol (register const stru last resort. Note that this only takes up extra computation time on a match. */ - if (do_linear_search) - { - top = BLOCK_NSYMS (block); - bot = 0; - while (bot < top) + top = BLOCK_NSYMS (block); + bot = 0; + while (bot < top) + { + sym = BLOCK_SYM (block, bot); + if (SYMBOL_NAMESPACE (sym) == namespace + && (mangled_name + ? strcmp (SYMBOL_NAME (sym), mangled_name) == 0 + : SYMBOL_MATCHES_NAME (sym, name))) { - sym = BLOCK_SYM (block, bot); - if (SYMBOL_NAMESPACE (sym) == namespace - && (mangled_name - ? strcmp (SYMBOL_NAME (sym), mangled_name) == 0 - : SYMBOL_MATCHES_NAME (sym, name))) + /* If SYM has aliases, then use any alias that is active + at the current PC. If no alias is active at the current + PC, then use the main symbol. + + ?!? Is checking the current pc correct? Is this routine + ever called to look up a symbol from another context? + + FIXME: No, it's not correct. If someone sets a + conditional breakpoint at an address, then the + breakpoint's `struct expression' should refer to the + `struct symbol' appropriate for the breakpoint's + address, which may not be the PC. + + Even if it were never called from another context, + it's totally bizarre for lookup_symbol's behavior to + depend on the value of the inferior's current PC. We + should pass in the appropriate PC as well as the + block. The interface to lookup_symbol should change + to require the caller to provide a PC. */ + + if (SYMBOL_ALIASES (sym)) + sym = find_active_alias (sym, read_pc ()); + + sym_found = sym; + if (SYMBOL_CLASS (sym) != LOC_ARG && + SYMBOL_CLASS (sym) != LOC_LOCAL_ARG && + SYMBOL_CLASS (sym) != LOC_REF_ARG && + SYMBOL_CLASS (sym) != LOC_REGPARM && + SYMBOL_CLASS (sym) != LOC_REGPARM_ADDR && + SYMBOL_CLASS (sym) != LOC_BASEREG_ARG) { - /* If SYM has aliases, then use any alias that is active - at the current PC. If no alias is active at the current - PC, then use the main symbol. - - ?!? Is checking the current pc correct? Is this routine - ever called to look up a symbol from another context? - - FIXME: No, it's not correct. If someone sets a - conditional breakpoint at an address, then the - breakpoint's `struct expression' should refer to the - `struct symbol' appropriate for the breakpoint's - address, which may not be the PC. - - Even if it were never called from another context, - it's totally bizarre for lookup_symbol's behavior to - depend on the value of the inferior's current PC. We - should pass in the appropriate PC as well as the - block. The interface to lookup_symbol should change - to require the caller to provide a PC. */ - - if (SYMBOL_ALIASES (sym)) - sym = find_active_alias (sym, read_pc ()); - - sym_found = sym; - if (SYMBOL_CLASS (sym) != LOC_ARG && - SYMBOL_CLASS (sym) != LOC_LOCAL_ARG && - SYMBOL_CLASS (sym) != LOC_REF_ARG && - SYMBOL_CLASS (sym) != LOC_REGPARM && - SYMBOL_CLASS (sym) != LOC_REGPARM_ADDR && - SYMBOL_CLASS (sym) != LOC_BASEREG_ARG) - { - break; - } + break; } - bot++; } + bot++; } + return (sym_found); /* Will be NULL if not found. */ }