* RFA/symtab: (Almost) always hash blocks when searching them
@ 2003-02-09 22:03 Daniel Jacobowitz
2003-02-09 22:37 ` Daniel Jacobowitz
2003-02-10 19:14 ` David Carlton
0 siblings, 2 replies; 3+ messages in thread
From: Daniel Jacobowitz @ 2003-02-09 22:03 UTC (permalink / raw)
To: gdb-patches
I'm working on modifying the symbol lookup functions to return multiple
symbols when there are multiple possible matches, and I didn't want to have
to modify all three kinds of binary search in lookup_block_symbol. First I
tried fixing mdebugread.c to generate hashed blocks properly; it was too
messy, and I couldn't build an mdebug toolchain to test with [mips-ecoff was
my best guess, and it's been broken for months. Part of it was my fault and
then GCC started segfaulting after I fixed that].
So instead, I added a new function to hash a block retroactively. Then, in
lookup_block_symbol, where we would previously have done a binary search we
instead hash the block and do a hash table search. Amortized cost is
somewhat lower, complexity cost is much lower. I like it.
I also updated the comments; the bit about not matching demangled names was
out of date. Symbols are hashed by their demangled name, if any.
Is this patch OK?
--
Daniel Jacobowitz
MontaVista Software Debian GNU/Linux Developer
2003-02-09 Daniel Jacobowitz <drow@mvista.com>
* Makefile.in (symtab.o): Add $(buildsym_h) and $(gdb_assert_h).
* buildsym.c (hash_block): New function.
* buildsym.h: Add prototype for hash_block.
* symtab.c: Include "buildsym.h" and "gdb_assert.h".
(lookup_block_symbol): Use hash_block. Remove binary search.
Update comments.
Index: Makefile.in
===================================================================
RCS file: /cvs/src/src/gdb/Makefile.in,v
retrieving revision 1.328
diff -u -p -r1.328 Makefile.in
--- Makefile.in 7 Feb 2003 05:33:44 -0000 1.328
+++ Makefile.in 9 Feb 2003 21:52:05 -0000
@@ -2216,7 +2216,7 @@ symtab.o: symtab.c $(defs_h) $(symtab_h)
$(gdbcmd_h) $(call_cmds_h) $(gdb_regex_h) $(expression_h) \
$(language_h) $(demangle_h) $(inferior_h) $(linespec_h) \
$(filenames_h) $(gdb_obstack_h) $(gdb_string_h) $(gdb_stat_h) \
- $(cp_abi_h) $(source_h)
+ $(cp_abi_h) $(source_h) $(buildsym_h) $(gdb_assert_h)
target.o: target.c $(defs_h) $(gdb_string_h) $(target_h) $(gdbcmd_h) \
$(symtab_h) $(inferior_h) $(bfd_h) $(symfile_h) $(objfiles_h) \
$(gdb_wait_h) $(dcache_h) $(regcache_h)
Index: buildsym.c
===================================================================
RCS file: /cvs/src/src/gdb/buildsym.c,v
retrieving revision 1.27
diff -u -p -r1.27 buildsym.c
--- buildsym.c 14 Jan 2003 00:15:05 -0000 1.27
+++ buildsym.c 9 Feb 2003 21:52:05 -0000
@@ -203,6 +203,56 @@ free_pending_blocks (void)
pending_blocks = NULL;
}
+/* Convert a block BLOCK with unhashed symbols into a block with
+ hashed symbols.
+
+ All blocks coming out of the debug info readers should either be
+ function blocks (which contain a list of arguments in addition to
+ locals, so we don't hash them [yet] in order to preserve the order
+ of the arguments) or already hashed. However, some older readers
+ don't use finish_block, so they don't create hashed blocks. The only
+ offender as of 2003-02-09 is mdebugread. */
+
+void
+hash_block (struct block *block)
+{
+ int i, nsyms;
+ struct symbol **syms;
+
+ gdb_assert (BLOCK_FUNCTION (block) == NULL);
+ gdb_assert (BLOCK_HASHTABLE (block) == 0);
+
+ nsyms = BLOCK_NSYMS (block);
+
+ /* Save the current list of symbols. */
+ syms = xmalloc (nsyms * sizeof (struct symbol *));
+ memcpy (syms, block->sym, nsyms * sizeof (struct symbol *));
+ memset (block->sym, 0, nsyms * sizeof (struct symbol *));
+
+ BLOCK_HASHTABLE (block) = 1;
+
+ /* Just reuse the already-allocated symbol pointers as our hashtable.
+ Normally we'd use a smaller hash table than this, but reclaiming
+ the memory at this point is hard. */
+ BLOCK_BUCKETS (block) = nsyms;
+
+ for (i = 0; i < nsyms; i++)
+ {
+ struct symbol *sym;
+ unsigned int hash_index;
+ const char *name = SYMBOL_DEMANGLED_NAME (syms[i]);
+ if (name == NULL)
+ name = SYMBOL_NAME (syms[i]);
+ hash_index = msymbol_hash_iw (name);
+ hash_index = hash_index % BLOCK_BUCKETS (block);
+ sym = BLOCK_BUCKET (block, hash_index);
+ BLOCK_BUCKET (block, hash_index) = syms[i];
+ syms[i]->hash_next = sym;
+ }
+
+ xfree (syms);
+}
+
/* Take one of the lists of symbols and make a block from it. Keep
the order the symbols have in the list (reversed from the input
file). Put the block on the list of pending blocks. */
Index: buildsym.h
===================================================================
RCS file: /cvs/src/src/gdb/buildsym.h,v
retrieving revision 1.10
diff -u -p -r1.10 buildsym.h
--- buildsym.h 9 Jan 2003 18:30:32 -0000 1.10
+++ buildsym.h 9 Feb 2003 21:52:05 -0000
@@ -227,6 +227,8 @@ extern void add_symbol_to_list (struct s
extern struct symbol *find_symbol_in_list (struct pending *list,
char *name, int length);
+extern void hash_block (struct block *block);
+
extern void finish_block (struct symbol *symbol,
struct pending **listhead,
struct pending_block *old_blocks,
Index: symtab.c
===================================================================
RCS file: /cvs/src/src/gdb/symtab.c,v
retrieving revision 1.87
diff -u -p -r1.87 symtab.c
--- symtab.c 4 Feb 2003 18:07:01 -0000 1.87
+++ symtab.c 9 Feb 2003 21:52:06 -0000
@@ -40,17 +40,17 @@
#include "linespec.h"
#include "source.h"
#include "filenames.h" /* for FILENAME_CMP */
+#include "cp-abi.h"
+#include "buildsym.h"
#include "hashtab.h"
-
#include "gdb_obstack.h"
-
#include <sys/types.h>
#include <fcntl.h>
#include "gdb_string.h"
#include "gdb_stat.h"
#include <ctype.h>
-#include "cp-abi.h"
+#include "gdb_assert.h"
/* Prototypes for local functions */
@@ -1572,30 +1572,23 @@ find_main_psymtab (void)
return (NULL);
}
-/* Search BLOCK for symbol NAME in NAMESPACE.
-
- Note that if NAME is the demangled form of a C++ symbol, we will fail
- to find a match during the binary search of the non-encoded names, but
- for now we don't worry about the slight inefficiency of looking for
- a match we'll never find, since it will go pretty quick. Once the
- binary search terminates, we drop through and do a straight linear
- search on the symbols. Each symbol which is marked as being a C++
- symbol (language_cplus set) has both the encoded and non-encoded names
- tested for a match.
-
- If MANGLED_NAME is non-NULL, verify that any symbol we find has this
- particular mangled name.
-*/
+/* Search BLOCK for symbol NAME in NAMESPACE. If MANGLED_NAME is non-NULL,
+ verify that any symbol we find has this particular mangled name. */
struct symbol *
lookup_block_symbol (register const struct block *block, const char *name,
const char *mangled_name,
const namespace_enum namespace)
{
- register int bot, top, inc;
- register struct symbol *sym;
- register struct symbol *sym_found = NULL;
- register int do_linear_search = 1;
+ int bot, top;
+ struct symbol *sym;
+ struct symbol *sym_found = NULL;
+
+ if (BLOCK_SHOULD_SORT (block))
+ {
+ hash_block (block);
+ gdb_assert (BLOCK_HASHTABLE (block));
+ }
if (BLOCK_HASHTABLE (block))
{
@@ -1613,88 +1606,8 @@ lookup_block_symbol (register const stru
return NULL;
}
- /* If the blocks's symbols were sorted, start with a binary search. */
-
- if (BLOCK_SHOULD_SORT (block))
- {
- /* Reset the linear search flag so if the binary search fails, we
- won't do the linear search once unless we find some reason to
- do so */
-
- do_linear_search = 0;
- top = BLOCK_NSYMS (block);
- bot = 0;
-
- /* Advance BOT to not far before the first symbol whose name is NAME. */
-
- while (1)
- {
- inc = (top - bot + 1);
- /* No need to keep binary searching for the last few bits worth. */
- if (inc < 4)
- {
- break;
- }
- inc = (inc >> 1) + bot;
- sym = BLOCK_SYM (block, inc);
- if (!do_linear_search && (SYMBOL_LANGUAGE (sym) == language_java))
- {
- do_linear_search = 1;
- }
- if (SYMBOL_SOURCE_NAME (sym)[0] < name[0])
- {
- bot = inc;
- }
- else if (SYMBOL_SOURCE_NAME (sym)[0] > name[0])
- {
- top = inc;
- }
- else if (strcmp (SYMBOL_SOURCE_NAME (sym), name) < 0)
- {
- bot = inc;
- }
- else
- {
- top = inc;
- }
- }
-
- /* Now scan forward until we run out of symbols, find one whose
- name is greater than NAME, or find one we want. If there is
- more than one symbol with the right name and namespace, we
- return the first one; I believe it is now impossible for us
- to encounter two symbols with the same name and namespace
- here, because blocks containing argument symbols are no
- longer sorted. The exception is for C++, where multiple functions
- (cloned constructors / destructors, in particular) can have
- the same demangled name. So if we have a particular
- mangled name to match, try to do so. */
-
- top = BLOCK_NSYMS (block);
- while (bot < top)
- {
- sym = BLOCK_SYM (block, bot);
- if (SYMBOL_NAMESPACE (sym) == namespace
- && (mangled_name
- ? strcmp (SYMBOL_NAME (sym), mangled_name) == 0
- : SYMBOL_MATCHES_NAME (sym, name)))
- {
- return sym;
- }
- if (SYMBOL_SOURCE_NAME (sym)[0] > name[0])
- {
- break;
- }
- bot++;
- }
- }
-
- /* Here if block isn't sorted, or we fail to find a match during the
- binary search above. If during the binary search above, we find a
- symbol which is a Java symbol, then we have re-enabled the linear
- search flag which was reset when starting the binary search.
-
- This loop is equivalent to the loop above, but hacked greatly for speed.
+ /* If a block isn't hashable, we do a linear search here. Right now
+ this means only for blocks associated with a function.
Note that parameter symbols do not always show up last in the
list; this loop makes sure to take anything else other than
@@ -1702,55 +1615,53 @@ lookup_block_symbol (register const stru
last resort. Note that this only takes up extra computation
time on a match. */
- if (do_linear_search)
- {
- top = BLOCK_NSYMS (block);
- bot = 0;
- while (bot < top)
+ top = BLOCK_NSYMS (block);
+ bot = 0;
+ while (bot < top)
+ {
+ sym = BLOCK_SYM (block, bot);
+ if (SYMBOL_NAMESPACE (sym) == namespace
+ && (mangled_name
+ ? strcmp (SYMBOL_NAME (sym), mangled_name) == 0
+ : SYMBOL_MATCHES_NAME (sym, name)))
{
- sym = BLOCK_SYM (block, bot);
- if (SYMBOL_NAMESPACE (sym) == namespace
- && (mangled_name
- ? strcmp (SYMBOL_NAME (sym), mangled_name) == 0
- : SYMBOL_MATCHES_NAME (sym, name)))
+ /* If SYM has aliases, then use any alias that is active
+ at the current PC. If no alias is active at the current
+ PC, then use the main symbol.
+
+ ?!? Is checking the current pc correct? Is this routine
+ ever called to look up a symbol from another context?
+
+ FIXME: No, it's not correct. If someone sets a
+ conditional breakpoint at an address, then the
+ breakpoint's `struct expression' should refer to the
+ `struct symbol' appropriate for the breakpoint's
+ address, which may not be the PC.
+
+ Even if it were never called from another context,
+ it's totally bizarre for lookup_symbol's behavior to
+ depend on the value of the inferior's current PC. We
+ should pass in the appropriate PC as well as the
+ block. The interface to lookup_symbol should change
+ to require the caller to provide a PC. */
+
+ if (SYMBOL_ALIASES (sym))
+ sym = find_active_alias (sym, read_pc ());
+
+ sym_found = sym;
+ if (SYMBOL_CLASS (sym) != LOC_ARG &&
+ SYMBOL_CLASS (sym) != LOC_LOCAL_ARG &&
+ SYMBOL_CLASS (sym) != LOC_REF_ARG &&
+ SYMBOL_CLASS (sym) != LOC_REGPARM &&
+ SYMBOL_CLASS (sym) != LOC_REGPARM_ADDR &&
+ SYMBOL_CLASS (sym) != LOC_BASEREG_ARG)
{
- /* If SYM has aliases, then use any alias that is active
- at the current PC. If no alias is active at the current
- PC, then use the main symbol.
-
- ?!? Is checking the current pc correct? Is this routine
- ever called to look up a symbol from another context?
-
- FIXME: No, it's not correct. If someone sets a
- conditional breakpoint at an address, then the
- breakpoint's `struct expression' should refer to the
- `struct symbol' appropriate for the breakpoint's
- address, which may not be the PC.
-
- Even if it were never called from another context,
- it's totally bizarre for lookup_symbol's behavior to
- depend on the value of the inferior's current PC. We
- should pass in the appropriate PC as well as the
- block. The interface to lookup_symbol should change
- to require the caller to provide a PC. */
-
- if (SYMBOL_ALIASES (sym))
- sym = find_active_alias (sym, read_pc ());
-
- sym_found = sym;
- if (SYMBOL_CLASS (sym) != LOC_ARG &&
- SYMBOL_CLASS (sym) != LOC_LOCAL_ARG &&
- SYMBOL_CLASS (sym) != LOC_REF_ARG &&
- SYMBOL_CLASS (sym) != LOC_REGPARM &&
- SYMBOL_CLASS (sym) != LOC_REGPARM_ADDR &&
- SYMBOL_CLASS (sym) != LOC_BASEREG_ARG)
- {
- break;
- }
+ break;
}
- bot++;
}
+ bot++;
}
+
return (sym_found); /* Will be NULL if not found. */
}
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: RFA/symtab: (Almost) always hash blocks when searching them
2003-02-09 22:03 RFA/symtab: (Almost) always hash blocks when searching them Daniel Jacobowitz
@ 2003-02-09 22:37 ` Daniel Jacobowitz
2003-02-10 19:14 ` David Carlton
1 sibling, 0 replies; 3+ messages in thread
From: Daniel Jacobowitz @ 2003-02-09 22:37 UTC (permalink / raw)
To: gdb-patches
On Sun, Feb 09, 2003 at 05:03:21PM -0500, Daniel Jacobowitz wrote:
> I'm working on modifying the symbol lookup functions to return multiple
> symbols when there are multiple possible matches, and I didn't want to have
> to modify all three kinds of binary search in lookup_block_symbol. First I
> tried fixing mdebugread.c to generate hashed blocks properly; it was too
> messy, and I couldn't build an mdebug toolchain to test with [mips-ecoff was
> my best guess, and it's been broken for months. Part of it was my fault and
> then GCC started segfaulting after I fixed that].
>
> So instead, I added a new function to hash a block retroactively. Then, in
> lookup_block_symbol, where we would previously have done a binary search we
> instead hash the block and do a hash table search. Amortized cost is
> somewhat lower, complexity cost is much lower. I like it.
>
> I also updated the comments; the bit about not matching demangled names was
> out of date. Symbols are hashed by their demangled name, if any.
>
> Is this patch OK?
Never mind for now. I'm solving my problem in a different way, and I
didn't notice the constness issue with this patch before I posted it.
--
Daniel Jacobowitz
MontaVista Software Debian GNU/Linux Developer
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: RFA/symtab: (Almost) always hash blocks when searching them
2003-02-09 22:03 RFA/symtab: (Almost) always hash blocks when searching them Daniel Jacobowitz
2003-02-09 22:37 ` Daniel Jacobowitz
@ 2003-02-10 19:14 ` David Carlton
1 sibling, 0 replies; 3+ messages in thread
From: David Carlton @ 2003-02-10 19:14 UTC (permalink / raw)
To: Daniel Jacobowitz; +Cc: gdb-patches
On Sun, 9 Feb 2003 17:03:21 -0500, Daniel Jacobowitz <drow@mvista.com> said:
> I'm working on modifying the symbol lookup functions to return
> multiple symbols when there are multiple possible matches, and I
> didn't want to have to modify all three kinds of binary search in
> lookup_block_symbol. First I tried fixing mdebugread.c to generate
> hashed blocks properly; it was too messy, and I couldn't build an
> mdebug toolchain to test with [mips-ecoff was my best guess, and
> it's been broken for months. Part of it was my fault and then GCC
> started segfaulting after I fixed that].
For what it's worth (I do realize you've withdrawn your patch), I'm
hoping to submit an RFC by the end of February for a plan that will
include getting rid of sorted linear blocks. I'm not up for
buildsymifying mdebugread.c, but I'm up for doing that amount of
cleanup.
I haven't looked at your patch, but one piece of trivia that you might
want to be aware of if my RFC is rejected and if you have to deal with
this again is that Java code will create one block that is a
non-sorted linear block but that isn't associated to a function. It's
easy to screw that one up without realizing it; if you do, Tom Tromey
will get mad at you. :-)
David Carlton
carlton@math.stanford.edu
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2003-02-10 19:14 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-02-09 22:03 RFA/symtab: (Almost) always hash blocks when searching them Daniel Jacobowitz
2003-02-09 22:37 ` Daniel Jacobowitz
2003-02-10 19:14 ` David Carlton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox