the global symbol table

Mirror of the gdb mailing list
 help / color / mirror / Atom feed

* the global symbol table
@ 2002-09-25 12:59 David Carlton
  2002-09-25 13:54 ` Elena Zannoni
  0 siblings, 1 reply; 6+ messages in thread
From: David Carlton @ 2002-09-25 12:59 UTC (permalink / raw)
  To: gdb

I'm trying to get a handle on the global symbol table, and on what
would be a good way to reorganize its data in order to make
lookup_symbol (and related functions, e.g. search_symbol) faster and
to provide some sort of data structure that could also be used in
namespace searches.

I'm not sure that I have any really coherent questions, so I'm going
to free-associate for a while, and see what thoughts that jars loose
from other people's brains.  I've listed the main issues that I'm
worried about below.

As a reminder, here's what currently happens if you call lookup_symbol
and it doesn't find a local match somewhere:

* It looks up the global blocks in all the symtabs, and looks for them
  there.  If it finds it, it calls fixup_symbol_section (to get the
  bfd_section and section set correctly; this happens in the other
  cases below, too) and returns it.
* Then, #ifndef HPUXHPPA, it sometimes looks in the minimal symbol
  table.  If it finds something, it looks for it in the global and
  static blocks of the corresponding symtab.  (But why wouldn't it
  have already found the global match?  Hmm.)  Otherwise, it sometimes
  does some sort of mangled name lookup.
* Next, it looks in all the psymtabs.  If it finds a match, it
  converts the psymtab to a symtab, and returns the corresponding
  symbol there.
* Next, it looks in the static symbols for all of the symtabs.  (To
  what extent does this overlap with the lookup_minimal_symbol stuff
  earlier?)
* Then, it looks in the static symbols for all the psymtabs.
* And finally, #ifdef HPUXHPPA, it does the minimal symbol stuff that
  it otherwise did earlier.

This raises some issues:

* Do we _really_ want to search static symbols?  My guess is that the
  answer is 'yes', but it's not entirely clear to me that that's the
  case.  Certainly whenever we do a search that returns results that
  aren't strictly in line with what the language says, I get nervous.

* Even the simplest case of searching all the global blocks isn't
  implemented very well: surely we need some sort of fast lookup
  structure that won't require us to look at every single symtab.
  Assuming that we have some sort of expandable data structure in
  which lookups are fast, then exactly how should this work?  Should
  we keep replace the current global blocks by one big global table,
  should we have a global table that duplicates the information in the
  global blocks (so struct symbols corresponding to global symbols
  would be stored in two separate places), or should we have a global
  table that quickly maps names to symtabs, but have the actual
  symbols only stored in the global blocks of symtabs?  I'm currently
  leaning towards the second solution, with the possible longer-term
  goal of migrating towards the first solution, but it's not clear to
  me exactly what the consequences of these choices are.

* Which of the structs symbol, partial_symbol, and minimal_symbol can
  be unified?  I tend to think that partial_symbols should go away, to
  be replaced by special 'incomplete' sorts of struct symbols combined
  with code in lookup_symbol (or wherever) that says that, if you run
  into one of those kinds of incomplete symbols, read in the entire
  symbol table for that file and flesh out all of its partial symbols.
  (This could open a door to a general notion of incomplete symbols
  that could, depending on the debugging format, be completed in a
  more efficient manner than the current psymtab->symtab mechanisms.)
  I was hoping that minimal_symbols could also be turned into special
  kinds of symbols, but now I'm more dubious about that; if they have
  to remain separate, I should presumably tweak dictionaries to let
  them index any sort of struct general_symbol_info, and change the
  minimal symbol table in ways that correspond to the changes to the
  global symbol table.

* So just what's going on with the usage of minimal symbols in
  lookup_symbol?  Are there any situations where looking in the
  minimal symbol is actually helpful?  If so, do those only have to do
  with ickiness related to mangled names, or is there a more serious
  issue that I'm missing?

* One issue that I didn't comment on above is that lookup_symbol has a
  'symtab' argument that stores the symtab in which the symbol is
  found.  If we have to support that, that obviously affects the data
  structures involved.  But it seems to me that we _don't_ have to
  support that: I did a cursory look through GDB's sources and, as far
  as I can tell, that argument is only used by linespec.c's
  decode_line_1.  So it seems to me that we should remove that
  argument from lookup_symbol and provide some sort of alternate
  functionality to satisfy decode_line_1's needs.

David Carlton
carlton@math.stanford.edu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: the global symbol table
  2002-09-25 12:59 the global symbol table David Carlton
@ 2002-09-25 13:54 ` Elena Zannoni
  2002-09-25 14:02   ` Andrew Cagney
                     ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Elena Zannoni @ 2002-09-25 13:54 UTC (permalink / raw)
  To: David Carlton; +Cc: gdb


I wonder if rewriting the symbol table is really necessary in order to
provide namespace support. I don't want to discourage you though, from
thinking about how to improve things.  Besides, I am a bit behind in my
mail, and I still have to catch up on your progress. Sorry about that.

One thing I found useful doing when I was trying to get a 500ft view
of the symbol table, was to look at how the symbol readers interface with
each other and the rest of gdb.

I looked at the functions that each module provided and noticed that a
few were a bit confused, as to where they belonged, which file
exported what, etc.

I think that an audit of the interfaces could help here as well.
Just a suggestion. And I hope to have time on the weekend to catch up.

Some (not too deep) thoughts below.

David Carlton writes:
 > I'm trying to get a handle on the global symbol table, and on what
 > would be a good way to reorganize its data in order to make
 > lookup_symbol (and related functions, e.g. search_symbol) faster and
 > to provide some sort of data structure that could also be used in
 > namespace searches.
 > 
 > I'm not sure that I have any really coherent questions, so I'm going
 > to free-associate for a while, and see what thoughts that jars loose
 > from other people's brains.  I've listed the main issues that I'm
 > worried about below.
 > 
 > As a reminder, here's what currently happens if you call lookup_symbol
 > and it doesn't find a local match somewhere:
 > 
 > * It looks up the global blocks in all the symtabs, and looks for them
 >   there.  If it finds it, it calls fixup_symbol_section (to get the
 >   bfd_section and section set correctly; this happens in the other
 >   cases below, too) and returns it.
 > * Then, #ifndef HPUXHPPA, it sometimes looks in the minimal symbol
 >   table.  If it finds something, it looks for it in the global and
 >   static blocks of the corresponding symtab.  (But why wouldn't it
 >   have already found the global match?  Hmm.)  Otherwise, it sometimes
 >   does some sort of mangled name lookup.

the hppa stuff (i think, if my memory is not failing me) relates to the
SOM stuff that is HP's proprietary format. Maybe look in somread.c.

 > * Next, it looks in all the psymtabs.  If it finds a match, it
 >   converts the psymtab to a symtab, and returns the corresponding
 >   symbol there.
 > * Next, it looks in the static symbols for all of the symtabs.  (To
 >   what extent does this overlap with the lookup_minimal_symbol stuff
 >   earlier?)
 > * Then, it looks in the static symbols for all the psymtabs.
 > * And finally, #ifdef HPUXHPPA, it does the minimal symbol stuff that
 >   it otherwise did earlier.
 > 
 > This raises some issues:
 > 
 > * Do we _really_ want to search static symbols?  My guess is that the
 >   answer is 'yes', but it's not entirely clear to me that that's the
 >   case.  Certainly whenever we do a search that returns results that
 >   aren't strictly in line with what the language says, I get nervous.
 > 
 > * Even the simplest case of searching all the global blocks isn't
 >   implemented very well: surely we need some sort of fast lookup
 >   structure that won't require us to look at every single symtab.
 >   Assuming that we have some sort of expandable data structure in
 >   which lookups are fast, then exactly how should this work?  Should
 >   we keep replace the current global blocks by one big global table,
 >   should we have a global table that duplicates the information in the
 >   global blocks (so struct symbols corresponding to global symbols
 >   would be stored in two separate places), or should we have a global
 >   table that quickly maps names to symtabs, but have the actual
 >   symbols only stored in the global blocks of symtabs?  I'm currently
 >   leaning towards the second solution, with the possible longer-term
 >   goal of migrating towards the first solution, but it's not clear to
 >   me exactly what the consequences of these choices are.
 > 

I thought we were trying to make gdb footprint a bit smaller. So I
would be against duplicating information. But maybe I am not
understanding your proposal.


 > * Which of the structs symbol, partial_symbol, and minimal_symbol can
 >   be unified?  I tend to think that partial_symbols should go away, to
 >   be replaced by special 'incomplete' sorts of struct symbols combined
 >   with code in lookup_symbol (or wherever) that says that, if you run
 >   into one of those kinds of incomplete symbols, read in the entire
 >   symbol table for that file and flesh out all of its partial symbols.
 >   (This could open a door to a general notion of incomplete symbols
 >   that could, depending on the debugging format, be completed in a
 >   more efficient manner than the current psymtab->symtab mechanisms.)
 >   I was hoping that minimal_symbols could also be turned into special
 >   kinds of symbols, but now I'm more dubious about that; if they have
 >   to remain separate, I should presumably tweak dictionaries to let
 >   them index any sort of struct general_symbol_info, and change the
 >   minimal symbol table in ways that correspond to the changes to the
 >   global symbol table.
 > 

We could eliminate partial stabs for dwarf2, (I think that was the
plan) but I am not sure you want to do that for other symbol readers.


 > * So just what's going on with the usage of minimal symbols in
 >   lookup_symbol?  Are there any situations where looking in the
 >   minimal symbol is actually helpful?  If so, do those only have to do
 >   with ickiness related to mangled names, or is there a more serious
 >   issue that I'm missing?
 > 

debugging something compiled w/o -g. 
Assembly only debugging can also be very tricky.


 > * One issue that I didn't comment on above is that lookup_symbol has a
 >   'symtab' argument that stores the symtab in which the symbol is
 >   found.  If we have to support that, that obviously affects the data
 >   structures involved.  But it seems to me that we _don't_ have to
 >   support that: I did a cursory look through GDB's sources and, as far
 >   as I can tell, that argument is only used by linespec.c's
 >   decode_line_1.  So it seems to me that we should remove that
 >   argument from lookup_symbol and provide some sort of alternate
 >   functionality to satisfy decode_line_1's needs.
 > 

Oh our favorite function. not. in theory maybe, but tweaking with
decode_line_1 has always provided quite unexpected interesting
consequences.  (Almost as unpredictable as changing wfi). The
default_* parameters allow some commands to work w/o arguments, like
you can say 'break' and gdb knows to put a bp at the current line.  I
am not sure what you could do here, the parameter seems to be used
mostly by breakpoint code.

Elena


 > David Carlton
 > carlton@math.stanford.edu


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: the global symbol table
  2002-09-25 13:54 ` Elena Zannoni
@ 2002-09-25 14:02   ` Andrew Cagney
  2002-09-25 16:17   ` David Carlton
  2002-09-30 13:08   ` David Carlton
  2 siblings, 0 replies; 6+ messages in thread
From: Andrew Cagney @ 2002-09-25 14:02 UTC (permalink / raw)
  To: Elena Zannoni; +Cc: David Carlton, gdb

>  > * So just what's going on with the usage of minimal symbols in
>  >   lookup_symbol?  Are there any situations where looking in the
>  >   minimal symbol is actually helpful?  If so, do those only have to do
>  >   with ickiness related to mangled names, or is there a more serious
>  >   issue that I'm missing?
>  > 
> 
> debugging something compiled w/o -g. 
> Assembly only debugging can also be very tricky.

Yes, see nodebug.exp.

Andrew



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: the global symbol table
  2002-09-25 13:54 ` Elena Zannoni
  2002-09-25 14:02   ` Andrew Cagney
@ 2002-09-25 16:17   ` David Carlton
  2002-09-30 13:08   ` David Carlton
  2 siblings, 0 replies; 6+ messages in thread
From: David Carlton @ 2002-09-25 16:17 UTC (permalink / raw)
  To: Elena Zannoni; +Cc: gdb

On Wed, 25 Sep 2002 16:52:20 -0400, Elena Zannoni <ezannoni@redhat.com> said:

> I wonder if rewriting the symbol table is really necessary in order
> to provide namespace support.

I'm not sure if it's strictly necessary, but I think they share enough
in common to make it a good idea.  Namespaces, like global symbols,
have the property that they gather together symbols defined in many
different blocks (indeed, in many different files) and make them all
accessible in the same place.  To do either of them right, you (at
least currently) have to deal with not only symbols but also partial
symbols and perhaps minimal symbols.  It might even be the case that
unnamed namespaces are to named namespaces as static symbols are to
global symbols, though that analogy might be difficult to express in
code.

So, at the very least, I have to see how global symbol lookup
currently works, so I know what the possible pitfalls are.  But it
would be a shame if I couldn't find pieces of code that I could use in
both places.  I'd be pretty surprised if it ended being the case that
_all_ of the global symbol code ended up being a special case of
namespace code - my guess is that one or both cases will have enough
quirks to require some special handling - but there should be chunks
of code that can be used in both cases.  Also, using the same code in
both places will help me a lot with testing: to the extent that
looking up global symbols is the same as looking up names within a
specified namespace, then the entire GDB testsuite will help test the
namespace code.

Of course, I don't want to change the global symbol code more than
necessary; and, in fact, one reasonable option would be to leave the
global symbol lookup code working almost exactly as it is, maybe
taking out some chunks of it and putting them in separate functions
but without changing the algorithms, and then build namespace support
on top of that, and only then to think about improving the algorithms.
One issue there is that the algorithm for global symbol lookup is
pretty bad (since it always examines each symtab), so it seems to me
that it's likely that that will get changed in the not too distant
future; if changing it now would make adding namespace support easier,
then changing it now would be a good idea.

> One thing I found useful doing when I was trying to get a 500ft view
> of the symbol table, was to look at how the symbol readers interface with
> each other and the rest of gdb.

> I looked at the functions that each module provided and noticed that a
> few were a bit confused, as to where they belonged, which file
> exported what, etc.

> I think that an audit of the interfaces could help here as well.
> Just a suggestion. And I hope to have time on the weekend to catch
> up.

Thanks for the suggestions.

> the hppa stuff (i think, if my memory is not failing me) relates to the
> SOM stuff that is HP's proprietary format. Maybe look in somread.c.

Thanks for the tip.

>> * Even the simplest case of searching all the global blocks isn't
>> implemented very well: surely we need some sort of fast lookup
>> structure that won't require us to look at every single symtab.
>> Assuming that we have some sort of expandable data structure in
>> which lookups are fast, then exactly how should this work?  Should
>> we keep replace the current global blocks by one big global table,
>> should we have a global table that duplicates the information in
>> the global blocks (so struct symbols corresponding to global
>> symbols would be stored in two separate places), or should we have
>> a global table that quickly maps names to symtabs, but have the
>> actual symbols only stored in the global blocks of symtabs?  I'm
>> currently leaning towards the second solution, with the possible
>> longer-term goal of migrating towards the first solution, but it's
>> not clear to me exactly what the consequences of these choices are.

> I thought we were trying to make gdb footprint a bit smaller. So I
> would be against duplicating information. But maybe I am not
> understanding your proposal.

I certainly wouldn't want to make GDB's footprint any bigger without a
reason to do so.  But having the same symbols accessible via two
different ways wouldn't make GDB's footprint all that much bigger (I'm
not proposing having two copies of each global symbol, just having
each global symbol appear in two tables), and if it turned out that
there was a compelling reason for it to be easy to quickly get at the
global symbols within a given symtab as well as all to quickly get at
the global symbols at once, then having different tables for those
roles is one easy way to do it.

>> * Which of the structs symbol, partial_symbol, and minimal_symbol can
>> be unified?  I tend to think that partial_symbols should go away,
>> to be replaced by special 'incomplete' sorts of struct symbols
>> combined with code in lookup_symbol (or wherever) that says that,
>> if you run into one of those kinds of incomplete symbols, read in
>> the entire symbol table for that file and flesh out all of its
>> partial symbols.  (This could open a door to a general notion of
>> incomplete symbols that could, depending on the debugging format,
>> be completed in a more efficient manner than the current
>> psymtab->symtab mechanisms.)  I was hoping that minimal_symbols
>> could also be turned into special kinds of symbols, but now I'm
>> more dubious about that; if they have to remain separate, I should
>> presumably tweak dictionaries to let them index any sort of struct
>> general_symbol_info, and change the minimal symbol table in ways
>> that correspond to the changes to the global symbol table.

> We could eliminate partial stabs for dwarf2, (I think that was the
> plan) but I am not sure you want to do that for other symbol
> readers.

At some point, I'll try to say a little more coherently what I mean to
do here.  Certainly I want to be able to work with the existing
psymtab_to_symtab code; it's just that I'd like to work with that in
such a way that I have to search in as few places as possible when
looking up a name.  So I want to bury the psymtab->symtab translation
down as deep as possible.

To put it another way: right now, lookup_symbol might look up names in
global symtabs, global psymtabs, static symtabs, static psymtabs, or
minimal symbols.  A name found in a global psymtab is just as good as
a name found in a global symtab (so whether you search the psymtabs or
the symtabs first should be an optimization issue, not a semantic
issue); but a name found in a static psymtab or symtab is worse than a
name found in a global one.  And a name found in minimal symbols is
its own strange thing; I'm not sure where it fits into this
hierarchy.  So that suggests that merging searching in psymtabs with
searching symtabs might be a reasonable idea to consider.

Eliminating the use of psymtabs in DWARF 2 is really a side issue, for
future consideration.  I don't want to make doing that any harder than
it would be now, and my guess is that any work that I do to merge data
structures that naturally belong together will probably help that
effort, but that can definitely wait.

>> * So just what's going on with the usage of minimal symbols in
>> lookup_symbol?  Are there any situations where looking in the
>> minimal symbol is actually helpful?  If so, do those only have to
>> do with ickiness related to mangled names, or is there a more
>> serious issue that I'm missing?

> debugging something compiled w/o -g. 

Right, I was wondering about that.  I know that, if I run GDB on a
program without debugging info and do 'break main' then it still
works.  But I can't see how lookup_symbol ever actually produces a
struct symbol from a situation where there isn't debugging info.
(Does lookup_symbol ever get called in that situation?  I'll have to
try it and see what happens.)

Obviously I have more code browsing to do.  But, because of debugging
w/o -g, it certainly seems to me that minimal symbols can't go away.

>> * One issue that I didn't comment on above is that lookup_symbol has a
>> 'symtab' argument that stores the symtab in which the symbol is
>> found.  If we have to support that, that obviously affects the data
>> structures involved.  But it seems to me that we _don't_ have to
>> support that: I did a cursory look through GDB's sources and, as far
>> as I can tell, that argument is only used by linespec.c's
>> decode_line_1.  So it seems to me that we should remove that
>> argument from lookup_symbol and provide some sort of alternate
>> functionality to satisfy decode_line_1's needs.
>> 

> Oh our favorite function. not. in theory maybe, but tweaking with
> decode_line_1 has always provided quite unexpected interesting
> consequences.

Here's what I was thinking might be reasonable: delete the SYMTAB
argument to lookup_symbol, and add a new function

extern symtab *lookup_symtab_symbol (struct symbol *sym, struct block *block);

that returns the symtab where SYM is found if you look for its name in
BLOCK.  (Or, I guess, NULL if the first argument is NULL.)  Then
existing calls like

sym = lookup_symbol (name, block, namespace, is_a_field_of_this, &symtab);

could be converted to the two calls

sym = lookup_symbol (name, block, namespace, is_a_field_of_this);
symtab = lookup_symtab_symbol (sym, block);

And, of course, existing calls like

sym = lookup_symbol (name, block, namespace, is_a_field_of_this, NULL);

could just be replaced by

sym = lookup_symbol (name, block, namespace, is_a_field_of_this);

That way, the semantics of decode_line_1 would remain unchanged, but
other uses of lookup_symbol would be faster if it turned out that it
were possible to speed up lookup_symbol as long as you didn't have to
identify the symtab in question.

Though the speed issue is, for me, not the only important issue: I
just get nervous when functions have arguments that are almost never
used.  So I don't like the IS_A_FIELD_OF_THIS argument to
lookup_symbol too much, either.

Anyways, thanks for the feedback, I appreciate it.

David Carlton
carlton@math.stanford.edu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: the global symbol table
  2002-09-25 13:54 ` Elena Zannoni
  2002-09-25 14:02   ` Andrew Cagney
  2002-09-25 16:17   ` David Carlton
@ 2002-09-30 13:08   ` David Carlton
  2002-09-30 14:49     ` Daniel Jacobowitz
  2 siblings, 1 reply; 6+ messages in thread
From: David Carlton @ 2002-09-30 13:08 UTC (permalink / raw)
  To: Elena Zannoni; +Cc: gdb

On Wed, 25 Sep 2002 16:52:20 -0400, Elena Zannoni <ezannoni@redhat.com> said:

> I wonder if rewriting the symbol table is really necessary in order
> to provide namespace support.

Now I'm starting to wonder if it's a good idea, too: I just noticed
'block_found', which throws a monkey wrench into some of my ideas.
Sigh.  Maybe it really would be wisest to get namespace support
working first.

While trying to understand lookup_symbol_aux, I've cleaned it up a bit
(in order to eliminate duplicate code, etc.); I'll probably submit
patches for those, since I think they'll make it easier to maintain
the function in the future.  But, for now, I'll hold off on any more
serious changes.

David Carlton
carlton@math.stanford.edu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: the global symbol table
  2002-09-30 13:08   ` David Carlton
@ 2002-09-30 14:49     ` Daniel Jacobowitz
  0 siblings, 0 replies; 6+ messages in thread
From: Daniel Jacobowitz @ 2002-09-30 14:49 UTC (permalink / raw)
  To: David Carlton; +Cc: Elena Zannoni, gdb

On Mon, Sep 30, 2002 at 01:02:47PM -0700, David Carlton wrote:
> On Wed, 25 Sep 2002 16:52:20 -0400, Elena Zannoni <ezannoni@redhat.com> said:
> 
> > I wonder if rewriting the symbol table is really necessary in order
> > to provide namespace support.
> 
> Now I'm starting to wonder if it's a good idea, too: I just noticed
> 'block_found', which throws a monkey wrench into some of my ideas.
> Sigh.  Maybe it really would be wisest to get namespace support
> working first.
> 
> While trying to understand lookup_symbol_aux, I've cleaned it up a bit
> (in order to eliminate duplicate code, etc.); I'll probably submit
> patches for those, since I think they'll make it easier to maintain
> the function in the future.  But, for now, I'll hold off on any more
> serious changes.

I have the feeling that rewriting the symbol table support might be
better done _after_ we have some namespace working...

I'm making a little bit of progress but I haven't had much time lately. 
First I need to do a lot of output cleanup so that I can test changes
properly; that'll be dribbling in over the next month I hope.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2002-09-30 21:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-09-25 12:59 the global symbol table David Carlton
2002-09-25 13:54 ` Elena Zannoni
2002-09-25 14:02   ` Andrew Cagney
2002-09-25 16:17   ` David Carlton
2002-09-30 13:08   ` David Carlton
2002-09-30 14:49     ` Daniel Jacobowitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox