Mirror of the gdb mailing list
 help / color / mirror / Atom feed
* Demangling and searches
@ 2003-01-07 23:56 Paul Hilfinger
  2003-01-08  0:13 ` David Carlton
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Paul Hilfinger @ 2003-01-07 23:56 UTC (permalink / raw)
  To: Elena Zannoni, Adam Fedor, GDB Patches, Daniel Jacobowitz


For some time, I've been meaning to ask a basic question about GDB
search strategy: for language implementations that mangle their
identifiers, the standard procedure in GDB at the moment is to search
for the demangled identifier among the demangled identifiers of the
symbol table, and to speed this search up by precomputing and storing
the demangled symbol names.  Why?

We used to do that for Ada mode in GDB, but subsequently changed our
approach entirely.  For Ada, we MANGLE the symbol we're searching for
and then search among the MANGLED (i.e., raw, unmodified, warm-from-
the-executable) names.  We do very little demangling as a result, and
do not devote any storage to demangled names.  Of course, we do have
to demangle during the 'info XXX' symbol searches, but that is not a
common operation (at least for our customers), and therefore we saw
little to be gained by storing the demangled names.

Is there some unfortunate feature of C++ and ObjC mangling that
completely prevents our approach for those languages?  What was the
rationale behind the current strategy?

Thanks for the information.

Paul Hilfinger


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Demangling and searches
  2003-01-07 23:56 Demangling and searches Paul Hilfinger
@ 2003-01-08  0:13 ` David Carlton
  2003-01-08  0:38   ` Daniel Berlin
  2003-01-09  2:38   ` Paul Hilfinger
  2003-01-08  1:04 ` Daniel Jacobowitz
  2003-01-08  1:39 ` Elena Zannoni
  2 siblings, 2 replies; 7+ messages in thread
From: David Carlton @ 2003-01-08  0:13 UTC (permalink / raw)
  To: Paul Hilfinger; +Cc: Elena Zannoni, Adam Fedor, GDB Patches, Daniel Jacobowitz

On Tue, 07 Jan 2003 15:54:36 -0800, Paul Hilfinger <hilfingr@CS.Berkeley.EDU> said:

> For some time, I've been meaning to ask a basic question about GDB
> search strategy: for language implementations that mangle their
> identifiers, the standard procedure in GDB at the moment is to
> search for the demangled identifier among the demangled identifiers
> of the symbol table, and to speed this search up by precomputing and
> storing the demangled symbol names.  Why?

> We used to do that for Ada mode in GDB, but subsequently changed our
> approach entirely.  For Ada, we MANGLE the symbol we're searching
> for and then search among the MANGLED (i.e., raw, unmodified,
> warm-from- the-executable) names.

I'm curious: in Ada, what does the mangling do?  In particular, how
much type info does it contain?  In C++, the mangled name contains
type info for the arguments for functions; I don't see how, using
GDB's current data structures, to allow us to allow users to, say,
break on a function without requiring them to specify the types of the
arguments, if we took your approach.  (Though it might be possible to
modify GDB's data structures to allow that.)

Also, C++ debugging really takes place in a mixed C/C++ environment
(which is really inherent to the nature of C++, given the possibility
of 'extern "C"') so it can be hard to tell if mangling is required in
the first place.  Come to think of it, anonymous namespaces would make
life pretty difficult, too.

It's amusing that you raise the issue right now, because several of us
are arguing about this off-list: I'd rather not deal with the fact
that sometimes we demangle and sometimes (partial symbols) we don't,
but other people say that demangling partial symbols would be too
expensive.  Daniel Jacobowitz might be coming up with a nice
compromise that allows us to share data when appropriate.  I don't see
how to reasonably avoid doing quite a lot of demangling (at the very
least demangling all minimal symbols when loading them in), but maybe
it's possible if you folks are doing it.

David Carlton
carlton@math.stanford.edu


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Demangling and searches
  2003-01-08  0:13 ` David Carlton
@ 2003-01-08  0:38   ` Daniel Berlin
  2003-01-09  2:38   ` Paul Hilfinger
  1 sibling, 0 replies; 7+ messages in thread
From: Daniel Berlin @ 2003-01-08  0:38 UTC (permalink / raw)
  To: David Carlton
  Cc: Paul Hilfinger, Elena Zannoni, Adam Fedor, GDB Patches,
	Daniel Jacobowitz


On Tuesday, January 7, 2003, at 07:13  PM, David Carlton wrote:

> On Tue, 07 Jan 2003 15:54:36 -0800, Paul Hilfinger 
> <hilfingr@CS.Berkeley.EDU> said:
>
>> For some time, I've been meaning to ask a basic question about GDB
>> search strategy: for language implementations that mangle their
>> identifiers, the standard procedure in GDB at the moment is to
>> search for the demangled identifier among the demangled identifiers
>> of the symbol table, and to speed this search up by precomputing and
>> storing the demangled symbol names.  Why?
>
>> We used to do that for Ada mode in GDB, but subsequently changed our
>> approach entirely.  For Ada, we MANGLE the symbol we're searching
>> for and then search among the MANGLED (i.e., raw, unmodified,
>> warm-from- the-executable) names.
>
> I'm curious: in Ada, what does the mangling do?  In particular, how
> much type info does it contain?  In C++, the mangled name contains
> type info for the arguments for functions; I don't see how, using
> GDB's current data structures, to allow us to allow users to, say,
> break on a function without requiring them to specify the types of the
> arguments, if we took your approach.  (Though it might be possible to
> modify GDB's data structures to allow that.)

Further, the mangling is ABI dependent.
Thus, you'd have to know what ABI the symbol you are looking up is 
going to use, to mangle it properly


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Demangling and searches
  2003-01-07 23:56 Demangling and searches Paul Hilfinger
  2003-01-08  0:13 ` David Carlton
@ 2003-01-08  1:04 ` Daniel Jacobowitz
  2003-01-08  1:39 ` Elena Zannoni
  2 siblings, 0 replies; 7+ messages in thread
From: Daniel Jacobowitz @ 2003-01-08  1:04 UTC (permalink / raw)
  To: Paul Hilfinger; +Cc: Elena Zannoni, Adam Fedor, GDB Patches

On Tue, Jan 07, 2003 at 03:54:36PM -0800, Paul Hilfinger wrote:
> 
> For some time, I've been meaning to ask a basic question about GDB
> search strategy: for language implementations that mangle their
> identifiers, the standard procedure in GDB at the moment is to search
> for the demangled identifier among the demangled identifiers of the
> symbol table, and to speed this search up by precomputing and storing
> the demangled symbol names.  Why?
> 
> We used to do that for Ada mode in GDB, but subsequently changed our
> approach entirely.  For Ada, we MANGLE the symbol we're searching for
> and then search among the MANGLED (i.e., raw, unmodified, warm-from-
> the-executable) names.  We do very little demangling as a result, and
> do not devote any storage to demangled names.  Of course, we do have
> to demangle during the 'info XXX' symbol searches, but that is not a
> common operation (at least for our customers), and therefore we saw
> little to be gained by storing the demangled names.
> 
> Is there some unfortunate feature of C++ and ObjC mangling that
> completely prevents our approach for those languages?  What was the

Bingo.  If you want to get a whiff of what the demangling looks like,
check libiberty/cp-demangle.c; that's just for the GNU v3 mangling
scheme.  cplus-dem.c supports some others.  It's hideously complicated
and includes type information, as part of the language support for
overloading.  If it were just namespace/enclosing class information
this would be practical, but for C++ that's too much.

Ada's "mangling" appears to be much more simplistic.

> rationale behind the current strategy?

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Demangling and searches
  2003-01-07 23:56 Demangling and searches Paul Hilfinger
  2003-01-08  0:13 ` David Carlton
  2003-01-08  1:04 ` Daniel Jacobowitz
@ 2003-01-08  1:39 ` Elena Zannoni
  2 siblings, 0 replies; 7+ messages in thread
From: Elena Zannoni @ 2003-01-08  1:39 UTC (permalink / raw)
  To: Paul Hilfinger; +Cc: Elena Zannoni, Adam Fedor, GDB Patches, Daniel Jacobowitz

Paul Hilfinger writes:
 > 
 > For some time, I've been meaning to ask a basic question about GDB
 > search strategy: for language implementations that mangle their
 > identifiers, the standard procedure in GDB at the moment is to search
 > for the demangled identifier among the demangled identifiers of the
 > symbol table, and to speed this search up by precomputing and storing
 > the demangled symbol names.  Why?
 > 

Gdb did that orginally for c++. In October 2000 the behavior was
changed to do the search among the demangled names instead of the
mangled ones. This way it was able to do a binary search instead of a
linear one, given that the names are sorted on the demangled value.
I think that what prompted the change was this analysis:
http://sources.redhat.com/ml/gdb-patches/2000-06/msg00024.html

Unfortunately there is still some lack of clarity around the partial
symbol handling. As David Carlton mentions. Partial symbols don't
store the demangled names. 

Elena


 > We used to do that for Ada mode in GDB, but subsequently changed our
 > approach entirely.  For Ada, we MANGLE the symbol we're searching for
 > and then search among the MANGLED (i.e., raw, unmodified, warm-from-
 > the-executable) names.  We do very little demangling as a result, and
 > do not devote any storage to demangled names.  Of course, we do have
 > to demangle during the 'info XXX' symbol searches, but that is not a
 > common operation (at least for our customers), and therefore we saw
 > little to be gained by storing the demangled names.
 > 
 > Is there some unfortunate feature of C++ and ObjC mangling that
 > completely prevents our approach for those languages?  What was the
 > rationale behind the current strategy?
 > 
 > Thanks for the information.
 > 
 > Paul Hilfinger


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Demangling and searches
  2003-01-08  0:13 ` David Carlton
  2003-01-08  0:38   ` Daniel Berlin
@ 2003-01-09  2:38   ` Paul Hilfinger
  2003-01-09 21:51     ` David Carlton
  1 sibling, 1 reply; 7+ messages in thread
From: Paul Hilfinger @ 2003-01-09  2:38 UTC (permalink / raw)
  To: David Carlton; +Cc: Elena Zannoni, Adam Fedor, GDB Patches, Daniel Jacobowitz


 > I'm curious: in Ada, what does the mangling do?  In particular, how
 > much type info does it contain?  In C++, the mangled name contains
 > type info for the arguments for functions; I don't see how, using
 > GDB's current data structures, to allow us to allow users to, say,
 > break on a function without requiring them to specify the types of the
 > arguments, if we took your approach.  (Though it might be possible to
 > modify GDB's data structures to allow that.)

In Ada, the mangled name does not contain type information, but we
actually solve an even harder problem.  The mangled name contains
certain information that the user doesn't necessarily know, so that the system
CANNOT reconstruct the full mangled name from the user's input a
priori.  Yet nevertheless, we look up quite successfully, thank you
very much.  The technique is simple: the mangled names have the form

     <NAME><OTHER INFO>

where the syntax of <OTHER INFO> is such that we can find its
beginning.  In Ada mode, GDB has to use a special lookup function
that ignores <OTHER INFO>, but otherwise the algorithm is the same as
for C or C++, etc.

Let's look at this a bit abstractly.  We're given a key K to
search for, among a set of symbols, S.  There is some predicate, P,
that defines success: you want s in S such that P(K, s).   The current
scheme achieves speed with two complementary strategies:

       1. Use K to find a subset, S', of S, limiting the number of 
          comparisons.
       2. Speed up P with preprocessing: P has (roughly) the form
                 K equals SYMBOL_SOURCE_NAME (s)
	  where SYMBOL_SOURCE_NAME (s) = f (SYMBOL_NAME (s), SYMBOL_LANGUAGE(s)),

I am asking why we can't change P to 

       K equals f (SYMBOL_NAME (s), SYMBOL_LANGUAGE (s)),

or, more abstractly, to something like

       compare_demangled (K, SYMBOL_NAME (s), SYMBOL_LANGUAGE (s)) == 0

saving considerable space in the process.  (Daniel Berlin points out
that ABI also figures into these, but here I'll just go by the
parameterization in symtab.h).  The answer is "because f is costly
when applied to lots of symbols."  But this answer really makes sense
only if strategy 1 above is ineffective.

If your symbol-search structure is a hash table, then all you have to
do is use SYMBOL_SOURCE_NAME (s) as the hash key; it is irrelevant
whether you actually store the SYMBOL_SOURCE_NAME in s.  Hash tables
have precisely the effect of drastically decreasing the size of S'.  
If you are relying on binary search, admittedly, things are not so
simple, since the sort itself requires several evaluations of
SYMBOL_SOURCE_NAME for each symbol (but then, I've always preferred
hash tables (:->)).


Paul Hilfinger


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Demangling and searches
  2003-01-09  2:38   ` Paul Hilfinger
@ 2003-01-09 21:51     ` David Carlton
  0 siblings, 0 replies; 7+ messages in thread
From: David Carlton @ 2003-01-09 21:51 UTC (permalink / raw)
  To: Paul Hilfinger; +Cc: Elena Zannoni, Adam Fedor, GDB Patches, Daniel Jacobowitz

On Wed, 08 Jan 2003 18:37:57 -0800, Paul Hilfinger <hilfingr@CS.Berkeley.EDU> said:

>> I'm curious: in Ada, what does the mangling do?  In particular, how
>> much type info does it contain?  In C++, the mangled name contains
>> type info for the arguments for functions; I don't see how, using
>> GDB's current data structures, to allow us to allow users to, say,
>> break on a function without requiring them to specify the types of the
>> arguments, if we took your approach.  (Though it might be possible to
>> modify GDB's data structures to allow that.)

> In Ada, the mangled name does not contain type information, but we
> actually solve an even harder problem.  The mangled name contains
> certain information that the user doesn't necessarily know, so that
> the system CANNOT reconstruct the full mangled name from the user's
> input a priori.

That's true for C++, too: the users don't know the types in question,
among other issues (anonymous namespaces!).  Which is why the hash
function that we apply to demangled names doesn't look at the entire
demangled name.

> I am asking why we can't change P to 

>        K equals f (SYMBOL_NAME (s), SYMBOL_LANGUAGE (s)),

> or, more abstractly, to something like

>        compare_demangled (K, SYMBOL_NAME (s), SYMBOL_LANGUAGE (s)) == 0

> saving considerable space in the process.  (Daniel Berlin points out
> that ABI also figures into these, but here I'll just go by the
> parameterization in symtab.h).  The answer is "because f is costly
> when applied to lots of symbols."  But this answer really makes sense
> only if strategy 1 above is ineffective.

> If your symbol-search structure is a hash table, then all you have
> to do is use SYMBOL_SOURCE_NAME (s) as the hash key; it is
> irrelevant whether you actually store the SYMBOL_SOURCE_NAME in s.

That could work.  Right now, it turns out that demangling C++ names is
probably more expensive in time than in memory, so the naive
implementation of what you propose (call the demangler to compute
SYMBOL_SOURCE_NAME, but then forget the demangled name) wouldn't be
much of a win.  But it should, I think, be possible to write a
function that computes our hash value directly from the mangled name,
and to compute it much more quickly than calling the demangler, and to
write an appropriate comparison function.

It seems like a pain to implement and a maintenance burden, and I
don't know exactly what the tradeoffs would be.  But you might well be
correct that it's possible.

David Carlton
carlton@math.stanford.edu


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2003-01-09 21:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-01-07 23:56 Demangling and searches Paul Hilfinger
2003-01-08  0:13 ` David Carlton
2003-01-08  0:38   ` Daniel Berlin
2003-01-09  2:38   ` Paul Hilfinger
2003-01-09 21:51     ` David Carlton
2003-01-08  1:04 ` Daniel Jacobowitz
2003-01-08  1:39 ` Elena Zannoni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox