From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <carlton@math.stanford.edu>
Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm
Received: (qmail 20276 invoked from network); 9 Jan 2003 21:51:55 -0000
Received: from unknown (HELO jackfruit.Stanford.EDU) (171.64.38.136)
  by 209.249.29.67 with SMTP; 9 Jan 2003 21:51:55 -0000
Received: (from carlton@localhost)
	by jackfruit.Stanford.EDU (8.11.6/8.11.6) id h09LpbC17928;
	Thu, 9 Jan 2003 13:51:37 -0800
X-Authentication-Warning: jackfruit.Stanford.EDU: carlton set sender to carlton@math.stanford.edu using -f
To: Paul Hilfinger <hilfingr@CS.Berkeley.EDU>
Cc: Elena Zannoni <ezannoni@redhat.com>, Adam Fedor <fedor@doc.com>,
   GDB Patches <gdb@sources.redhat.com>, Daniel Jacobowitz <drow@mvista.com>
Subject: Re: Demangling and searches
References: <200301090237.SAA22983@tully.CS.Berkeley.EDU>
From: David Carlton <carlton@math.stanford.edu>
Date: Thu, 09 Jan 2003 21:51:00 -0000
In-Reply-To: <200301090237.SAA22983@tully.CS.Berkeley.EDU>
Message-ID: <ro1bs2qf0uv.fsf@jackfruit.Stanford.EDU>
User-Agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Common Lisp)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-SW-Source: 2003-01/txt/msg00151.txt.bz2

On Wed, 08 Jan 2003 18:37:57 -0800, Paul Hilfinger <hilfingr@CS.Berkeley.EDU> said:

>> I'm curious: in Ada, what does the mangling do?  In particular, how
>> much type info does it contain?  In C++, the mangled name contains
>> type info for the arguments for functions; I don't see how, using
>> GDB's current data structures, to allow us to allow users to, say,
>> break on a function without requiring them to specify the types of the
>> arguments, if we took your approach.  (Though it might be possible to
>> modify GDB's data structures to allow that.)

> In Ada, the mangled name does not contain type information, but we
> actually solve an even harder problem.  The mangled name contains
> certain information that the user doesn't necessarily know, so that
> the system CANNOT reconstruct the full mangled name from the user's
> input a priori.

That's true for C++, too: the users don't know the types in question,
among other issues (anonymous namespaces!).  Which is why the hash
function that we apply to demangled names doesn't look at the entire
demangled name.

> I am asking why we can't change P to 

>        K equals f (SYMBOL_NAME (s), SYMBOL_LANGUAGE (s)),

> or, more abstractly, to something like

>        compare_demangled (K, SYMBOL_NAME (s), SYMBOL_LANGUAGE (s)) == 0

> saving considerable space in the process.  (Daniel Berlin points out
> that ABI also figures into these, but here I'll just go by the
> parameterization in symtab.h).  The answer is "because f is costly
> when applied to lots of symbols."  But this answer really makes sense
> only if strategy 1 above is ineffective.

> If your symbol-search structure is a hash table, then all you have
> to do is use SYMBOL_SOURCE_NAME (s) as the hash key; it is
> irrelevant whether you actually store the SYMBOL_SOURCE_NAME in s.

That could work.  Right now, it turns out that demangling C++ names is
probably more expensive in time than in memory, so the naive
implementation of what you propose (call the demangler to compute
SYMBOL_SOURCE_NAME, but then forget the demangled name) wouldn't be
much of a win.  But it should, I think, be possible to write a
function that computes our hash value directly from the mangled name,
and to compute it much more quickly than calling the demangler, and to
write an appropriate comparison function.

It seems like a pain to implement and a maintenance burden, and I
don't know exactly what the tradeoffs would be.  But you might well be
correct that it's possible.

David Carlton
carlton@math.stanford.edu