From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-11792-listarch-gdb=sources.redhat.com@sources.redhat.com>
Received: (qmail 13009 invoked by alias); 9 Jan 2003 02:38:16 -0000
Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-subscribe@sources.redhat.com>
List-Archive: <http://sources.redhat.com/ml/gdb/>
List-Post: <mailto:gdb@sources.redhat.com>
List-Help: <mailto:gdb-help@sources.redhat.com>, <http://sources.redhat.com/ml/#faqs>
Sender: gdb-owner@sources.redhat.com
Received: (qmail 12967 invoked from network); 9 Jan 2003 02:38:14 -0000
Received: from unknown (HELO tully.CS.Berkeley.EDU) (128.32.153.227)
  by 209.249.29.67 with SMTP; 9 Jan 2003 02:38:14 -0000
Received: from tully.CS.Berkeley.EDU (hilfingr@localhost)
	by tully.CS.Berkeley.EDU (8.9.1a/8.9.1) with ESMTP id SAA22983;
	Wed, 8 Jan 2003 18:37:57 -0800 (PST)
Message-Id: <200301090237.SAA22983@tully.CS.Berkeley.EDU>
To: David Carlton <carlton@math.stanford.edu>
cc: Elena Zannoni <ezannoni@redhat.com>, Adam Fedor <fedor@doc.com>,
        GDB Patches <gdb@sources.redhat.com>,
        Daniel Jacobowitz <drow@mvista.com>
Subject: Re: Demangling and searches 
In-Reply-To: Message from David Carlton <carlton@math.stanford.edu> 
   of "Tue, 07 Jan 2003 16:13:38 PST." <ro1n0mcebwt.fsf@jackfruit.Stanford.EDU> 
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <22978.1042079877.1@tully.CS.Berkeley.EDU>
Date: Thu, 09 Jan 2003 02:38:00 -0000
From: Paul Hilfinger <hilfingr@CS.Berkeley.EDU>
X-SW-Source: 2003-01/txt/msg00107.txt.bz2


 > I'm curious: in Ada, what does the mangling do?  In particular, how
 > much type info does it contain?  In C++, the mangled name contains
 > type info for the arguments for functions; I don't see how, using
 > GDB's current data structures, to allow us to allow users to, say,
 > break on a function without requiring them to specify the types of the
 > arguments, if we took your approach.  (Though it might be possible to
 > modify GDB's data structures to allow that.)

In Ada, the mangled name does not contain type information, but we
actually solve an even harder problem.  The mangled name contains
certain information that the user doesn't necessarily know, so that the system
CANNOT reconstruct the full mangled name from the user's input a
priori.  Yet nevertheless, we look up quite successfully, thank you
very much.  The technique is simple: the mangled names have the form

     <NAME><OTHER INFO>

where the syntax of <OTHER INFO> is such that we can find its
beginning.  In Ada mode, GDB has to use a special lookup function
that ignores <OTHER INFO>, but otherwise the algorithm is the same as
for C or C++, etc.

Let's look at this a bit abstractly.  We're given a key K to
search for, among a set of symbols, S.  There is some predicate, P,
that defines success: you want s in S such that P(K, s).   The current
scheme achieves speed with two complementary strategies:

       1. Use K to find a subset, S', of S, limiting the number of 
          comparisons.
       2. Speed up P with preprocessing: P has (roughly) the form
                 K equals SYMBOL_SOURCE_NAME (s)
	  where SYMBOL_SOURCE_NAME (s) = f (SYMBOL_NAME (s), SYMBOL_LANGUAGE(s)),

I am asking why we can't change P to 

       K equals f (SYMBOL_NAME (s), SYMBOL_LANGUAGE (s)),

or, more abstractly, to something like

       compare_demangled (K, SYMBOL_NAME (s), SYMBOL_LANGUAGE (s)) == 0

saving considerable space in the process.  (Daniel Berlin points out
that ABI also figures into these, but here I'll just go by the
parameterization in symtab.h).  The answer is "because f is costly
when applied to lots of symbols."  But this answer really makes sense
only if strategy 1 above is ineffective.

If your symbol-search structure is a hash table, then all you have to
do is use SYMBOL_SOURCE_NAME (s) as the hash key; it is irrelevant
whether you actually store the SYMBOL_SOURCE_NAME in s.  Hash tables
have precisely the effect of drastically decreasing the size of S'.  
If you are relying on binary search, admittedly, things are not so
simple, since the sort itself requires several evaluations of
SYMBOL_SOURCE_NAME for each symbol (but then, I've always preferred
hash tables (:->)).


Paul Hilfinger