From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13009 invoked by alias); 9 Jan 2003 02:38:16 -0000 Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sources.redhat.com Received: (qmail 12967 invoked from network); 9 Jan 2003 02:38:14 -0000 Received: from unknown (HELO tully.CS.Berkeley.EDU) (128.32.153.227) by 209.249.29.67 with SMTP; 9 Jan 2003 02:38:14 -0000 Received: from tully.CS.Berkeley.EDU (hilfingr@localhost) by tully.CS.Berkeley.EDU (8.9.1a/8.9.1) with ESMTP id SAA22983; Wed, 8 Jan 2003 18:37:57 -0800 (PST) Message-Id: <200301090237.SAA22983@tully.CS.Berkeley.EDU> To: David Carlton cc: Elena Zannoni , Adam Fedor , GDB Patches , Daniel Jacobowitz Subject: Re: Demangling and searches In-Reply-To: Message from David Carlton of "Tue, 07 Jan 2003 16:13:38 PST." MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <22978.1042079877.1@tully.CS.Berkeley.EDU> Date: Thu, 09 Jan 2003 02:38:00 -0000 From: Paul Hilfinger X-SW-Source: 2003-01/txt/msg00107.txt.bz2 > I'm curious: in Ada, what does the mangling do? In particular, how > much type info does it contain? In C++, the mangled name contains > type info for the arguments for functions; I don't see how, using > GDB's current data structures, to allow us to allow users to, say, > break on a function without requiring them to specify the types of the > arguments, if we took your approach. (Though it might be possible to > modify GDB's data structures to allow that.) In Ada, the mangled name does not contain type information, but we actually solve an even harder problem. The mangled name contains certain information that the user doesn't necessarily know, so that the system CANNOT reconstruct the full mangled name from the user's input a priori. Yet nevertheless, we look up quite successfully, thank you very much. The technique is simple: the mangled names have the form where the syntax of is such that we can find its beginning. In Ada mode, GDB has to use a special lookup function that ignores , but otherwise the algorithm is the same as for C or C++, etc. Let's look at this a bit abstractly. We're given a key K to search for, among a set of symbols, S. There is some predicate, P, that defines success: you want s in S such that P(K, s). The current scheme achieves speed with two complementary strategies: 1. Use K to find a subset, S', of S, limiting the number of comparisons. 2. Speed up P with preprocessing: P has (roughly) the form K equals SYMBOL_SOURCE_NAME (s) where SYMBOL_SOURCE_NAME (s) = f (SYMBOL_NAME (s), SYMBOL_LANGUAGE(s)), I am asking why we can't change P to K equals f (SYMBOL_NAME (s), SYMBOL_LANGUAGE (s)), or, more abstractly, to something like compare_demangled (K, SYMBOL_NAME (s), SYMBOL_LANGUAGE (s)) == 0 saving considerable space in the process. (Daniel Berlin points out that ABI also figures into these, but here I'll just go by the parameterization in symtab.h). The answer is "because f is costly when applied to lots of symbols." But this answer really makes sense only if strategy 1 above is ineffective. If your symbol-search structure is a hash table, then all you have to do is use SYMBOL_SOURCE_NAME (s) as the hash key; it is irrelevant whether you actually store the SYMBOL_SOURCE_NAME in s. Hash tables have precisely the effect of drastically decreasing the size of S'. If you are relying on binary search, admittedly, things are not so simple, since the sort itself requires several evaluations of SYMBOL_SOURCE_NAME for each symbol (but then, I've always preferred hash tables (:->)). Paul Hilfinger