From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-10503-listarch-gdb=sourceware.cygnus.com@sources.redhat.com>
Received: (qmail 15448 invoked by alias); 23 Sep 2002 03:11:02 -0000
Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-subscribe@sources.redhat.com>
List-Archive: <http://sources.redhat.com/ml/gdb/>
List-Post: <mailto:gdb@sources.redhat.com>
List-Help: <mailto:gdb-help@sources.redhat.com>, <http://sources.redhat.com/ml/#faqs>
Sender: gdb-owner@sources.redhat.com
Received: (qmail 15439 invoked from network); 23 Sep 2002 03:11:01 -0000
Received: from unknown (HELO crack.them.org) (65.125.64.184)
  by sources.redhat.com with SMTP; 23 Sep 2002 03:11:00 -0000
Received: from nevyn.them.org ([66.93.61.169] ident=mail)
	by crack.them.org with asmtp (Exim 3.12 #1 (Debian))
	id 17tKYZ-0000Dd-00; Sun, 22 Sep 2002 23:10:47 -0500
Received: from drow by nevyn.them.org with local (Exim 3.35 #1 (Debian))
	id 17tJce-0006v0-00; Sun, 22 Sep 2002 23:10:56 -0400
Date: Sun, 22 Sep 2002 20:11:00 -0000
From: Daniel Jacobowitz <drow@mvista.com>
To: Jim Blandy <jimb@redhat.com>
Cc: david carlton <carlton@math.stanford.edu>, gdb@sources.redhat.com
Subject: Re: suggestion for dictionary representation
Message-ID: <20020923031056.GA26307@nevyn.them.org>
Mail-Followup-To: Jim Blandy <jimb@redhat.com>,
	david carlton <carlton@math.stanford.edu>, gdb@sources.redhat.com
References: <200209230244.g8N2ieo21741@zenia.red-bean.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <200209230244.g8N2ieo21741@zenia.red-bean.com>
User-Agent: Mutt/1.5.1i
X-SW-Source: 2002-09/txt/msg00346.txt.bz2

On Sun, Sep 22, 2002 at 09:44:40PM -0500, Jim Blandy wrote:
> 
> It seems to me that the `skip list' data structure simultaneously
> meets a lot of the criteria that we're currently meeting by having
> multiple representations.  Skip lists:
> - provide (probabalistic) log n access time
> - are low overhead ("They can easily be configured to require an
>   average of 1 1/3 pointers per element (or even less).")
> - are easy to build incrementally, and stay "balanced" automatically
> - are obstack-friendly, since they don't involve realloc (as hash
>   tables do)
> - are an ordered structure, which would support completion nicely (and,
>   by the way, make the `_Z' test for the C++ V3 ABI faster too)
> - have a trivial iterator (walking the finest level of links)
> - are pretty easy to understand
> 
> http://www.cs.umd.edu/~pugh points to a paper describing and analyzing
> them.
> 
> Using skip lists, there'd be no need to distinguish `expandable' from
> non-expandable blocks.  This one structure would scale to handle both
> local blocks and the global environment (depending on how we handle
> lazy symbol reading --- I'd like a more generic and descriptive term
> than "partial symbol tables").

Hmm.  Lots of simplicity/cleanliness benefits, but the real question as
far as I'm concerned is whether the benefit to completion (the _Z thing
is done as we read in symbols right now, so it's a complete non-issue)
outweights going from O(1) to O(probabalistic log n) for symbol lookup.

I suspect it would; having faster completion [I can't really see how to
beat O(n) with the current hash tables, can anyone else?  But I think
it's slower than O(n) right now; I recall it being quadratic...] would
be nice.  O(~ log N) ought to be plenty fast, right?

> The only remaining special case would be function blocks, in which
> parameter symbols must to appear in the order they appear in the
> source.  I think it's pretty ugly to abuse the name array this way; it
> introduces special cases in dumb places.  This kludge could be removed
> by changing the `function' member of `struct block' to a pointer to
> something like this:
> 
>    struct function_info {
>        struct symbol *sym;
>        int num_params;
>        struct symbol **params;
>    };
> 
> This would require extra space only for function blocks; non-function
> blocks would remain the same size.  And this info would only be
> consulted when we actually wanted to iterate over the parameters.
> This would clean up a bunch of loops in GDB that currently have to
> iterate over all the symbols in a function's block and do a switch on
> each symbol's address class to find the arguments.  (And would this
> also allow us to remove the arg/other distinction in enum
> address_class?  Dunno.)
> 
> But if we were to remove function blocks as a special case, there
> would only need to be a single structure for representing levels of
> the namespace.

I'm tempted to whack the block special case for function arguments.  It
may make name lookup a little more complicated but I think it will make
everything clearer.  We could, of course, try this on the branch and
see if we like the results :)

David, what do you think?

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer