From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-10193-listarch-gdb=sourceware.cygnus.com@sources.redhat.com>
Received: (qmail 17863 invoked by alias); 6 Sep 2002 19:34:41 -0000
Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-subscribe@sources.redhat.com>
List-Archive: <http://sources.redhat.com/ml/gdb/>
List-Post: <mailto:gdb@sources.redhat.com>
List-Help: <mailto:gdb-help@sources.redhat.com>, <http://sources.redhat.com/ml/#faqs>
Sender: gdb-owner@sources.redhat.com
Received: (qmail 17855 invoked from network); 6 Sep 2002 19:34:40 -0000
Received: from unknown (HELO mail.cdt.org) (206.112.85.61)
  by sources.redhat.com with SMTP; 6 Sep 2002 19:34:40 -0000
Received: from www.dberlin.org (pool-151-200-239-242.res.east.verizon.net [151.200.239.242])
	by mail.cdt.org (Postfix) with ESMTP
	id 06F8749003F; Fri,  6 Sep 2002 15:13:00 -0400 (EDT)
Received: by www.dberlin.org (Postfix, from userid 503)
	id 189FB1851427; Fri,  6 Sep 2002 15:34:35 -0400 (EDT)
Received: from localhost (localhost.localdomain [127.0.0.1])
	by www.dberlin.org (Postfix) with ESMTP
	id 8C1681850002; Fri,  6 Sep 2002 15:34:34 -0400 (EDT)
Date: Fri, 06 Sep 2002 12:34:00 -0000
From: Daniel Berlin <dberlin@dberlin.org>
To: David Carlton <carlton@math.stanford.edu>
Cc: Daniel Jacobowitz <drow@mvista.com>, gdb <gdb@sources.redhat.com>
Subject: Re: struct environment
In-Reply-To: <ro14rd3c5gx.fsf@jackfruit.Stanford.EDU>
Message-ID: <Pine.LNX.4.44.0209061521120.20047-100000@dberlin.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Spam-Status: No, hits=-7.1 required=5.0
	tests=AWL,IN_REP_TO,QUOTED_EMAIL_TEXT,SPAM_PHRASE_02_03,
	      USER_AGENT_PINE
	version=2.40-cvs
X-Spam-Level: 
X-SW-Source: 2002-09/txt/msg00036.txt.bz2


> one which might not even be necessary (see below).
> 
> >> And there's another, even more important reason: the global
> >> environment (and, in the future, namespaces).  The global
> >> environment spans multiple files, uses lazy evaluation
> >> (i.e. partial_symbols), and there's even minimal_symbols to
> >> shoehorn in there somehow.  So it's going to need its own special
> >> implementation (which will be much more complicated than the
> >> implementations for blocks).
> 
> > This is why I don't like the environment == list-of-symbols thing.
> > An environment may HAVE a list of symbols, but it is not its list of
> > symbols.  You shouldn't grow the list of symbols in the global
> > environment when a new file is added.  Instead you should associate
> > a new list of symbols with it.

This makes lookup dependent on two things, instead of one (number of 
files, and number of names, vs number of names).
Bad idea when you can avoid it.
If you want to know what blocks go with which files, than store *that* 
information.
Removal is *not* the common case.


> Files can be removed as well as added, remember.  We don't do that very well right now.
We don't do it at all, actually, because of obstacks. 
Only when we blow everything (IE all files) away at once, or something.

> 
> > Search the archives of this list for:
> >   Date: 11 Jun 2001 12:55:42 -0400
> >   From: Daniel Berlin <dan@cgsoftware.com>
> >   Subject: [RFC] Symbol table improvements
> 
> > for some other suggested reading on this topic.  I like this
> > architecture he describes, even if we're not quite ready for it yet.
> > We could be.
> 
> Thanks for the pointer.  Certainly I'm not at all thrilled with the
> way that the global symbol lookup functions work currently.  One
> thought that I had was to use a growable but quickly-searchable data
> structure for global lookups; but that has the problems that it makes
> removing individual files complicated (incidentally, I had been
> wondering if that could happen, but I guess from your response above
> that the answer is 'yes')

No, but we *should* be working towards this.
> and it might make psymtab->symtab conversion
> a bit trickier.
psymtabs are useless for things besides stabs and maybe one or two other 
minor debug formats. Almost every one, including DWARF2, HP's, etc, have a 
fast way to access the full symbol given a name or address, without having 
to construct some auxillary structure to do it.
psymtabs exist because STABS *can't* do this.
They, in actuality, should become specific to the STABS reader, and an 
internal mechanism *it* uses to do fast lookup.
For most other debug formats, it wastes memory, but their is no defined 
mechanism to bypass them (IE the interface to lazy lookup is through 
psymtabs directly, rather than abstracted in a way that lets you use 
DWARF2 pubnames/aranges, or the various structures present in other debug 
formats).

> > Skimming the message you're referring to, it seems like Daniel Berlin
> is proposing keeping the idea that the global environment is made out
> of a bunch of blocks from different files, but speeding up the process
> of going from a symbol name to the specific blocks to search.

This is what is slow in GDB.
As for whether it maps well to namespaces, the global environment really 
doesn't.
But in addition to being *much* faster, it's also easier to deal with 
namespaces in this kind of scheme, since you know all the blocks in which 
a name exists, and you can use a language dependent scheme to determine 
which is the name you want.
Before, you would no way to figure out this information except by brute 
force, if you wanted to look at names outside of the current scopes as 
well (because you wouldn't know what other scopes they exist in).
>sounds like a good idea: it's still fast, but it still allows a 
> per-file granularity which seems like it would be useful.

I studied this problem extensively while trying to speed up GDB, and 
reduce it's memory usage.
In fact, I had the dwarf2 reader able to drop and read files at will, but 
it required bypassing psymtabs.

Note that this wasn't the first time someone wanted to do this.  HP has a 
whole "FAT_FREE_PSYMTABS" thing in WDB.