From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-62476-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 28443 invoked by alias); 26 Mar 2009 01:16:20 -0000
Received: (qmail 28428 invoked by uid 22791); 26 Mar 2009 01:16:19 -0000
X-SWARE-Spam-Status: No, hits=-2.4 required=5.0 	tests=AWL,BAYES_00
X-Spam-Check-By: sourceware.org
Received: from rock.gnat.com (HELO rock.gnat.com) (205.232.38.15)     by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 26 Mar 2009 01:16:14 +0000
Received: from localhost (localhost.localdomain [127.0.0.1]) 	by filtered-rock.gnat.com (Postfix) with ESMTP id CE2342BAB7B; 	Wed, 25 Mar 2009 21:16:12 -0400 (EDT)
Received: from rock.gnat.com ([127.0.0.1]) 	by localhost (rock.gnat.com [127.0.0.1]) (amavisd-new, port 10024) 	with LMTP id n6yNhLjD6AiI; Wed, 25 Mar 2009 21:16:12 -0400 (EDT)
Received: from joel.gnat.com (localhost.localdomain [127.0.0.1]) 	by rock.gnat.com (Postfix) with ESMTP id 8D9632BAB5F; 	Wed, 25 Mar 2009 21:16:12 -0400 (EDT)
Received: by joel.gnat.com (Postfix, from userid 1000) 	id 6BC385BD21; Wed, 25 Mar 2009 18:16:05 -0700 (PDT)
Date: Thu, 26 Mar 2009 01:24:00 -0000
From: Joel Brobecker <brobecker@adacore.com>
To: Tom Tromey <tromey@redhat.com>, Paul Hilfinger <hilfinger@adacore.com>
Cc: gdb-patches@sourceware.org
Subject: Re: RFC: lazy partial symbol table reading
Message-ID: <20090326011605.GJ9472@adacore.com>
References: <m3y6uv6a7b.fsf@fleche.redhat.com> <20090325223211.GH9472@adacore.com> <m33ad1yxpe.fsf@fleche.redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <m33ad1yxpe.fsf@fleche.redhat.com>
User-Agent: Mutt/1.5.18 (2008-05-17)
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
X-SW-Source: 2009-03/txt/msg00586.txt.bz2

> Joel> I think I need to discuss the principle of this patch internally with
> Joel> the other AdaCore GDB engineers, in particular with Paul Hilfinger.

I've started a discussion, so let's what's coming out of it.
But I took a look too, and I think it can definitely be a great
step forward.

Basically, what I'm worried about is creating a big pause once
the debugging session has started. For instance, if the user tries
to print the value of variable that's not found in the local scope
(basically, a global variable), the next thing we do is search all
symbols in all symtabs/psymtabs. Or even more common: Inserting
a breakpoint on a function ("break function_name"). Again, we search
all symtabs/psymtabs.  This is necessary, because we need to handle
homonyms.

But:

If we have a way of having all symbol names available without creating
the symtabs (the .debug_pubnames section), then I think we have the cake
and eat it too: We only have to create the psymtab for the CUs that
we need. In fact, one could argue that we actually no longer need
the partial symtab anymore...

Just one thing that we need to be careful of, for Ada: Make sure that
names of symbols for nested functions appear in the pubnames section.
Ada users expect us to be able to break on a nested function without
having to be in scope first.

As a side note, this is connected to another long-term I-will-probably-
never-get-to-do-it project of mine which is to speed-up our lookup
routines. Currently, we only store the "linkage" name in order to
save memory space. This is different from C++, I believe, we we store
both the linkage name as well as the "natural" name.  The problem with
that approach is that it makes symbol name matching during lookups
become a lot more work than simple string matching. We pretty much do
symbol-name decoding on the fly for every symbol we try to match.
Very ineficient.

The idea was: If I could somehow find the memory to store the natural
names, then the symbol name matching routine can be simplified
dramatically, and we would probably be as efficient as C (modulo the fact
that we search all symtabs instead of stopping on the first match).
In order to achieve that, the idea would be to use a string table,
where we would reuse the same string when 2 symbols have the same
name.  This happens all the time. For instance, types defined in
a common file that's included by several units get repeated in every
unit. To give myself a rough idea of how much we could save, I measure
the amount of memory used to allocate symbol names.  Then I computed
the amount of memory required if we only had unique names. The number
went from 85_036_058 bytes down to 12_780_400 bytes!

Going one step further, I would love to have a link from the string
back to all matching symbols/psymbols/msymbols. Currently, we store
our symbol information in a way that easy to build, but not so friendly
to search. Perhaps one day we'll be able to improve that aspect.

I did create the equivalent of a PR in AdaCore's tracking system
in order to implement the string-table idea...

-- 
Joel