From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 28443 invoked by alias); 26 Mar 2009 01:16:20 -0000 Received: (qmail 28428 invoked by uid 22791); 26 Mar 2009 01:16:19 -0000 X-SWARE-Spam-Status: No, hits=-2.4 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from rock.gnat.com (HELO rock.gnat.com) (205.232.38.15) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 26 Mar 2009 01:16:14 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by filtered-rock.gnat.com (Postfix) with ESMTP id CE2342BAB7B; Wed, 25 Mar 2009 21:16:12 -0400 (EDT) Received: from rock.gnat.com ([127.0.0.1]) by localhost (rock.gnat.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id n6yNhLjD6AiI; Wed, 25 Mar 2009 21:16:12 -0400 (EDT) Received: from joel.gnat.com (localhost.localdomain [127.0.0.1]) by rock.gnat.com (Postfix) with ESMTP id 8D9632BAB5F; Wed, 25 Mar 2009 21:16:12 -0400 (EDT) Received: by joel.gnat.com (Postfix, from userid 1000) id 6BC385BD21; Wed, 25 Mar 2009 18:16:05 -0700 (PDT) Date: Thu, 26 Mar 2009 01:24:00 -0000 From: Joel Brobecker To: Tom Tromey , Paul Hilfinger Cc: gdb-patches@sourceware.org Subject: Re: RFC: lazy partial symbol table reading Message-ID: <20090326011605.GJ9472@adacore.com> References: <20090325223211.GH9472@adacore.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.18 (2008-05-17) Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2009-03/txt/msg00586.txt.bz2 > Joel> I think I need to discuss the principle of this patch internally with > Joel> the other AdaCore GDB engineers, in particular with Paul Hilfinger. I've started a discussion, so let's what's coming out of it. But I took a look too, and I think it can definitely be a great step forward. Basically, what I'm worried about is creating a big pause once the debugging session has started. For instance, if the user tries to print the value of variable that's not found in the local scope (basically, a global variable), the next thing we do is search all symbols in all symtabs/psymtabs. Or even more common: Inserting a breakpoint on a function ("break function_name"). Again, we search all symtabs/psymtabs. This is necessary, because we need to handle homonyms. But: If we have a way of having all symbol names available without creating the symtabs (the .debug_pubnames section), then I think we have the cake and eat it too: We only have to create the psymtab for the CUs that we need. In fact, one could argue that we actually no longer need the partial symtab anymore... Just one thing that we need to be careful of, for Ada: Make sure that names of symbols for nested functions appear in the pubnames section. Ada users expect us to be able to break on a nested function without having to be in scope first. As a side note, this is connected to another long-term I-will-probably- never-get-to-do-it project of mine which is to speed-up our lookup routines. Currently, we only store the "linkage" name in order to save memory space. This is different from C++, I believe, we we store both the linkage name as well as the "natural" name. The problem with that approach is that it makes symbol name matching during lookups become a lot more work than simple string matching. We pretty much do symbol-name decoding on the fly for every symbol we try to match. Very ineficient. The idea was: If I could somehow find the memory to store the natural names, then the symbol name matching routine can be simplified dramatically, and we would probably be as efficient as C (modulo the fact that we search all symtabs instead of stopping on the first match). In order to achieve that, the idea would be to use a string table, where we would reuse the same string when 2 symbols have the same name. This happens all the time. For instance, types defined in a common file that's included by several units get repeated in every unit. To give myself a rough idea of how much we could save, I measure the amount of memory used to allocate symbol names. Then I computed the amount of memory required if we only had unique names. The number went from 85_036_058 bytes down to 12_780_400 bytes! Going one step further, I would love to have a link from the string back to all matching symbols/psymbols/msymbols. Currently, we store our symbol information in a way that easy to build, but not so friendly to search. Perhaps one day we'll be able to improve that aspect. I did create the equivalent of a PR in AdaCore's tracking system in order to implement the string-table idea... -- Joel