From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 7213 invoked by alias); 29 Nov 2011 03:07:06 -0000 Received: (qmail 7198 invoked by uid 22791); 29 Nov 2011 03:07:04 -0000 X-SWARE-Spam-Status: No, hits=0.7 required=5.0 tests=AWL,BAYES_00,KAM_STOCKTIP X-Spam-Check-By: sourceware.org Received: from rock.gnat.com (HELO rock.gnat.com) (205.232.38.15) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 29 Nov 2011 03:06:51 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by filtered-rock.gnat.com (Postfix) with ESMTP id A73ED2BB142; Mon, 28 Nov 2011 22:06:50 -0500 (EST) Received: from rock.gnat.com ([127.0.0.1]) by localhost (rock.gnat.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id OcP2l9RIpcSd; Mon, 28 Nov 2011 22:06:50 -0500 (EST) Received: from joel.gnat.com (localhost.localdomain [127.0.0.1]) by rock.gnat.com (Postfix) with ESMTP id 62DC92BB13E; Mon, 28 Nov 2011 22:06:50 -0500 (EST) Received: by joel.gnat.com (Postfix, from userid 1000) id 884E9145615; Mon, 28 Nov 2011 22:06:37 -0500 (EST) Date: Tue, 29 Nov 2011 03:07:00 -0000 From: Joel Brobecker To: Tom Tromey Cc: gdb-patches@sources.redhat.com Subject: partial-symtab symbol sorting (was: "Re: GDB 7.4 branching status? (2011-11-23)") Message-ID: <20111129030637.GM24943@adacore.com> References: <20111123163917.GA13809@adacore.com> <20111123232406.GQ13809@adacore.com> <20111124105603.GA91879@adacore.com> <20111124163304.GR13809@adacore.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2011-11/txt/msg00796.txt.bz2 This is an issue that is only tangential to this patch, and should not be considered part of the series. I just realized that partial symbols are sorted using strcmp_iw_ordered. This works great for C++, for instance, but only works OK for Ada. I think that this is related to the fact that we might be using the linkage name, rather than the natural name (we compute the natural name only on-demand, due to memory pressure in large apps). As a result, the strcmp_iw_ordered routine can return non-zero for two names that ada-lang.c:compare_names would consider equal. For instance: `pck__hello' and `pck__hello__2'. So when doing a symbol lookup for pck__hello, for instance, we pass our own comparison routine, which is "compatible" with strcmp_iw_ordered to the psymtab map_matching_symbols routine. This allows us to perform a binary search rather than linear one. I am wondering if we shouldn't be sorting the partial symbols using a language-specific sorting routine instead. As it turns out, there was a bug in ada-lang.c:compare_names and that could have caused the two search orders to diverge. The thing is, when I looked at it, it's not easy just looking at the partial symtab what language it is. The language seems to be embedded in the symbols themselves. And then, we'd still have to specify whether we'd want to perform a binary search or not anyways. That's because we permit "wild" matches: (gdb) break hello In the case above, we must break on "pck__hello" and "pck__hello__2". In that case, binary searches based on string comparison cannot work because we're missing the start of the symbol linkage name. As an aside: One of the ideas I had in the past was to store the natural name inverted - for instance "hello.pck" instead of "pck.hello". That way, searches for hello could be done using binary searches as well. It might actually allow us to reconcile "wild" vs "non-wild" searches, and even allow "semi-wild" lookups as in: (gdb) break subpackage.hello ... would now manage to find matches such as package.subpackage.hello. This is not the case today. You either fully qualify your symbol name, or you don't qualify it at all. But, without even thinking about performance issues at startup, this approach suffers from the same problem as storing the natural name does: For certain large applications, we would exceed the maximum amount of memory a process can hold. This is not necessarily on GNU/Linux, but the problem is there. I think we'll have better luck we are capable of merging a bit the massive duplication in the debug info. -- Joel