From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-85619-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 7213 invoked by alias); 29 Nov 2011 03:07:06 -0000
Received: (qmail 7198 invoked by uid 22791); 29 Nov 2011 03:07:04 -0000
X-SWARE-Spam-Status: No, hits=0.7 required=5.0	tests=AWL,BAYES_00,KAM_STOCKTIP
X-Spam-Check-By: sourceware.org
Received: from rock.gnat.com (HELO rock.gnat.com) (205.232.38.15)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 29 Nov 2011 03:06:51 +0000
Received: from localhost (localhost.localdomain [127.0.0.1])	by filtered-rock.gnat.com (Postfix) with ESMTP id A73ED2BB142;	Mon, 28 Nov 2011 22:06:50 -0500 (EST)
Received: from rock.gnat.com ([127.0.0.1])	by localhost (rock.gnat.com [127.0.0.1]) (amavisd-new, port 10024)	with LMTP id OcP2l9RIpcSd; Mon, 28 Nov 2011 22:06:50 -0500 (EST)
Received: from joel.gnat.com (localhost.localdomain [127.0.0.1])	by rock.gnat.com (Postfix) with ESMTP id 62DC92BB13E;	Mon, 28 Nov 2011 22:06:50 -0500 (EST)
Received: by joel.gnat.com (Postfix, from userid 1000)	id 884E9145615; Mon, 28 Nov 2011 22:06:37 -0500 (EST)
Date: Tue, 29 Nov 2011 03:07:00 -0000
From: Joel Brobecker <brobecker@adacore.com>
To: Tom Tromey <tromey@redhat.com>
Cc: gdb-patches@sources.redhat.com
Subject: partial-symtab symbol sorting (was: "Re: GDB 7.4 branching status? (2011-11-23)")
Message-ID: <20111129030637.GM24943@adacore.com>
References: <20111123163917.GA13809@adacore.com> <m38vn6y8v6.fsf@fleche.redhat.com> <20111123232406.GQ13809@adacore.com> <20111124105603.GA91879@adacore.com> <20111124163304.GR13809@adacore.com> <m31ussq6o2.fsf@fleche.redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <m31ussq6o2.fsf@fleche.redhat.com>
User-Agent: Mutt/1.5.20 (2009-06-14)
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
X-SW-Source: 2011-11/txt/msg00796.txt.bz2

This is an issue that is only tangential to this patch, and should
not be considered part of the series.

I just realized that partial symbols are sorted using strcmp_iw_ordered.
This works great for C++, for instance, but only works OK for Ada.
I think that this is related to the fact that we might be using
the linkage name, rather than the natural name (we compute the natural
name only on-demand, due to memory pressure in large apps). As a result,
the strcmp_iw_ordered routine can return non-zero for two names that
ada-lang.c:compare_names would consider equal.

For instance: `pck__hello' and `pck__hello__2'.

So when doing a symbol lookup for pck__hello, for instance, we pass
our own comparison routine, which is "compatible" with
strcmp_iw_ordered to the psymtab map_matching_symbols routine.
This allows us to perform a binary search rather than linear one.

I am wondering if we shouldn't be sorting the partial symbols
using a language-specific sorting routine instead.  As it turns
out, there was a bug in ada-lang.c:compare_names and that could
have caused the two search orders to diverge.  The thing is, when
I looked at it, it's not easy just looking at the partial symtab
what language it is. The language seems to be embedded in the
symbols themselves. And then, we'd still have to specify whether
we'd want to perform a binary search or not anyways. That's because
we permit "wild" matches:

        (gdb) break hello

In the case above, we must break on "pck__hello" and "pck__hello__2".
In that case, binary searches based on string comparison cannot work
because we're missing the start of the symbol linkage name.

As an aside: One of the ideas I had in the past was to store the natural
name inverted - for instance "hello.pck" instead of "pck.hello". That
way, searches for hello could be done using binary searches as well.
It might actually allow us to reconcile "wild" vs "non-wild" searches,
and even allow "semi-wild" lookups as in:

        (gdb) break subpackage.hello

... would now manage to find matches such as package.subpackage.hello.
This is not the case today. You either fully qualify your symbol name,
or you don't qualify it at all.

But, without even thinking about performance issues at startup, this
approach suffers from the same problem as storing the natural name does:
For certain large applications, we would exceed the maximum amount of
memory a process can hold. This is not necessarily on GNU/Linux, but
the problem is there. I think we'll have better luck we are capable of
merging a bit the massive duplication in the debug info.

-- 
Joel