From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-bounces~public-inbox=simark.ca@sourceware.org>
Received: from simark.ca
	by simark.ca with LMTP
	id RQz+LiH382cXZCoAWB0awg
	(envelope-from <gdb-bounces~public-inbox=simark.ca@sourceware.org>)
	for <public-inbox@simark.ca>; Mon, 07 Apr 2025 12:02:41 -0400
Authentication-Results: simark.ca;
	dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=gEid1E7Z;
	dkim-atps=neutral
Received: by simark.ca (Postfix, from userid 112)
	id AE99D1E0C3; Mon,  7 Apr 2025 12:02:41 -0400 (EDT)
X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on simark.ca
X-Spam-Level: 
X-Spam-Status: No, score=-5.4 required=5.0 tests=ARC_SIGNED,ARC_VALID,BAYES_00,
	DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=unavailable autolearn_force=no
	version=4.0.1
Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256)
	(No client certificate requested)
	by simark.ca (Postfix) with ESMTPS id 10E3B1E05C
	for <public-inbox@simark.ca>; Mon,  7 Apr 2025 12:02:41 -0400 (EDT)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id A6071384387D
	for <public-inbox@simark.ca>; Mon,  7 Apr 2025 16:02:40 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A6071384387D
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org;
	s=default; t=1744041760;
	bh=3qCfZbFCsf6RaFgzVo+ErRKR9mm1JGrY4Upz6BnEAXk=;
	h=Date:Subject:To:Cc:References:In-Reply-To:List-Id:
	 List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe:
	 From:Reply-To:From;
	b=gEid1E7ZKBgzGOVNefSGD4trbYNbb0mtzx8rbE4q3g1fGjksv/lnfMcohWTMlDlP2
	 7HapqSKESCrIjO67xMgwsfFbkpfSoUTy+w1+tmKqObD+RXo/Q10o01x5J79cXkYV3a
	 Zrk2fSBf3j2wCPlfMSDFVxYhoXtlgpRieMF/7bSc=
Received: from simark.ca (simark.ca [158.69.221.121])
 by sourceware.org (Postfix) with ESMTPS id 0DAF7384587B
 for <gdb@sourceware.org>; Mon,  7 Apr 2025 16:01:42 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0DAF7384587B
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 0DAF7384587B
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1744041702; cv=none;
 b=bVc8dnH4jUEnk2ZcTZ9gVYan/1IcjiR84yQbO8OsAW52/ec7gT2WD7dX9jECoOMW27bvKhlD9TtjqRo3sEWBGa/sZQqVARYEtneG3Nj7xFnv7sQD8eexD68z5Faet+b4ZfJNmAEqY6IwChP+8r05iSsbe6XSpfesFrmktNvT04A=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1744041702; c=relaxed/simple;
 bh=SFCyXMljuF1PGgNDgCL3sC1cSNtCex5uupioxCaSSCY=;
 h=DKIM-Signature:DKIM-Signature:Message-ID:Date:MIME-Version:
 Subject:To:From;
 b=Kwsjc6Ia6/4PsxlB13kW/rNLWA5LasQQXsZ4h2Ku9C9pBi43udh4ZkdAhRAg2EjFS6QU7wuCMqmsTcr9e7KY8HFspn9rGr/XC1FNjCOlu6dX/VXQJVkwIGj5pz94IMeafD+29qqj6in/2LbI0xa+MrayCOiBi9/WxtNCF4jLgz8=
ARC-Authentication-Results: i=1; server2.sourceware.org
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0DAF7384587B
Received: by simark.ca (Postfix, from userid 112)
 id 6F7701E0C3; Mon,  7 Apr 2025 12:01:41 -0400 (EDT)
Received: from [10.0.0.11] (modemcable238.237-201-24.mc.videotron.ca
 [24.201.237.238])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
 key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256)
 (No client certificate requested)
 by simark.ca (Postfix) with ESMTPSA id 978F31E05C;
 Mon,  7 Apr 2025 12:01:38 -0400 (EDT)
Message-ID: <2caec2e4-6baf-45ca-875c-8a8a6b6bfe42@simark.ca>
Date: Mon, 7 Apr 2025 12:01:38 -0400
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: Loading some symbols, when, and index-cache
To: =?UTF-8?Q?Llu=C3=ADs_Batlle_i_Rossell?= <viric@viric.name>,
 Guinevere Larsen <guinevere@redhat.com>
Cc: gdb@sourceware.org
References: <e3slt5yktlplzzxztvrg47bs5zb4ruz6rpnid2yewwpxin4mw5@7ittqxficilz>
 <6214910b-519f-420f-8999-b05e5410ad5a@redhat.com>
 <zevoy4ps2torxbkvcik6vahtx4pxkdypsd3hhne47xgtiaqbm7@a3nojsauzd5q>
Content-Language: en-US
In-Reply-To: <zevoy4ps2torxbkvcik6vahtx4pxkdypsd3hhne47xgtiaqbm7@a3nojsauzd5q>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-BeenThere: gdb@sourceware.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gdb mailing list <gdb.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/gdb>,
 <mailto:gdb-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/gdb/>
List-Post: <mailto:gdb@sourceware.org>
List-Help: <mailto:gdb-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb>,
 <mailto:gdb-request@sourceware.org?subject=subscribe>
From: Simon Marchi via Gdb <gdb@sourceware.org>
Reply-To: Simon Marchi <simark@simark.ca>
Errors-To: gdb-bounces~public-inbox=simark.ca@sourceware.org
Sender: "Gdb" <gdb-bounces~public-inbox=simark.ca@sourceware.org>


On 2025-04-07 08:21, Lluís Batlle i Rossell via Gdb wrote:
> On Mon, Apr 07, 2025 at 09:05:10AM -0300, Guinevere Larsen via Gdb wrote:
>> $ time ./gdb gdb --batch -ex "complete list def" #Basically the same as you
>> did
>> ./gdb gdb --batch -ex "complete list def"  42.40s user 1.19s system 120% cpu
>> 36.243 total
>> $ time ./gdb gdb --batch --readnow -ex "complete list def"
>> ./gdb gdb --batch --readnow -ex "complete list def"  58.32s user 1.79s
>> system 100% cpu 1:00.08 total
>>
>> So even with the slower expansion, GDB is still faster than if we had all
>> symbols being read at the start, and this isn't even taking into account the
>> memory usage.
> 
> There are two points important:
> 
> The cache should allow having a big file on disk that it's just read into
> memory with zero work and then all symbols are ready to search. But
> apparently this happens only with the "minimal" set of symbols, which is
> far from enough for a tab completion.
> 
> And 2nd, at gdb ELF file loading, thread workers are launched to read the
> symbols from the CUs. Again, these seem to load only the "minimal" set of
> symbols. In your example of 1 minute that load of CUs runs single thread.
> 
> I can't even tell that the code behaves correctly. One could say that the
> initial load of CUs multithread should load ALL symbols, and same about
> caching. Or at least that could be an option. Otherwise, the things they
> are meant to run faster become quite limited, while the really-slow
> usual completions go single-thread uncached, taking for you 1 minute.
> 
> Thanks,
> Lluís.


Hi Lluís,

I think you misunderstand the role of what's called the index-cache
(meaning it's perhaps not documented properly).  I will summarize the
process that GDB takes from start to being able to use a symbol.  In
this situation no index is present in the binary file nor in the index
cache.

 1. You do "gdb myprogram".
 2. GDB reads the ELF symbols into what it calls "minimal symbols"
    internally.
 3. In parallel, GDB demangles the names Ada/C++/whatever symbol names
 4. GDB notices there is some DWARF info, so it opens it and lists the
    compilation units by hopping from header to header
 5. In parallel (the background workers you referred to), GDB scans each
    compilation unit to create an in-memory index.  What this index
    essentially consists of is a mapping of names to which compilation
    unit contains that name.  We only need the names of variables,
    functions and types that the user could possibly refer to in an
    expression.  While traversing the debug info to create that index,
    GDB skips most the of debug info, making it quite fast.
 6. You type "print foo" in the CLI, or another expression
    containing a symbol name.
 7. The expression parser asks the DWARF subsystem: expand all
    compilation units that contain a variable or function named "foo".
    The index is looked up to identify the candidate compilation units.
 8. Serially, the DWARF subsystem fully reads those compilation units
    to create some "symtab"s, which is a detailed internal representation
    of everything that could be found in the debug info.
 9. The core is then able to look up "foo" in the symtabs, find
    the relevant symbol, and continue the expression evaluation.

To avoid doing the work in step #5, it is possible to ask compilers to
generate a name index that ressembles the name -> compilation unit index
that GDB would produce.  Alternatively, GDB is able to add that index to
a binary that doesn't have it (see the gdb-add-index command).  The
index cache is a third way to access that pre-computed index, which
requires no user intervention (other than toggling the index-cache on).
If the index-cache enabled, and no pre-computed index is present
already, GDB will save it in ~/.cache/gdb, allowing it to read it back
later.  It's exactly the same data that gdb-add-index would add to the
binary.

The index cache was added because historically, using an index was
_much_ faster than having GDB generate its in-memory index (it used
what's called internally "partial symbols", it no longer uses them for
DWARF).  So it was useful, because the second time you loaded a binary
was much faster than the first time.  Nowadays, with the new-ish
parallelized scanner I described in step #5, the time difference between
using a pre-computed index or generating it on the fly is not that big.

The -readnow option that Guinevere talked about skips the in-memory
index generation (or skips reading the pre-computed index if there is
one) and goes directly to expanding all compilation units into symtabs
right away.  I would typically not recommend using this day-to-day,
other than maybe if you need to work around an indexer bug.

When you type "list def<TAB>" and it's very slow, it's likely that a lot
of compilation units have a symbol that starts with "def", so a lot of
compilation units get expanded into symtabs.  Only after that happens
can the core of GDB search generate completion list by searching the
symtabs.  That's not optimal, there's certainly room for improvement
here (I too get frustrated by very slow tab-completion, to the point
where I often avoid it).

I must point out that a pending patch series changes a bit how symbols
are looked up and symtabs expanded, which will probably make what I
described somewhat outdated:

  [PATCH v2 00/28] Search symbols via quick API
  https://inbox.sourceware.org/gdb-patches/20250402-search-in-psyms-v2-0-ea91704487cb@tromey.com/T/#m737dd42bf8767f5719ffac7eb147977c4b0f2829

Simon