Loading some symbols, when, and index-cache

Mirror of the gdb mailing list
 help / color / mirror / Atom feed

* Loading some symbols, when, and index-cache
@ 2025-04-06 15:07 Lluís Batlle i Rossell via Gdb
  2025-04-07 12:05 ` Guinevere Larsen via Gdb
  0 siblings, 1 reply; 8+ messages in thread
From: Lluís Batlle i Rossell via Gdb @ 2025-04-06 15:07 UTC (permalink / raw)
  To: gdb

Hello,

I'm trying to make sense of what symbols are loaded and when, in gdb.
For example, opening an elf file, it will build some index (usable as
index-cache) of some symbols, but not all.

If I do "gdb gdb" (for my compiled gdb), and then "maint print stat" I
get:

  Number of "minimal" symbols read: 33435
  Number of read CUs: 0
  Number of unread CUs: 682

Why haven't all CUs been read, and incorporated into the index? Because at
file loading the CUs symbols index is generated with worker threads (read.c),
therefore, potentially using all cores of the system. And the result would
be cached in the index-cache.

Because the next thing I notice is that when I type "list def<TAB>" to get
symbol completion, a very slow single-thread symbol expansion causes many CUs
to be loaded. After which:, "maint print stat":

  Number of "minimal" symbols read: 33435
  Number of "full" symbols read: 2732412
  Number of "types" defined: 4292812
  Number of symbol tables: 57075
  Number of symbol tables with line tables: 9608
  Number of symbol tables with blockvectors: 351
  Number of read CUs: 351
  Number of unread CUs: 331

Is there any way this work can be done ahead? Why the index-cache only helps
for the "minimal" symbols? How can we use the cache for all symbols? Why not
all CUs have been loaded yet?

Regards,
Lluís.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Loading some symbols, when, and index-cache
  2025-04-06 15:07 Loading some symbols, when, and index-cache Lluís Batlle i Rossell via Gdb
@ 2025-04-07 12:05 ` Guinevere Larsen via Gdb
  2025-04-07 12:21   ` Lluís Batlle i Rossell via Gdb
  0 siblings, 1 reply; 8+ messages in thread
From: Guinevere Larsen via Gdb @ 2025-04-07 12:05 UTC (permalink / raw)
  To: Lluís Batlle i Rossell, gdb

On 4/6/25 12:07 PM, Lluís Batlle i Rossell via Gdb wrote:
> Hello,
>
> I'm trying to make sense of what symbols are loaded and when, in gdb.
> For example, opening an elf file, it will build some index (usable as
> index-cache) of some symbols, but not all.
>
> If I do "gdb gdb" (for my compiled gdb), and then "maint print stat" I
> get:
>
>    Number of "minimal" symbols read: 33435
>    Number of read CUs: 0
>    Number of unread CUs: 682
>
> Why haven't all CUs been read, and incorporated into the index? Because at
> file loading the CUs symbols index is generated with worker threads (read.c),
> therefore, potentially using all cores of the system. And the result would
> be cached in the index-cache.
>
> Because the next thing I notice is that when I type "list def<TAB>" to get
> symbol completion, a very slow single-thread symbol expansion causes many CUs
> to be loaded. After which:, "maint print stat":
>
>    Number of "minimal" symbols read: 33435
>    Number of "full" symbols read: 2732412
>    Number of "types" defined: 4292812
>    Number of symbol tables: 57075
>    Number of symbol tables with line tables: 9608
>    Number of symbol tables with blockvectors: 351
>    Number of read CUs: 351
>    Number of unread CUs: 331
>
> Is there any way this work can be done ahead? Why the index-cache only helps
> for the "minimal" symbols? How can we use the cache for all symbols? Why not
> all CUs have been loaded yet?

I can't answer all questions, but I can help with two of them:

  > is there any way this work can be done ahead?

Yes! you can use the --readnow option to force gdb to read all symbols 
of the added inferior... however

  > Why not all CUs have been loaded yet?

It is super slow. Your example completion managed to avoid reading half 
of all symbols and you already felt the time, doing twice that work 
every time you start up your debug section would be very noticeable for 
little gain considering that most debug sessions won't span all the 
codebase.

To get a sense of how much of a slowdown that is, I tested locally and:

$ time ./gdb gdb --batch -ex "complete list def" #Basically the same as 
you did
./gdb gdb --batch -ex "complete list def"  42.40s user 1.19s system 120% 
cpu 36.243 total
$ time ./gdb gdb --batch --readnow -ex "complete list def"
./gdb gdb --batch --readnow -ex "complete list def"  58.32s user 1.79s 
system 100% cpu 1:00.08 total

So even with the slower expansion, GDB is still faster than if we had 
all symbols being read at the start, and this isn't even taking into 
account the memory usage.

I can't help with the cache stuff, though, never touched it.

-- 
Cheers,
Guinevere Larsen
She/Her/Hers

>
> Regards,
> Lluís.
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Loading some symbols, when, and index-cache
  2025-04-07 12:05 ` Guinevere Larsen via Gdb
@ 2025-04-07 12:21   ` Lluís Batlle i Rossell via Gdb
  2025-04-07 16:01     ` Simon Marchi via Gdb
  0 siblings, 1 reply; 8+ messages in thread
From: Lluís Batlle i Rossell via Gdb @ 2025-04-07 12:21 UTC (permalink / raw)
  To: Guinevere Larsen; +Cc: gdb

On Mon, Apr 07, 2025 at 09:05:10AM -0300, Guinevere Larsen via Gdb wrote:
> $ time ./gdb gdb --batch -ex "complete list def" #Basically the same as you
> did
> ./gdb gdb --batch -ex "complete list def"  42.40s user 1.19s system 120% cpu
> 36.243 total
> $ time ./gdb gdb --batch --readnow -ex "complete list def"
> ./gdb gdb --batch --readnow -ex "complete list def"  58.32s user 1.79s
> system 100% cpu 1:00.08 total
> 
> So even with the slower expansion, GDB is still faster than if we had all
> symbols being read at the start, and this isn't even taking into account the
> memory usage.

There are two points important:

The cache should allow having a big file on disk that it's just read into
memory with zero work and then all symbols are ready to search. But
apparently this happens only with the "minimal" set of symbols, which is
far from enough for a tab completion.

And 2nd, at gdb ELF file loading, thread workers are launched to read the
symbols from the CUs. Again, these seem to load only the "minimal" set of
symbols. In your example of 1 minute that load of CUs runs single thread.

I can't even tell that the code behaves correctly. One could say that the
initial load of CUs multithread should load ALL symbols, and same about
caching. Or at least that could be an option. Otherwise, the things they
are meant to run faster become quite limited, while the really-slow
usual completions go single-thread uncached, taking for you 1 minute.

Thanks,
Lluís.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Loading some symbols, when, and index-cache
  2025-04-07 12:21   ` Lluís Batlle i Rossell via Gdb
@ 2025-04-07 16:01     ` Simon Marchi via Gdb
  2025-04-08  7:29       ` Lluís Batlle i Rossell via Gdb
  0 siblings, 1 reply; 8+ messages in thread
From: Simon Marchi via Gdb @ 2025-04-07 16:01 UTC (permalink / raw)
  To: Lluís Batlle i Rossell, Guinevere Larsen; +Cc: gdb

On 2025-04-07 08:21, Lluís Batlle i Rossell via Gdb wrote:
> On Mon, Apr 07, 2025 at 09:05:10AM -0300, Guinevere Larsen via Gdb wrote:
>> $ time ./gdb gdb --batch -ex "complete list def" #Basically the same as you
>> did
>> ./gdb gdb --batch -ex "complete list def"  42.40s user 1.19s system 120% cpu
>> 36.243 total
>> $ time ./gdb gdb --batch --readnow -ex "complete list def"
>> ./gdb gdb --batch --readnow -ex "complete list def"  58.32s user 1.79s
>> system 100% cpu 1:00.08 total
>>
>> So even with the slower expansion, GDB is still faster than if we had all
>> symbols being read at the start, and this isn't even taking into account the
>> memory usage.
> 
> There are two points important:
> 
> The cache should allow having a big file on disk that it's just read into
> memory with zero work and then all symbols are ready to search. But
> apparently this happens only with the "minimal" set of symbols, which is
> far from enough for a tab completion.
> 
> And 2nd, at gdb ELF file loading, thread workers are launched to read the
> symbols from the CUs. Again, these seem to load only the "minimal" set of
> symbols. In your example of 1 minute that load of CUs runs single thread.
> 
> I can't even tell that the code behaves correctly. One could say that the
> initial load of CUs multithread should load ALL symbols, and same about
> caching. Or at least that could be an option. Otherwise, the things they
> are meant to run faster become quite limited, while the really-slow
> usual completions go single-thread uncached, taking for you 1 minute.
> 
> Thanks,
> Lluís.

Hi Lluís,

I think you misunderstand the role of what's called the index-cache
(meaning it's perhaps not documented properly).  I will summarize the
process that GDB takes from start to being able to use a symbol.  In
this situation no index is present in the binary file nor in the index
cache.

 1. You do "gdb myprogram".
 2. GDB reads the ELF symbols into what it calls "minimal symbols"
    internally.
 3. In parallel, GDB demangles the names Ada/C++/whatever symbol names
 4. GDB notices there is some DWARF info, so it opens it and lists the
    compilation units by hopping from header to header
 5. In parallel (the background workers you referred to), GDB scans each
    compilation unit to create an in-memory index.  What this index
    essentially consists of is a mapping of names to which compilation
    unit contains that name.  We only need the names of variables,
    functions and types that the user could possibly refer to in an
    expression.  While traversing the debug info to create that index,
    GDB skips most the of debug info, making it quite fast.
 6. You type "print foo" in the CLI, or another expression
    containing a symbol name.
 7. The expression parser asks the DWARF subsystem: expand all
    compilation units that contain a variable or function named "foo".
    The index is looked up to identify the candidate compilation units.
 8. Serially, the DWARF subsystem fully reads those compilation units
    to create some "symtab"s, which is a detailed internal representation
    of everything that could be found in the debug info.
 9. The core is then able to look up "foo" in the symtabs, find
    the relevant symbol, and continue the expression evaluation.

To avoid doing the work in step #5, it is possible to ask compilers to
generate a name index that ressembles the name -> compilation unit index
that GDB would produce.  Alternatively, GDB is able to add that index to
a binary that doesn't have it (see the gdb-add-index command).  The
index cache is a third way to access that pre-computed index, which
requires no user intervention (other than toggling the index-cache on).
If the index-cache enabled, and no pre-computed index is present
already, GDB will save it in ~/.cache/gdb, allowing it to read it back
later.  It's exactly the same data that gdb-add-index would add to the
binary.

The index cache was added because historically, using an index was
_much_ faster than having GDB generate its in-memory index (it used
what's called internally "partial symbols", it no longer uses them for
DWARF).  So it was useful, because the second time you loaded a binary
was much faster than the first time.  Nowadays, with the new-ish
parallelized scanner I described in step #5, the time difference between
using a pre-computed index or generating it on the fly is not that big.

The -readnow option that Guinevere talked about skips the in-memory
index generation (or skips reading the pre-computed index if there is
one) and goes directly to expanding all compilation units into symtabs
right away.  I would typically not recommend using this day-to-day,
other than maybe if you need to work around an indexer bug.

When you type "list def<TAB>" and it's very slow, it's likely that a lot
of compilation units have a symbol that starts with "def", so a lot of
compilation units get expanded into symtabs.  Only after that happens
can the core of GDB search generate completion list by searching the
symtabs.  That's not optimal, there's certainly room for improvement
here (I too get frustrated by very slow tab-completion, to the point
where I often avoid it).

I must point out that a pending patch series changes a bit how symbols
are looked up and symtabs expanded, which will probably make what I
described somewhat outdated:

  [PATCH v2 00/28] Search symbols via quick API
  https://inbox.sourceware.org/gdb-patches/20250402-search-in-psyms-v2-0-ea91704487cb@tromey.com/T/#m737dd42bf8767f5719ffac7eb147977c4b0f2829

Simon

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Loading some symbols, when, and index-cache
  2025-04-07 16:01     ` Simon Marchi via Gdb
@ 2025-04-08  7:29       ` Lluís Batlle i Rossell via Gdb
  2025-04-08 13:43         ` Simon Marchi via Gdb
  0 siblings, 1 reply; 8+ messages in thread
From: Lluís Batlle i Rossell via Gdb @ 2025-04-08  7:29 UTC (permalink / raw)
  To: Simon Marchi; +Cc: Guinevere Larsen, gdb

On Mon, Apr 07, 2025 at 12:01:38PM -0400, Simon Marchi wrote:
> I must point out that a pending patch series changes a bit how symbols
> are looked up and symtabs expanded, which will probably make what I
> described somewhat outdated:
> 
>   [PATCH v2 00/28] Search symbols via quick API
>   https://inbox.sourceware.org/gdb-patches/20250402-search-in-psyms-v2-0-ea91704487cb@tromey.com/T/#m737dd42bf8767f5719ffac7eb147977c4b0f2829

Thank you a lot for your detailed explanation! It has been very helpful.
Definitely I had no idea of the "partial symbols" piece. Now all makes
sense. I will think of areas where that can be improved.

I will check those pending patches. Are they likely to be integrated?

Thank you,
Lluís.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Loading some symbols, when, and index-cache
  2025-04-08  7:29       ` Lluís Batlle i Rossell via Gdb
@ 2025-04-08 13:43         ` Simon Marchi via Gdb
  2025-04-09  7:42           ` Lluís Batlle i Rossell via Gdb
  0 siblings, 1 reply; 8+ messages in thread
From: Simon Marchi via Gdb @ 2025-04-08 13:43 UTC (permalink / raw)
  To: Lluís Batlle i Rossell; +Cc: Guinevere Larsen, gdb



On 2025-04-08 03:29, Lluís Batlle i Rossell wrote:
> I will check those pending patches. Are they likely to be integrated?

Yes, it's from Tom Tromey, so it's probably good :).  I hope to find the
time to look at them soon, to add my approval.

Simon

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Loading some symbols, when, and index-cache
  2025-04-08 13:43         ` Simon Marchi via Gdb
@ 2025-04-09  7:42           ` Lluís Batlle i Rossell via Gdb
  2025-04-22 15:31             ` Simon Marchi via Gdb
  0 siblings, 1 reply; 8+ messages in thread
From: Lluís Batlle i Rossell via Gdb @ 2025-04-09  7:42 UTC (permalink / raw)
  To: Simon Marchi; +Cc: Guinevere Larsen, gdb

On Tue, Apr 08, 2025 at 09:43:45AM -0400, Simon Marchi wrote:
> On 2025-04-08 03:29, Lluís Batlle i Rossell wrote:
> > I will check those pending patches. Are they likely to be integrated?
> 
> Yes, it's from Tom Tromey, so it's probably good :).  I hope to find the
> time to look at them soon, to add my approval.

I thought that one interesting approach would be to restrict the
tab-completion to the "minimal" symbols, as a user config option. I wonder
if that set is more than enough for many cases.

What are the occasions where only "minimal" symbols are searched, and
occasions where "all" symbols are searched (by means of partial symbols +
CU loading)?

Regards,
Lluís.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Loading some symbols, when, and index-cache
  2025-04-09  7:42           ` Lluís Batlle i Rossell via Gdb
@ 2025-04-22 15:31             ` Simon Marchi via Gdb
  0 siblings, 0 replies; 8+ messages in thread
From: Simon Marchi via Gdb @ 2025-04-22 15:31 UTC (permalink / raw)
  To: Lluís Batlle i Rossell; +Cc: Guinevere Larsen, gdb

On 4/9/25 3:42 AM, Lluís Batlle i Rossell wrote:
> On Tue, Apr 08, 2025 at 09:43:45AM -0400, Simon Marchi wrote:
>> On 2025-04-08 03:29, Lluís Batlle i Rossell wrote:
>>> I will check those pending patches. Are they likely to be integrated?
>>
>> Yes, it's from Tom Tromey, so it's probably good :).  I hope to find the
>> time to look at them soon, to add my approval.
> 
> I thought that one interesting approach would be to restrict the
> tab-completion to the "minimal" symbols, as a user config option. I wonder
> if that set is more than enough for many cases.

Not really, because you can tab-complete much more than minimal symbols.
In production apps, minimal symbols only contain exported functions and
variables.  They also don't know anything about types.

> What are the occasions where only "minimal" symbols are searched, and
> occasions where "all" symbols are searched (by means of partial symbols +
> CU loading)?

I don't know for sure.  I think that full symbols are always searched
first when trying to resolve a symbol name to a symbol.

Simon

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-04-22 15:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-06 15:07 Loading some symbols, when, and index-cache Lluís Batlle i Rossell via Gdb
2025-04-07 12:05 ` Guinevere Larsen via Gdb
2025-04-07 12:21   ` Lluís Batlle i Rossell via Gdb
2025-04-07 16:01     ` Simon Marchi via Gdb
2025-04-08  7:29       ` Lluís Batlle i Rossell via Gdb
2025-04-08 13:43         ` Simon Marchi via Gdb
2025-04-09  7:42           ` Lluís Batlle i Rossell via Gdb
2025-04-22 15:31             ` Simon Marchi via Gdb

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox