Re: [PATCHv3 5/7] gdb: create address map after parsing all DIE

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

From: Andrew Burgess <aburgess@redhat.com>
To: Tom Tromey <tom@tromey.com>
Cc: gdb-patches@sourceware.org
Subject: Re: [PATCHv3 5/7] gdb: create address map after parsing all DIE
Date: Fri, 02 Jan 2026 16:36:43 +0000	[thread overview]
Message-ID: <87o6ncyns4.fsf@redhat.com> (raw)
In-Reply-To: <87ikg09cnl.fsf@tromey.com>

Hi Tom,

Thanks for taking a look at these patches.  I wanted to follow up on
some of your thoughts...

Tom Tromey <tom@tromey.com> writes:

>>>>>> "Andrew" == Andrew Burgess <aburgess@redhat.com> writes:
>
> Andrew> Continuing the work done in the last two commits, this commit defers
> Andrew> building the addrmap for a blockvector until after all the DIE have
> Andrew> been read, and the line table processed.
>
> Andrew> The benefit of this is that any changes to a block's ranges done
> Andrew> during line table processing (see the next commit) will be reflected
> Andrew> in the blockvector's addrmap.
>
> Andrew> The alternative to this is to build the addrmap as we initially see
> Andrew> each block, but then adjust the addrmap if we later decide to modify a
> Andrew> block.  I think defering the addrmap creation is cleaner, and is less
> Andrew> work overall.
>
> Andrew> The addrmap requires that we add the most inner blocks first.  I
> Andrew> achieve this by walking the blockvector backward, as we always add
> Andrew> parent blocks before their more inner child blocks.
>
> I wonder if this is guaranteed to be correct.  Like, does gdb cope
> properly if a compiler happens to emit multiple lexical scopes that each
> have non-contiguous ranges, and then where the ranges happen to overlap.

I had a play with some examples using DW_TAG_lexical_block to see how
things change.

If things are laid out in what I'd consider a "sane" way, with nested
blocks being nested within the DWARF, then you'll end up with the same
block structure before and after this patch, e.g.:

                DW_TAG_lexical_block {
                    ...
                } {
                    DW_TAG_variable {
                        DW_AT_name "var_a"
                        ...
                    }

		    DW_TAG_lexical_block {
                        ...
		    } {
			DW_TAG_variable {
			    DW_AT_name "var_b"
                            ...
			}
		    }
                }

In this example the block containing 'var_a' is added to the blockvector
before the block holding 'var_b', so when we walk the blockvector
backwards, we see the 'var_b' block first, and everything is fine.

If we change the test to be structured like this instead:

                DW_TAG_lexical_block {
                    ...
                } {
                    DW_TAG_variable {
                        DW_AT_name "var_a"
                        ....
                    }
                }

                DW_TAG_lexical_block {
                    ...
                } {
                    DW_TAG_variable {
                        DW_AT_name "var_b"
                        ...
                    }
                }

Now the two lexical blocks are siblings, but if we make the second block
overlap the first then this patch does cause problems.

Current GDB will see the 'var_a' block first and add this to the map,
then when GDB sees the 'var_b' block, only the addresses not claimed by
the 'var_a' block will be allocated to the second lexical block.

With this patch blocks are now processed in reverse order, so we see the
'var_b' block first and assign addresses to this, then any unassigned
addresses are assigned to the 'var_a' block.

This DOES change GDB's behaviour.

> However I think your patch probably does not make gdb worse in this
> regard.

This is what I think.  In the sibling block case it's not clear what the
expected behaviour would actually be?  In current GDB you'll only see
'var_b' at those addresses not covered by the 'var_a' block.  While in
patched GDB the situation is reveresed, it's 'var_a' that's missing in
some cases.

In both cases there are addresses where we can only see one of the
variables.  But it isn't 100% clear too me if this is really fixable at
all with our current block hierarchy approach.

Is this sibling case the type of case you were thinking of?  Or have I
got the wrong end of the stick here?

>
> Andrew> @@ -410,8 +410,6 @@ buildsym_compunit::record_block_range (struct block *block,
> Andrew>    if (start != block->start ()
> Andrew>        || end_inclusive + 1 != block->end ())
> Andrew>      m_pending_addrmap_interesting = true;
> Andrew> -
> Andrew> -  m_pending_addrmap.set_empty (start, end_inclusive, block);
> Andrew>  }
>
> I think the comment before this method and the comment in the method
> both need to be updated, as this doesn't actually record the range any
> more.

Yeah.  But reading this code again I think I can drop record_block_range
completely now.  Blocks are basically interesting if they are
non-contiguous, and buildsym_compunit::make_blockvector, which is the
only place that checks m_pending_addrmap_interesting already has this
code:

  /* Count the length of the list of blocks.  */

  for (next = m_pending_blocks, i = 0; next; next = next->next, i++)
    {
    }

Where we could check for any non-contiguous blocks.  This would mean GDB
starts using the map instead of the vector for the case where a block
uses DW_AT_ranges, but only has a single range.  This shouldn't be a
user visible change, but I guess would increase memory use a little.
Still, I expect this would not be a common case as most compilers seem
to optimise to low/high pc attributes for contiguous blocks.

> I wonder why buildsym even tries to see if the pending addrmap might be
> interesting.  It seems to me that a fixed addrmap is basically the same
> data structure as the blockvector: it maps addresses to blocks and uses
> a binary search to find the correct entry.

Sure, but I wonder if this is a memory use thing?  The addrmap must use
more memory than the vector, so maybe for large programs this would be a
noticable hit.

>
> So I'm wondering if, in the longer term -- you definitely don't need to
> to this here -- if we should just switch to a single representation.
>
> Furthermore I was wondering if it makes sense for blocks to even track
> their ranges.  That is, for a block with multiple ranges, could we just
> have multiple entries in the blockvector and get rid of block::m_ranges?

There are a few places where we use the block's ranges for useful work,
these would all need to be rewritten to get the same information from
the blockvector (really the addrmap within the blockvector).

>
> FWIW I've been looking into this area a bit while experimenting with
> lazy CU expansion.  There I think I want to make it so blockvectors are
> always expandable, so no fixed addrmap at least -- and mutable addrmaps
> have some issues with updating, so probably moving to just having a
> vector.

Wouldn't that cause problems for non-contiguous blocks?  Isn't that the
problem that the addrmap was added to solve?

I'm going to look at updating to remove record_block_range completely,
and test that.  But I'd love to hear back about the lexical block
testing I did.

thanks,
Andrew

>
> Anyway this seems fine with the comment cleanup.
> Approved-By: Tom Tromey <tom@tromey.com>
>
> Tom

next prev parent reply	other threads:[~2026-01-02 16:37 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-20 10:20 [PATCH 0/7] Inline Function Optimised Code Debug Improvements Andrew Burgess
2025-07-20 10:20 ` [PATCH 1/7] gdb: improve line number lookup around inline functions Andrew Burgess
2025-07-20 10:20 ` [PATCH 2/7] gdb: handle empty ranges for inline subroutines Andrew Burgess
2025-07-20 10:20 ` [PATCH 3/7] gdb: split dwarf line table parsing in two Andrew Burgess
2025-07-20 10:20 ` [PATCH 4/7] gdb: move block range recording into its own function Andrew Burgess
2025-07-20 10:20 ` [PATCH 5/7] gdb: create address map after parsing all DIE Andrew Burgess
2025-07-20 10:20 ` [PATCH 6/7] gdb: record block end addresses while parsing DIEs Andrew Burgess
2025-07-20 10:20 ` [PATCH 7/7] gdb: fix-up truncated inline function block ranges Andrew Burgess
2025-08-01  8:58 ` [PATCHv2 0/7] Inline Function Optimised Code Debug Improvements Andrew Burgess
2025-08-01  8:58   ` [PATCHv2 1/7] gdb: improve line number lookup around inline functions Andrew Burgess
2025-08-01  8:58   ` [PATCHv2 2/7] gdb: handle empty ranges for inline subroutines Andrew Burgess
2025-08-01  8:58   ` [PATCHv2 3/7] gdb: split dwarf line table parsing in two Andrew Burgess
2025-08-01  8:58   ` [PATCHv2 4/7] gdb: move block range recording into its own function Andrew Burgess
2025-08-01  8:58   ` [PATCHv2 5/7] gdb: create address map after parsing all DIE Andrew Burgess
2025-08-01  8:58   ` [PATCHv2 6/7] gdb: record block end addresses while parsing DIEs Andrew Burgess
2025-08-01  8:58   ` [PATCHv2 7/7] gdb: fix-up truncated inline function block ranges Andrew Burgess
2025-10-16 17:49   ` [PATCHv3 0/7] Inline Function Optimised Code Debug Improvements Andrew Burgess
2025-10-16 17:49     ` [PATCHv3 1/7] gdb: improve line number lookup around inline functions Andrew Burgess
2025-10-27 22:22       ` Tom Tromey
2025-12-17 14:32         ` Andrew Burgess
2025-12-17 14:48           ` Tom de Vries
2025-12-18 14:46             ` Andrew Burgess
2025-10-16 17:49     ` [PATCHv3 2/7] gdb: handle empty ranges for inline subroutines Andrew Burgess
2025-10-16 17:49     ` [PATCHv3 3/7] gdb: split dwarf line table parsing in two Andrew Burgess
2025-10-16 17:49     ` [PATCHv3 4/7] gdb: move block range recording into its own function Andrew Burgess
2025-10-27 22:45       ` Tom Tromey
2025-10-16 17:49     ` [PATCHv3 5/7] gdb: create address map after parsing all DIE Andrew Burgess
2025-10-27 22:56       ` Tom Tromey
2026-01-02 16:36         ` Andrew Burgess [this message]
2026-01-05 20:03           ` Tom Tromey
2026-01-05 21:37             ` Andrew Burgess
2026-01-06  0:53               ` Tom Tromey
2025-10-16 17:49     ` [PATCHv3 6/7] gdb: record block end addresses while parsing DIEs Andrew Burgess
2025-10-27 23:00       ` Tom Tromey
2025-10-16 17:49     ` [PATCHv3 7/7] gdb: fix-up truncated inline function block ranges Andrew Burgess
2026-02-04 10:43     ` [PATCHv3 0/7] Inline Function Optimised Code Debug Improvements Andrew Burgess
2025-08-01 15:41 ` [PATCH " Sam James

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o6ncyns4.fsf@redhat.com \
    --to=aburgess@redhat.com \
    --cc=gdb-patches@sourceware.org \
    --cc=tom@tromey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox