Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed
From: "Jan Vraný" <Jan.Vrany@labware.com>
To: "gdb-patches@sourceware.org" <gdb-patches@sourceware.org>
Subject: [PING] Re: [PATCH] Revert "gdb: change blockvector::contains() to handle blockvectors with "holes""
Date: Mon, 15 Dec 2025 12:08:06 +0000	[thread overview]
Message-ID: <6cadb0cf3862706f4d70bb07a5002d8c99ea6906.camel@labware.com> (raw)
In-Reply-To: <20251208180717.176567-1-jan.vrany@labware.com>

Polite ping.

Thanks,
Jan

On Mon, 2025-12-08 at 18:07 +0000, Jan Vrany wrote:
> This reverts commit cc1fc6af4150b19f9c4c70d0463ff498703fb637, since it
> causes a number of regressions that seem not to be easily fixable.
> 
> The problem lies in existence of "freestanding" code, a code that is
> part of a CU but does not have any block associated with it. Consider
> following program:
> 
>     __asm__(
>       ".type foo,@function   \n"
>       "foo:                  \n"
>       "  mov %rdi, %rax      \n"
>       "  ret                 \n"
>     );
> 
>     static int foo(int i);
> 
>     int main(int argc, char **argv) {
>       return foo(argc);
>     }
> 
> When compiled, the foo function has no block of itself:
> 
>     Blockvector:
> 
>     no map
> 
>     block #000, object at 0x55978957b510, 1 symbols in 0x1129..0x1148
>      int main(int, char **); block object 0x55978957b380, 0x112d..0x1148 section .text
>       block #001, object at 0x55978957b470 under 0x55978957b510, 2 symbols in 0x1129..0x1148
>        typedef int int;
>        typedef char char;
>         block #002, object at 0x55978957b380 under 0x55978957b470, 2 symbols in 0x112d..0x1148, function main
>          int argc; computed at runtime
>          char **argv; computed at runtime
> 
> In this case lookup(0x1129) returns static block and, because of the
> change in cc1fc6af4, contains(0x1129) which is wrong.
> 
> Such "freestanding" code is perhaps not common but it does exist,
> especially in system code. In fact the regressions were at least in part
> caused by such "freestanding" code in glibc (libc_sigaction.c).
> 
> The whole idea of commit cc1fc6af4 was to handle "holes" in CUs, a case
> where one CU spans over multiple disjoint regions, possibly interleaved
> with other CUs. Consider somewhat extreme case with two CUs:
> 
>     /* hole-1.c */
>     int give_me_zero ();
> 
>     int
>     main ()
>     {
>       return give_me_zero ();
>     }
> 
>     /* hole-2.c */
>         int __attribute__ ((section (".text_give_me_one"))) __attribute__((noinline))
>     baz () { return 42; }
> 
>     __asm__(
>       ".section  .text_give_me_one,\"ax\",@progbits\n"
>       ".type foo,@function         \n"
>       "foo:                        \n"
>             "  mov %rdi, %rax      \n"
>             "  ret                 \n"
>             "  nop                 \n"
>             "  nop                 \n"
>             "  nop                 \n"
>     );
>     int __attribute__ ((section (".text_give_me_one"))) __attribute__((noinline))
>     give_me_one ()
>     {
>       return 1;
>     }
> 
>     __asm__(
>       ".section  .text_give_me_zero,\"ax\",@progbits\n"
>       "bar:                        \n"
>             "  jmp give_me_one     \n"
>             "  nop                 \n"
>             "  nop                 \n"
>             "  nop                 \n"
>     );
>     int __attribute__ ((section (".text_give_me_zero")))
>     give_me_zero ()
>     {
>       extern int bar();
>       return give_me_one() - 1;
>     }
> 
> This when compiled with a carefully crafted linker script to force code
> at certain positions, creates following layout:
> 
>    0x080000..0x080007   # "freestanding" bar from hole-2.c
>    0x080008..0x080016   # give_me_zero() from hole-2.c
>    0x080109..0x080114   # main from hole-1.c
>    0xf00000..0xf0000b   # baz() from hole-2.c
>    0xf0000b..0xf00011   # "freestanding" foo from hole-2.
>    0xf0000b..0xf0001c   # gice_me_one() from hole-2.
> 
> The block vector for hole-1.c looks:
> 
>     Blockvector:
> 
>     no map
> 
>     block #000, object at 0x555a5d85fb90, 1 symbols in 0x80109..0x80114
>      int main(void); block object 0x555a5d85faa0, 0x80109..0x80114 section .text
>       block #001, object at 0x555a5d85faf0 under 0x555a5d85fb90, 1 symbols in 0x80109..0x80114
>        typedef int int;
>         block #002, object at 0x555a5d85faa0 under 0x555a5d85faf0, 0 symbols in 0x80109..0x80114, function main
> 
> And for hole-2.c:
> 
>     Blockvector:
> 
>     map
>       0x0 -> 0x0
>       0x80008 -> 0x555a5d85ff50
>       0x80016 -> 0x0
>       0xf00000 -> 0x555a5d860280
>       0xf0000b -> 0x0
>       0xf00012 -> 0x555a5d860110
>       0xf0001d -> 0x0
> 
>     block #000, object at 0x555a5d8603b0, 3 symbols in 0x80008..0xf0001d
>      int give_me_zero(void); block object 0x555a5d85ff50, 0x80008..0x80016 section .text
>      int give_me_one(void); block object 0x555a5d860110, 0xf00012..0xf0001d section .text
>      int baz(void); block object 0x555a5d860280, 0xf00000..0xf0000b section .text
>       block #001, object at 0x555a5d8602d0 under 0x555a5d8603b0, 1 symbols in 0x80008..0xf0001d
>        typedef int int;
>         block #002, object at 0x555a5d85ff50 under 0x555a5d8602d0, 0 symbols in 0x80008..0x80016, function give_me_zero
>         block #003, object at 0x555a5d860280 under 0x555a5d8602d0, 0 symbols in 0xf00000..0xf0000b, function baz
>         block #004, object at 0x555a5d860110 under 0x555a5d8602d0, 0 symbols in 0xf00012..0xf0001d, function give_me_one
> 
> Note that despite the fact "freestanding" bar belongs to hole-2.c, the
> corresponding CU's global and static blocks start at 0x80008! Looking
> at DWARF for the second program, it looks like that the compiler (GCC 15)
> did not record the presence of "freestanding" code:
> 
>     <0><71>: Abbrev Number: 1 (DW_TAG_compile_unit)
>         <72>   DW_AT_producer    : (indirect string, offset: 0): GNU C23 15.2.0 -mtune=generic -march=x86-64 -g -fasynchronous-unwind-tables
>         <76>   DW_AT_language    : 29   (C11)
>         <77>   Unknown AT value: 90: 3
>         <78>   Unknown AT value: 91: 0x31647
>         <7c>   DW_AT_name        : (indirect line string, offset: 0x2d): hole-2.c
>         <80>   DW_AT_comp_dir    : (indirect line string, offset: 0): test_programs
>         <84>   DW_AT_ranges      : 0xc
>         <88>   DW_AT_low_pc      : 0
>         <90>   DW_AT_stmt_list   : 0x51
> 
> and corresponding part of .debug_aranges:
> 
>       Length:                   76
>       Version:                  2
>       Offset into .debug_info:  0x65
>       Pointer Size:             8
>       Segment Size:             0
> 
>         Address            Length
>         0000000000f00000 000000000000000b
>         0000000000f00012 000000000000000b
>         0000000000080008 000000000000000e
>         0000000000000000 0000000000000000
> 
> Thiago suggested to use minsymbols to tell whether or a CU contains
> given address. I do not think this would work reliably as minsymbols do
> no know to which CU they belong. In slightly more complicated case of
> interleaved CUs it does not seem to be possible to tell for sure to which
> one a given minsymbol belongs.
> 
> Moreover, Tom suggested that the comment in find_compunit_symtab_for_pc_sect
> (which led to cc1fc6af4) may be outdated [2].
> 
> Given all that, I'm just reverting the change.
> 
> [1]: https://sourceware.org/bugzilla/show_bug.cgi?id=33679#c13
> [2]: https://inbox.sourceware.org/gdb-patches/87cy6xzd3j.fsf@tromey.com/
> 
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=33679
> ---
>  gdb/block.c | 29 +----------------------------
>  1 file changed, 1 insertion(+), 28 deletions(-)
> 
> diff --git a/gdb/block.c b/gdb/block.c
> index 3d2c51cc554..e21580bcf63 100644
> --- a/gdb/block.c
> +++ b/gdb/block.c
> @@ -864,34 +864,7 @@ blockvector::lookup (CORE_ADDR addr) const
>  bool
>  blockvector::contains (CORE_ADDR addr) const
>  {
> -  auto b = lookup (addr);
> -  if (b == nullptr)
> -    return false;
> -
> -  /* Handle the case that the blockvector has no address map but still has
> -     "holes".  For example, consider the following blockvector:
> -
> -	B0    0x1000 - 0x4000   (global block)
> -	B1    0x1000 - 0x4000   (static block)
> -	 B3   0x1000 - 0x2000
> -				(hole)
> -	 B4   0x3000 - 0x4000
> -
> -     In this case, the above blockvector does not contain address 0x2500 but
> -     lookup (0x2500) would return the blockvector's static block.
> -
> -     So here we check if the returned block is a static block and if yes, still
> -     return false.  However, if the blockvector contains no blocks other than
> -     the global and static blocks and ADDR falls into the static block,
> -     conservatively return true.
> -
> -     See comment in find_compunit_symtab_for_pc_sect, symtab.c.
> -
> -     Also, note that if the blockvector in the above example would contain
> -     an address map, then lookup (0x2500) would return NULL instead of
> -     the static block.
> -   */
> -  return b != static_block () || num_blocks () == 2;
> +  return lookup (addr) != nullptr;
>  }
>  
>  /* See block.h.  */


  reply	other threads:[~2025-12-15 12:08 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-08 18:07 Jan Vrany
2025-12-15 12:08 ` Jan Vraný [this message]
2025-12-15 14:49 ` Tom Tromey
2025-12-16 14:24   ` Jan Vraný
2025-12-18 18:55     ` Tom Tromey
2025-12-18 20:45       ` Jan Vraný

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6cadb0cf3862706f4d70bb07a5002d8c99ea6906.camel@labware.com \
    --to=jan.vrany@labware.com \
    --cc=gdb-patches@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox