[PATCHv3 3/7] gdb: split dwarf line table parsing in two

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

From: Andrew Burgess <aburgess@redhat.com>
To: gdb-patches@sourceware.org
Cc: Andrew Burgess <aburgess@redhat.com>
Subject: [PATCHv3 3/7] gdb: split dwarf line table parsing in two
Date: Thu, 16 Oct 2025 18:49:49 +0100	[thread overview]
Message-ID: <abea982cb4672f80713396a10799c30d56749cdf.1760636870.git.aburgess@redhat.com> (raw)
In-Reply-To: <cover.1760636870.git.aburgess@redhat.com>

A later commit in this series, that improves GDB's ability to debug
optimised code, wants to use the line table information in order to
"fix" inline blocks with a truncated address range.  For the reasoning
behind wanting to do that, please read ahead in the series.

Assuming that we accept for now the need to use the line table
information to adjust the block ranges, then why is this commit
needed?

GDB splits the line table data info different symtabs, adding end of
sequence markers as we move between symtabs.  This seems to work fine
for GDB, but causes a problem for me in this case.

What I will want to do is this: scan the line table and spot line
table entries that corresponds to the end addresses of an inline
block's address range.  If the block meets certain requirements, then
the end address of the block is adjusted to be that of the next line
table entry.

The way that GDB currently splits the line table entries between
symtabs makes this harder.  I will have the set of blocks end
addresses which I know might be fixable, but to find the line table
entry corresponding to that address requires searching through all the
symtabs.  Having found the entry for the end address, I then need to
find the next line table entry.  For some blocks this is easy, it's
the next entry in the same symtab.  But for other blocks the next
entry might be in a different symtab, which requires yet another full
search.

I did try implementing this approach, but the number of full symtab
searches is significant, and it had a significant impact on GDB's
debug parsing performance.  The impact was such that an operation that
currently takes ~7seconds would take ~3minutes or more.  Now I could
possibly improve that 3 minutes figure by optimising the code some,
but I think that would add unnecessary complexity.

By deferring building the line table until after we have parsed the
DIEs it becomes simple to spot when a line table entry corresponds to
a block end address, and finding the next entry is always trivial, as,
at this point, the next entry is just the next entry which we will
process.  With this approach I see no noticable impact on DWARF
parsing performance.

This patch is just the refactoring.  There's no finding block end
addresses and "fixing" being done here.  This just sets things up for
the later commits.

The existing code has a single function handle_DW_AT_stmt_list which
loads the line table header and then calls dwarf_decode_lines to
decode the line table itself, splitting the line table entries between
symtabs.

After this commit handle_DW_AT_stmt_list is renamed to
decode_line_header_for_cu, this loads the line table header and
creates the symtabs based off the line table, but doesn't process the
line table.  The dwarf_decode_lines function is no longer called at
this point.

In read_file_scope is where dwarf_decode_lines is now called.  This
relies on the line table having been loaded earlier, and processes the
line table entries, splitting them between symtabs.

There should be no user visible changes after this commit.
---
 gdb/dwarf2/line-program.c | 39 +++---------------
 gdb/dwarf2/line-program.h | 25 ++----------
 gdb/dwarf2/read.c         | 83 +++++++++++++++++++++++++++------------
 3 files changed, 67 insertions(+), 80 deletions(-)

diff --git a/gdb/dwarf2/line-program.c b/gdb/dwarf2/line-program.c
index c30f70dd11a..40b570b70c0 100644
--- a/gdb/dwarf2/line-program.c
+++ b/gdb/dwarf2/line-program.c
@@ -480,12 +480,10 @@ lnp_state_machine::check_line_address (struct dwarf2_cu *cu,
     }
 }
 
-/* Subroutine of dwarf_decode_lines to simplify it.
-   Process the line number information in LH.  */
+/* See dwarf2/line-program.h.  */
 
-static void
-dwarf_decode_lines_1 (struct line_header *lh, struct dwarf2_cu *cu,
-		      unrelocated_addr lowpc)
+void
+dwarf_decode_lines (struct dwarf2_cu *cu, unrelocated_addr lowpc)
 {
   const gdb_byte *line_ptr, *extended_end;
   const gdb_byte *line_end;
@@ -494,6 +492,9 @@ dwarf_decode_lines_1 (struct line_header *lh, struct dwarf2_cu *cu,
   struct objfile *objfile = cu->per_objfile->objfile;
   bfd *abfd = objfile->obfd.get ();
   struct gdbarch *gdbarch = objfile->arch ();
+  struct line_header *lh = cu->line_header;
+
+  gdb_assert (lh != nullptr);
 
   line_ptr = lh->statement_program_start;
   line_end = lh->statement_program_end;
@@ -690,31 +691,3 @@ dwarf_decode_lines_1 (struct line_header *lh, struct dwarf2_cu *cu,
       state_machine.record_line (true);
     }
 }
-
-/* See dwarf2/line-program.h.  */
-
-void
-dwarf_decode_lines (struct line_header *lh, struct dwarf2_cu *cu,
-		    unrelocated_addr lowpc, bool decode_mapping)
-{
-  if (decode_mapping)
-    dwarf_decode_lines_1 (lh, cu, lowpc);
-
-  /* Make sure a symtab is created for every file, even files
-     which contain only variables (i.e. no code with associated
-     line numbers).  */
-  buildsym_compunit *builder = cu->get_builder ();
-  struct compunit_symtab *cust = builder->get_compunit_symtab ();
-
-  for (auto &fe : lh->file_names ())
-    {
-      dwarf2_start_subfile (cu, fe, *lh);
-      subfile *sf = builder->get_current_subfile ();
-
-      if (sf->symtab == nullptr)
-	sf->symtab = allocate_symtab (cust, sf->name.c_str (),
-				      sf->name_for_id.c_str ());
-
-      fe.symtab = sf->symtab;
-    }
-}
diff --git a/gdb/dwarf2/line-program.h b/gdb/dwarf2/line-program.h
index 824f18f9613..dc5cf3faa44 100644
--- a/gdb/dwarf2/line-program.h
+++ b/gdb/dwarf2/line-program.h
@@ -20,28 +20,11 @@
 #ifndef GDB_DWARF2_LINE_PROGRAM_H
 #define GDB_DWARF2_LINE_PROGRAM_H
 
-/* Decode the Line Number Program (LNP) for the given line_header
-   structure and CU.  The actual information extracted and the type
-   of structures created from the LNP depends on the value of PST.
+/* Decode the Line Number Program (LNP) for the line_header structure in
+   CU.
 
-   FND holds the CU file name and directory, if known.
-   It is used for relative paths in the line table.
+   LOWPC is the lowest address in CU (or 0 if not known).  */
 
-   NOTE: It is important that psymtabs have the same file name (via
-   strcmp) as the corresponding symtab.  Since the directory is not
-   used in the name of the symtab we don't use it in the name of the
-   psymtabs we create.  E.g. expand_line_sal requires this when
-   finding psymtabs to expand.  A good testcase for this is
-   mb-inline.exp.
-
-   LOWPC is the lowest address in CU (or 0 if not known).
-
-   Boolean DECODE_MAPPING specifies we need to fully decode .debug_line
-   for its PC<->lines mapping information.  Otherwise only the filename
-   table is read in.  */
-
-extern void dwarf_decode_lines (struct line_header *lh,
-				struct dwarf2_cu *cu,
-				unrelocated_addr lowpc, bool decode_mapping);
+extern void dwarf_decode_lines (struct dwarf2_cu *cu, unrelocated_addr lowpc);
 
 #endif /* GDB_DWARF2_LINE_PROGRAM_H */
diff --git a/gdb/dwarf2/read.c b/gdb/dwarf2/read.c
index 49026c2dcf6..6c5157ea8d4 100644
--- a/gdb/dwarf2/read.c
+++ b/gdb/dwarf2/read.c
@@ -5960,29 +5960,55 @@ find_file_and_directory (struct die_info *die, struct dwarf2_cu *cu)
   return *cu->per_cu->fnd;
 }
 
-/* Handle DW_AT_stmt_list for a compilation unit.
-   DIE is the DW_TAG_compile_unit die for CU.
-   COMP_DIR is the compilation directory.  LOWPC is passed to
-   dwarf_decode_lines.  See dwarf_decode_lines comments about it.  */
+/* Ensure that every file_entry within the line_table of CU has a symtab
+   allocated for it. */
 
 static void
-handle_DW_AT_stmt_list (struct die_info *die, struct dwarf2_cu *cu,
-			const file_and_directory &fnd, unrelocated_addr lowpc,
-			bool have_code) /* ARI: editCase function */
+create_symtabs_from_cu_line_table (struct dwarf2_cu *cu)
 {
-  dwarf2_per_objfile *per_objfile = cu->per_objfile;
-  struct attribute *attr;
-  hashval_t line_header_local_hash;
-  void **slot;
-  int decode_mapping;
+  /* Make sure a symtab is created for every file, even files
+     which contain only variables (i.e. no code with associated
+     line numbers).  */
+  buildsym_compunit *builder = cu->get_builder ();
+  struct compunit_symtab *cust = builder->get_compunit_symtab ();
 
-  gdb_assert (! cu->per_cu->is_debug_types);
+  struct line_header *lh = cu->line_header;
+  gdb_assert (lh != nullptr);
 
-  attr = dwarf2_attr (die, DW_AT_stmt_list, cu);
+  for (auto &fe : lh->file_names ())
+    {
+      dwarf2_start_subfile (cu, fe, *lh);
+      subfile *sf = builder->get_current_subfile ();
+
+      if (sf->symtab == nullptr)
+	sf->symtab = allocate_symtab (cust, sf->name.c_str (),
+				      sf->name_for_id.c_str ());
+
+      fe.symtab = sf->symtab;
+    }
+}
+
+
+/* Handle DW_AT_stmt_list for a compilation unit. DIE is the
+   DW_TAG_compile_unit die for CU.  FND is used to access the compilation
+   directory.  This function will decode the line table header and create
+   symtab objects for the files referenced in the line table.  The line
+   table itself though is not processed by this function.  If there is no
+   line table, or there's a problem decoding the header, then CU will not
+   be updated.  */
+
+static void
+decode_line_header_for_cu (struct die_info *die, struct dwarf2_cu *cu,
+			   const file_and_directory &fnd)
+{
+  gdb_assert (!cu->per_cu->is_debug_types);
+
+  struct attribute *attr = dwarf2_attr (die, DW_AT_stmt_list, cu);
   if (attr == NULL || !attr->form_is_unsigned ())
     return;
 
   sect_offset line_offset = (sect_offset) attr->as_unsigned ();
+  dwarf2_per_objfile *per_objfile = cu->per_objfile;
 
   /* The line header hash table is only created if needed (it exists to
      prevent redundant reading of the line table for partial_units).
@@ -6000,8 +6026,9 @@ handle_DW_AT_stmt_list (struct die_info *die, struct dwarf2_cu *cu,
 				   xcalloc, xfree));
     }
 
+  void **slot;
   line_header line_header_local (line_offset, cu->per_cu->is_dwz);
-  line_header_local_hash = line_header_hash (&line_header_local);
+  hashval_t line_header_local_hash = line_header_hash (&line_header_local);
   if (per_objfile->line_header_hash != NULL)
     {
       slot = htab_find_slot_with_hash (per_objfile->line_header_hash.get (),
@@ -6054,12 +6081,8 @@ handle_DW_AT_stmt_list (struct die_info *die, struct dwarf2_cu *cu,
 	 then this is what we want as well.  */
       gdb_assert (die->tag != DW_TAG_partial_unit);
     }
-  decode_mapping = (die->tag != DW_TAG_partial_unit);
-  /* The have_code check is here because, if LOWPC and HIGHPC are both 0x0,
-     then there won't be any interesting code in the CU, but a check later on
-     (in lnp_state_machine::check_line_address) will fail to properly exclude
-     an entry that was removed via --gc-sections.  */
-  dwarf_decode_lines (cu->line_header, cu, lowpc, decode_mapping && have_code);
+
+  create_symtabs_from_cu_line_table (cu);
 }
 
 /* Process DW_TAG_compile_unit or DW_TAG_partial_unit.  */
@@ -6111,10 +6134,12 @@ read_file_scope (struct die_info *die, struct dwarf2_cu *cu)
   scoped_restore restore_sym_cu
     = make_scoped_restore (&per_objfile->sym_cu, cu);
 
-  /* Decode line number information if present.  We do this before
-     processing child DIEs, so that the line header table is available
-     for DW_AT_decl_file.  */
-  handle_DW_AT_stmt_list (die, cu, fnd, unrel_low, unrel_low != unrel_high);
+  /* Decode the line header if present.  We do this before processing child
+     DIEs, so that information is available for DW_AT_decl_file.  We defer
+     parsing the actual line table until after processing the child DIEs,
+     this allows us to fix up some of the inline function blocks as the
+     line table is read.  */
+  decode_line_header_for_cu (die, cu, fnd);
 
   /* Process all dies in compilation unit.  */
   for (die_info *child_die : die->children ())
@@ -6122,6 +6147,12 @@ read_file_scope (struct die_info *die, struct dwarf2_cu *cu)
 
   per_objfile->sym_cu = nullptr;
 
+  /* If we actually have code, then read the line table now.  */
+  if (unrel_low != unrel_high
+      && die->tag != DW_TAG_partial_unit
+      && cu->line_header != nullptr)
+    dwarf_decode_lines (cu, unrel_low);
+
   /* Decode macro information, if present.  Dwarf 2 macro information
      refers to information in the line number info statement program
      header, so we can only read it if we've read the header
@@ -16144,7 +16175,7 @@ new_symbol (struct die_info *die, struct type *type, struct dwarf2_cu *cu,
 	      if (file_cu->line_header == nullptr)
 		{
 		  file_and_directory fnd (nullptr, nullptr);
-		  handle_DW_AT_stmt_list (file_cu->dies, file_cu, fnd, {}, false);
+		  decode_line_header_for_cu (file_cu->dies, file_cu, fnd);
 		}
 
 	      if (file_cu->line_header != nullptr)
-- 
2.47.1

next prev parent reply	other threads:[~2025-10-16 17:52 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-20 10:20 [PATCH 0/7] Inline Function Optimised Code Debug Improvements Andrew Burgess
2025-07-20 10:20 ` [PATCH 1/7] gdb: improve line number lookup around inline functions Andrew Burgess
2025-07-20 10:20 ` [PATCH 2/7] gdb: handle empty ranges for inline subroutines Andrew Burgess
2025-07-20 10:20 ` [PATCH 3/7] gdb: split dwarf line table parsing in two Andrew Burgess
2025-07-20 10:20 ` [PATCH 4/7] gdb: move block range recording into its own function Andrew Burgess
2025-07-20 10:20 ` [PATCH 5/7] gdb: create address map after parsing all DIE Andrew Burgess
2025-07-20 10:20 ` [PATCH 6/7] gdb: record block end addresses while parsing DIEs Andrew Burgess
2025-07-20 10:20 ` [PATCH 7/7] gdb: fix-up truncated inline function block ranges Andrew Burgess
2025-08-01  8:58 ` [PATCHv2 0/7] Inline Function Optimised Code Debug Improvements Andrew Burgess
2025-08-01  8:58   ` [PATCHv2 1/7] gdb: improve line number lookup around inline functions Andrew Burgess
2025-08-01  8:58   ` [PATCHv2 2/7] gdb: handle empty ranges for inline subroutines Andrew Burgess
2025-08-01  8:58   ` [PATCHv2 3/7] gdb: split dwarf line table parsing in two Andrew Burgess
2025-08-01  8:58   ` [PATCHv2 4/7] gdb: move block range recording into its own function Andrew Burgess
2025-08-01  8:58   ` [PATCHv2 5/7] gdb: create address map after parsing all DIE Andrew Burgess
2025-08-01  8:58   ` [PATCHv2 6/7] gdb: record block end addresses while parsing DIEs Andrew Burgess
2025-08-01  8:58   ` [PATCHv2 7/7] gdb: fix-up truncated inline function block ranges Andrew Burgess
2025-10-16 17:49   ` [PATCHv3 0/7] Inline Function Optimised Code Debug Improvements Andrew Burgess
2025-10-16 17:49     ` [PATCHv3 1/7] gdb: improve line number lookup around inline functions Andrew Burgess
2025-10-27 22:22       ` Tom Tromey
2025-12-17 14:32         ` Andrew Burgess
2025-12-17 14:48           ` Tom de Vries
2025-12-18 14:46             ` Andrew Burgess
2025-10-16 17:49     ` [PATCHv3 2/7] gdb: handle empty ranges for inline subroutines Andrew Burgess
2025-10-16 17:49     ` Andrew Burgess [this message]
2025-10-16 17:49     ` [PATCHv3 4/7] gdb: move block range recording into its own function Andrew Burgess
2025-10-27 22:45       ` Tom Tromey
2025-10-16 17:49     ` [PATCHv3 5/7] gdb: create address map after parsing all DIE Andrew Burgess
2025-10-27 22:56       ` Tom Tromey
2026-01-02 16:36         ` Andrew Burgess
2026-01-05 20:03           ` Tom Tromey
2026-01-05 21:37             ` Andrew Burgess
2026-01-06  0:53               ` Tom Tromey
2025-10-16 17:49     ` [PATCHv3 6/7] gdb: record block end addresses while parsing DIEs Andrew Burgess
2025-10-27 23:00       ` Tom Tromey
2025-10-16 17:49     ` [PATCHv3 7/7] gdb: fix-up truncated inline function block ranges Andrew Burgess
2026-02-04 10:43     ` [PATCHv3 0/7] Inline Function Optimised Code Debug Improvements Andrew Burgess
2025-08-01 15:41 ` [PATCH " Sam James

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=abea982cb4672f80713396a10799c30d56749cdf.1760636870.git.aburgess@redhat.com \
    --to=aburgess@redhat.com \
    --cc=gdb-patches@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox