Re: [PATCH] Dwarf 5: Handle debug_str_offsets and indexed attributes that have base offsets.

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

From: Simon Marchi <simon.marchi@polymtl.ca>
To: Ali Tamur <tamur@google.com>, gdb-patches@sourceware.org
Subject: Re: [PATCH] Dwarf 5: Handle debug_str_offsets and indexed attributes that have base offsets.
Date: Wed, 20 Nov 2019 04:29:00 -0000	[thread overview]
Message-ID: <867ce990-e0c7-c2cd-5b5b-126339a32a44@polymtl.ca> (raw)
In-Reply-To: <20191119044628.178151-1-tamur@google.com>

On 2019-11-18 11:46 p.m., Ali Tamur via gdb-patches wrote:
> Reply:
> Sure, here's an example.
> $  cd /tmp/dw5 && echo "int calculate() { return 4; } int main(int argc, char** argv) { return calculate(); }" >>main.cc
> $  clang -gdwarf-5 -gsplit-dwarf main.cc
> $  ls
> a.out main.cc main.dwo 
> 
> $  llvm-dwarfdump --all a.out
> a.out:	file format ELF64-x86-64
> 
> .debug_abbrev contents:
> Abbrev table for offset: 0x00000000
> [1] DW_TAG_compile_unit	DW_CHILDREN_no
> 	DW_AT_stmt_list	DW_FORM_sec_offset
> 	DW_AT_str_offsets_base	DW_FORM_sec_offset
> 	DW_AT_comp_dir	DW_FORM_strx1
> 	DW_AT_GNU_pubnames	DW_FORM_flag_present
> 	DW_AT_GNU_dwo_name	DW_FORM_strx1
> 	DW_AT_low_pc	DW_FORM_addrx
> 	DW_AT_high_pc	DW_FORM_data4
> 	DW_AT_addr_base	DW_FORM_sec_offset
> 
> .debug_info contents:
> 0x00000000: Compile Unit: length = 0x00000024 version = 0x0005 unit_type = DW_UT_skeleton abbr_offset = 0x0000 addr_size = 0x08 DWO_id = 0xee1d4b42a2f0ca0b (next unit at 0x00000028)
> 
> 0x00000014: DW_TAG_compile_unit
>               DW_AT_stmt_list	(0x00000000)
>               DW_AT_str_offsets_base	(0x00000008)
>               DW_AT_comp_dir	("/tmp/dw5")
>               DW_AT_GNU_pubnames	(true)
>               DW_AT_GNU_dwo_name	("main.dwo")
>               DW_AT_low_pc	(0x0000000000401110)
>               DW_AT_high_pc	(0x0000000000401141)
>               DW_AT_addr_base	(0x00000008)
> ....
> .debug_str contents:
> 0x00000000: "/tmp/dw5"
> 0x00000009: "main.dwo"
> ....
> .debug_str_offsets contents:
> 0x00000000: Contribution size = 12, Format = DWARF32, Version = 5
> 0x00000008: 00000000 "/tmp/dw5"
> 0x0000000c: 00000009 "main.dwo"
> 
> Here is what happens. gdb starts to parse DW_TAG_compile_unit DIE. It comes
> to DW_AT_GNU_dwo_name. It is of form DW_FORM_strx1 and it has a value of 1.
> The actual value is somewhere in .debug_str section. To find it we need to
> process .debug_str_offsets (refer to 1st index somewhere within) and also know
> the value of DW_AT_str_offsets_base. However, we are in the middle of parsing
> the die and there is no guarantee that we have yet processed
> DW_AT_str_offsets_base, it may be parsed later. 
> 
> My solution is to temporarily write "1" as the value of the attribute
> DW_AT_GNU_dwo_name, and mark it as 'needs reprocessing'. After all the
> attributes of the die have been processed, DW_AT_str_offsets_base will hold the
> correct value if it exists. Then, we revisit the marked attributes. During
> reprocess, we don't need to read the binary file again, because we had already
> written the value we need (1) in the first pass. We calculate the correct
> address using that value, the contents of .debug_str* sections and
> DW_AT_str_offsets_base value.

Ok, so that reprocessing happens right after having read the attributes from the DIE.

Another approach that I would have expected would be to do it more lazily.  Only keep
the index in the attribute, and go read the actual string when we call dwarf2_string_attr
on it (and then maybe cache the value).  Did you consider doing something like that, and
perhaps it was a too involving change?

I see that this is kind of done in get_stub_string_attr, for attributes marked with
DW_STRING_IS_STR_INDEX, so I was wondering why not do it for all strx forms and avoid
the reprocessing.

> +/* Process the attributes that had to be skipped in the first round. These
> +   attributes are the ones that need str_offsets_base or addr_base attributes.
> +   They could not have been processed in the first round, because at the time
> +   the values of str_offsets_base or addr_base may not have been known.  */
> +void read_attribute_reprocess (const struct die_reader_specs *reader,
> +			       struct attribute *attr)
> +{
> +  struct dwarf2_cu *cu = reader->cu;
> +  switch (attr->form)
> +    {
> +      case DW_FORM_addrx:
> +      case DW_FORM_GNU_addr_index:
> +        DW_ADDR (attr) = read_addr_index (cu, DW_UNSND (attr));
> +        break;
> +      case DW_FORM_strx:
> +      case DW_FORM_strx1:
> +      case DW_FORM_strx2:
> +      case DW_FORM_strx3:
> +      case DW_FORM_strx4:
> +      case DW_FORM_GNU_str_index:
> +        unsigned int str_index = DW_UNSND (attr);
> +	if (reader->dwo_file != NULL)
> +	  {
> +	    DW_STRING (attr) = read_dwo_str_index (reader, str_index);
> +	    DW_STRING_IS_CANONICAL (attr) = 0;
> +	    DW_STRING_IS_STR_INDEX (attr) = false;
> +	  }
> +	else if (cu->str_offsets_base.has_value ())
> +	  {
> +	    DW_STRING (attr) = read_stub_str_index (cu, str_index);
> +	    DW_STRING_IS_CANONICAL (attr) = 0;
> +	    DW_STRING_IS_STR_INDEX (attr) = false;
> +	  }
> +	else
> +	  {
> +	    if (attr->name != DW_AT_comp_dir
> +		&& attr->name != DW_AT_GNU_dwo_name
> +		&& attr->name != DW_AT_dwo_name)
> +	      {
> +		error (_("Dwarf Error: %s/%s found in non-DWO CU"),
> +		       dwarf_form_name (attr->form),
> +		       dwarf_attr_name (attr->name));
> +	      }
> +	    DW_UNSND (attr) = str_index;
> +	    DW_STRING_IS_STR_INDEX (attr) = true;
> +	  }

I don't understand this part.  When read_attribute_reprocess is called,
if there's a DW_AT_str_offsets_base, we will have read it at this point.
So when is the else clause actually useful?  Can you give a practical
example of how to exercise this code?

Simon

next prev parent reply	other threads:[~2019-11-20  4:29 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-01 23:44 Ali Tamur via gdb-patches
2019-11-12 20:25 ` Ali Tamur via gdb-patches
2019-11-15  3:05 ` Simon Marchi
2019-11-19  4:46   ` Ali Tamur via gdb-patches
2019-11-19  5:27     ` Simon Marchi
2019-11-20  4:29     ` Simon Marchi [this message]
2019-12-27  0:41       ` Ali Tamur via gdb-patches
2019-12-27 23:20         ` Simon Marchi
2020-01-08  5:19           ` Ali Tamur via gdb-patches
     [not found] <f8620619-d480-6e9c-39f0-f3d540bc4d0d@polymtl.ca>
2020-01-09  0:57 ` Ali Tamur via gdb-patches
2020-01-10  3:46   ` Simon Marchi
2020-01-11  3:45     ` Ali Tamur via gdb-patches
2020-01-11 13:35       ` Simon Marchi
2020-01-14  3:46         ` Ali Tamur via gdb-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=867ce990-e0c7-c2cd-5b5b-126339a32a44@polymtl.ca \
    --to=simon.marchi@polymtl.ca \
    --cc=gdb-patches@sourceware.org \
    --cc=tamur@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox