"

I have made most of your requested changes, which are in the attached
patch.  I disagree with one of your comments, and I had a question
about another:

First the disagreement.  You said:

>> @@ -18684,7 +18764,8 @@ partial_die_info::read (const struct die_reader_specs *reader,
>>           /* It would be nice to reuse dwarf2_get_pc_bounds here,
>>              but that requires a full DIE, so instead we just
>>              reimplement it.  */
>> -         int need_ranges_base = tag != DW_TAG_compile_unit;
>> +         int need_ranges_base = (tag != DW_TAG_compile_unit
>> +                                 && attr.form != DW_FORM_rnglistx);
>
> It looks like this fix is unrelated from the rest of the patch, which deals
> about reading rnglists from dwo files.  It's better to have on fix per patch.
> So I think this fix would deserve its own patch, with a commit message that
> clearly explains the issue and the fix.
>
This is actually a required fix for dealing with rnglists.  Without
it, when the DW_AT_ranges uses the form
DW_FORM_rnglistx, the wrong value is calculated and the entire rnglist
is ignored (if you turn on complaints this will happen).  So without
this fix, I can't test any of the other code for fixing rnglists, with
or without .dwo files.

> I'm also not sure I understand it: when the form is DW_FORM_rnglistx, don't
> we want to sometimes apply the base coming from DW_AT_rnglists_base?

It all depends on what form is being used. If the DW_AT_ranges is
using DW_FORM_sec_offset (which is what
GCC generates), then you do need to add the base value.  But if you
are using the DW_FORM_rnglisx (which is what clang generates), the
base value has already been considered/incorporated into the attribute
so adding it in (again) causes the value to be incorrect (again, turn
on complaints and see what happens if you try to add the base into
something with DW_FORM_rnglistx).  From the DWARF5 spec: "[A] rnglist
is either represented as: An index into the .debug_rnglists section
(DW_FORM_rnglistx). The ...operand identifies an offset location
relative to the base of that section...[or] An offset into the
.debug_rnglists section (DW_FORM_sec_offset).  The operand consists of
a byte offset from the beginning of the .debug_rnglists section."
Note the DW_FORM_rnglistx is already relative to the base.

Now, my question:  I'm not quite clear as to whether you really want
me to change the code in
cu_debug_rnglists_section or not.   I wrote the code mostly by copying
the code for loclists as closely as possible, and making whatever
changes to that were required to make it correct for rnglists.  Is it
better to have the code more consistent with the loclists code?  If
you really want me to make the changes there you suggest, then I will;
I just wanted confirmation that you really want that change (i.e.
adding the dwarf tag and checking vs. DW_TAG_compile_unit, & calling
it from dwarf2_rnglists_process).

The test of your requested changes are in the attached patch.

-- Caroline
cmtice@google.com

2020-07-03  Caroline Tice  <cmtice@google.com>

        * dwarf2/read.c (struct dwop_section_names): Add rnglists_dwo.
        (dwop_section_names): Add .debug_rnglists.dwo, .zdebug_rnglists.dwo.
        (struct dwo_sections): Add rnglists field.
        (dwarf2_ranges_read): Add tag parameter.
        (cu_debug_rnglist_section): New function (decl & definition).
        (dwarf2_locate_dwo_sections): Add code to read rnglists_dwo section.
        (dwarf2_rnglists_process): Add a dwarf_tag parameter, for the kind of
        die whose range is being checked; add code to read the rnglist section
        from the dwo file rather than from the main objfile, if appropriate.
        Add cases for DW_RLE_base_addressx,
        DW_RLE_startx_length, DW_RLE_startx_endx.  Also, update to only add
        the base address to DW_RLE_offset_pairs (not to all ranges).
        (dwarf2_ranges_process): Add dwarf tag parameter and pass it to
        dwarf2_rnglists_process.
        (dwarf2_ranges_read): Add dwarf tag parameter and pass it to
        dwarf2_ranges_process.
        (dwarf2_get_pc_bounds): Check for DW_FORM_rnglistx when setting
        need_ranges_base.  Also pass die tag to dwarf2_ranges_read.
        (dwarf2_record_block_ranges): Check for DW_FORM_rnglistx when setting
        need_ranges_base.  Also pass die tag to dwarf2_ranges_read.
        (read_full_die_1): Add code to read DW_AT_rnglists_base and assign to
        cu->ranges_base.
        (partial_die_info::read): Check for DW_FORM_rnglistx when setting
        need_ranges_base.  Also pass die tag to dwarf2_ranges_read.
        (read_rnglist_index): New function.
        (read_attribute_reprocess):  Add code for DW_FORM_rnglistx.
        (read_attribute_value): Mark DW_FORM_rnglistx with need_reprocess.

gdb/testsuite/ChangeLog entry:

2020-07-03  Caroline Tice  <cmtice@google.com>

        * gdb.dwarf2/dw5-rnglist-test.cc: New file.
        * gdb.dwarf2/dw5-rnglist-test.exp: New file.


On Wed, Jul 1, 2020 at 10:41 PM Simon Marchi <simark@simark.ca> wrote:
>
> On 2020-07-01 3:57 p.m., Caroline Tice via Gdb-patches wrote:
> > I created the attached patch with git format-patch; I've never used
> > that before, so I'm hoping I got it right. :-)
>
> Very nice, thanks.
>
> > Please let me know  if there's anything else you need/want for approval.
>
> Thanks for that.  I'd appreciate if the commit message contained a bit more
> of the rationale for the change (in addition to what you have already put
> in there).  Basically, what you explained in your first message.
>
> > @@ -1379,11 +1385,16 @@ static void read_call_site_scope (struct die_info *die, struct dwarf2_cu *cu);
> >  static void read_variable (struct die_info *die, struct dwarf2_cu *cu);
> >
> >  static int dwarf2_ranges_read (unsigned, CORE_ADDR *, CORE_ADDR *,
> > -                            struct dwarf2_cu *, dwarf2_psymtab *);
> > +                            struct dwarf2_cu *, dwarf2_psymtab *,
> > +                            dwarf_tag);
> >
> >  /* Return the .debug_loclists section to use for cu.  */
> >  static struct dwarf2_section_info *cu_debug_loc_section (struct dwarf2_cu *cu);
> >
> > +/* Return the .debug_rnglists section to use for cu.  */
> > +static struct dwarf2_section_info *cu_debug_rnglist_section
> > +  (struct dwarf2_cu *cu);
>
> Nit, this function should probably be called cu_debug_rnglists_section (with an `s`
> to rnglists).
>
> > @@ -13802,17 +13820,36 @@ dwarf2_rnglists_process (unsigned offset, struct dwarf2_cu *cu,
> >    const gdb_byte *buffer;
> >    CORE_ADDR baseaddr;
> >    bool overflow = false;
> > +  ULONGEST addr_index;
> > +  bool ignore_dwo_unit = false;
> > +  struct dwarf2_section_info *rnglists_section;
> >
> >    base = cu->base_address;
> >
> > -  per_objfile->per_bfd->rnglists.read (objfile);
> > -  if (offset >= per_objfile->per_bfd->rnglists.size)
> > +  /* If the DW_AT_ranges attribute was part of a DW_TAG_skeleton_unit that was
> > +     changed into a DW_TAG_compile_unit after calling read_cutu_die_from_dwo,
> > +     we want to use the rnglist in the objfile rather than the one in the
> > +     dwo_unit. */
>
> I think that can be explained more simply as: we want to use the .debug_rnglists
> section from the file the DW_AT_ranges attribute we're reading came from.  If it's
> on DW_TAG_compile_unit, it was actually on the DW_TAG_skeleton_unit DIE, in the
> objfile / linked program.  Otherwise, it was on some other DIE in the dwo.
>
> > +  if (tag == DW_TAG_compile_unit
> > +      && cu->cu_ranges_from_skeleton)
>
> I'm wondering if this `cu_ranges_from_skeleton` boolean is really necessary.  There
> cannot be a DW_AT_ranges attribute on a DW_TAG_compile_unit inside a dwo file (see
> DWARF 5, section 3.1.3).  So if we're reading a DW_AT_ranges on a DW_TAG_compile_unit,
> it's either because:
>
> - it's a standard non-split DW_TAG_compile_unit
> - it's a DW_TAG_skeleton_unit that was transformed into a DW_TAG_compile_unit by
>   read_cutu_die_from_dwo, as you said (I didn't know it did that)
>
> In both of these cases, the rnglists section to use should be the one from the objfile,
> not the one from the dwo.  The DW_TAG_skeleton_unit in the main objfile can have
> a DW_AT_ranges attribute, while the associated DW_TAG_compile_unit in the dwo can't.
>
> So the condition could probably just be:
>
>   if (cu->dwo_unit != nullptr && tag != DW_TAG_compile_unit)
>     rnglists_section = &cu->dwo_unit->dwo_file->sections.rnglists;
>   else
>     rnglists_section = &per_objfile->per_bfd->rnglists;
>
> If there's a dwo_unit, the DW_TAG_compile_unit (transformed from DW_TAG_skeleton_unit)
> will be the only DIE in the CU, so the only one to use the rnglists from the main objfile.
> All the conceptual children DIEs contained in the dwo will use the rnglists from the dwo.
>
> > +    ignore_dwo_unit = true;
> > +
> > +  /* Else, if there's a dwo_unit, we want the rnglist from the dwo file.  */
> > +  if (cu->dwo_unit
> > +      && cu->dwo_unit->dwo_file->sections.rnglists.size > 0
> > +      && !ignore_dwo_unit)
> > +      rnglists_section = &cu->dwo_unit->dwo_file->sections.rnglists;
> > +  else
> > +      rnglists_section = &per_objfile->per_bfd->rnglists;
>
> I'm not sure the `cu->dwo_unit->dwo_file->sections.rnglists.size > 0` part of the condition
> is right.  Let's say you have some DIE inside the dwo with a DW_AT_ranges attriute, but the
> rnglists section from the dwo is either missing or empty.  That code will end up reading the
> rnglists from the objfile, which is wrong: a DW_AT_ranges in a dwo always refers to the
> rnglists section of the dwo, right?
>
> It would be a good thing to test for fun, strip the .debug_rnglists.dwo section from the dwo
> and see what happens.
>
> > @@ -13889,7 +13944,20 @@ dwarf2_rnglists_process (unsigned offset, struct dwarf2_cu *cu,
> >         range_end = cu->header.read_address (obfd, buffer, &bytes_read);
> >         buffer += bytes_read;
> >         break;
> > -     default:
> > +     case DW_RLE_startx_endx:
> > +          addr_index = read_unsigned_leb128 (obfd, buffer, &bytes_read);
> > +          buffer += bytes_read;
> > +          range_beginning = read_addr_index (cu, addr_index);
> > +          if (buffer > buf_end)
> > +            {
> > +              overflow = true;
> > +              break;
> > +            }
> > +          addr_index = read_unsigned_leb128 (obfd, buffer, &bytes_read);
> > +          buffer += bytes_read;
> > +          range_end = read_addr_index (cu, addr_index);
> > +          break;
> > +     default:
>
> Remove the space at the beginning of the line (before the tab).
>
> >         complaint (_("Invalid .debug_rnglists data (no base address)"));
> >         return false;
> >       }
> > @@ -13917,8 +13985,12 @@ dwarf2_rnglists_process (unsigned offset, struct dwarf2_cu *cu,
> >        if (range_beginning == range_end)
> >       continue;
> >
> > -      range_beginning += *base;
> > -      range_end += *base;
> > +      /* Only DW_RLE_offset_pair needs the base address added.  */
> > +      if (rlet == DW_RLE_offset_pair)
> > +     {
> > +       range_beginning += *base;
> > +       range_end += *base;
> > +     }
>
> That makes sense.  The DWARF standard says, about DW_RLE_base_address:
>
>     ...that indicates the applicable base address used by following
>     DW_RLE_offset_pair entries.
>
> However, see this check a few lines above:
>
>       if (!base.has_value ())
>         {
>           /* We have no valid base address for the ranges
>              data.  */
>           complaint (_("Invalid .debug_rnglists data (no base address)"));
>           return false;
>         }
>
> Should that be moved inside the `if (rlet == DW_RLE_offset_pair)`?  It
> only matters that we don't have a base if we're going to use it, and that's
> if `rlet == DW_RLE_offset_pair`.
>
> > @@ -18684,7 +18764,8 @@ partial_die_info::read (const struct die_reader_specs *reader,
> >           /* It would be nice to reuse dwarf2_get_pc_bounds here,
> >              but that requires a full DIE, so instead we just
> >              reimplement it.  */
> > -         int need_ranges_base = tag != DW_TAG_compile_unit;
> > +         int need_ranges_base = (tag != DW_TAG_compile_unit
> > +                                 && attr.form != DW_FORM_rnglistx);
>
> It looks like this fix is unrelated from the rest of the patch, which deals
> about reading rnglists from dwo files.  It's better to have on fix per patch.
> So I think this fix would deserve its own patch, with a commit message that
> clearly explains the issue and the fix.
>
> I'm also not sure I understand it: when the form is DW_FORM_rnglistx, don't
> we want to sometimes apply the base coming from DW_AT_rnglists_base?
>
> > @@ -19094,6 +19175,49 @@ read_loclist_index (struct dwarf2_cu *cu, ULONGEST loclist_index)
> >      return bfd_get_64 (abfd, info_ptr) + loclist_base;
> >  }
> >
> > +/* Given a DW_FORM_rnglistx value rnglist_index, fetch the offset from the
> > +   array of offsets in the .debug_rnglists section.  */
> > +static CORE_ADDR
> > +read_rnglist_index (struct dwarf2_cu *cu, ULONGEST rnglist_index)
> > +{
> > +  struct dwarf2_per_objfile *dwarf2_per_objfile = cu->per_objfile;
> > +  struct objfile *objfile = dwarf2_per_objfile->objfile;
> > +  bfd *abfd = objfile->obfd;
> > +  ULONGEST rnglist_header_size =
> > +    (cu->header.initial_length_size == 4 ? LOCLIST_HEADER_SIZE32
> > +     : LOCLIST_HEADER_SIZE64);
> > +  ULONGEST rnglist_base =
> > +      (cu->dwo_unit) ? rnglist_header_size : cu->ranges_base;
> > +
> > +  struct dwarf2_section_info *section = cu_debug_rnglist_section (cu);
> > +  if (section == nullptr)
> > +    error (_("Cannot find .debug_rnglists section [in module %s]"),
> > +        objfile_name(objfile));
> > +  section->read (objfile);
> > +  if (section->buffer == NULL)
> > +    error (_("DW_FORM_rnglistx used without .debug_rnglists section "
> > +          "[in module %s]"),
> > +       objfile_name (objfile));
> > +  struct loclist_header header;
> > +  read_loclist_header (&header, section);
> > +  if (rnglist_index >= header.offset_entry_count)
> > +    error (_("DW_FORM_rnglistx index pointing outside of "
> > +          ".debug_rnglists offset array [in module %s]"),
> > +         objfile_name(objfile));
> > +  if (rnglist_base + rnglist_index * cu->header.offset_size
> > +        >= section->size)
> > +    error (_("DW_FORM_rnglistx pointing outside of "
> > +             ".debug_rnglists section [in module %s]"),
> > +          objfile_name(objfile));
> > +  const gdb_byte *info_ptr
> > +    = section->buffer + rnglist_base + rnglist_index * cu->header.offset_size;
>
> Factor out `rnglist_base + rnglist_index * cu->header.offset_size` into
> an `offset` variable.
>
> > +
> > +  if (cu->header.offset_size == 4)
> > +    return read_4_bytes (abfd, info_ptr) + rnglist_base;
> > +  else
> > +    return read_8_bytes (abfd, info_ptr) + rnglist_base;
>
> Please use some empty lines in the code to separate logical block.  I find
> it really hard to follow, packed like this.  At the very least, after the
> `if` statements, put an empty line.  Comments that indicate what each block
> does at high level are also helpful:
>
>  +  /* Get rnglists section.  */
>  +  struct dwarf2_section_info *section = cu_debug_rnglist_section (cu);
>  +  if (section == nullptr)
>  +    error (_("Cannot find .debug_rnglists section [in module %s]"),
>  +         objfile_name(objfile));
>  +
>  +  /* Read rnglists section content.  */
>  +  section->read (objfile);
>  +  if (section->buffer == NULL)
>  +    error (_("DW_FORM_rnglistx used without .debug_rnglists section "
>  +           "[in module %s]"),
>  +       objfile_name (objfile));
>  +
>  +  /* Validate that the index is with the offset list range. */
>  +  struct loclist_header header;
>  +  read_loclist_header (&header, section);
>  +  if (rnglist_index >= header.offset_entry_count)
>  +    error (_("DW_FORM_rnglistx index pointing outside of "
>  +           ".debug_rnglists offset array [in module %s]"),
>  +          objfile_name(objfile));
>  +
>  +  /* Validate that the offset is in the section's range.  */
>  +  if (rnglist_base + rnglist_index * cu->header.offset_size
>  +        >= section->size)
>  +    error (_("DW_FORM_rnglistx pointing outside of "
>  +             ".debug_rnglists section [in module %s]"),
>  +          objfile_name(objfile));
>  +
>  +  const gdb_byte *info_ptr
>  +    = section->buffer + rnglist_base + rnglist_index * cu->header.offset_size;
>
>
> > @@ -23382,6 +23511,25 @@ cu_debug_loc_section (struct dwarf2_cu *cu)
> >                                 : &per_objfile->per_bfd->loc);
> >  }
> >
> > +/* Return the .debug_rnglists section to use for CU.  */
> > +static struct dwarf2_section_info *
> > +cu_debug_rnglist_section (struct dwarf2_cu *cu)
> > +{
> > +  if (cu->header.version < 5)
> > +    error (_(".debug_rnglists section cannot be used in DWARF %d"),
> > +        cu->header.version);
> > +  struct dwarf2_per_objfile *dwarf2_per_objfile = cu->per_objfile;
> > +
> > +  if (cu->dwo_unit)
>
> When checking pointers, use an explicit ` == nullptr` / ` != nullptr`.
>
> > +    {
> > +      struct dwo_sections *sections = &cu->dwo_unit->dwo_file->sections;
> > +
> > +      if (sections->rnglists.size > 0)
> > +     return &sections->rnglists;
> > +    }
> > +  return &dwarf2_per_objfile->per_bfd->rnglists;
>
> This function is called to get the right rnglists section in the
> read_rnglist_index function.  I don't understand how this can work.
> When there is a dwo, the read_rnglist_index can be used to read a
> range list either for the main file (the DW_AT_ranges on the
> DW_TAG_skeleton_unit) or for the dwo file (the DW_AT_ranges on a
> DW_TAG_lexical_block).
>
> This function here always return the dwo section if a dwo is
> present... which would be wrong for when reading the DW_AT_ranges
> on the DW_TAG_skeleton_unit?
>
> Ok, I debugged GDB to see when this gets called.  It works because
> when reading the DW_TAG_skeleton_unit, `cu->dwo_unit` is still
> NULL... so the first time cu_debug_rnglist_section gets called, it
> returns the section in the main file, and the following times, it
> returns the section in the dwo file.  That seems a bit fragile to
> me.  For example, if the order of things change, like if we change
> GDB to read the ranges lazily later, and `cu->dwo_unit` is not NULL
> anymore when reading the DW_AT_ranges on DW_TAG_skeleton_unit, this
> will return the wrong section.
>
> Perhaps a more robust way to decide which section to return would
> be to use the same `cu->dwo_unit != nullptr && tag != DW_TAG_compile_unit`
> trick from earlier?  In fact, perhaps that other spot should call
> cu_debug_rnglist_section instead of replicating the logic?
>
> >  static void
> > diff --git a/gdb/testsuite/gdb.dwarf2/dw5-rnglist-test.cc b/gdb/testsuite/gdb.dwarf2/dw5-rnglist-test.cc
> > new file mode 100644
> > index 0000000000..17f78843d2
> > --- /dev/null
> > +++ b/gdb/testsuite/gdb.dwarf2/dw5-rnglist-test.cc
> > @@ -0,0 +1,88 @@
> > +/* This testcase is part of GDB, the GNU debugger.
> > +
> > +   Copyright 2020 Free Software Foundation, Inc.
> > +
> > +   This program is free software; you can redistribute it and/or modify
> > +   it under the terms of the GNU General Public License as published by
> > +   the Free Software Foundation; either version 3 of the License, or
> > +   (at your option) any later version.
> > +
> > +   This program is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +   GNU General Public License for more details.
> > +
> > +   You should have received a copy of the GNU General Public License
> > +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
> > +
> > +#include <iostream>
> > +#include <vector>
> > +
> > +struct node {
> > +  int id;
> > +  node *left;
> > +  node *right;
> > +  bool visited;
> > +};
> > +
> > +node  node_array[50];
> > +unsigned int CUR_IDX = 0;
> > +
> > +node * make_node (int val)
> > +{
> > +  node *new_node = &(node_array[CUR_IDX++]);
> > +  new_node->left = NULL;
> > +  new_node->right = NULL;
> > +  new_node->id = val;
> > +  new_node->visited = false;
> > +
> > +  return new_node;
> > +}
> > +
> > +void tree_insert (node *root, int val)
> > +{
> > +  if (val < root->id)
> > +    {
> > +      if (root->left)
> > +        tree_insert (root->left, val);
> > +      else
> > +        root->left = make_node(val);
> > +    }
> > +  else if (val > root->id)
> > +    {
> > +      if (root->right)
> > +        tree_insert (root->right, val);
> > +      else
> > +        root->right = make_node(val);
> > +    }
> > +}
> > +
> > +void inorder(node *root) {
> > +  std::vector<node *> todo;
> > +  todo.push_back(root);
> > +  while (!todo.empty()){
> > +    node *curr = todo.back();
> > +    todo.pop_back(); /* break-here */
> > +    if (curr->visited) {
> > +      std::cout << curr->id << " ";
> > +    } else {
> > +      curr->visited = true;
> > +      if (curr->right) { todo.push_back(curr->right); }
> > +      todo.push_back(curr);
> > +      if (curr->left) { todo.push_back(curr->left); }
> > +    }
> > +  }
> > +}
> > +
> > +int main (int argc, char **argv)
> > +{
> > +  node *root = make_node (35);
> > +
> > +  tree_insert (root, 28);
> > +  tree_insert (root, 20);
> > +  tree_insert (root, 60);
> > +
> > +  inorder (root);
> > +
> > +  return 0;
> > +}
>
> We try to format the test cases using the same style as the rest of the
> source code.  Could you please update this to follow the same rules?
>
> Simon