From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4601 invoked by alias); 19 Nov 2019 05:27:54 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 4593 invoked by uid 89); 19 Nov 2019 05:27:53 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-8.2 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_1,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.1 spammy=revised, H*M:7012 X-HELO: smtp.polymtl.ca Received: from smtp.polymtl.ca (HELO smtp.polymtl.ca) (132.207.4.11) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 19 Nov 2019 05:27:43 +0000 Received: from simark.ca (simark.ca [158.69.221.121]) (authenticated bits=0) by smtp.polymtl.ca (8.14.7/8.14.7) with ESMTP id xAJ5RVP9012886 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 19 Nov 2019 00:27:36 -0500 DKIM-Filter: OpenDKIM Filter v2.11.0 smtp.polymtl.ca xAJ5RVP9012886 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=polymtl.ca; s=default; t=1574141260; bh=tfq29yL7EwhD85DC6lzn1j/sUb7P1zfTcbrGpKFW4uM=; h=Subject:To:References:From:Date:In-Reply-To:From; b=ATL0idimy43riZSMc/klJvwY6FQCSuojNhmAvLaOSHmNqjiRcBRhMam3nibAo5bSG tt6EEzgVp7uVm9axvwnz6KpLva46BHXvu4wvYzSq0gce/7VT4HKyV9QX3hAV8c7HWm xyE+P/GFtm2CGTizoC6RISLeIA9tzLG05e+sByFA= Received: from [10.0.0.11] (unknown [192.222.164.54]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPSA id 2D5F71E60A; Tue, 19 Nov 2019 00:27:31 -0500 (EST) Subject: Re: [PATCH] Dwarf 5: Handle debug_str_offsets and indexed attributes that have base offsets. To: Ali Tamur , gdb-patches@sourceware.org References: <7f64573b-adef-f991-cc6c-6174b1c24d39@polymtl.ca> <20191119044628.178151-1-tamur@google.com> From: Simon Marchi Message-ID: <317662be-7012-e1b0-d93f-0119749c2661@polymtl.ca> Date: Tue, 19 Nov 2019 05:27:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: <20191119044628.178151-1-tamur@google.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2019-11/txt/msg00584.txt.bz2 On 2019-11-18 11:46 p.m., Ali Tamur wrote: > Hi Simon, > Thanks for the review. Please see my replies below. > >> - ULONGEST ranges_base = 0; >> + gdb::optional ranges_base = 0; > *** I am probably missing something, but I don't really understand the logic > of making this field optional and initializing it to 0, or the need to make it > optional at all. If the CU does not have a DW_AT_rnglists_base attribute, > doesn't it implicitly mean that this CU's ranges offset is 0? So it would seem > fine to just leave it at 0 and keep the arithmetic as before. > > Ok, I changed the initial value to 'no value' and thank you, it's better. It > seems a good idea to differentiate between an unexisting value and an existing > value of 0 even if the math stays the same. For example, for str_offsets_base, > there are different code paths depending on whether there is a value (not > depending whether the value is 0). It seems to me all of [addr_base, > ranges_base, str_offsets_base] should have similar semantics. But I agree that > currently there is no behavioral difference in ranges_base case, do you prefer > it ULONGEST? I don't know enough about the situation to be able to tell. But in my mind the choice would be: if a missing information is semantically equivalent to it being zero, then we don't need the optional. The value 0 is just a no-op. But if a missing information has a different meaning than an information present with a value 0, then the optional is probably necessary because we'll need to do some branching based on whether the value is present or not. >> +#define DW_STRING_IS_STR_INDEX(attr) ((attr)->string_is_str_index) > *** I don't really see the advantage of those macros. > > I agree and I don't like these macros at all. But I wanted to be > consistent. (Maybe I should clear all of them as a separate refactoring cl > sometime) Being consistent with the code around is a good justification, so if you want to keep it like that I am fine with that. >> + struct attribute *attr = dwarf2_attr (comp_unit_die, >> + DW_AT_str_offsets_base, cu); >> + if (attr) > *** Use: if (attr != nullptr) > > Done and sent another patch to replace all "if (attr)" with Thanks, very appreciated. >> static void >> init_cutu_and_read_dies_no_follow (struct dwarf2_per_cu_data *this_cu, >> + struct dwarf2_cu *parent_cu, > *** Please document this new parameter. Since "parent" is passed only to get > the base offsets, I think it would be clearer to pass the offsets directly. > If I look at the signature of the function and see "dwarf2_cu *parent_cu", I > can't really tell what it's for (though a comment might help). But if I see > "ULONGEST str_offsets_base" and "ULONGEST addr_base", it's more self-explanatory. > > I wrote a comment, I hope it's good. I prefer not changing the parameters to > ULONGEST str_offsets_base and ULONGEST addr_base, because I am confused a little > too: there is a case where there is no parent_cu and in that case I believe any > existing value should not be overridden and it is difficult to do that on the > caller site. Ok, I had not noticed that. > *** I'm a bit confused with those different cases, could you please add a bit > of comments to explain the logic for each case? Why is it not possible to > obtain the string value at this point? If cu->str_offsets_base has no value, > doesn't it mean that the offset is 0? > > Maybe it would be more obvious if I saw the actual DWARF being parsed. Could > you suggest some gcc or clang commands to obtain some DWARF where we have a > DW_AT_dwo_name that is an indirect string? > > Reply: > Sure, here's an example. > $ cd /tmp/dw5 && echo "int calculate() { return 4; } int main(int argc, char** argv) { return calculate(); }" >>main.cc > $ clang -gdwarf-5 -gsplit-dwarf main.cc > $ ls > a.out main.cc main.dwo > > $ llvm-dwarfdump --all a.out > a.out: file format ELF64-x86-64 > > .debug_abbrev contents: > Abbrev table for offset: 0x00000000 > [1] DW_TAG_compile_unit DW_CHILDREN_no > DW_AT_stmt_list DW_FORM_sec_offset > DW_AT_str_offsets_base DW_FORM_sec_offset > DW_AT_comp_dir DW_FORM_strx1 > DW_AT_GNU_pubnames DW_FORM_flag_present > DW_AT_GNU_dwo_name DW_FORM_strx1 > DW_AT_low_pc DW_FORM_addrx > DW_AT_high_pc DW_FORM_data4 > DW_AT_addr_base DW_FORM_sec_offset > > .debug_info contents: > 0x00000000: Compile Unit: length = 0x00000024 version = 0x0005 unit_type = DW_UT_skeleton abbr_offset = 0x0000 addr_size = 0x08 DWO_id = 0xee1d4b42a2f0ca0b (next unit at 0x00000028) > > 0x00000014: DW_TAG_compile_unit > DW_AT_stmt_list (0x00000000) > DW_AT_str_offsets_base (0x00000008) > DW_AT_comp_dir ("/tmp/dw5") > DW_AT_GNU_pubnames (true) > DW_AT_GNU_dwo_name ("main.dwo") > DW_AT_low_pc (0x0000000000401110) > DW_AT_high_pc (0x0000000000401141) > DW_AT_addr_base (0x00000008) > .... > .debug_str contents: > 0x00000000: "/tmp/dw5" > 0x00000009: "main.dwo" > .... > .debug_str_offsets contents: > 0x00000000: Contribution size = 12, Format = DWARF32, Version = 5 > 0x00000008: 00000000 "/tmp/dw5" > 0x0000000c: 00000009 "main.dwo" > > Here is what happens. gdb starts to parse DW_TAG_compile_unit DIE. It comes > to DW_AT_GNU_dwo_name. It is of form DW_FORM_strx1 and it has a value of 1. > The actual value is somewhere in .debug_str section. To find it we need to > process .debug_str_offsets (refer to 1st index somewhere within) and also know > the value of DW_AT_str_offsets_base. However, we are in the middle of parsing > the die and there is no guarantee that we have yet processed > DW_AT_str_offsets_base, it may be parsed later. > > My solution is to temporarily write "1" as the value of the attribute > DW_AT_GNU_dwo_name, and mark it as 'needs reprocessing'. After all the > attributes of the die have been processed, DW_AT_str_offsets_base will hold the > correct value if it exists. Then, we revisit the marked attributes. During > reprocess, we don't need to read the binary file again, because we had already > written the value we need (1) in the first pass. We calculate the correct > address using that value, the contents of .debug_str* sections and > DW_AT_str_offsets_base value. > > I am not claiming this is the only or the best solution, but I think it is > intuitive and changes relatively few lines of code. Thanks for the details above. It is a bit late here for me to process all of this, but I'll try to give it a look (and at the revised patch) in the following days. Thanks, Simon