From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca by simark.ca with LMTP id AciqERUpa2K+NQMAWB0awg (envelope-from ) for ; Thu, 28 Apr 2022 19:53:57 -0400 Received: by simark.ca (Postfix, from userid 112) id 3E1BD1E058; Thu, 28 Apr 2022 19:53:57 -0400 (EDT) Authentication-Results: simark.ca; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=mRXg/lQW; dkim-atps=neutral X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on simark.ca X-Spam-Level: X-Spam-Status: No, score=-3.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPS id 174E31E01D for ; Thu, 28 Apr 2022 19:53:56 -0400 (EDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3AFBC3857372 for ; Thu, 28 Apr 2022 23:53:55 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3AFBC3857372 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1651190035; bh=yy+/qXCTB2KJ0uNG+t2GMG3rew1A4Y3PFQOnPzfBaE0=; h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=mRXg/lQWge7iDgH3Q/EEbNlTjspjUTvdKOxtCKL5uxp5Z+II1zW6NKKFeyKJN6MnV obi7TsMUz7kH6VucssK7kVhmC/NVg/LeurEMNhwariNMXS0i3zS+8M4gu2Po/E1lDu 5Oew2YPbVTC9iYJV/HxzIHS/nE2UMnRC7giqNgeo= Received: from lndn.lancelotsix.com (vps-42846194.vps.ovh.net [IPv6:2001:41d0:801:2000::2400]) by sourceware.org (Postfix) with ESMTPS id 592ED3857433 for ; Thu, 28 Apr 2022 23:53:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 592ED3857433 Received: from ubuntu.lan (unknown [IPv6:2a02:390:9086::635]) by lndn.lancelotsix.com (Postfix) with ESMTPSA id C9CD181D8F; Thu, 28 Apr 2022 23:53:32 +0000 (UTC) Date: Thu, 28 Apr 2022 23:53:24 +0000 To: Simon Marchi Subject: Re: [PATCH v3 5/7] gdb: add "id" fields to identify symtabs and subfiles Message-ID: <20220428235314.elsyab5pk5bufivr@ubuntu.lan> References: <20220428033542.1636284-1-simon.marchi@polymtl.ca> <20220428033542.1636284-6-simon.marchi@polymtl.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20220428033542.1636284-6-simon.marchi@polymtl.ca> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.5.11 (lndn.lancelotsix.com [0.0.0.0]); Thu, 28 Apr 2022 23:53:32 +0000 (UTC) X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Lancelot SIX via Gdb-patches Reply-To: Lancelot SIX Cc: gdb-patches@sourceware.org Errors-To: gdb-patches-bounces+public-inbox=simark.ca@sourceware.org Sender: "Gdb-patches" Hi, Just some nits below. When applying the patch I have: Applying: gdb: add "id" fields to identify symtabs and subfiles .git/rebase-apply/patch:153: trailing whitespace. It is used to look up existing subfiles in calls to start_subfile. */ warning: 1 line adds whitespace errors. On Wed, Apr 27, 2022 at 11:35:40PM -0400, Simon Marchi via Gdb-patches wrote: > Printing macros defined in the main source file doesn't work reliably > using various toolchains, especially when DWARF 5 is used. For example, > using the binaries produced by either of these commands: > > $ gcc --version > gcc (GCC) 11.2.0 > $ ld --version > GNU ld (GNU Binutils) 2.38 > $ gcc test.c -g3 -gdwarf-5 > > $ clang --version > clang version 13.0.1 > $ clang test.c -gdwarf-5 -fdebug-macro > > I get: > > $ ./gdb -nx -q --data-directory=data-directory a.out > (gdb) start > Temporary breakpoint 1 at 0x111d: file test.c, line 6. > Starting program: /home/simark/build/binutils-gdb-one-target/gdb/a.out > > Temporary breakpoint 1, main () at test.c:6 > 6 return ZERO; > (gdb) p ZERO > No symbol "ZERO" in current context. > > When starting to investigate this (taking the gcc-compiled binary as an > example), we see that GDB fails to look up the appropriate macro scope > when evaluating the expression. While stopped in > macro_lookup_inclusion: > > (top-gdb) p name > $1 = 0x62100011a980 "test.c" > (top-gdb) p source.filename > $2 = 0x62100011a9a0 "/home/simark/build/binutils-gdb-one-target/gdb/test.c" > > `source` is the macro_source_file that we would expect GDB to find. > `name` comes from the symtab::filename field of the symtab we are > stopped in. GDB doesn't find the appropriate macro_source_file because > the name of the macro_source_file doesn't match exactly the name of the > symtab. > > The name of the main symtab comes from the compilation unit's > DW_AT_name, passed to the buildsym_compunit's constructor: > > https://gitlab.com/gnutools/binutils-gdb/-/blob/4815d6125ec580cc02a1094d61b8c9d1cc83c0a1/gdb/dwarf2/read.c#L10627-10630 > > The contents of DW_AT_name, in this case, is "test.c". It is typically > (what I witnessed all compilers do) the same string that was passed to > the compiler on the command-line. > > The name of the macro_source_file comes from the line number program > header's file table, from the call to the line_header::file_file_name > method: > > https://gitlab.com/gnutools/binutils-gdb/-/blob/4815d6125ec580cc02a1094d61b8c9d1cc83c0a1/gdb/dwarf2/macro.c#L54-65 > > line_header::file_file_name prepends the directory path that the file > entry refers to, in the file table (if the file name is not already > absolute). In this case, the file name is "test.c", appended to the > directory "/home/simark/build/binutils-gdb-one-target/gdb". > > Because the symtab's name is not created the same way as the > macro_source_file's name is created, we get this mismatch. GDB fails to > find the appropriate macro scope for the symtab, and we can't print > macros when stopped in that symtab. > > To make this work, we must ensure that paths produced in these two ways > end up identical. This can be tricky because of the different ways a > path can be passed to the compiler. > > Another this to consider is that while the main symtab's name (or s/this/thing/ ? > subfile, before it becomes a symtab) is created using DW_AT_name, the > main symtab is also referred to using its entry in the line table > header's file table, when processing the line table. We must therefore > ensure that the same name is produced in both cases, so that a call to > "start_subfile" for the main subfile will correctly find the > already-created subfile, created by buildsym_compunit's constructor. If > we fail to do that, things still often work, because of a fallback: the > watch_main_source_file_lossage method. This method determines that if > the main subfile has no symbols but there exists another subfile with > the same basename (e.g. "test.c") that does have symbols, it's probably > because there was some filename mismatch. So it replaces the main > subfile with that other subfile. I think that heuristic is useful as a > last effort to work around any bug or bad debug info, but I don't think > we should design things such as to rely on it. It's a heuristic, it can > get things wrong. So in my search for a fix, it is important that given > some good debug info, we don't end up relying on that for things to > work. > > A first attempt at fixing this was to try to prepend the compilation > directory here or not prepend it there. In practice, because of all the > possible combinations of debug info the compilers produce, it was not > possible to get something that would produce reliable, consistent paths. > > Another attempt at fixing this was to make both macro_source_file > objects and symtab objects use the most complete form of path possible. > That means to prepend directories at least until we get an absolute > path. In theory, we should end up with the same path in all cases. > This generally worked, but because it changed the symtab names, it > resulted in user-visible changes (for example, paths to source files in > Breakpoint hit messages becoming always absolute). I didn't find this > very good, first because there is a "set filename-display" setting that > lets the user control how they want the paths to be displayed, and that > would suddenly make this setting completely ineffective (although even > today, it is a bit dependent on the debug info). Second, it would > require a good amount of testsuite tweaks to make tests accept these > suddenly absolute paths. > > This new patch is a slight variation of that: it adds a new field called > "filename_for_id" in struct symtab and struct subfile, next to the > existing filename field. The goal is to separate the internal ids used > for finding objects from the names used for presentation. This field is > used for identifying subfiles, symtabs and macro_source_files > internally. For DWARF symtabs, this new field is meant to contain the > "most complete possible" path, as discussed above. So for a given file, > it must always be in the same form, everywhere. The existing > symtab::filename field remains the one used for printing to the user, so > there shouldn't be any change in how paths are printed. > > Changes in the core symtab files are: > > - Add "name_for_id" and "filename_for_id" fields to "struct subfile" > and "struct symta"b, next to existing "name" and "filename" fields. ^ Two chars are swapped here. > - Make buildsym_compunit::buildsym_compunit and > buildsym_compunit::start_subfile accept a "name_for_id" parameter > next to the existing "name" ones. > - Make buildsym_compunit::start_subfile used "name_for_id" for looking > up existing subfiles. This is the key thing for making calls > to start_subfile for the main source file look up the existing > subfile successfully, and avoid relying on > watch_main_source_file_lossage. > - Make sal_macro_scope pass "filename_for_id", rather than "filename", > to macro_lookup_inclusion. This is the key thing to making the > lookup work and macro printing work. > > Changes in the DWARF files are: > > - Make line_header::file_file_name return the "most complete possible" > name. The only pre-existing user of this method is the macro code, > to give the macro_source_file objects their name. And we now want > them to have this "most complete possible" name, which will match the > corresponding symtab's "filename_for_id". > - Make dwarf2_cu::start_compunit_symtab pass the "most complete > possible" name for the main symtab's "filename_for_id". In this > context, where the info comes from the compilation unit's DW_AT_name > / DW_AT_comp_dir, it means prepending DW_AT_comp_dir to DW_AT_name if > DW_AT_name is not already absolute. > - Change dwarf2_start_subfile to build a name_for_id for the subfile > being started. The simplest way is to re-use > line_header::file_file_name, since the callers always have a > file_entry handy. This ensures that it will get the exact same path > representation as the macro code does, for the same file (since it > also uses line_header::file_file_name). > - Update calls to allocate_symtab to pass the "name_for_id" from the > subfile. > > The rest of the changes are to update the other symtab users (jit, > ctfread, mdebugread, xcoffread). For those, the same value is passed > for the "id" as the for the filename, so they should keep the same > behavior they have today. I guess this is just personal preference, but I would have chosen to add overloads so those files do not have to change / bother about an argument which is meaningless for them. Something like: // gdb/buildsym.h struct buildsym_compunit { buildsym_compunit (struct objfile *objfile_, const char *name, const char *comp_dir_, enum language language_, CORE_ADDR last_addr) : buildsym_compunit (objfile_, name, comp_dir_, name, language_, last_addr); buildsym_compunit (struct objfile *objfile_, const char *name, const char *comp_dir_, const char *name_for_id, language language_, CORE_ADDR last_addr); ... void start_subfile (const char *name, const char *name_for_id); void start_subfile (const char *name) { start_subfile (name, name); } }; // gdb/symfile.h extern struct symtab *allocate_symtab (struct compunit_symtab *cust, const char *filename, const char *id)) ATTRIBUTE_NONNULL (1); static inline struct symtab * allocate_symtab (struct compunit_symtab *cust, const char *filename) { return allocate_sumtab (cust, filename, filename); } In the end it does not make much difference, your approach work also perfectly fine so feel free to keep it the way it is. Otherwise, FWIW I find the approach to keep one string representation of the filename for display and one for some sort of canonicalization perfectly appropriate. > > Tests exercising all this are added by the following patch. > > Of all the cases I tried, the only one I found that ends up relying on > watch_main_source_file_lossage is the following one: > > $ clang --version > clang version 13.0.1 > Target: x86_64-pc-linux-gnu > Thread model: posix > InstalledDir: /usr/bin > $ clang ./test.c -g3 -O0 -gdwarf-4 > $ ./gdb -nx --data-directory=data-directory -q -readnow -iex "set debug symtab-create 1" a.out > ... > [symtab-create] start_subfile: name = test.c, name_for_id = /home/simark/build/binutils-gdb-one-target/gdb/test.c > [symtab-create] start_subfile: name = ./test.c, name_for_id = /home/simark/build/binutils-gdb-one-target/gdb/./test.c > [symtab-create] start_subfile: name = ./test.c, name_for_id = /home/simark/build/binutils-gdb-one-target/gdb/./test.c > [symtab-create] start_subfile: found existing symtab with name_for_id /home/simark/build/binutils-gdb-one-target/gdb/./test.c (/home/simark/build/binutils-gdb-one-target/gdb/./test.c) > [symtab-create] watch_main_source_file_lossage: using subfile ./test.c as the main subfile > > As we can see, there are two forms used for "test.c", one with a "." and > one without. This comes from the fact that the compilation unit DIE > contains: > > DW_AT_name ("test.c") > DW_AT_comp_dir ("/home/simark/build/binutils-gdb-one-target/gdb") > > without a ".", and the line table for that file contains: > > include_directories[ 1] = "." > file_names[ 1]: > name: "test.c" > dir_index: 1 > > When assembling the filename from that entry, we get a ".". > > It is a bit unexpected that the main filename resulting from the line > table header does not match exactly the name in the compilation unit. > For instance, gcc uses "./test.c" for the DW_AT_name, which gives > identical paths in the compilation unit and in the line table header. > > Similary, with DWARF 5: s/Similary/Similarly/ Best, Lancelot. > > $ clang ./test.c -g3 -O0 -gdwarf-5 > > clang create two entries that refer to the same file but are of in a different > form. > > include_directories[ 0] = "/home/simark/build/binutils-gdb-one-target/gdb" > include_directories[ 1] = "." > file_names[ 0]: > name: "test.c" > dir_index: 0 > file_names[ 1]: > name: "test.c" > dir_index: 1 > > The first file name produces a path without a "." while the second does. > This is not caught by watch_main_source_file_lossage, because of > dwarf_decode_lines that creates a symtab for each file entry in the line > table. It therefore appears as "non-empty" to > watch_main_source_file_lossage. This results in two symtabs: > > (gdb) maintenance info symtabs > { objfile /home/simark/build/binutils-gdb-one-target/gdb/a.out ((struct objfile *) 0x613000005d00) > { ((struct compunit_symtab *) 0x62100011aca0) > debugformat DWARF 5 > producer clang version 13.0.1 > name test.c > dirname /home/simark/build/binutils-gdb-one-target/gdb > blockvector ((struct blockvector *) 0x621000129ec0) > user ((struct compunit_symtab *) (null)) > { symtab test.c ((struct symtab *) 0x62100011ad20) > fullname (null) > linetable ((struct linetable *) 0x0) > } > { symtab ./test.c ((struct symtab *) 0x62100011ad60) > fullname (null) > linetable ((struct linetable *) 0x621000129ef0) > } > } > } > > I am not sure what is the consequence of this, but this is also what > happens before my patch, so I think its acceptable to leave it as-is. > > To handle these two cases nicely, I think we will need a function that > removes the unnecessary "." from path names, something that can be done > later. > > Change-Id: I6fb4f10de040602779da07a0934b7078701a1856