From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1336 invoked by alias); 23 Feb 2012 03:08:00 -0000 Received: (qmail 1322 invoked by uid 22791); 23 Feb 2012 03:07:58 -0000 X-SWARE-Spam-Status: No, hits=-0.1 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,KAM_STOCKTIP,RCVD_IN_DNSWL_LOW,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mail-vw0-f41.google.com (HELO mail-vw0-f41.google.com) (209.85.212.41) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 23 Feb 2012 03:07:35 +0000 Received: by vbip1 with SMTP id p1so691460vbi.0 for ; Wed, 22 Feb 2012 19:07:34 -0800 (PST) Received-SPF: pass (google.com: domain of dje@google.com designates 10.52.65.100 as permitted sender) client-ip=10.52.65.100; Authentication-Results: mr.google.com; spf=pass (google.com: domain of dje@google.com designates 10.52.65.100 as permitted sender) smtp.mail=dje@google.com; dkim=pass header.i=dje@google.com Received: from mr.google.com ([10.52.65.100]) by 10.52.65.100 with SMTP id w4mr19371vds.3.1329966454988 (num_hops = 1); Wed, 22 Feb 2012 19:07:34 -0800 (PST) Received: by 10.52.65.100 with SMTP id w4mr15056vds.3.1329966454927; Wed, 22 Feb 2012 19:07:34 -0800 (PST) MIME-Version: 1.0 Received: by 10.52.65.100 with SMTP id w4mr15047vds.3.1329966454687; Wed, 22 Feb 2012 19:07:34 -0800 (PST) Received: by 10.220.21.4 with HTTP; Wed, 22 Feb 2012 19:07:34 -0800 (PST) In-Reply-To: <4F454188.7000904@eagercon.com> References: <4F4439B2.70101@eagercon.com> <4F454188.7000904@eagercon.com> Date: Thu, 23 Feb 2012 03:08:00 -0000 Message-ID: Subject: Re: Regression handling linkage vs source name From: Doug Evans To: Michael Eager Cc: gdb@sourceware.org, Jan Kratochvil Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-System-Of-Record: true X-Gm-Message-State: ALoCoQmJOuRYZPGZbvFIFsXd7qvCLbYRhMYlajZ7T8PukCuowVEk5eRZR1DhDUDalfLnNuf/SH7SC8mwoFiOxm9xM5iBMIzCVceCBnKNETydbk4YIxgI8FVjn1jFkXY0gkwWU7omo1iZMQFtd5hcvH6a/TSTr+hHoA== X-IsSubscribed: yes Mailing-List: contact gdb-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sourceware.org X-SW-Source: 2012-02/txt/msg00065.txt.bz2 On Wed, Feb 22, 2012 at 11:27 AM, Michael Eager wrote: > On 02/22/2012 08:40 AM, Doug Evans wrote: >> >> On Tue, Feb 21, 2012 at 4:41 PM, Michael Eager =A0wr= ote: >>> >>> There were changes to dwarf2_physname() around July 1, 2011, which >>> cause it to return a different value. =A0Arguably, the changes make >>> the function work better, returning the linkage name for a symbol where >>> it previously returned the source name. >> >> >> Tangential data point: Outside of dwarf2read.c, gdb generally uses >> "physname" to mean linkage name. > > > It doesn't appear to be the case for the value returned by > dwarf2_compute_name(). =A0I haven't looked at other places. I'm not sure what your point is here, so I'm going to set it aside for now. [My comment (tangential data point) still stands though.] > >> I'm not sure dwarf2_physname ever returned the linkage name (for c/c++). >> I know there's been some changes in how it computes the source name. > > > Before the patch, it didn't for C. =A0I didn't look at C++. Ah. Indeed. Fun fun fun. >>> But since the source name >>> is overwritten by the linkage name in the symbol entry, gdb's >>> behavior is different, and, IMO, incorrect. >> >> >> Can you elaborate on what you mean by overwritten? > > > In new_symbol_full(): > =A0 =A0 =A0/* Cache this symbol's name and the name's demangled form (if = any). =A0*/ > =A0 =A0 =A0SYMBOL_SET_LANGUAGE (sym, cu->language); > =A0 =A0 =A0linkagename =3D dwarf2_physname (name, die, cu); > =A0 =A0 =A0SYMBOL_SET_NAMES (sym, linkagename, strlen (linkagename), 0, o= bjfile); > > Clearly the comment is incorrect or misleading: =A0only one name is store= d in > the symbol. Well, yes and no. Only one name is passed to SYMBOL_SET_NAMES, but it does at least try to set both the mangled(/linkage) and demangled(/source-with-caveats) names. > > Before the patch, dwarf2_physname() would call dwarf2_compute_name() which > would return 'name' unmodified. > > After the patch, dwarf2_physname() returns the linkage name or a mangled > name. Right. This was done to address potential shortcomings in c++-land. IIUC, We totally missed gcc's asm(name) feature. Blech. > (I initially misread the code to save the source name in the symbol, > then overwrite it with the linkage name.) > > >> [It's not what I see.] >> >>> Here is the test case: >>> >>> $ cat t.c >>> extern int foo(void) asm ("xfoo"); >>> >>> int foo(void) {} >>> >>> int main (void) >>> { >>> =A0foo (); >>> } >>> >>> $ gdb-before a.out >>> ... >>> (gdb) b foo >>> Breakpoint 1 at ... >>> >>> $ gdb-after a.out >>> ... >>> (gdb) b foo >>> Make breakpoint pending on future shared library load? (y or [n])? n >>> (gdb) b xfoo >>> Breakpoint 1 at ... >>> >>> >>> The symbol "foo" is no longer present in gdb symbol table >>> so trying to use the name specified in the source fails. >>> Before the change, breakpoint, backtrace, and other commands >>> display the symbol name as in the source (and in the DWARF DIE). >>> After the change, the linkage name is displayed instead of the >>> source name. >> >> >> Even pre-dwarf2_physname gdb's have this problem. >> Maybe you tried a 7.x variant that I didn't (I think I tried 6.8, 7.0, >> 7.1, and 7.4 - at the time I didn't have easy access to 7.2,7.3 and >> didn't want to go to trouble of building them). > > > This works correctly in 7.2. =A0Maybe this was an "accident". =A0:-) I'm not sure, though I think it was at least partly accidental. > >> The problem (or at least one of the problems), as I see it, is that >> the API for creating symbols is broken. > > > I agree. > > >> gdb does (or at least can) record both the linkage and source names >> (it does this for "minsyms", e.g. ELF symbols). =A0But the way to set a >> symbol's name is via symbol_set_names and it only takes a linkage name >> (though the dwarf2read.c further compounds the problem by passing it >> the source name - or more accurately the "search" name). >> symbol_set_names then tries to demangle the name and will record that >> name as well if the demangling succeeds. > > > Well, the minisym only stores a single name, not both linkage and source > names. =A0Despite what the comments preceding symbol_set_names() says, > there's only one name saved. Sorry, this is another c-vs-c++ mixup. Minsyms don't come from the debug info so there's no way to obtain the source name for C. However, for c++ the linkage name can be demangled and that name is indeed stored for minsyms (in addition to the linkage name). [Setting aside the case where the program uses asm(name).] > The code in symbol_set_names() seems to be doing way more than it needs > to. =A0If both the source and linkage names are available, then there > is no need to mangle, demangle, add a prefix or do anything but save > the name. I think part of the intent here is to have the demangled name available even if there is no debug info (e.g. for c++). [As for how one goes about achieving that, I wouldn't disagree with the claim that there is room for improvement.] > > (symbol_set_demangled_name() does save a demangled name for C++.) > > >>> Before the change, dwarf2_physname() calls dwarf2_compute_name() >>> which returns the symbol name if the language is not C++, Java, >>> or Fortran, even if the DIE has a DW_AT_linkage_name attribute. >>> After the change, dwarf2_physname() returns the DW_AT_linkage_name. >>> >>> Since gdb doesn't keep both the source name and the linkage >>> name in the symbol table (a design flaw, IMO) what is the >>> right way to get the previous behavior, where gdb recognizes >>> the symbol names from the source file? >> >> >> We need to have dwarf2read.c store both names (linkage name and dwarf >> name). [More changes may be required beyond that, but I think that's a >> start.] =A0Your test program shows that we can't assume we can simply >> demangle the linkage name to get the source name. > > > Yes, this was my conclusion as well. > > When debug info is not available, mangling a source name or demangling > a linkage name may be necessary. =A0I don't know that this should be done > when reading the DWARF info. =A0When the debug info contains both names, > or the linkage name can be inferred from the DIE, doing anything > other than saving their values should not be necessary. In general I agree. I think part of the concern here is if there is a problem in what gcc (or whatever) provides, and how to best cope. > Understanding what the symbol routines are trying to do with all of > the mangling and demangling is a challenge. You're preaching to the choir here ... :-) > One thing which is not > clear is whether there is agreement whether the name stored for a symbol > is supposed to be the source name or the linkagename. This is part of what I was referring to in my tangential data point. dwarf2read.c stores the demangled name in symbol.ginfo.name which is defined to contain the mangled name (if it wasn't, why does symbol.ginfo.language_specific.cplus_specific.demangled_name exist :-)). > symbol_set_names() > and symbol_set_demangled_name() suggest that symtab.c thinks it is the > linkage name. =A0Commands which look up a symbol assume that it is the > source name. re: commands: minsym lookup will make two passes first trying the passed in name as the linkage name, and then trying it as the source(/search) name, whereas fullsym lookup will just look at one (SYMBOL_SEARCH_NAME). > Should there be two symbol entries, one for "foo" and one for "xfoo"? > A single entry which is searched for a match on either name or linkage > name? =A0Something else? I think we do need to store both. Otherwise lookup is going to be too pain= ful. Beyond that, the devil is in the details. Note for reference sake: What kind of demangled name to store is a bit more complex than simply storing the fully demangled name. gdb stores the "search" name which is not necessarily equivalent to the full source name (e.g. for c++ templates that encode the result type in the linkage name).