From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 18040 invoked by alias); 22 Oct 2007 16:56:47 -0000 Received: (qmail 18026 invoked by uid 22791); 22 Oct 2007 16:56:44 -0000 X-Spam-Check-By: sourceware.org Received: from NaN.false.org (HELO nan.false.org) (208.75.86.248) by sourceware.org (qpsmtpd/0.31) with ESMTP; Mon, 22 Oct 2007 16:56:36 +0000 Received: from nan.false.org (localhost [127.0.0.1]) by nan.false.org (Postfix) with ESMTP id 6EF939833F; Mon, 22 Oct 2007 16:56:34 +0000 (GMT) Received: from caradoc.them.org (22.svnf5.xdsl.nauticom.net [209.195.183.55]) by nan.false.org (Postfix) with ESMTP id 13DFB9833D; Mon, 22 Oct 2007 16:56:34 +0000 (GMT) Received: from drow by caradoc.them.org with local (Exim 4.68) (envelope-from ) id 1Ik0Zs-00023P-Sw; Mon, 22 Oct 2007 12:56:32 -0400 Date: Mon, 22 Oct 2007 17:27:00 -0000 From: Daniel Jacobowitz To: gdb-patches@sourceware.org Cc: Eli Zaretskii Subject: [rfc] gdb/931 - Canonicalize type names on input / output Message-ID: <20071022165632.GA5202@caradoc.them.org> Mail-Followup-To: gdb-patches@sourceware.org, Eli Zaretskii MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.15 (2007-04-09) X-IsSubscribed: yes Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2007-10/txt/msg00511.txt.bz2 This patch fixes gdb/931 and most of gdb/1512. The original problem in gdb/931 was that "ptype C" did not work, although "ptype C" did, with no space before the star. There are two KFAILs in templates.exp that become passes with the patch applied. "So just ignore whitespace", you must be thinking. But it still won't work because that there's another important case: "ptype C" with char and const swapped. I've spent a long time thinking about this problem. I mean a really long time; I wrote the necessary parser over the winter of 2003-2004, nearly four years ago! And I still can't think of any useful lighter-weight solution than parsing the name into a tree and outputing it in canonical form. The attached patch implements this idea. The unfortunate part is that this adds a function call to almost every DW_AT_name attribute when loading C++ symbols. I've done everything I could think of to improve performance of that function and it is still a roughly 10% slowdown for large C++ applications, measured with a set of shared libraries from KOffice and Firefox. There is no measurable impact for files written in other languages. All other ad-hoc solutions that I looked at are faster but less reliable, leading to corner cases where it becomes difficult or impossible to look up some symbol that I hadn't accounted for. I have various other ideas for speeding up GDB startup. Most of them are independent of this patch, which means it's sort of cheating; I may get ten percent back, but it won't be this ten percent. There's another benefit to consider though: right now we rely on the demangler and DW_AT_MIPS_linkage_name to get a lot of things right. I want to support compilers which do not generate DW_AT_MIPS_linkage_name, which will: - let us support some non-GNU compilers - avoid the demangler during symbol reading - eventually let GCC stop generating DW_AT_MIPS_linkage_name, producing smaller binaries - Fix the namespace.exp failures, a recent Fortran PR, and hopefully some Ada failures I've been seeing lately. So I believe that this is a worthwhile tradeoff. I would like to add a NEWS entry for this. Eli, is this a reasonable one? * GDB now parses C++ symbol and type names more flexibly. And we have not been keeping the PROBLEMS file up to date much lately, but this will remove the gdb/931 entry from that file and I'll be back to remove the gdb/1512 entry too. Most of PROBLEMS seems to be obsolete now - great! Tested on x86_64-pc-linux-gnu with DWARF-2, and on i686-pc-linux-gnu with stabs for completeness. -- Daniel Jacobowitz CodeSourcery 2007-10-22 Daniel Jacobowitz PR gdb/931 * Makefile.in (dbxread.o): Update. * dbxread.c (read_dbx_symtab): Use cp_canonicalize_string. * dwarf2read.c (GDB_FORM_cached_string): New. (read_partial_die): Use dwarf2_canonicalize_name. (dwarf2_linkage_name): Use dwarf2_name. (dwarf2_canonicalize_name): New. (dwarf2_name): Use dwarf2_canonicalize_name. (dwarf_form_name, dump_die): Handle GDB_FORM_cached_string. * stabsread.c (define_symbol, read_type): Use cp_canonicalize_string. * symtab.c (lookup_symbol_in_language): Canonicalize input before searching. 2007-10-22 Daniel Jacobowitz PR gdb/931 * gdb.cp/gdb1355.exp (f_li, f_lui, f_si, f_sui): Allow canonical output. * gdb.cp/templates.exp: Allow canonical output. Remove KFAILs for gdb/931. Index: Makefile.in =================================================================== RCS file: /cvs/src/src/gdb/Makefile.in,v retrieving revision 1.946 diff -u -p -r1.946 Makefile.in --- Makefile.in 22 Oct 2007 14:37:36 -0000 1.946 +++ Makefile.in 22 Oct 2007 16:19:15 -0000 @@ -1955,6 +1955,7 @@ dbxread.o: dbxread.c $(defs_h) $(gdb_str $(gdb_stat_h) $(symtab_h) $(breakpoint_h) $(target_h) $(gdbcore_h) \ $(libaout_h) $(objfiles_h) $(buildsym_h) $(stabsread_h) \ $(gdb_stabs_h) $(demangle_h) $(complaints_h) $(cp_abi_h) \ + $(cp_support_h) \ $(gdb_assert_h) $(gdb_string_h) $(aout_aout64_h) $(aout_stab_gnu_h) dcache.o: dcache.c $(defs_h) $(dcache_h) $(gdbcmd_h) $(gdb_string_h) \ $(gdbcore_h) $(target_h) Index: dbxread.c =================================================================== RCS file: /cvs/src/src/gdb/dbxread.c,v retrieving revision 1.90 diff -u -p -r1.90 dbxread.c --- dbxread.c 19 Oct 2007 12:26:31 -0000 1.90 +++ dbxread.c 22 Oct 2007 16:19:15 -0000 @@ -54,6 +54,7 @@ #include "demangle.h" #include "complaints.h" #include "cp-abi.h" +#include "cp-support.h" #include "gdb_assert.h" #include "gdb_string.h" @@ -1178,6 +1179,8 @@ read_dbx_symtab (struct objfile *objfile struct internal_nlist nlist; CORE_ADDR text_addr; int text_size; + char *sym_name; + int sym_len; char *namestring; int nsl; @@ -1663,7 +1666,27 @@ read_dbx_symtab (struct objfile *objfile if (!p) continue; /* Not a debugging symbol. */ + sym_len = 0; + if (psymtab_language == language_cplus) + { + char *new_name, *name = alloca (p - namestring + 1); + memcpy (name, namestring, p - namestring); + name[p - namestring] = '\0'; + new_name = cp_canonicalize_string (name); + if (new_name != NULL) + { + sym_len = strlen (new_name); + sym_name = obsavestring (new_name, sym_len, + &objfile->objfile_obstack); + xfree (new_name); + } + } + if (sym_len == 0) + { + sym_name = namestring; + sym_len = p - namestring; + } /* Main processing section for debugging symbols which the initial read through the symbol tables needs to worry @@ -1681,7 +1704,7 @@ read_dbx_symtab (struct objfile *objfile namestring = gdbarch_static_transform_name (current_gdbarch, namestring); - add_psymbol_to_list (namestring, p - namestring, + add_psymbol_to_list (sym_name, sym_len, VAR_DOMAIN, LOC_STATIC, &objfile->static_psymbols, 0, nlist.n_value, @@ -1691,7 +1714,7 @@ read_dbx_symtab (struct objfile *objfile nlist.n_value += ANOFFSET (objfile->section_offsets, data_sect_index); /* The addresses in these entries are reported to be wrong. See the code that reads 'G's for symtabs. */ - add_psymbol_to_list (namestring, p - namestring, + add_psymbol_to_list (sym_name, sym_len, VAR_DOMAIN, LOC_STATIC, &objfile->global_psymbols, 0, nlist.n_value, @@ -1709,7 +1732,7 @@ read_dbx_symtab (struct objfile *objfile || (p == namestring + 1 && namestring[0] != ' ')) { - add_psymbol_to_list (namestring, p - namestring, + add_psymbol_to_list (sym_name, sym_len, STRUCT_DOMAIN, LOC_TYPEDEF, &objfile->static_psymbols, nlist.n_value, 0, @@ -1717,7 +1740,7 @@ read_dbx_symtab (struct objfile *objfile if (p[2] == 't') { /* Also a typedef with the same name. */ - add_psymbol_to_list (namestring, p - namestring, + add_psymbol_to_list (sym_name, sym_len, VAR_DOMAIN, LOC_TYPEDEF, &objfile->static_psymbols, nlist.n_value, 0, @@ -1729,7 +1752,7 @@ read_dbx_symtab (struct objfile *objfile case 't': if (p != namestring) /* a name is there, not just :T... */ { - add_psymbol_to_list (namestring, p - namestring, + add_psymbol_to_list (sym_name, sym_len, VAR_DOMAIN, LOC_TYPEDEF, &objfile->static_psymbols, nlist.n_value, 0, @@ -1808,7 +1831,7 @@ read_dbx_symtab (struct objfile *objfile continue; case 'c': /* Constant, e.g. from "const" in Pascal. */ - add_psymbol_to_list (namestring, p - namestring, + add_psymbol_to_list (sym_name, sym_len, VAR_DOMAIN, LOC_CONST, &objfile->static_psymbols, nlist.n_value, 0, psymtab_language, objfile); @@ -1872,7 +1895,7 @@ read_dbx_symtab (struct objfile *objfile pst->textlow = nlist.n_value; textlow_not_set = 0; } - add_psymbol_to_list (namestring, p - namestring, + add_psymbol_to_list (sym_name, sym_len, VAR_DOMAIN, LOC_BLOCK, &objfile->static_psymbols, 0, nlist.n_value, @@ -1940,7 +1963,7 @@ read_dbx_symtab (struct objfile *objfile pst->textlow = nlist.n_value; textlow_not_set = 0; } - add_psymbol_to_list (namestring, p - namestring, + add_psymbol_to_list (sym_name, sym_len, VAR_DOMAIN, LOC_BLOCK, &objfile->global_psymbols, 0, nlist.n_value, Index: dwarf2read.c =================================================================== RCS file: /cvs/src/src/gdb/dwarf2read.c,v retrieving revision 1.234 diff -u -p -r1.234 dwarf2read.c --- dwarf2read.c 22 Oct 2007 03:01:00 -0000 1.234 +++ dwarf2read.c 22 Oct 2007 16:19:16 -0000 @@ -541,6 +541,15 @@ struct die_info struct type *type; /* Cached type information */ }; +/* Additional GDB-specific attribute forms. */ +enum + { + /* A string which has been updated to GDB's internal + representation (e.g. converted to canonical form) and does not + need to be updated again. */ + GDB_FORM_cached_string = 0xff + }; + /* Attributes have a name and a value */ struct attribute { @@ -978,6 +987,9 @@ static void process_die (struct die_info static char *dwarf2_linkage_name (struct die_info *, struct dwarf2_cu *); +static char *dwarf2_canonicalize_name (char *, struct dwarf2_cu *, + struct obstack *); + static char *dwarf2_name (struct die_info *die, struct dwarf2_cu *); static struct die_info *dwarf2_extension (struct die_info *die, @@ -5546,10 +5558,23 @@ read_partial_die (struct partial_die_inf switch (attr.name) { case DW_AT_name: - - /* Prefer DW_AT_MIPS_linkage_name over DW_AT_name. */ - if (part_die->name == NULL) - part_die->name = DW_STRING (&attr); + switch (part_die->tag) + { + case DW_TAG_compile_unit: + /* Compilation units have a DW_AT_name that is a filename, not + a source language identifier. */ + case DW_TAG_enumeration_type: + case DW_TAG_enumerator: + /* These tags always have simple identifiers already; no need + to canonicalize them. */ + part_die->name = DW_STRING (&attr); + break; + default: + part_die->name + = dwarf2_canonicalize_name (DW_STRING (&attr), cu, + &cu->comp_unit_obstack); + break; + } break; case DW_AT_comp_dir: if (part_die->dirname == NULL) @@ -7844,10 +7869,29 @@ dwarf2_linkage_name (struct die_info *di attr = dwarf2_attr (die, DW_AT_MIPS_linkage_name, cu); if (attr && DW_STRING (attr)) return DW_STRING (attr); - attr = dwarf2_attr (die, DW_AT_name, cu); - if (attr && DW_STRING (attr)) - return DW_STRING (attr); - return NULL; + return dwarf2_name (die, cu); +} + +/* Get name of a die, return NULL if not found. */ + +static char * +dwarf2_canonicalize_name (char *name, struct dwarf2_cu *cu, + struct obstack *obstack) +{ + if (cu->language == language_cplus) + { + char *canon_name = cp_canonicalize_string (name); + + if (canon_name != NULL) + { + if (strcmp (canon_name, name) != 0) + name = obsavestring (canon_name, strlen (canon_name), + obstack); + xfree (canon_name); + } + } + + return name; } /* Get name of a die, return NULL if not found. */ @@ -7858,9 +7902,29 @@ dwarf2_name (struct die_info *die, struc struct attribute *attr; attr = dwarf2_attr (die, DW_AT_name, cu); - if (attr && DW_STRING (attr)) - return DW_STRING (attr); - return NULL; + if (!attr || !DW_STRING (attr)) + return NULL; + + switch (die->tag) + { + case DW_TAG_compile_unit: + /* Compilation units have a DW_AT_name that is a filename, not + a source language identifier. */ + case DW_TAG_enumeration_type: + case DW_TAG_enumerator: + /* These tags always have simple identifiers already; no need + to canonicalize them. */ + return DW_STRING (attr); + default: + if (attr->form != GDB_FORM_cached_string) + { + DW_STRING (attr) + = dwarf2_canonicalize_name (DW_STRING (attr), cu, + &cu->objfile->objfile_obstack); + attr->form = GDB_FORM_cached_string; + } + return DW_STRING (attr); + } } /* Return the die that this die in an extension of, or NULL if there @@ -8352,6 +8416,8 @@ dwarf_form_name (unsigned form) return "DW_FORM_ref_udata"; case DW_FORM_indirect: return "DW_FORM_indirect"; + case GDB_FORM_cached_string: + return "GDB_FORM_cached_string"; default: return "DW_FORM_"; } @@ -8883,6 +8949,7 @@ dump_die (struct die_info *die) break; case DW_FORM_string: case DW_FORM_strp: + case GDB_FORM_cached_string: fprintf_unfiltered (gdb_stderr, "string: \"%s\"", DW_STRING (&die->attrs[i]) ? DW_STRING (&die->attrs[i]) : ""); Index: stabsread.c =================================================================== RCS file: /cvs/src/src/gdb/stabsread.c,v retrieving revision 1.102 diff -u -p -r1.102 stabsread.c --- stabsread.c 19 Oct 2007 12:23:20 -0000 1.102 +++ stabsread.c 22 Oct 2007 16:19:16 -0000 @@ -587,6 +587,7 @@ define_symbol (CORE_ADDR valu, char *str int deftype; int synonym = 0; int i; + char *new_name = NULL; /* We would like to eliminate nameless symbols, but keep their types. E.g. stab entry ":t10=*2" should produce a type 10, which is a pointer @@ -680,7 +681,20 @@ define_symbol (CORE_ADDR valu, char *str { normal: SYMBOL_LANGUAGE (sym) = current_subfile->language; - SYMBOL_SET_NAMES (sym, string, p - string, objfile); + if (current_subfile->language == language_cplus) + { + char *name = alloca (p - string + 1); + memcpy (name, string, p - string); + name[p - string] = '\0'; + new_name = cp_canonicalize_string (name); + } + if (new_name != NULL) + { + SYMBOL_SET_NAMES (sym, new_name, strlen (new_name), objfile); + xfree (new_name); + } + else + SYMBOL_SET_NAMES (sym, string, p - string, objfile); } p++; @@ -1546,18 +1560,35 @@ again: if (*p != ':') return error_type (pp, objfile); } - to = type_name = - (char *) obstack_alloc (&objfile->objfile_obstack, p - *pp + 1); + type_name = NULL; + if (current_subfile->language == language_cplus) + { + char *new_name, *name = alloca (p - *pp + 1); + memcpy (name, *pp, p - *pp); + name[p - *pp] = '\0'; + new_name = cp_canonicalize_string (name); + if (new_name != NULL) + { + type_name = obsavestring (new_name, strlen (new_name), + &objfile->objfile_obstack); + xfree (new_name); + } + } + if (type_name == NULL) + { + to = type_name = + (char *) obstack_alloc (&objfile->objfile_obstack, p - *pp + 1); - /* Copy the name. */ - from = *pp + 1; - while (from < p) - *to++ = *from++; - *to = '\0'; + /* Copy the name. */ + from = *pp + 1; + while (from < p) + *to++ = *from++; + *to = '\0'; + } /* Set the pointer ahead of the name which we just read, and the colon. */ - *pp = from + 1; + *pp = p + 1; } /* If this type has already been declared, then reuse the same Index: symtab.c =================================================================== RCS file: /cvs/src/src/gdb/symtab.c,v retrieving revision 1.166 diff -u -p -r1.166 symtab.c --- symtab.c 9 Oct 2007 06:59:27 -0000 1.166 +++ symtab.c 22 Oct 2007 16:19:17 -0000 @@ -41,6 +41,7 @@ #include "objc-lang.h" #include "ada-lang.h" #include "p-lang.h" +#include "cp-support.h" #include "hashtab.h" @@ -1093,6 +1094,17 @@ lookup_symbol_in_language (const char *n modified_name = demangled_name; needtofreename = 1; } + else + { + /* If we were given a non-mangled name, canonicalize it + according to the language (so far only for C++). */ + demangled_name = cp_canonicalize_string (name); + if (demangled_name) + { + modified_name = demangled_name; + needtofreename = 1; + } + } } else if (lang == language_java) { Index: testsuite/gdb.cp/gdb1355.exp =================================================================== RCS file: /cvs/src/src/gdb/testsuite/gdb.cp/gdb1355.exp,v retrieving revision 1.4 diff -u -p -r1.4 gdb1355.exp --- testsuite/gdb.cp/gdb1355.exp 23 Aug 2007 18:14:17 -0000 1.4 +++ testsuite/gdb.cp/gdb1355.exp 22 Oct 2007 16:19:17 -0000 @@ -68,11 +68,11 @@ set s_tail ".*" set f_i "${ws}int m_int;" set f_c "${ws}char m_char;" -set f_li "${ws}long int m_long_int;" +set f_li "${ws}long( int)? m_long_int;" set f_ui "${ws}unsigned int m_unsigned_int;" -set f_lui "${ws}long unsigned int m_long_unsigned_int;" -set f_si "${ws}short int m_short_int;" -set f_sui "${ws}short unsigned int m_short_unsigned_int;" +set f_lui "${ws}(long unsigned int|unsigned long) m_long_unsigned_int;" +set f_si "${ws}short( int)? m_short_int;" +set f_sui "${ws}(short unsigned int|unsigned short) m_short_unsigned_int;" set f_uc "${ws}unsigned char m_unsigned_char;" set f_f "${ws}float m_float;" set f_d "${ws}double m_double;" Index: testsuite/gdb.cp/templates.exp =================================================================== RCS file: /cvs/src/src/gdb/testsuite/gdb.cp/templates.exp,v retrieving revision 1.15 diff -u -p -r1.15 templates.exp --- testsuite/gdb.cp/templates.exp 23 Aug 2007 18:14:17 -0000 1.15 +++ testsuite/gdb.cp/templates.exp 22 Oct 2007 16:19:17 -0000 @@ -325,13 +325,11 @@ gdb_expect { send_gdb "print Foo::foo\n" gdb_expect { - -re "\\$\[0-9\]* = \\{.*char \\*\\((class |)Foo \\*(| const), int, .*char \\*\\)\\} $hex ::foo\\(int, .*char.*\\*\\)>\r\n$gdb_prompt $" { pass "print Foo::foo" } + -re "\\$\[0-9\]* = \\{.*char \\*\\((class |)Foo<(volatile char|char volatile) ?\\*> \\*(| const), int, .*char \\*\\)\\} $hex ::foo\\(int, .*char.*\\*\\)>\r\n$gdb_prompt $" { pass "print Foo::foo" } -re "No symbol \"Foo\" in current context.\r\n$gdb_prompt $" { - # This used to be a kfail gdb/33. That problem has been - # fixed, but now gdb/931 and gdb/1512 are rearing their ugly - # heads. - kfail "gdb/931" "print Foo::foo" + # This used to be a kfail gdb/33 and then kfail gdb/931. + fail "print Foo::foo" } -re "$gdb_prompt $" { fail "print Foo::foo" } timeout { fail "(timeout) print Foo::foo" } @@ -339,13 +337,11 @@ gdb_expect { send_gdb "print Foo::foo\n" gdb_expect { - -re "\\$\[0-9\]* = \\{.*char \\*\\((class |)Foo \\*(| const), int, .*char \\*\\)\\} $hex ::foo\\(int, .*char.*\\*\\)>\r\n$gdb_prompt $" { pass "print Foo::foo" } + -re "\\$\[0-9\]* = \\{.*char \\*\\((class |)Foo<(volatile char|char volatile) ?\\*> \\*(| const), int, .*char \\*\\)\\} $hex ::foo\\(int, .*char.*\\*\\)>\r\n$gdb_prompt $" { pass "print Foo::foo" } -re "No symbol \"Foo\" in current context.\r\n$gdb_prompt $" { - # This used to be a kfail gdb/33. That problem has been - # fixed, but now gdb/931 and gdb/1512 are rearing their ugly - # heads. - kfail "gdb/931" "print Foo::foo" + # This used to be a kfail gdb/33 and then kfail gdb/931. + fail "print Foo::foo" } -re "$gdb_prompt $" { fail "print Foo::foo" } timeout { fail "(timeout) print Foo::foo" } @@ -455,7 +451,7 @@ send_gdb "ptype quxint\n" gdb_expect { -re "type = class Qux \\{\r\n\[ \t\]*public:\r\n\[ \t\]*int x;\r\n\[ \t\]*int t;\r\n\r\n\[ \t\]*.*int qux\\(int, int\\);\r\n\\}\r\n$gdb_prompt $" { pass "ptype quxint" } -re "type = class Qux \\{\r\n\[ \t\]*public:\r\n\[ \t\]*int x;\r\n\[ \t\]*int t;\r\n\r\n\[ \t\]*int qux\\(int, int\\);.*\r\n\\}\r\n$gdb_prompt $" { pass "ptype quxint" } - -re "type = class Qux \\{\r\n\[ \t\]*public:\r\n\[ \t\]*int x;\r\n\[ \t\]*int t;\r\n\r\n\[ \t\]*int qux\\(int, int\\);.*\r\n\\}\r\n$gdb_prompt $" { pass "ptype quxint" } + -re "type = class Qux \\{\r\n\[ \t\]*public:\r\n\[ \t\]*int x;\r\n\[ \t\]*int t;\r\n\r\n\[ \t\]*int qux\\(int, int\\);.*\r\n\\}\r\n$gdb_prompt $" { pass "ptype quxint" } -re "type = class Qux \\{\r\n\[ \t\]*public:\r\n\[ \t\]*int x;\r\n\[ \t\]*int t;\r\n\r\n\[ \t\]*int qux\\(int, int\\);.*\r\n\\}\r\n$gdb_prompt $" { kfail "gdb/1512" "ptype quxint" }