From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 23495 invoked by alias); 16 Sep 2004 00:23:25 -0000 Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sources.redhat.com Received: (qmail 23474 invoked from network); 16 Sep 2004 00:23:22 -0000 Received: from unknown (HELO mx1.redhat.com) (66.187.233.31) by sourceware.org with SMTP; 16 Sep 2004 00:23:22 -0000 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.12.11/8.12.10) with ESMTP id i8G0NMs2006023 for ; Wed, 15 Sep 2004 20:23:22 -0400 Received: from zenia.home.redhat.com (sebastian-int.corp.redhat.com [172.16.52.221]) by int-mx1.corp.redhat.com (8.11.6/8.11.6) with ESMTP id i8G0NJr29290; Wed, 15 Sep 2004 20:23:20 -0400 To: Ton van Overbeek , Daniel Jacobowitz Cc: gdb-patches@sources.redhat.com Subject: Re: [PATCH] Fix coff symbol table reading problem for C code compiled by g++ References: From: Jim Blandy Date: Thu, 16 Sep 2004 00:23:00 -0000 In-Reply-To: Message-ID: User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-SW-Source: 2004-09/txt/msg00263.txt.bz2 Ton van Overbeek writes: > I found a bug/problem in the symbol reading code in symtab.c in gdb-6.2. > > The problem occurs when reading symbols from a coff object file produced > >From a C source file compiled by g++ (not by gcc). > The particular compiler is m68k-palmos-gcc, which is still based on gcc-2.95.3. > See http://prc-tools.sourceforge.net. > I know this gcc version is very old, but I believe the problem may also exist > for other compilers. > > When compiling normal C code by g++ it produces mangled function names. > The coff symbol reader first reads the function name and inserts this in > the minimal symbol table and in a demangled name hash table in > symtab_set_names(). When reading the '.bf' symbol the function name is inserted > in the real symbol table. This time the symbol is already in the hash table > in symtab_set_names() and symbol_find_demangled_name() is not called. A side > effect of symbol_find_demangled_name() is that it changes/corrects the > gsymbol->language field. In the case of 'C compiled by g++' it changes > it from language_auto to language_cplus. > Because symbol_find_demangled_name() is not called, the gsymbol->language field > in the full symbol table stays set to language_auto. > This causes all kinds of problems when looking up symbols later, since the > stored name is the mangled name and the demangled name is empty: the symbol is > not found in the full symbol table and the code falls back on the minimal > symbol table or e.g. function names. > When trying to set a breakpoint on a function, the breakpoint is then set > on the last line of the preceding function. > > I have applied the following fix to ensure that symbol_find_demangled_name() > is also called in this case. It is working for me. I do not know > if something else is needed for other languages/compilers/compiler > versions. So, let me make sure I understand this correctly: The essential problem is that symbol_set_names sometimes has the side effect of setting GSYMBOL->language, and sometimes it doesn't: whether it does depends on whether that particular mangled name has been seen before in this objfile, which shouldn't matter. Here's the thread about introducing demangled_names_hash: http://sources.redhat.com/ml/gdb-patches/2003-01/msg00726.html The main motivation for introducing it was to be able to include mangled names in the partial symbol tables; we were also hoping to save time by avoiding calling the demangler. As it turns out, the time saved by not calling the demangler was used up (to within 1%) by the overhead of the patch, so there was no net performance win. (Assuming you weren't paging...) The problem with your patch is that it brings back all the calls to the demangler that the hash table allowed us to avoid: the demangler gets called every time, whether we've already demangled the symbol before or not. I think the fundamental problem is that the hash table only retains partial information about the results from symbol_find_demangled_name: it retains the demangled name, but not the language whose demangler we used. If we could retain that information, then symbol_set_names could consistently provide the language. I see two approaches. Based on the discussion in the thread, space is at a premium, so we're only considering things which won't significantly increase the memory usage. Specific numbers are from the test case discussed in the thread. - Store the language in another byte beyond the demangled name. This makes the form of the hash table entries even less obvious. It would also add 200k of memory consumption. On the other hand, depending on the granularity of obstack_alloc, perhaps many of those would fall into the padding at the end of the value. The hair could be localized to symbol_set_names, though. - Have a separate hash table for each language. In 'struct objfile', we'd have: struct htab *demangled_names_hashes[nr_languages]; They'd be allocated lazily. One would need to probe all hash tables before deciding that a symbol hadn't been seen yet (or, only the hash tables that'd actually been allocated, typically only one unless you're mixing languages). Then, the index of the hash table you'd found your name in would tell you the language. This would entail a lot of changes elsewhere to properly initialize and free demangled_names_hashes. Daniel, what do you think? Have I at least got the problem right?