From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 29714 invoked by alias); 25 Sep 2002 02:52:11 -0000 Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sources.redhat.com Received: (qmail 29687 invoked from network); 25 Sep 2002 02:52:08 -0000 Received: from unknown (HELO zenia.red-bean.com) (66.244.67.22) by sources.redhat.com with SMTP; 25 Sep 2002 02:52:08 -0000 Received: (from jimb@localhost) by zenia.red-bean.com (8.11.6/8.11.6) id g8P2adZ25401; Tue, 24 Sep 2002 21:36:39 -0500 To: David Carlton Cc: Daniel Jacobowitz , gdb@sources.redhat.com Subject: Re: suggestion for dictionary representation References: <200209230244.g8N2ieo21741@zenia.red-bean.com> <20020923031056.GA26307@nevyn.them.org> From: Jim Blandy Date: Tue, 24 Sep 2002 19:52:00 -0000 In-Reply-To: Message-ID: User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2.90 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-SW-Source: 2002-09/txt/msg00391.txt.bz2 David Carlton writes: > > I'm tempted to whack the block special case for function arguments. It > > may make name lookup a little more complicated but I think it will make > > everything clearer. We could, of course, try this on the branch and > > see if we like the results :) >=20 > Would it be reasonable to break up function blocks into two separate > blocks: a linear block that only defines the parameters for the > function and a non-linear block that contains the actual local > variables? Not that I think Jim's scheme is a bad one - I agree that > it's better than the current scheme - but given the possibility of > local variables shadowing function parameters, it seems to me to be > conceptually cleaner to have two separate blocks appear anyways, and > it also solves this problem. The issue is a bit more tangled than you think, I think. Splitting the function's body and its formals into two separate blocks is a good idea, but it isn't going to get rid of all your duplicates. A single formal parameter can have two symbols in a function's block that describe it. Try this out on a Pentium. (The `-O2' and `-gstabs+' are required.) $ cat func.c #include int main (int argc, char **argv) { static int local =3D 3; printf ("%d\n", argc * local); } $ gcc -O2 -gstabs+ func.c -o func Then start up GDB on GDB on `func': (top-gdb) run The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: gdb -nw func GNU gdb 2002-09-16-cvs Copyright 2002 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you = are welcome to change it and/or distribute copies of it under certain conditi= ons. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for detail= s. This GDB was configured as "i686-pc-linux-gnu"... (gdb) Set a breakpoint in main, just to get the symbols read: (gdb) break main Breakpoint 1 at 0x804834c: file func.c, line 7. (gdb) Drop out to the enclosing GDB: (gdb) info (top-gdb) It just so happens that `func.c' is the first compilation unit of the first executable file in GDB's list: (top-gdb) print object_files->symtabs->filename $177 =3D 0x82fdbf8 "func.c" (top-gdb) If that's not so for you, you'll need to walk `symtabs' to find the right symtab. Anyway, let's check out this symtab's blockvector. I'm just using [0] as a postfix dereferencing operator here: (top-gdb) print object_files->symtabs->blockvector[0] $178 =3D {nblocks =3D 3, block =3D {0x82f8b74}} (top-gdb) The first and second blocks are the global and static blocks, so the third one is probably for `main': (top-gdb) print object_files->symtabs->blockvector->block[2] $179 =3D (struct block *) 0x82f8ab4 (top-gdb) p *$179 $180 =3D {startaddr =3D 134513472, endaddr =3D 134513513, function =3D 0x= 82f8988,=20 superblock =3D 0x82f8ae4, gcc_compile_flag =3D 2 '\002', hashtable =3D = 0 '\0',=20 nsyms =3D 4, sym =3D {0x82f89c4}} (top-gdb) p *$179->function $181 =3D {ginfo =3D {name =3D 0x82f89bc "main", value =3D {ivalue =3D 137= 333428,=20 block =3D 0x82f8ab4,=20 bytes =3D 0x82f8ab4 "@\203\004\bi\203\004\b\210\211/\b=E4\212/\b\00= 2",=20 address =3D 137333428, chain =3D 0x82f8ab4}, language_specific =3D { cplus_specific =3D {demangled_name =3D 0x0}}, language =3D language= _c,=20 section =3D 11, bfd_section =3D 0x82d4fc0}, type =3D 0x82faaa8,=20 namespace =3D VAR_NAMESPACE, aclass =3D LOC_BLOCK, line =3D 5, aux_valu= e =3D { basereg =3D 0}, aliases =3D 0x0, ranges =3D 0x0, hash_next =3D 0x0} (top-gdb) And it was! Let's look at those four symbols: (top-gdb) p *$179->sym[0] $182 =3D {ginfo =3D {name =3D 0x82f89f8 "argc", value =3D {ivalue =3D 8, = block =3D 0x8,=20 bytes =3D 0x8
, address =3D 8, chain =3D= 0x8},=20 language_specific =3D {cplus_specific =3D {demangled_name =3D 0x0}},= =20 language =3D language_c, section =3D 0, bfd_section =3D 0x0}, type = =3D 0x82df828, namespace =3D VAR_NAMESPACE, aclass =3D LOC_ARG, line =3D 4, aux_value = =3D { basereg =3D 0}, aliases =3D 0x0, ranges =3D 0x0, hash_next =3D 0x0} (top-gdb) p *$179->sym[1] $183 =3D {ginfo =3D {name =3D 0x82f8a34 "argv", value =3D {ivalue =3D 12,= block =3D 0xc,=20 bytes =3D 0xc
, address =3D 12, chain = =3D 0xc},=20 language_specific =3D {cplus_specific =3D {demangled_name =3D 0x0}},= =20 language =3D language_c, section =3D 0, bfd_section =3D 0x0}, type = =3D 0x82faaf4, namespace =3D VAR_NAMESPACE, aclass =3D LOC_ARG, line =3D 4, aux_value = =3D { basereg =3D 0}, aliases =3D 0x0, ranges =3D 0x0, hash_next =3D 0x0} (top-gdb) p *$179->sym[2] $184 =3D {ginfo =3D {name =3D 0x82f8a70 "argc", value =3D {ivalue =3D 0, = block =3D 0x0,=20 bytes =3D 0x0, address =3D 0, chain =3D 0x0}, language_specific =3D= { cplus_specific =3D {demangled_name =3D 0x0}}, language =3D language= _c,=20 section =3D 0, bfd_section =3D 0x0}, type =3D 0x82df828,=20 namespace =3D VAR_NAMESPACE, aclass =3D LOC_REGISTER, line =3D 4, aux_v= alue =3D { basereg =3D 0}, aliases =3D 0x0, ranges =3D 0x0, hash_next =3D 0x0} (top-gdb) p *$179->sym[3] $185 =3D {ginfo =3D {name =3D 0x82f8aac "local", value =3D {ivalue =3D 13= 4517720,=20 block =3D 0x80493d8, bytes =3D 0x80493d8 "=C9\f", address =3D 13451= 7720,=20 chain =3D 0x80493d8}, language_specific =3D {cplus_specific =3D { demangled_name =3D 0x0}}, language =3D language_c, section =3D 14= ,=20 bfd_section =3D 0x0}, type =3D 0x82df828, namespace =3D VAR_NAMESPACE= ,=20 aclass =3D LOC_STATIC, line =3D 6, aux_value =3D {basereg =3D 0}, alias= es =3D 0x0,=20 ranges =3D 0x0, hash_next =3D 0x0} (top-gdb)=20 Hey! Why are there two entries for argc? (This is the extra tangle I was referring to. If you know all about this, you can stop reading now.) The two `argc' symbols have different address classes: one has an address class that indicates it's an argument, and the other doesn't. The argument symbol describes where the variable is passed on the stack (eight bytes after %ebp), whereas the non-argument symbol describes where the variable lives in the block of the function: register zero, or %eax. As a sanity check, let's look at the IA-32 code for main: (top-gdb) c Continuing. (gdb) disass main Dump of assembler code for function main: 0x8048340
: push %ebp 0x8048341 : mov %esp,%ebp 0x8048343 : sub $0x8,%esp 0x8048346 : mov 0x8(%ebp),%eax 0x8048349 : and $0xfffffff0,%esp 0x804834c : mov 0x80493d8,%edx 0x8048352 : movl $0x80483c8,(%esp,1) 0x8048359 : imul %edx,%eax 0x804835c : mov %eax,0x4(%esp,1) 0x8048360 : call 0x8048268 0x8048365 : mov %ebp,%esp 0x8048367 : pop %ebp 0x8048368 : ret=20=20=20=20 End of assembler dump. (gdb)=20 So, yes, the compiler did copy `argc' from the stack into %eax. Check. But *why* does GDB do this? I have no idea. It seems to me that, with prologue skipping et al, simply having a single LOC_REGPARM would be the Right Thing. I don't really know when GDB will prefer the argument entry, and when it'll prefer the non-argument entry. I suspect it's historical. If you look at the stabs spec, you'll see that it actually emits two stabs for arguments that are passed in one place, but get moved somewhere else: $ objdump --stabs func ... 329 FUN 0 5 08048340 12145 main:F(0,1) 330 PSYM 0 4 00000008 12157 argc:p(0,1) 331 PSYM 0 4 0000000c 12169 argv:p(1,1)=3D*(7,36) 332 SLINE 0 5 00000000 0=20=20=20=20=20=20 333 SLINE 0 7 0000000c 0=20=20=20=20=20=20 334 SLINE 0 8 00000025 0=20=20=20=20=20=20 335 RSYM 0 4 00000000 12189 argc:r(0,1) 336 STSYM 0 6 080493d8 12201 local:V(0,1) 337 LBRAC 0 0 0000000c 0=20=20=20=20=20=20 338 RBRAC 0 0 00000029 0=20=20=20=20=20=20 339 FUN 0 0 00000029 0=20=20=20=20=20=20 ... $=20 The PSYM accounts for the argument symbol, and the RSYM accounts for the internal symbol. A lot of GDB's data structures very closely match what's provided in STABS. (The partial symbol tables are a good example of this: they correspond exactly to the EXCL links.) But anyway, all this could be handled much better nowadays using Dwarf 2 CFA and location lists. I've been saying that for years, but it hasn't happened yet. Andrew has the CFI done now (I think?), and Daniel B. has submitted a patch for location expressions (but not location lists, tho they would be easy to add), but it's awaiting revision while he works on law school.