From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 30866 invoked by alias); 6 Apr 2002 04:42:07 -0000 Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sources.redhat.com Received: (qmail 30856 invoked from network); 6 Apr 2002 04:42:05 -0000 Received: from unknown (HELO zwingli.cygnus.com) (208.245.165.35) by sources.redhat.com with SMTP; 6 Apr 2002 04:42:05 -0000 Received: by zwingli.cygnus.com (Postfix, from userid 442) id 245E45EA11; Fri, 5 Apr 2002 23:42:04 -0500 (EST) From: Jim Blandy To: gdb@sources.redhat.com Cc: Benjamin Kosnik , Daniel Berlin Subject: C++ nested classes, namespaces, structs, and compound statements Message-Id: <20020406044204.245E45EA11@zwingli.cygnus.com> Date: Fri, 05 Apr 2002 20:42:00 -0000 X-SW-Source: 2002-04/txt/msg00072.txt.bz2 At the moment, GDB doesn't handle C++ namespaces or nested classes very well. I have a general idea of how we could address these limitations, which I'd like to put up for shredding M-DEL discussion. Let me admit up front that I don't really know C++, so I may be saying stupid things. Please set me straight if you notice something. In C, structs are essentially lists of member names, types, and locations (offsets from the structure's base address): struct S { int x; char y; struct T t; } (Unions are just the same, except that the offsets are all zero. That relationship carries through the entire discussion here, so I'm not going to talk about unions any more.) If you think about it just right (or just wrong), this is really very similar to the set of local variables associated with a compound statement: { int x; char y; struct T t; ... } As far as scoping is concerned, this compound statement is also just a list of names, types, and locations. The locations here are a bit less restricted: whereas a struct's members' locations are all offsets from the start of the struct, a compound statement's variables' locations can be registers, regions of the stack frame, fixed addresses (i.e., static variables), and so on. But just as a struct type divides up a block of storage into individual members with types, a compound statement's local variables divide up a function invocation's stack frame and registers into individual variables with types. The analogy isn't perfect, of course. Structs don't enclose blocks of code. And a compound statement is less restricted: it can also contain typedefs, definitions of struct and enum tags, and so on: { int x; char y; struct T t; struct L { int j, k; }; typedef struct L L_t; ... } Here the definitions of `struct L' and L_t are local to the compound statement. In structs, however, things behave differently: struct tags defined within another struct have the same scope as the containing struct; and you can't put typedefs in a struct at all. So structs are really very restricted with regards to what they can contain. However, C++ loosens a lot of these restrictions, generalizing structs and classes until they really begin to look very much like compound statements. (The only difference between structs and classes in C++ is whether members are public by default. So I'm not going to talk about classes any more.) For example, in C++, you can declare typedefs inside structs: $ cat local-typedef.C struct S { typedef int smootz; smootz a, b; }; smootz c; $ $GccB/g++ -c local-typedef.C local-typedef.C:8: 'smootz' is used as a type, but is not defined as a type. The compiler accepts the definition of the typedef `smootz' and its use within `struct S', but outside of S the typedef isn't visible. Struct tags behave similarly. You can also declare "static" struct members --- you can access them with the `->' and `.' operators, just like ordinary members, but they're actually variables at fixed addresses in the .data segment --- much like a "static" variable in a C compound statement. But this means that a simple offset from a base address is no longer sufficient to describe a struct's member's location --- you actually start needing something like GDB's enum address_class. Multiple inheritance and virtual base classes introduce further complexity here. There's another difference between compound statements and structs goes away. In C, you can only reference a struct's members using the `.' and `->' operators, whereas you refer to a compound statement's variables by simply naming them. But in C++, a struct's member functions can refer to the struct's members by simply naming them. The struct's bindings become another rib in the search path for identifier bindings. In summary, the data structure GDB needs to represent C++ structs (classes, unions, whatever) has a lot of similarities to the structure GDB needs to represent the local variables of a compound statement. They both need to carry bindings for several namespaces (ordinary identifiers and structure tags). The names can refer to any manner of things: variables, functions, namespaces, base classes, and so on. For variables, there are a variety of locations they might occupy. So I would like to introduce to GDB a new type, `struct environment' (or is `struct env' better?) which does about the same thing that the `nsyms' and `sym' members of `struct block', and the `nfields' and `fields' members of `struct type', do now: it's just a bunch of bindings for names. We would use `struct environment': - in `struct block', to represent the block's local variables, replacing `nsyms' and `sym'; - in `struct type', to represent a struct's members, instead of `struct fields'; and - in our representation for C++ namespaces, which seem pretty much like structs that can only contain static members and member functions (i.e., you can't ever create an instance of one). There'd be a single set of functions for building `struct environment' objects, and looking up bindings in them; you'd use it for variable lookup, and in the `.' and `->' operators. It could handle hashing, when appropriate. Basically, we would take two distinct areas of GDB (and a third, namespaces, which we haven't implemented yet but will need to), and support them all with a single structure and a single bunch of support functions. GDB would become easier to read. As a half-baked idea, perhaps a `struct environment' object would have a list of other `struct environment' objects you should search, too, if you didn't find what you were looking for. We could use this to search a compound statement's enclosing scopes to find bindings further out, find members inherited from base classes, and resolve C++ `using' declarations. How does this strike people? Open issues: - This "list of other places to search" thing may be ill-formed. I mean, sure, there are a set of similar behaviors going on there, but are they similar enough? For example: - You need a frame to find a variable's value, but you need an object address to find a member's value. - If you find a member in a base class, then you will often need to adjust the object's base address in some way. - And what about ambiguous member names? Maybe these questions mean this `list of other places to search' can't be handled in a uniform way. - What really happens when you start using `struct symbol' objects for structure members? Do we need new address classes now for `offset from object base address'? Does the LOC_COMPUTED idea I've been pushing still work? - How do member functions work in this arrangement? Virtual member functions? Virtual base classes? - How would we introduce this incrementally?