From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-8303-listarch-gdb=sourceware.cygnus.com@sources.redhat.com>
Received: (qmail 30866 invoked by alias); 6 Apr 2002 04:42:07 -0000
Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-subscribe@sources.redhat.com>
List-Archive: <http://sources.redhat.com/ml/gdb/>
List-Post: <mailto:gdb@sources.redhat.com>
List-Help: <mailto:gdb-help@sources.redhat.com>, <http://sources.redhat.com/ml/#faqs>
Sender: gdb-owner@sources.redhat.com
Received: (qmail 30856 invoked from network); 6 Apr 2002 04:42:05 -0000
Received: from unknown (HELO zwingli.cygnus.com) (208.245.165.35)
  by sources.redhat.com with SMTP; 6 Apr 2002 04:42:05 -0000
Received: by zwingli.cygnus.com (Postfix, from userid 442)
	id 245E45EA11; Fri,  5 Apr 2002 23:42:04 -0500 (EST)
From: Jim Blandy <jimb@redhat.com>
To: gdb@sources.redhat.com
Cc: Benjamin Kosnik <bkoz@redhat.com>,
	Daniel Berlin <dan@dberlin.org>
Subject: C++ nested classes, namespaces, structs, and compound statements 
Message-Id: <20020406044204.245E45EA11@zwingli.cygnus.com>
Date: Fri, 05 Apr 2002 20:42:00 -0000
X-SW-Source: 2002-04/txt/msg00072.txt.bz2


At the moment, GDB doesn't handle C++ namespaces or nested classes
very well.  I have a general idea of how we could address these
limitations, which I'd like to put up for shredding M-DEL discussion.

Let me admit up front that I don't really know C++, so I may be saying
stupid things.  Please set me straight if you notice something.

In C, structs are essentially lists of member names, types, and
locations (offsets from the structure's base address):

  struct S { int x; char y; struct T t; }

(Unions are just the same, except that the offsets are all zero.  That
relationship carries through the entire discussion here, so I'm not
going to talk about unions any more.)

If you think about it just right (or just wrong), this is really very
similar to the set of local variables associated with a compound
statement:

  {
    int x;
    char y;
    struct T t;

    ...
  }

As far as scoping is concerned, this compound statement is also just a
list of names, types, and locations.  The locations here are a bit
less restricted: whereas a struct's members' locations are all offsets
from the start of the struct, a compound statement's variables'
locations can be registers, regions of the stack frame, fixed
addresses (i.e., static variables), and so on.  But just as a struct
type divides up a block of storage into individual members with types,
a compound statement's local variables divide up a function
invocation's stack frame and registers into individual variables with
types.

The analogy isn't perfect, of course.  Structs don't enclose blocks of
code.  And a compound statement is less restricted: it can also
contain typedefs, definitions of struct and enum tags, and so on:

  {
    int x;
    char y;
    struct T t;
    struct L { int j, k; };
    typedef struct L L_t;

    ...
  }

Here the definitions of `struct L' and L_t are local to the compound
statement.  In structs, however, things behave differently: struct
tags defined within another struct have the same scope as the
containing struct; and you can't put typedefs in a struct at all.  So
structs are really very restricted with regards to what they can
contain.

However, C++ loosens a lot of these restrictions, generalizing structs
and classes until they really begin to look very much like compound
statements.  (The only difference between structs and classes in C++
is whether members are public by default.  So I'm not going to talk
about classes any more.)

For example, in C++, you can declare typedefs inside structs:

  $ cat local-typedef.C
  struct S
  {
    typedef int smootz;

    smootz a, b;
  };

  smootz c;
  $ $GccB/g++ -c local-typedef.C 
  local-typedef.C:8: 'smootz' is used as a type, but is not defined as a type.

The compiler accepts the definition of the typedef `smootz' and its
use within `struct S', but outside of S the typedef isn't visible.
Struct tags behave similarly.

You can also declare "static" struct members --- you can access them
with the `->' and `.' operators, just like ordinary members, but
they're actually variables at fixed addresses in the .data segment ---
much like a "static" variable in a C compound statement.  But this
means that a simple offset from a base address is no longer sufficient
to describe a struct's member's location --- you actually start
needing something like GDB's enum address_class.  Multiple inheritance
and virtual base classes introduce further complexity here.

There's another difference between compound statements and structs
goes away.  In C, you can only reference a struct's members using the
`.' and `->' operators, whereas you refer to a compound statement's
variables by simply naming them.  But in C++, a struct's member
functions can refer to the struct's members by simply naming them.
The struct's bindings become another rib in the search path for
identifier bindings.

In summary, the data structure GDB needs to represent C++ structs
(classes, unions, whatever) has a lot of similarities to the structure
GDB needs to represent the local variables of a compound statement.
They both need to carry bindings for several namespaces (ordinary
identifiers and structure tags).  The names can refer to any manner of
things: variables, functions, namespaces, base classes, and so on.
For variables, there are a variety of locations they might occupy.


So I would like to introduce to GDB a new type, `struct environment'
(or is `struct env' better?) which does about the same thing that the
`nsyms' and `sym' members of `struct block', and the `nfields' and
`fields' members of `struct type', do now: it's just a bunch of
bindings for names.  We would use `struct environment':

- in `struct block', to represent the block's local variables, replacing
  `nsyms' and `sym';
- in `struct type', to represent a struct's members, instead of
  `struct fields'; and
- in our representation for C++ namespaces, which seem pretty much
  like structs that can only contain static members and member
  functions (i.e., you can't ever create an instance of one).

There'd be a single set of functions for building `struct environment'
objects, and looking up bindings in them; you'd use it for variable
lookup, and in the `.' and `->' operators.  It could handle hashing,
when appropriate.

Basically, we would take two distinct areas of GDB (and a third,
namespaces, which we haven't implemented yet but will need to), and
support them all with a single structure and a single bunch of
support functions.  GDB would become easier to read.

As a half-baked idea, perhaps a `struct environment' object would have
a list of other `struct environment' objects you should search, too,
if you didn't find what you were looking for.  We could use this to
search a compound statement's enclosing scopes to find bindings
further out, find members inherited from base classes, and resolve C++
`using' declarations.

How does this strike people?

Open issues:

- This "list of other places to search" thing may be ill-formed.  I
  mean, sure, there are a set of similar behaviors going on there, but
  are they similar enough?  For example:
  - You need a frame to find a variable's value, but you need an
    object address to find a member's value.
  - If you find a member in a base class, then you will often
    need to adjust the object's base address in some way.
  - And what about ambiguous member names?
  Maybe these questions mean this `list of other places to search'
  can't be handled in a uniform way.

- What really happens when you start using `struct symbol' objects for
  structure members?  Do we need new address classes now for `offset
  from object base address'?  Does the LOC_COMPUTED idea I've been
  pushing still work?

- How do member functions work in this arrangement?  Virtual member
  functions?  Virtual base classes?

- How would we introduce this incrementally?