* Re: C++ nested classes, namespaces, structs, and compound statements
@ 2002-04-05 22:02 Michael Elizabeth Chastain
2002-04-05 22:13 ` Daniel Berlin
0 siblings, 1 reply; 37+ messages in thread
From: Michael Elizabeth Chastain @ 2002-04-05 22:02 UTC (permalink / raw)
To: gdb, jimb; +Cc: bkoz, dan
I'll bite.
> In summary, the data structure GDB needs to represent C++ structs
> (classes, unions, whatever) has a lot of similarities to the structure
> GDB needs to represent the local variables of a compound statement.
Sounds reasonable to me.
It also sounds dangerous. It's true that namespaces, structs,
and compound statements are all identifier binding contexts.
But if you start treating a struct as a type of compound statement
you could get into a maze of twisty forced meanings. You have to
reach down and create a new paradigm and then port both structs
and compound statements to it.
Think about how much context information an identifier-binding-object
needs to do its job. I think it would be difficult to come up with a
universal context object that both structs and compound statements
can use. Each identifier-binding-object has its own specialized
context requirements.
> - And what about ambiguous member names?
The C++ language spec says: if class A inherits from both class B and
class C, and both B and C have a member "foo_", then an unqualified
reference to a.foo_ is illegal. The programmer has to say a::B.foo_
or a::C.foo_.
The last time I checked, gdb just grabs one of the a.foo_ values and
uses it. I think it would be a lot better for gdb to enforce the
ambiguity rule.
What happens right now if 10 C source files have a static variable
named "i" and I say "print i" and I am not in any of those source
files at the moment? What *should* happen?
It's late ... I'm rambling.
Michael C
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-05 22:02 C++ nested classes, namespaces, structs, and compound statements Michael Elizabeth Chastain
@ 2002-04-05 22:13 ` Daniel Berlin
2002-04-05 22:30 ` Daniel Berlin
0 siblings, 1 reply; 37+ messages in thread
From: Daniel Berlin @ 2002-04-05 22:13 UTC (permalink / raw)
To: Michael Elizabeth Chastain; +Cc: gdb, jimb, bkoz
On Sat, 6 Apr 2002, Michael Elizabeth Chastain wrote:
> I'll bite.
>
> > In summary, the data structure GDB needs to represent C++ structs
> > (classes, unions, whatever) has a lot of similarities to the structure
> > GDB needs to represent the local variables of a compound statement.
>
> Sounds reasonable to me.
>
> It also sounds dangerous. It's true that namespaces, structs,
> and compound statements are all identifier binding contexts.
> But if you start treating a struct as a type of compound statement
> you could get into a maze of twisty forced meanings. You have to
> reach down and create a new paradigm and then port both structs
> and compound statements to it.
>
> Think about how much context information an identifier-binding-object
> needs to do its job. I think it would be difficult to come up with a
> universal context object that both structs and compound statements
> can use. Each identifier-binding-object has its own specialized
> context requirements.
>
> > - And what about ambiguous member names?
>
> The C++ language spec says: if class A inherits from both class B and
> class C, and both B and C have a member "foo_", then an unqualified
> reference to a.foo_ is illegal. The programmer has to say a::B.foo_
> or a::C.foo_.
Not quite.
Watch this cuteness, copied from the C++ draft standard:
struct U { static int i; };
struct V : U { };
struct W : U { using U::i; };
struct X : V, W { void foo(); };
void X::foo()
{
i; //Finds U::i in two ways: as W::i and U::i
// but no ambiguity because U::i is static
}
"A static member, a nested type or an enumerator defined in a base class T
can unambiguously be found even if an object has more than one base class
subobject of type T. Two base class subobjects share the non-static
member subobjects of their common virtual base classes"
In other words, it's not just statics.
Observe:
class V { public: int v; };
class A {
public:
int a;
static int s;
enum { e };
};
class B : public A, public virtual V {};
class C : public A, public virtual V {};
class D : public B, public C { };
void f(D* pd)
{
pd->v++; // ok: only one `v' (virtual)
pd->s++; // ok: only one `s' (static)
int i = pd->e; // ok: only one `e' (enumerator)
pd->a++; // error, ambiguous: two `a's in `D'
}
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-05 22:13 ` Daniel Berlin
@ 2002-04-05 22:30 ` Daniel Berlin
0 siblings, 0 replies; 37+ messages in thread
From: Daniel Berlin @ 2002-04-05 22:30 UTC (permalink / raw)
To: Daniel Berlin; +Cc: Michael Elizabeth Chastain, gdb, jimb, bkoz
> > > - And what about ambiguous member names?
> >
> > The C++ language spec says: if class A inherits from both class B and
> > class C, and both B and C have a member "foo_", then an unqualified
> > reference to a.foo_ is illegal. The programmer has to say a::B.foo_
> > or a::C.foo_.
>
> Not quite.
>
> Watch this cuteness, copied from the C++ draft standard:
>
> struct U { static int i; };
> struct V : U { };
> struct W : U { using U::i; };
> struct X : V, W { void foo(); };
> void X::foo()
> {
> i; //Finds U::i in two ways: as W::i and U::i
> // but no ambiguity because U::i is static
>
> }
>
>
> "A static member, a nested type or an enumerator defined in a base class T
> can unambiguously be found even if an object has more than one base class
> subobject of type T. Two base class subobjects share the non-static
> member subobjects of their common virtual base classes"
>
> In other words, it's not just statics.
> Observe:
>
> class V { public: int v; };
> class A {
> public:
> int a;
> static int s;
> enum { e };
> };
> class B : public A, public virtual V {};
> class C : public A, public virtual V {};
> class D : public B, public C { };
>
> void f(D* pd)
> {
> pd->v++; // ok: only one `v' (virtual)
> pd->s++; // ok: only one `s' (static)
> int i = pd->e; // ok: only one `e' (enumerator)
> pd->a++; // error, ambiguous: two `a's in `D'
> }
>
>
>
>
I forgot to say, in general, you can come up with enough crazy lookup
requirements for each language we support/want to support that it just
makes sense to have the lookup function (by lookup function I mean
whatever you call when you see the "." or "->" to try to get a symbol
out of the environment) be a function pointer, filled in by whatever
created the environment.
For GDB internal symbols (IE $a = 5), we have a simple lookup.
For the "global, all enclosing" environment, you probably want a lookup
function that figures out the environment of the symbol you are trying
to access, and then just looks there.
None of these should have to care about what language that environment
is, they should just get the "right" answer.
I'm saying whatever thing creates the environment and puts symbols in it
knows what language these symbols represent, and thus, can best set the
lookup function.
For the hypothetical mixed language environments (IE you find some way
to embed java and C++ in the *same* frame or something, such that you
need to install multiple language symbols in a single environment, they
really are in the same exact scope), you'd just have an environment with
two sub environments, and a lookup function that looked at the current
language setting/something else to determine which one to look in first.
--Dan
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-16 14:52 ` Jim Blandy
@ 2002-04-16 14:58 ` Daniel Jacobowitz
0 siblings, 0 replies; 37+ messages in thread
From: Daniel Jacobowitz @ 2002-04-16 14:58 UTC (permalink / raw)
To: Jim Blandy; +Cc: gdb
On Tue, Apr 16, 2002 at 04:46:27PM -0500, Jim Blandy wrote:
>
> Daniel Jacobowitz <drow@mvista.com> writes:
> > > Okay, I think I see. You're preserving the distinctions implicit in
> > > the existing structures (fields and symbols are separate),
> > > distinguishing types from symbols (i.e. an entry for a typedef would
> > > be an environment_entry whose kind == type_kind, instead of a symbol
> > > with an address class of LOC_TYPEDEF), and positing that namespaces
> > > would be a fourth kind of thing. The `data' field would point to a
> > > `struct type' or a `struct field', or whatever.
> >
> > Yes, that's right. There's also transparent scopes (which might be a
> > special kind of namespace... or not). By that I mean {} enclosed
> > regions with their own local variables. A function belongs to a
> > namespace, a namespace does not enclose a particular range of PCs - but
> > a scope does enclose a particular PC range. Hopefully but not
> > necessarily a single contiguous range. Optimization or explicit
> > .section directives could break it up.
>
> At the moment GDB assumes they're contiguous. (Of course.) Dwarf 3
> allows one to describe lexical blocks that occupy discontinuous
> address ranges, but we don't read that. (Of course.)
Of course :)
> But why would lexical blocks occur in an environment? They don't
> generally have names. Functions do, but I would say a function "has
> a" lexical block, rather than saying it "is a" lexical block.
A function can have local types (in GNU C, and possibly in standard C++,
etc.). It also contains lexical blocks with no names.
int
foo()
{
typedef int x;
return (x) 1;
}
The type 'x' is local to foo() in this example. The DWARF-2
information supports this interpretation.
--
Daniel Jacobowitz Carnegie Mellon University
MontaVista Software Debian GNU/Linux Developer
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-12 16:56 ` Daniel Jacobowitz
2002-04-16 12:08 ` Jim Blandy
@ 2002-04-16 14:52 ` Jim Blandy
2002-04-16 14:58 ` Daniel Jacobowitz
1 sibling, 1 reply; 37+ messages in thread
From: Jim Blandy @ 2002-04-16 14:52 UTC (permalink / raw)
To: Daniel Jacobowitz; +Cc: gdb
Daniel Jacobowitz <drow@mvista.com> writes:
> > Okay, I think I see. You're preserving the distinctions implicit in
> > the existing structures (fields and symbols are separate),
> > distinguishing types from symbols (i.e. an entry for a typedef would
> > be an environment_entry whose kind == type_kind, instead of a symbol
> > with an address class of LOC_TYPEDEF), and positing that namespaces
> > would be a fourth kind of thing. The `data' field would point to a
> > `struct type' or a `struct field', or whatever.
>
> Yes, that's right. There's also transparent scopes (which might be a
> special kind of namespace... or not). By that I mean {} enclosed
> regions with their own local variables. A function belongs to a
> namespace, a namespace does not enclose a particular range of PCs - but
> a scope does enclose a particular PC range. Hopefully but not
> necessarily a single contiguous range. Optimization or explicit
> .section directives could break it up.
At the moment GDB assumes they're contiguous. (Of course.) Dwarf 3
allows one to describe lexical blocks that occupy discontinuous
address ranges, but we don't read that. (Of course.)
But why would lexical blocks occur in an environment? They don't
generally have names. Functions do, but I would say a function "has
a" lexical block, rather than saying it "is a" lexical block.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-16 12:08 ` Jim Blandy
@ 2002-04-16 14:01 ` Daniel Jacobowitz
0 siblings, 0 replies; 37+ messages in thread
From: Daniel Jacobowitz @ 2002-04-16 14:01 UTC (permalink / raw)
To: Jim Blandy; +Cc: gdb
On Tue, Apr 16, 2002 at 02:07:22PM -0500, Jim Blandy wrote:
>
> Daniel Jacobowitz <drow@mvista.com> writes:
> > > > Doing it for struct symbol would be a good idea, I think, but a better
> > > > approach would be:
> > > > - start the environments properly, using a new enum.
> > > > - Separate out those things which need to be "different kinds of
> > > > struct symbol", and keep the factoring at the environment level.
> > > > - Look up environment entries, not struct symbol's. That way we can
> > > > have a hope of keeping the right names attached to types, for
> > > > instance.
> > >
> > > By the last point here, are you suggesting that everyone hand around
> > > pointers to `struct environment_entry' objects, rather than pointers
> > > to `struct type', `struct field', etc.? That would lose some
> > > typechecking, and some clarity. If space is the concern, I think I'd
> > > rather see both the environment entry and the symbol/field/etc. have
> > > `name' fields, that perhaps point to the same string.
> >
> > There's a question of correctness, though. Suppose a type is imported
> > into a namespace - we don't want to create a new type for it, but we do
> > want to create a new name for it. I'm not sure what to do.
>
> You mean, imported via `using A::t', or via `using namespace A', where
> `A' binds `t' to a type? I guess I don't see the problem; could you
> be more explicit?
Well, neither of those exactly - in those cases the type is still named
't'. The only interesting problem with those examples is whether the
type is still named A::t (as opposed to new_namespace::t) in that case;
I'm not sure what C++ says about that.
I was talking about this example from Daniel Berlin:
#include <string>
using namespace bob = std;
bob::string a;
It doesn't seem to be the problem I thought it was, since it only
happens to namespaces and not to namespace members like types. So this
is not a real problem.
--
Daniel Jacobowitz Carnegie Mellon University
MontaVista Software Debian GNU/Linux Developer
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-12 16:56 ` Daniel Jacobowitz
@ 2002-04-16 12:08 ` Jim Blandy
2002-04-16 14:01 ` Daniel Jacobowitz
2002-04-16 14:52 ` Jim Blandy
1 sibling, 1 reply; 37+ messages in thread
From: Jim Blandy @ 2002-04-16 12:08 UTC (permalink / raw)
To: Daniel Jacobowitz; +Cc: gdb
Daniel Jacobowitz <drow@mvista.com> writes:
> > > Doing it for struct symbol would be a good idea, I think, but a better
> > > approach would be:
> > > - start the environments properly, using a new enum.
> > > - Separate out those things which need to be "different kinds of
> > > struct symbol", and keep the factoring at the environment level.
> > > - Look up environment entries, not struct symbol's. That way we can
> > > have a hope of keeping the right names attached to types, for
> > > instance.
> >
> > By the last point here, are you suggesting that everyone hand around
> > pointers to `struct environment_entry' objects, rather than pointers
> > to `struct type', `struct field', etc.? That would lose some
> > typechecking, and some clarity. If space is the concern, I think I'd
> > rather see both the environment entry and the symbol/field/etc. have
> > `name' fields, that perhaps point to the same string.
>
> There's a question of correctness, though. Suppose a type is imported
> into a namespace - we don't want to create a new type for it, but we do
> want to create a new name for it. I'm not sure what to do.
You mean, imported via `using A::t', or via `using namespace A', where
`A' binds `t' to a type? I guess I don't see the problem; could you
be more explicit?
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-12 13:58 ` Jim Blandy
@ 2002-04-12 16:56 ` Daniel Jacobowitz
2002-04-16 12:08 ` Jim Blandy
2002-04-16 14:52 ` Jim Blandy
0 siblings, 2 replies; 37+ messages in thread
From: Daniel Jacobowitz @ 2002-04-12 16:56 UTC (permalink / raw)
To: gdb
On Fri, Apr 12, 2002 at 03:58:21PM -0500, Jim Blandy wrote:
>
> Daniel Jacobowitz <drow@mvista.com> writes:
> > On Wed, Apr 10, 2002 at 12:31:27PM -0500, Jim Blandy wrote:
> > > Daniel Jacobowitz <drow@mvista.com> writes:
> > > > Sure. But I think this is a chance (if we want one) to move in a
> > > > different direction. We'd have to work out the details, but I envision
> > > > something like this (names made up as I go along):
> > > >
> > > > struct environment_entry {
> > > > const char *name;
> > > > enum name_type kind;
> > > > void *data;
> > > > }
> > > >
> > > > enum name_type {
> > > > type_kind,
> > > > field_kind,
> > > > symbol_kind,
> > > > namespace_kind,
> > > > };
> > >
> > > In other words, replace the sloppy union with a properly discriminated
> > > union? I'm for it.
> > >
> > > But granted that it's important to clearly distinguish between the
> > > expanding set of uses we're putting `struct symbol' to, and that
> > > extending enum address_class isn't the best idea, how is it better to
> > > make this change concurrently with the enclosing environment changes?
> > > We could do this change right now. Isn't it basically independent?
> >
> > Well, no. I was suggesting this for things that are not currently in
> > symbols (well, types generally are...). But namespaces are not
> > represented at all and fields are in a different structure entirely.
>
> Okay, I think I see. You're preserving the distinctions implicit in
> the existing structures (fields and symbols are separate),
> distinguishing types from symbols (i.e. an entry for a typedef would
> be an environment_entry whose kind == type_kind, instead of a symbol
> with an address class of LOC_TYPEDEF), and positing that namespaces
> would be a fourth kind of thing. The `data' field would point to a
> `struct type' or a `struct field', or whatever.
Yes, that's right. There's also transparent scopes (which might be a
special kind of namespace... or not). By that I mean {} enclosed
regions with their own local variables. A function belongs to a
namespace, a namespace does not enclose a particular range of PCs - but
a scope does enclose a particular PC range. Hopefully but not
necessarily a single contiguous range. Optimization or explicit
.section directives could break it up.
>
> > Doing it for struct symbol would be a good idea, I think, but a better
> > approach would be:
> > - start the environments properly, using a new enum.
> > - Separate out those things which need to be "different kinds of
> > struct symbol", and keep the factoring at the environment level.
> > - Look up environment entries, not struct symbol's. That way we can
> > have a hope of keeping the right names attached to types, for
> > instance.
>
> By the last point here, are you suggesting that everyone hand around
> pointers to `struct environment_entry' objects, rather than pointers
> to `struct type', `struct field', etc.? That would lose some
> typechecking, and some clarity. If space is the concern, I think I'd
> rather see both the environment entry and the symbol/field/etc. have
> `name' fields, that perhaps point to the same string.
There's a question of correctness, though. Suppose a type is imported
into a namespace - we don't want to create a new type for it, but we do
want to create a new name for it. I'm not sure what to do.
--
Daniel Jacobowitz Carnegie Mellon University
MontaVista Software Debian GNU/Linux Developer
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-12 15:08 ` Jim Blandy
@ 2002-04-12 16:32 ` Daniel Jacobowitz
0 siblings, 0 replies; 37+ messages in thread
From: Daniel Jacobowitz @ 2002-04-12 16:32 UTC (permalink / raw)
To: gdb
On Fri, Apr 12, 2002 at 05:08:00PM -0500, Jim Blandy wrote:
>
> Daniel Jacobowitz <drow@mvista.com> writes:
> > As much of a pain as they are, I recommend a CVS branch for this. Then
> > we can see how it comes together with some history and a little less
> > destabilization. We still need to know where we're going first, of
> > course.
>
> So you're suggesting that we do all the work first on a branch, and
> then once we've got that the way we want, we merge it piece-wise into
> the trunk?
Probably wisest.
> > > a) The symbol table stores names either way: with an explicit
> > > namespace tree, or with qualified names sitting directly in the
> > > symbol table. (When I say "namespace", please understand that to
> > > also include classes, etc.) Any given symbol is stored only one
> > > way or the other, but any given symbol table can hold a mix of
> > > symbols in each form. Symbols stored in the explicit tree would
> > > have a `fully_qualified_name' field, so symtab clients expecting
> > > to see fully qualified names would still get them.
> >
> > OK so far... we might want to take the path of least resistence, leave
> > the name fully qualified, and add an unqualified_name.
>
> Sure.
I think this is the way to go, then.
> > I like it. Who wants to start? :) We probably want to start with
> > interfaces, and then see where we need to go from there.
>
> If I write up a concrete proposal for this, I think my keepers will
> let me spend time on it. Sure, let's draft some interfaces.
Great!
--
Daniel Jacobowitz Carnegie Mellon University
MontaVista Software Debian GNU/Linux Developer
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-09 20:56 ` Daniel Jacobowitz
@ 2002-04-12 15:08 ` Jim Blandy
2002-04-12 16:32 ` Daniel Jacobowitz
0 siblings, 1 reply; 37+ messages in thread
From: Jim Blandy @ 2002-04-12 15:08 UTC (permalink / raw)
To: Daniel Jacobowitz; +Cc: gdb
Daniel Jacobowitz <drow@mvista.com> writes:
> As much of a pain as they are, I recommend a CVS branch for this. Then
> we can see how it comes together with some history and a little less
> destabilization. We still need to know where we're going first, of
> course.
So you're suggesting that we do all the work first on a branch, and
then once we've got that the way we want, we merge it piece-wise into
the trunk?
> > a) The symbol table stores names either way: with an explicit
> > namespace tree, or with qualified names sitting directly in the
> > symbol table. (When I say "namespace", please understand that to
> > also include classes, etc.) Any given symbol is stored only one
> > way or the other, but any given symbol table can hold a mix of
> > symbols in each form. Symbols stored in the explicit tree would
> > have a `fully_qualified_name' field, so symtab clients expecting
> > to see fully qualified names would still get them.
>
> OK so far... we might want to take the path of least resistence, leave
> the name fully qualified, and add an unqualified_name.
Sure.
> > b) The object representing a namespace keeps around the prefix it
> > corresponds to (`std::' or `A::B::' or whatever), so that lookups of
> > single name components relative to that namespace can find entries
> > stored in either form.
> >
> > c) For backwards compatibility, the symbol lookup function would check
> > for `::' in symbol names, and do a component-by-component lookup.
>
> We might also want to check for '.', as per Java (in existing gcj
> versions, at least).
Yes, that's true.
> > - Once the producers are all creating data in the new style, remove
> > support for it. Now you've got your new data structure, used as an
> > opaque datatype.
>
> And hopefully we'd reach this step, rather than being left with the
> mess in the middle.
Hopefully! :)
> I like it. Who wants to start? :) We probably want to start with
> interfaces, and then see where we need to go from there.
If I write up a concrete proposal for this, I think my keepers will
let me spend time on it. Sure, let's draft some interfaces.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-10 12:08 ` Daniel Jacobowitz
@ 2002-04-12 13:58 ` Jim Blandy
2002-04-12 16:56 ` Daniel Jacobowitz
0 siblings, 1 reply; 37+ messages in thread
From: Jim Blandy @ 2002-04-12 13:58 UTC (permalink / raw)
To: Daniel Jacobowitz; +Cc: gdb
Daniel Jacobowitz <drow@mvista.com> writes:
> On Wed, Apr 10, 2002 at 12:31:27PM -0500, Jim Blandy wrote:
> > Daniel Jacobowitz <drow@mvista.com> writes:
> > > Sure. But I think this is a chance (if we want one) to move in a
> > > different direction. We'd have to work out the details, but I envision
> > > something like this (names made up as I go along):
> > >
> > > struct environment_entry {
> > > const char *name;
> > > enum name_type kind;
> > > void *data;
> > > }
> > >
> > > enum name_type {
> > > type_kind,
> > > field_kind,
> > > symbol_kind,
> > > namespace_kind,
> > > };
> >
> > In other words, replace the sloppy union with a properly discriminated
> > union? I'm for it.
> >
> > But granted that it's important to clearly distinguish between the
> > expanding set of uses we're putting `struct symbol' to, and that
> > extending enum address_class isn't the best idea, how is it better to
> > make this change concurrently with the enclosing environment changes?
> > We could do this change right now. Isn't it basically independent?
>
> Well, no. I was suggesting this for things that are not currently in
> symbols (well, types generally are...). But namespaces are not
> represented at all and fields are in a different structure entirely.
Okay, I think I see. You're preserving the distinctions implicit in
the existing structures (fields and symbols are separate),
distinguishing types from symbols (i.e. an entry for a typedef would
be an environment_entry whose kind == type_kind, instead of a symbol
with an address class of LOC_TYPEDEF), and positing that namespaces
would be a fourth kind of thing. The `data' field would point to a
`struct type' or a `struct field', or whatever.
> Doing it for struct symbol would be a good idea, I think, but a better
> approach would be:
> - start the environments properly, using a new enum.
> - Separate out those things which need to be "different kinds of
> struct symbol", and keep the factoring at the environment level.
> - Look up environment entries, not struct symbol's. That way we can
> have a hope of keeping the right names attached to types, for
> instance.
By the last point here, are you suggesting that everyone hand around
pointers to `struct environment_entry' objects, rather than pointers
to `struct type', `struct field', etc.? That would lose some
typechecking, and some clarity. If space is the concern, I think I'd
rather see both the environment entry and the symbol/field/etc. have
`name' fields, that perhaps point to the same string.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-10 12:31 ` Daniel Berlin
@ 2002-04-10 12:53 ` Petr Sorfa
0 siblings, 0 replies; 37+ messages in thread
From: Petr Sorfa @ 2002-04-10 12:53 UTC (permalink / raw)
To: Daniel Berlin; +Cc: Jim Blandy, gdb, Benjamin Kosnik, Daniel Berlin
Hi Daniel,
> > Petr Sorfa <petrs@caldera.com> writes:
> > > I've implemented FORTRAN95 MODULE support which is essentially
> > > equivalent to namespaces (except you cannot have nested MODULEs.) I
> > > treat it internally as a static class. For scoping issues I simply add
> > > (in DWARF) the current local symbols to the MODULE to the local symbols
> > > of the PROGRAM, CONTAINS, SUBROUTINE and FUNCTION scopes. A similar kind
> > > of approach will allow nested C++ namespaces (flame bait comment.)
> >
> > I'm not sure I understand your implementation. (And I'm sure I don't
> > understand FORTRAN...) So, when some program construct imports a
> > module, you actually repeat the declarations for the imported module's
> > contents in the debug info for the importing construct?
> >
>
> And if so, isn't the memory usage absurd for large programs?
No. The compiler generates a DW_TAG_imported_declaration for module
contents which basically consists of DW_AT_import attributes that
provide dwarf refs to the actual module contents (and internal
structure.) I've extended support for these tags and attributes. It also
provides support for DW_AT_ref_addr outside of the current compilation
unit with external modules.
I guess I didn't explain it too well, it really adds the scope of the
module (as defined by DW_TAG_imported_declaration) to the current scope
hierarchy (rather than recreating the information as my initial
statement implies.)
Petr
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-10 10:34 ` Jim Blandy
@ 2002-04-10 12:31 ` Daniel Berlin
2002-04-10 12:53 ` Petr Sorfa
0 siblings, 1 reply; 37+ messages in thread
From: Daniel Berlin @ 2002-04-10 12:31 UTC (permalink / raw)
To: Jim Blandy; +Cc: Petr Sorfa, gdb, Benjamin Kosnik, Daniel Berlin
On 10 Apr 2002, Jim Blandy wrote:
>
> Petr Sorfa <petrs@caldera.com> writes:
> > I've implemented FORTRAN95 MODULE support which is essentially
> > equivalent to namespaces (except you cannot have nested MODULEs.) I
> > treat it internally as a static class. For scoping issues I simply add
> > (in DWARF) the current local symbols to the MODULE to the local symbols
> > of the PROGRAM, CONTAINS, SUBROUTINE and FUNCTION scopes. A similar kind
> > of approach will allow nested C++ namespaces (flame bait comment.)
>
> I'm not sure I understand your implementation. (And I'm sure I don't
> understand FORTRAN...) So, when some program construct imports a
> module, you actually repeat the declarations for the imported module's
> contents in the debug info for the importing construct?
>
And if so, isn't the memory usage absurd for large programs?
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-10 10:31 ` Jim Blandy
@ 2002-04-10 12:08 ` Daniel Jacobowitz
2002-04-12 13:58 ` Jim Blandy
0 siblings, 1 reply; 37+ messages in thread
From: Daniel Jacobowitz @ 2002-04-10 12:08 UTC (permalink / raw)
To: Jim Blandy; +Cc: gdb
On Wed, Apr 10, 2002 at 12:31:27PM -0500, Jim Blandy wrote:
>
> Daniel Jacobowitz <drow@mvista.com> writes:
> > Sure. But I think this is a chance (if we want one) to move in a
> > different direction. We'd have to work out the details, but I envision
> > something like this (names made up as I go along):
> >
> > struct environment_entry {
> > const char *name;
> > enum name_type kind;
> > void *data;
> > }
> >
> > enum name_type {
> > type_kind,
> > field_kind,
> > symbol_kind,
> > namespace_kind,
> > };
>
> In other words, replace the sloppy union with a properly discriminated
> union? I'm for it.
>
> But granted that it's important to clearly distinguish between the
> expanding set of uses we're putting `struct symbol' to, and that
> extending enum address_class isn't the best idea, how is it better to
> make this change concurrently with the enclosing environment changes?
> We could do this change right now. Isn't it basically independent?
Well, no. I was suggesting this for things that are not currently in
symbols (well, types generally are...). But namespaces are not
represented at all and fields are in a different structure entirely.
Doing it for struct symbol would be a good idea, I think, but a better
approach would be:
- start the environments properly, using a new enum.
- Separate out those things which need to be "different kinds of
struct symbol", and keep the factoring at the environment level.
- Look up environment entries, not struct symbol's. That way we can
have a hope of keeping the right names attached to types, for
instance.
> Getting too technical for this point in the discussion: I like doing
> subclassing of structs in C like this:
>
> struct environment_entry {
> const char *name;
> enum name_type kind;
> };
>
> struct field_entry {
> struct environment_entry env;
> enum field_visibility visibility;
> struct type *type;
> ...
> };
>
> Since C guarantees that a pointer to a struct can be safely converted
> to a pointer to its first member and back, this is okay. And while
> going from superclass to subclass still isn't typesafe, going from
> subclass to superclass is. (The down-casting should be hidden in a
> function which also checks the tag.)
>
> But this is just bikeshedding. I like your basic idea, however one
> implements it.
I actually have a significant gripe with this technique. If we're
going to do it, we should use accessor functions (inline or macroized,
please...) in both directions. It's very confusing when you see such
a thing to have to go check the definition - "is that the first member?
Is this reversible?"
--
Daniel Jacobowitz Carnegie Mellon University
MontaVista Software Debian GNU/Linux Developer
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-09 6:55 ` Petr Sorfa
@ 2002-04-10 10:34 ` Jim Blandy
2002-04-10 12:31 ` Daniel Berlin
0 siblings, 1 reply; 37+ messages in thread
From: Jim Blandy @ 2002-04-10 10:34 UTC (permalink / raw)
To: Petr Sorfa; +Cc: gdb, Benjamin Kosnik, Daniel Berlin
Petr Sorfa <petrs@caldera.com> writes:
> I've implemented FORTRAN95 MODULE support which is essentially
> equivalent to namespaces (except you cannot have nested MODULEs.) I
> treat it internally as a static class. For scoping issues I simply add
> (in DWARF) the current local symbols to the MODULE to the local symbols
> of the PROGRAM, CONTAINS, SUBROUTINE and FUNCTION scopes. A similar kind
> of approach will allow nested C++ namespaces (flame bait comment.)
I'm not sure I understand your implementation. (And I'm sure I don't
understand FORTRAN...) So, when some program construct imports a
module, you actually repeat the declarations for the imported module's
contents in the debug info for the importing construct?
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-08 18:49 ` Daniel Jacobowitz
@ 2002-04-10 10:31 ` Jim Blandy
2002-04-10 12:08 ` Daniel Jacobowitz
0 siblings, 1 reply; 37+ messages in thread
From: Jim Blandy @ 2002-04-10 10:31 UTC (permalink / raw)
To: Daniel Jacobowitz; +Cc: gdb, Benjamin Kosnik, Daniel Berlin
Daniel Jacobowitz <drow@mvista.com> writes:
> Sure. But I think this is a chance (if we want one) to move in a
> different direction. We'd have to work out the details, but I envision
> something like this (names made up as I go along):
>
> struct environment_entry {
> const char *name;
> enum name_type kind;
> void *data;
> }
>
> enum name_type {
> type_kind,
> field_kind,
> symbol_kind,
> namespace_kind,
> };
In other words, replace the sloppy union with a properly discriminated
union? I'm for it.
But granted that it's important to clearly distinguish between the
expanding set of uses we're putting `struct symbol' to, and that
extending enum address_class isn't the best idea, how is it better to
make this change concurrently with the enclosing environment changes?
We could do this change right now. Isn't it basically independent?
Getting too technical for this point in the discussion: I like doing
subclassing of structs in C like this:
struct environment_entry {
const char *name;
enum name_type kind;
};
struct field_entry {
struct environment_entry env;
enum field_visibility visibility;
struct type *type;
...
};
Since C guarantees that a pointer to a struct can be safely converted
to a pointer to its first member and back, this is okay. And while
going from superclass to subclass still isn't typesafe, going from
subclass to superclass is. (The down-casting should be hidden in a
function which also checks the tag.)
But this is just bikeshedding. I like your basic idea, however one
implements it.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-09 18:35 ` Jim Blandy
@ 2002-04-09 20:56 ` Daniel Jacobowitz
2002-04-12 15:08 ` Jim Blandy
0 siblings, 1 reply; 37+ messages in thread
From: Daniel Jacobowitz @ 2002-04-09 20:56 UTC (permalink / raw)
To: Jim Blandy; +Cc: gdb
I'm going to answer out-of-order a little bit, so pardon the
cut-n-paste butchery.
> I feel like we're planning a construction project like Boston's Big
> Dig or something: everything's got to keep running while we do the
> work, and things will get pretty ugly in there for a while, but we
> hope (if our funding doesn't run out) that in the end it'll all be
> beautiful.
>
As much of a pain as they are, I recommend a CVS branch for this. Then
we can see how it comes together with some history and a little less
destabilization. We still need to know where we're going first, of
course.
On Tue, Apr 09, 2002 at 08:35:52PM -0500, Jim Blandy wrote:
>
> Ah, this is exactly the kind of debate I was looking for. :)
>
> Daniel Jacobowitz <drow@mvista.com> writes:
> > But let me put my cynic's cap on for a moment and point out some
> > problems. I'd love to see us just decide to overcome them all, and I
> > think it's viable, but we need to make sure we consider them first.
> >
> > The "incremental change" Problem
>
> This I especially suck at. So folks, please insist on getting an
> answer you're really comfortable with.
>
> Here's one way we could approach it:
>
> - First, we could simply replace the `nsyms' and `syms' members of
> `struct block' with a reference to our opaque `environment' object.
> There isn't that much code which works on those directly, so this
> wouldn't be too bad.
And most of it will be very easy to find, because I went through all of
it quite recently. Remember my tries last October or so to change syms
from a list to a hash table? I've still got all that code, and while
none of it went in, at least I cleaned up every access to those members
I can find. Everything that iterates over them (with perhaps one
exception in the COFF code, IIRC) uses ALL_BLOCK_SYMBOLS. All accesses
go through the proper macros.
> - Next, we could replace the static and global blocks with `environment'
> objects, too.
>
> At this point, our environment object would be known to work. *ahem*
>
> - Make accessing a symbol's name go through an accessor function. It
> goes through a macro already, but we'd have to make sure it *always*
> goes through the macro. (Renaming the structure member and tweaking
> the macro accordingly would help us find code which doesn't go
> through the macro.)
>
> - Then we could go through an intermediate phase where things worked
> like this:
>
> a) The symbol table stores names either way: with an explicit
> namespace tree, or with qualified names sitting directly in the
> symbol table. (When I say "namespace", please understand that to
> also include classes, etc.) Any given symbol is stored only one
> way or the other, but any given symbol table can hold a mix of
> symbols in each form. Symbols stored in the explicit tree would
> have a `fully_qualified_name' field, so symtab clients expecting
> to see fully qualified names would still get them.
OK so far... we might want to take the path of least resistence, leave
the name fully qualified, and add an unqualified_name.
> b) The object representing a namespace keeps around the prefix it
> corresponds to (`std::' or `A::B::' or whatever), so that lookups of
> single name components relative to that namespace can find entries
> stored in either form.
>
> c) For backwards compatibility, the symbol lookup function would check
> for `::' in symbol names, and do a component-by-component lookup.
We might also want to check for '.', as per Java (in existing gcj
versions, at least).
> Then, we could gradually do the following (some of these are
> interdependent, some not):
>
> - Change symbol table clients to call a function to print a symbol's
> qualified name relative to the current scope, rather than expecting
> to see a fully qualified name in the symbol structure itself. This
> would make b) unnecessary.
>
> - Change symbol table clients to do lookups one component at a time,
> making c) unnecessary.
>
> - Change symbol table readers to build explicit namespace trees,
> rather than dumping qualified names into the symbol table. This
> would make a) unnecessary.
>
> Now we've got symbol lookups switched over. Given the new
> representation, we can implement namespaces in a straightforward way.
>
> But what about structs? I don't have enough of a grasp on how data
> members, member functions, static members, etc. really work now to say
> how we'd switch struct types over to the new representation, but it
> seems like the same general approach should work:
>
> - Gradually replace code which manipulates the type structures
> directly with simple accessor functions, until the type can be made
> opaque.
They mostly are, already.
> - Switch to an intermediate representation which allows both the old
> and the new representations, mixed.
Not for a given type, certainly. So not quite the same as with
symbols; a type would be one or the other.
> - Migrate clients and producers over to the newer interfaces. This is
> now a set of independent changes, that can be done in any order.
>
> - Once the producers are all creating data in the new style, remove
> support for it. Now you've got your new data structure, used as an
> opaque datatype.
And hopefully we'd reach this step, rather than being left with the
mess in the middle.
I like it. Who wants to start? :) We probably want to start with
interfaces, and then see where we need to go from there.
--
Daniel Jacobowitz Carnegie Mellon University
MontaVista Software Debian GNU/Linux Developer
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-08 18:59 ` Daniel Jacobowitz
@ 2002-04-09 18:35 ` Jim Blandy
2002-04-09 20:56 ` Daniel Jacobowitz
0 siblings, 1 reply; 37+ messages in thread
From: Jim Blandy @ 2002-04-09 18:35 UTC (permalink / raw)
To: Daniel Jacobowitz; +Cc: gdb, Benjamin Kosnik, Daniel Berlin
Ah, this is exactly the kind of debate I was looking for. :)
Daniel Jacobowitz <drow@mvista.com> writes:
> But let me put my cynic's cap on for a moment and point out some
> problems. I'd love to see us just decide to overcome them all, and I
> think it's viable, but we need to make sure we consider them first.
>
> The "incremental change" Problem
This I especially suck at. So folks, please insist on getting an
answer you're really comfortable with.
Here's one way we could approach it:
- First, we could simply replace the `nsyms' and `syms' members of
`struct block' with a reference to our opaque `environment' object.
There isn't that much code which works on those directly, so this
wouldn't be too bad.
- Next, we could replace the static and global blocks with `environment'
objects, too.
At this point, our environment object would be known to work. *ahem*
- Make accessing a symbol's name go through an accessor function. It
goes through a macro already, but we'd have to make sure it *always*
goes through the macro. (Renaming the structure member and tweaking
the macro accordingly would help us find code which doesn't go
through the macro.)
- Then we could go through an intermediate phase where things worked
like this:
a) The symbol table stores names either way: with an explicit
namespace tree, or with qualified names sitting directly in the
symbol table. (When I say "namespace", please understand that to
also include classes, etc.) Any given symbol is stored only one
way or the other, but any given symbol table can hold a mix of
symbols in each form. Symbols stored in the explicit tree would
have a `fully_qualified_name' field, so symtab clients expecting
to see fully qualified names would still get them.
b) The object representing a namespace keeps around the prefix it
corresponds to (`std::' or `A::B::' or whatever), so that lookups of
single name components relative to that namespace can find entries
stored in either form.
c) For backwards compatibility, the symbol lookup function would check
for `::' in symbol names, and do a component-by-component lookup.
Then, we could gradually do the following (some of these are
interdependent, some not):
- Change symbol table clients to call a function to print a symbol's
qualified name relative to the current scope, rather than expecting
to see a fully qualified name in the symbol structure itself. This
would make b) unnecessary.
- Change symbol table clients to do lookups one component at a time,
making c) unnecessary.
- Change symbol table readers to build explicit namespace trees,
rather than dumping qualified names into the symbol table. This
would make a) unnecessary.
Now we've got symbol lookups switched over. Given the new
representation, we can implement namespaces in a straightforward way.
But what about structs? I don't have enough of a grasp on how data
members, member functions, static members, etc. really work now to say
how we'd switch struct types over to the new representation, but it
seems like the same general approach should work:
- Gradually replace code which manipulates the type structures
directly with simple accessor functions, until the type can be made
opaque.
- Switch to an intermediate representation which allows both the old
and the new representations, mixed.
- Migrate clients and producers over to the newer interfaces. This is
now a set of independent changes, that can be done in any order.
- Once the producers are all creating data in the new style, remove
support for it. Now you've got your new data structure, used as an
opaque datatype.
I feel like we're planning a construction project like Boston's Big
Dig or something: everything's got to keep running while we do the
work, and things will get pretty ugly in there for a while, but we
hope (if our funding doesn't run out) that in the end it'll all be
beautiful.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-08 16:29 ` Jim Blandy
2002-04-08 16:48 ` Daniel Jacobowitz
@ 2002-04-09 6:55 ` Petr Sorfa
2002-04-10 10:34 ` Jim Blandy
1 sibling, 1 reply; 37+ messages in thread
From: Petr Sorfa @ 2002-04-09 6:55 UTC (permalink / raw)
To: Jim Blandy; +Cc: gdb, Benjamin Kosnik, Daniel Berlin
Hi,
I've implemented FORTRAN95 MODULE support which is essentially
equivalent to namespaces (except you cannot have nested MODULEs.) I
treat it internally as a static class. For scoping issues I simply add
(in DWARF) the current local symbols to the MODULE to the local symbols
of the PROGRAM, CONTAINS, SUBROUTINE and FUNCTION scopes. A similar kind
of approach will allow nested C++ namespaces (flame bait comment.)
Petr
> Jim Blandy <jimb@redhat.com> writes:
> > As a half-baked idea, perhaps a `struct environment' object would have
> > a list of other `struct environment' objects you should search, too,
> > if you didn't find what you were looking for. We could use this to
> > search a compound statement's enclosing scopes to find bindings
> > further out, find members inherited from base classes, and resolve C++
> > `using' declarations.
>
> >From the discussion, it's pretty clear that this idea is, indeed,
> half-baked. While the general idea of "stuff from over there is
> visible here, too" does recur in the different contexts, there are so
> many subtle differences in exactly what it means that I'm
> uncomfortable having generic code try to handle it. I have the
> feeling that it would become populated with "if we're doing C++
> inheritance, do this; but if we're stepping out to an enclosing
> compound statement, do this; ..." garbage. It's better to let the
> context implement the right semantics itself.
>
> However, it would be possible, at least, to provide generic code to do
> lookups within a single environment. We could conceal symbol table
> indexing techniques behind this interface (linear search for
> environments binding few identifiers, as compound statements often
> are; hash tables for big environments; and so on), which would allow
> us to change the representation without breaking the consumers
> (... but maybe skip lists would be fine for all the above).
>
> We could then use that to write code for more specific cases:
>
> - The code that looks up member names in a struct type (for example)
> would call this generic code to search the immediate struct's
> members, and then recurse on the struct's base classes, making the
> appropriate adjustments (qualifying names, adjusting the base
> address, and so on).
>
> - The code that searches compound statement scopes, from the innermost
> enclosing statement out (eventually) to the global scope, would know
> that inner declarations simply shadow outer declarations, rather
> than introducing ambiguities (as inheritance does). If GDB were to
> support nested functions, some steps outward might note that a
> static link needs to be traversed.
>
> And so on. The generic code would only search one level; deeper
> searches would be left to code that knows how they're supposed to
> behave.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-08 17:03 ` Jim Blandy
@ 2002-04-08 18:59 ` Daniel Jacobowitz
2002-04-09 18:35 ` Jim Blandy
0 siblings, 1 reply; 37+ messages in thread
From: Daniel Jacobowitz @ 2002-04-08 18:59 UTC (permalink / raw)
To: Jim Blandy; +Cc: gdb, Benjamin Kosnik, Daniel Berlin
On Mon, Apr 08, 2002 at 07:02:58PM -0500, Jim Blandy wrote:
>
> Daniel Jacobowitz <drow@mvista.com> writes:
> > GDB already does a great deal of this by the very simple method of
> > using fully qualified names. It's served us remarkably well, although
> > of course we're hitting its limits now. But let's not be too quick to
> > discard that approach, for the present at least.
>
> Oh dear. I think that's exactly what I'm proposing we replace. As
> Per said:
> In that situation, we can look up things like `B2::A2::x' in a
> straightforward way: look up B2, look up A2 there, and look up x
> there. With a symbol table full of fully qualified names, how do we
> do this?
>
> Can you talk more about why we shouldn't evolve away from placing
> fully qualified names in the symbol table directly, and towards
> representing them more explicitly?
Let me be a little clearer on what I wanted to say there: I agree
completely that your approach is cleaner, more efficient, more
adaptable, more useful, and generally better than what we have now.
But let me put my cynic's cap on for a moment and point out some
problems. I'd love to see us just decide to overcome them all, and I
think it's viable, but we need to make sure we consider them first.
The "incremental change" Problem
Could we write the necessary wrapper functions to simulate lookup of
a fully qualified name and use them everywhere we haven't changed yet?
Probably a non-issue. How much will have to be changed at first to
make this functional? For instance, all the debug readers would be
substantially affected.
The "big change" Problem
There's plenty of places in GDB that assume the current name lookup
behavior. We'd have to do a lot of digging to make sure we did scoped
lookups everywhere that needed them. On the upside, we could use this
as an excuse for some really rocking test coverage improvements.
The "debug info" Problem
DWARF-2 gives us enough information for this. Sun's stabs extensions
(well, recent definitions; it being their format and all) also give us
enough information. HP's stuff probably does, but I don't know if
we've got anyone that knows it well enough to update. For most other
formats, we'd just have to fake it as well as we could and hope for the
best. Look at the debugging output GCC 3.0 / -gstabs+ puts out for
namespaces or even nested classes sometime.
I'm sure I had another in mind, but it slips my mind at the moment.
For debug info, we could probably add a 'unknown_scope_kind' to the
enum name_kind I described in my other message, and change all
periods/colons etc. into those. For free, this hierarchy would let us
associate out-of-line member function definitions directly with the
types, which would make my life in C++-land much nicer!
--
Daniel Jacobowitz Carnegie Mellon University
MontaVista Software Debian GNU/Linux Developer
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-08 17:19 ` Jim Blandy
@ 2002-04-08 18:49 ` Daniel Jacobowitz
2002-04-10 10:31 ` Jim Blandy
0 siblings, 1 reply; 37+ messages in thread
From: Daniel Jacobowitz @ 2002-04-08 18:49 UTC (permalink / raw)
To: Jim Blandy; +Cc: gdb, Benjamin Kosnik, Daniel Berlin
On Mon, Apr 08, 2002 at 07:19:14PM -0500, Jim Blandy wrote:
>
> Daniel Jacobowitz <drow@mvista.com> writes:
> > How about -containing- `struct fields', instead of replacing? i.e. let
> > the name search happen in the `struct environment', as before, but the
> > data items would be fields (could be indicated in a flag in the
> > environment, with a pointer to the type or symbol for the enclosing
> > structure). I don't think turning members into symbols is a good idea.
>
> I admit the idea of using `struct symbol' for fields as well as
> variables is pretty weird. Here's the rationale:
>
> First, keep in mind that `struct symbol' is sort of a `messy union':
> it's used for a lot of distinct purposes, and it contains all the
> members any of those purposes might need. The `struct symbol'
> representing a declaration like `struct A' doesn't need its
> ginfo.value field. The `struct symbol' representing a local variable
> doesn't need its `bfd_section' field. (I'm not saying this is a great
> way to do things; but it is the way it's done now.)
Sure. But I think this is a chance (if we want one) to move in a
different direction. We'd have to work out the details, but I envision
something like this (names made up as I go along):
struct environment_entry {
const char *name;
enum name_type kind;
void *data;
}
enum name_type {
type_kind,
field_kind,
symbol_kind,
namespace_kind,
};
Some duplication between symbol_kind and field_kind that would need to
be decided on, and there's probably other useful kinds. There could
even be a subtype in there if we found it convenient. For instance:
/* For classes. */
enum field_kind {
normal_field,
static_field,
base_class, /* Probably not this one, actually. Better kept in the type? */
member_function,
static_member_fn, /* Might or might not be necessary. */
};
Then we don't need to have a dummy symbol for every type, etc. It
would keep searching a little more straightforward. IMVHO.
There's plenty of details I haven't thought about how to fit in. Like,
the protections on a member type. Perhaps the environment really
should dictate a structure for every member of the environment,
opaquely. We have some wiggle room here.
There's also a question of how many "kinds" of lookup situations we
have - implicit vs. explicit members, for instance. Some inherit one
way, some another, so we'd need to have separate lookup functions.
Categorizing by an enum is all good. Reusing address classes, though,
I'm more suspicious of.
Does that make sense?
--
Daniel Jacobowitz Carnegie Mellon University
MontaVista Software Debian GNU/Linux Developer
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-06 9:26 ` Gianni Mariani
2002-04-06 11:57 ` Daniel Berlin
@ 2002-04-08 17:24 ` Jim Blandy
1 sibling, 0 replies; 37+ messages in thread
From: Jim Blandy @ 2002-04-08 17:24 UTC (permalink / raw)
To: Gianni Mariani; +Cc: Daniel Berlin, Daniel Jacobowitz, gdb, Benjamin Kosnik
Gianni Mariani <gianni@mariani.ws> writes:
> Much of what is discussed here is language and compiler specific. My
> generic approach to solving this kind of problem is to provide an
> abstraction layer where all the facilities are provided for in a API
> (abstract base interface class); the mapping is then language and
> compiler specific. The burden is then on the compiler writer to
> provide the symbol binding mechanism/implementation which is where it
> belongs.
Well, the rules for identifier lookup are part of the language.
They're not compiler-specific, or else the meaning of your programs
would be, too. (That does happen, but it's not generally regarded as
desireable.)
The rules for the correspondence between machine-level objects (bits,
bytes, registers) and source-level objects (variables, functions)
aren't really compiler-specific either: they're given by the ABI.
Many different compilers can (try to) share the same ABI.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-05 22:34 ` Daniel Jacobowitz
2002-04-05 23:49 ` Daniel Berlin
2002-04-08 17:03 ` Jim Blandy
@ 2002-04-08 17:19 ` Jim Blandy
2002-04-08 18:49 ` Daniel Jacobowitz
2 siblings, 1 reply; 37+ messages in thread
From: Jim Blandy @ 2002-04-08 17:19 UTC (permalink / raw)
To: Daniel Jacobowitz; +Cc: gdb, Benjamin Kosnik, Daniel Berlin
Daniel Jacobowitz <drow@mvista.com> writes:
> How about -containing- `struct fields', instead of replacing? i.e. let
> the name search happen in the `struct environment', as before, but the
> data items would be fields (could be indicated in a flag in the
> environment, with a pointer to the type or symbol for the enclosing
> structure). I don't think turning members into symbols is a good idea.
I admit the idea of using `struct symbol' for fields as well as
variables is pretty weird. Here's the rationale:
First, keep in mind that `struct symbol' is sort of a `messy union':
it's used for a lot of distinct purposes, and it contains all the
members any of those purposes might need. The `struct symbol'
representing a declaration like `struct A' doesn't need its
ginfo.value field. The `struct symbol' representing a local variable
doesn't need its `bfd_section' field. (I'm not saying this is a great
way to do things; but it is the way it's done now.)
Now, when we're debugging a C++ program, if we have a class A, think
about what sorts of objects A::x could represent:
- It could be a member.
- It could be a static member, which is really a global variable
with a qualified name.
- It could be a typedef.
- It could be a nested class.
When the user says `ptype A::x', we should be able to just look up A,
then look up x in A's environment, and see what it is.
It needs to have an `enum address_class' to distinguish members from
typedefs.
If it's a static member, it'll need to have a bfd_section.
`struct field' is slowly acquiring the equivalent of `enum
address_class', but badly: here's the comment for the `bitsize'
member:
/* Size of this field, in bits, or zero if not packed.
For an unpacked field, the field's type's length
says how many bytes the field occupies.
A value of -1 or -2 indicates a static field; -1 means the location
is specified by the label loc.physname; -2 means that loc.physaddr
specifies the actual address. */
int bitsize;
How would you suggest we represent nested typedefs and classes?
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-05 22:34 ` Daniel Jacobowitz
2002-04-05 23:49 ` Daniel Berlin
@ 2002-04-08 17:03 ` Jim Blandy
2002-04-08 18:59 ` Daniel Jacobowitz
2002-04-08 17:19 ` Jim Blandy
2 siblings, 1 reply; 37+ messages in thread
From: Jim Blandy @ 2002-04-08 17:03 UTC (permalink / raw)
To: Daniel Jacobowitz; +Cc: gdb, Benjamin Kosnik, Daniel Berlin
Daniel Jacobowitz <drow@mvista.com> writes:
> GDB already does a great deal of this by the very simple method of
> using fully qualified names. It's served us remarkably well, although
> of course we're hitting its limits now. But let's not be too quick to
> discard that approach, for the present at least.
Oh dear. I think that's exactly what I'm proposing we replace. As
Per said:
Per Bothner <per@bothner.com> writes:
> Nothing much to add, except that namespace support is even
> more critical for Java, in which *all* code uses namespaces
> (aka "packages"). We kludge around it, by treating a compound
> name like 'java.lang.Object' as a single name, but this doesn't
> work all that well, especially with mixed Java/C++ code.
It just seems natural to represent something like:
namespace A
{
int x;
};
as a binding in the top-level environment for the symbol A, saying
it's a namespace. That binding would have another environment object
representing the contents of A; there we'd find a binding for 'x',
saying it's a global variable of type `int'. Presented with something
like `A::x', we'd first look up `A', check that it is a namespace or
struct or something else that can qualify names, and then look in A's
environment for `x'.
Think about how namespace aliases need to work:
namespace A
{
int x;
};
int y;
namespace B
{
namespace A2 = A;
}
int
foo (int y)
{
namespace B2 = B;
return B2::A2::x + y;
}
The `namespace A2 = A' declaration in `namespace B' really should
create a new entry in B's environment which binds A2 to the same
namespace object A is bound to. The `namespace B2 = B' declaration
should modify foo's local variable environment to bind B2 to the same
namespace object B is bound to.
In that situation, we can look up things like `B2::A2::x' in a
straightforward way: look up B2, look up A2 there, and look up x
there. With a symbol table full of fully qualified names, how do we
do this?
Can you talk more about why we shouldn't evolve away from placing
fully qualified names in the symbol table directly, and towards
representing them more explicitly?
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-08 16:29 ` Jim Blandy
@ 2002-04-08 16:48 ` Daniel Jacobowitz
2002-04-09 6:55 ` Petr Sorfa
1 sibling, 0 replies; 37+ messages in thread
From: Daniel Jacobowitz @ 2002-04-08 16:48 UTC (permalink / raw)
To: gdb
On Mon, Apr 08, 2002 at 06:29:37PM -0500, Jim Blandy wrote:
>
> Jim Blandy <jimb@redhat.com> writes:
> > As a half-baked idea, perhaps a `struct environment' object would have
> > a list of other `struct environment' objects you should search, too,
> > if you didn't find what you were looking for. We could use this to
> > search a compound statement's enclosing scopes to find bindings
> > further out, find members inherited from base classes, and resolve C++
> > `using' declarations.
>
> >From the discussion, it's pretty clear that this idea is, indeed,
> half-baked. While the general idea of "stuff from over there is
> visible here, too" does recur in the different contexts, there are so
> many subtle differences in exactly what it means that I'm
> uncomfortable having generic code try to handle it. I have the
> feeling that it would become populated with "if we're doing C++
> inheritance, do this; but if we're stepping out to an enclosing
> compound statement, do this; ..." garbage. It's better to let the
> context implement the right semantics itself.
>
> However, it would be possible, at least, to provide generic code to do
> lookups within a single environment. We could conceal symbol table
> indexing techniques behind this interface (linear search for
> environments binding few identifiers, as compound statements often
> are; hash tables for big environments; and so on), which would allow
> us to change the representation without breaking the consumers
> (... but maybe skip lists would be fine for all the above).
>
> We could then use that to write code for more specific cases:
Completely agree! I like the look of this. The specific code to
search a given level could call the general code "search me" and then
recurse on its parents/outer wrappers/base classes/whatever.
> - The code that looks up member names in a struct type (for example)
> would call this generic code to search the immediate struct's
> members, and then recurse on the struct's base classes, making the
> appropriate adjustments (qualifying names, adjusting the base
> address, and so on).
>
> - The code that searches compound statement scopes, from the innermost
> enclosing statement out (eventually) to the global scope, would know
> that inner declarations simply shadow outer declarations, rather
> than introducing ambiguities (as inheritance does). If GDB were to
> support nested functions, some steps outward might note that a
> static link needs to be traversed.
>
> And so on. The generic code would only search one level; deeper
> searches would be left to code that knows how they're supposed to
> behave.
>
--
Daniel Jacobowitz Carnegie Mellon University
MontaVista Software Debian GNU/Linux Developer
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-05 20:42 Jim Blandy
` (3 preceding siblings ...)
2002-04-06 8:49 ` Per Bothner
@ 2002-04-08 16:29 ` Jim Blandy
2002-04-08 16:48 ` Daniel Jacobowitz
2002-04-09 6:55 ` Petr Sorfa
4 siblings, 2 replies; 37+ messages in thread
From: Jim Blandy @ 2002-04-08 16:29 UTC (permalink / raw)
To: gdb; +Cc: Benjamin Kosnik, Daniel Berlin
Jim Blandy <jimb@redhat.com> writes:
> As a half-baked idea, perhaps a `struct environment' object would have
> a list of other `struct environment' objects you should search, too,
> if you didn't find what you were looking for. We could use this to
> search a compound statement's enclosing scopes to find bindings
> further out, find members inherited from base classes, and resolve C++
> `using' declarations.
From the discussion, it's pretty clear that this idea is, indeed,
half-baked. While the general idea of "stuff from over there is
visible here, too" does recur in the different contexts, there are so
many subtle differences in exactly what it means that I'm
uncomfortable having generic code try to handle it. I have the
feeling that it would become populated with "if we're doing C++
inheritance, do this; but if we're stepping out to an enclosing
compound statement, do this; ..." garbage. It's better to let the
context implement the right semantics itself.
However, it would be possible, at least, to provide generic code to do
lookups within a single environment. We could conceal symbol table
indexing techniques behind this interface (linear search for
environments binding few identifiers, as compound statements often
are; hash tables for big environments; and so on), which would allow
us to change the representation without breaking the consumers
(... but maybe skip lists would be fine for all the above).
We could then use that to write code for more specific cases:
- The code that looks up member names in a struct type (for example)
would call this generic code to search the immediate struct's
members, and then recurse on the struct's base classes, making the
appropriate adjustments (qualifying names, adjusting the base
address, and so on).
- The code that searches compound statement scopes, from the innermost
enclosing statement out (eventually) to the global scope, would know
that inner declarations simply shadow outer declarations, rather
than introducing ambiguities (as inheritance does). If GDB were to
support nested functions, some steps outward might note that a
static link needs to be traversed.
And so on. The generic code would only search one level; deeper
searches would be left to code that knows how they're supposed to
behave.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-06 6:31 ` Andrew Cagney
2002-04-06 7:58 ` Daniel Berlin
@ 2002-04-08 0:59 ` Joel Brobecker
1 sibling, 0 replies; 37+ messages in thread
From: Joel Brobecker @ 2002-04-08 0:59 UTC (permalink / raw)
To: Andrew Cagney; +Cc: Jim Blandy, gdb, Benjamin Kosnik, Daniel Berlin
> I'm very interested in hearing about what ACT did for Ada. As far as I
> know Ada, with its packages et.al. has a very similar problem and,
> potentially, working code.
One thing we do is to use a precisely defined encoding that allows us to
retrieve this information from the name. For instance, for type T in
package Pck, the type name will be encoded into pck__t. The
specifications are available in exp_dbug.ads, which is part of the GNAT
sources.
--
Joel
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-06 9:26 ` Gianni Mariani
@ 2002-04-06 11:57 ` Daniel Berlin
2002-04-08 17:24 ` Jim Blandy
1 sibling, 0 replies; 37+ messages in thread
From: Daniel Berlin @ 2002-04-06 11:57 UTC (permalink / raw)
To: Gianni Mariani; +Cc: Daniel Jacobowitz, Jim Blandy, gdb, Benjamin Kosnik
On Sat, 6 Apr 2002, Gianni Mariani wrote:
>
> As a complete digression:
>
> Although this problem might need to be solved in the fashion presented
> here, I can't help but think we're trying to solve the wrong problem.
>
> Much of what is discussed here is language and compiler specific.
No, it's not.
It's language specific.
It's compiler independent.
--Dan
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-05 23:49 ` Daniel Berlin
2002-04-06 7:18 ` Dan Kegel
@ 2002-04-06 9:26 ` Gianni Mariani
2002-04-06 11:57 ` Daniel Berlin
2002-04-08 17:24 ` Jim Blandy
1 sibling, 2 replies; 37+ messages in thread
From: Gianni Mariani @ 2002-04-06 9:26 UTC (permalink / raw)
To: Daniel Berlin; +Cc: Daniel Jacobowitz, Jim Blandy, gdb, Benjamin Kosnik
As a complete digression:
Although this problem might need to be solved in the fashion presented
here, I can't help but think we're trying to solve the wrong problem.
Much of what is discussed here is language and compiler specific. My
generic approach to solving this kind of problem is to provide an
abstraction layer where all the facilities are provided for in a API
(abstract base interface class); the mapping is then language and
compiler specific. The burden is then on the compiler writer to provide
the symbol binding mechanism/implementation which is where it belongs.
This then buys all kinds of nice things like unit testability. If the
compiler writer makes a change, the compiler tests fail way before it
even gets to running gdb.
In other words, "Place the responsibility with the knowledge".
The question is - how do you provide an abstraction that makes sense for
all languages ? (or at least *most* languages).
BTW- "Amazingly tedious refactoring" is somthing that a editor macro can
help with. I constantly have people in my development team who want to
make global changes and I end up doing them because I've taken the time
to learn how to use my editor's keyboard macros. The reason you
*should* use them is that you can guarentee a consistant change.
So how about this silly challenge:
a) Someone define a new interface for symbol look up.
b) Someone implement the interface so it is in an independant module.
c) I'll mod the code to use the new interface
I'll bet a keg of beer I can do c) it in less time it takes someone to
do a) and b).
.... I feel I'm going to regret this .... :)
Daniel Berlin wrote:
>>>- How would we introduce this incrementally?
>>>
>>Do we want to?
>>
>>No, I'm serious. Incremental solutions are more practical to
>>implement, but they will come with more baggage. Baggage will haunt us
>>for a very long time. If we can completely handle non-minimal-symbol
>>lookup in this way, that's a big win.
>>
>
>You might be able to pull something off like i did on the
>new-typesystem-branch (which is unfinished, but quite far along. It was
>left ina non-compiling stabs because i was in the midst of stabs fixes
>when i stopped working on it).
>
>I modified a single type class at a time, replacing it with a compatible
>structure with the added members, then changed the functions gradually to
>fill in the extra members, then use the extra members, then not use the
>old members, then removed the old members. Somewhere in there ,I
>created new type creation functions (one for each type class), and changed
>the symbol readers to use them when approriate.
>
>Adding a struct environment is probably comparable in the amount of
>work/places to touch.
>
>I can tell you that while I did succeeed in keeping a working gdb at
>all times, even with a mix of new type structures and old (which are
>completely different beasts), it was *amazingly* tedious to do it this
>way.
>
>It's not just a matter of global search and replace, the rewriting
>required is mundane and repetitive, but a step above what simple global
>search and replace would do, so you end up doing it by hand (you'd need
>to write a pass for a source-source translator or something to do it
>automatically).
>
>It was at least 2x the work it would have been to not do it incrementally.
>But it's also less disheartening then dealing with 8 million compile
>errors at once, and trying to hunt down logic bugs after making a million
>changes.
>
>--Dan
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-05 20:42 Jim Blandy
` (2 preceding siblings ...)
2002-04-06 6:31 ` Andrew Cagney
@ 2002-04-06 8:49 ` Per Bothner
2002-04-08 16:29 ` Jim Blandy
4 siblings, 0 replies; 37+ messages in thread
From: Per Bothner @ 2002-04-06 8:49 UTC (permalink / raw)
To: Jim Blandy; +Cc: gdb
Nothing much to add, except that namespace support is even
more critical for Java, in which *all* code uses namespaces
(aka "packages"). We kludge around it, by treating a compound
name like 'java.lang.Object' as a single name, but this doesn't
work all that well, especially with mixed Java/C++ code. And
the primitive methods in Gcj are written in C++.
Fixing this mess is perhaps the most critical issue in terms
of improving Java support in gdb.
Java also has nested ("inner") classes.
--
--Per Bothner
per@bothner.com http://www.bothner.com/per/
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-06 6:31 ` Andrew Cagney
@ 2002-04-06 7:58 ` Daniel Berlin
2002-04-08 0:59 ` Joel Brobecker
1 sibling, 0 replies; 37+ messages in thread
From: Daniel Berlin @ 2002-04-06 7:58 UTC (permalink / raw)
To: Andrew Cagney; +Cc: Jim Blandy, gdb, Benjamin Kosnik
On Sat, 6 Apr 2002, Andrew Cagney wrote:
> > At the moment, GDB doesn't handle C++ namespaces or nested classes
> > very well. I have a general idea of how we could address these
> > limitations, which I'd like to put up for shredding M-DEL discussion.
> >
> > Let me admit up front that I don't really know C++, so I may be saying
> > stupid things. Please set me straight if you notice something.
> >
> > In C, structs are essentially lists of member names, types, and
> > locations (offsets from the structure's base address):
> >
> > struct S { int x; char y; struct T t; }
> >
> > (Unions are just the same, except that the offsets are all zero. That
> > relationship carries through the entire discussion here, so I'm not
> > going to talk about unions any more.)
> >
> > If you think about it just right (or just wrong), this is really very
> > similar to the set of local variables associated with a compound
> > statement:
>
> I'm very interested in hearing about what ACT did for Ada. As far as I
> know Ada, with its packages et.al. has a very similar problem and,
> potentially, working code.
The last time I scanned the Ada changes (a few days ago), they hadn't
handled this problem at all.
Probably because gcc doesn't produce module/packages/etc debug info for
Ada.
--Dan
>
> Andrew
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-05 23:49 ` Daniel Berlin
@ 2002-04-06 7:18 ` Dan Kegel
2002-04-06 9:26 ` Gianni Mariani
1 sibling, 0 replies; 37+ messages in thread
From: Dan Kegel @ 2002-04-06 7:18 UTC (permalink / raw)
To: Daniel Berlin; +Cc: Daniel Jacobowitz, Jim Blandy, gdb, Benjamin Kosnik
Daniel Berlin wrote:
> I can tell you that while I did succeeed in keeping a working gdb at
> all times, even with a mix of new type structures and old (which are
> completely different beasts), it was *amazingly* tedious to do it this
> way.
>
> It's not just a matter of global search and replace, the rewriting
> required is mundane and repetitive, but a step above what simple global
> search and replace would do, so you end up doing it by hand (you'd need
> to write a pass for a source-source translator or something to do it
> automatically).
The 'refactoring' weenies are trying to produce editors that
automate this sort of global search-and-replace. I have not tried
any of them yet. It'd be fairly amazing if they worked well for C++,
let alone C. (See e.g.
http://www.refactoring.com/
http://www.xref-tech.com/speller/
http://www-106.ibm.com/developerworks/linux/library/l-eclipse.html )
> It was at least 2x the work it would have been to not do it incrementally.
> But it's also less disheartening then dealing with 8 million compile
> errors at once, and trying to hunt down logic bugs after making a million
> changes.
Agreed. As time goes on, I find incremental refactoring increasingly
my favorite way to make big changes.
- Dan
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-05 20:42 Jim Blandy
2002-04-05 22:05 ` Daniel Berlin
2002-04-05 22:34 ` Daniel Jacobowitz
@ 2002-04-06 6:31 ` Andrew Cagney
2002-04-06 7:58 ` Daniel Berlin
2002-04-08 0:59 ` Joel Brobecker
2002-04-06 8:49 ` Per Bothner
2002-04-08 16:29 ` Jim Blandy
4 siblings, 2 replies; 37+ messages in thread
From: Andrew Cagney @ 2002-04-06 6:31 UTC (permalink / raw)
To: Jim Blandy; +Cc: gdb, Benjamin Kosnik, Daniel Berlin
> At the moment, GDB doesn't handle C++ namespaces or nested classes
> very well. I have a general idea of how we could address these
> limitations, which I'd like to put up for shredding M-DEL discussion.
>
> Let me admit up front that I don't really know C++, so I may be saying
> stupid things. Please set me straight if you notice something.
>
> In C, structs are essentially lists of member names, types, and
> locations (offsets from the structure's base address):
>
> struct S { int x; char y; struct T t; }
>
> (Unions are just the same, except that the offsets are all zero. That
> relationship carries through the entire discussion here, so I'm not
> going to talk about unions any more.)
>
> If you think about it just right (or just wrong), this is really very
> similar to the set of local variables associated with a compound
> statement:
I'm very interested in hearing about what ACT did for Ada. As far as I
know Ada, with its packages et.al. has a very similar problem and,
potentially, working code.
Andrew
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-05 22:34 ` Daniel Jacobowitz
@ 2002-04-05 23:49 ` Daniel Berlin
2002-04-06 7:18 ` Dan Kegel
2002-04-06 9:26 ` Gianni Mariani
2002-04-08 17:03 ` Jim Blandy
2002-04-08 17:19 ` Jim Blandy
2 siblings, 2 replies; 37+ messages in thread
From: Daniel Berlin @ 2002-04-05 23:49 UTC (permalink / raw)
To: Daniel Jacobowitz; +Cc: Jim Blandy, gdb, Benjamin Kosnik
> > - How would we introduce this incrementally?
>
> Do we want to?
>
> No, I'm serious. Incremental solutions are more practical to
> implement, but they will come with more baggage. Baggage will haunt us
> for a very long time. If we can completely handle non-minimal-symbol
> lookup in this way, that's a big win.
>
You might be able to pull something off like i did on the
new-typesystem-branch (which is unfinished, but quite far along. It was
left ina non-compiling stabs because i was in the midst of stabs fixes
when i stopped working on it).
I modified a single type class at a time, replacing it with a compatible
structure with the added members, then changed the functions gradually to
fill in the extra members, then use the extra members, then not use the
old members, then removed the old members. Somewhere in there ,I
created new type creation functions (one for each type class), and changed
the symbol readers to use them when approriate.
Adding a struct environment is probably comparable in the amount of
work/places to touch.
I can tell you that while I did succeeed in keeping a working gdb at
all times, even with a mix of new type structures and old (which are
completely different beasts), it was *amazingly* tedious to do it this
way.
It's not just a matter of global search and replace, the rewriting
required is mundane and repetitive, but a step above what simple global
search and replace would do, so you end up doing it by hand (you'd need
to write a pass for a source-source translator or something to do it
automatically).
It was at least 2x the work it would have been to not do it incrementally.
But it's also less disheartening then dealing with 8 million compile
errors at once, and trying to hunt down logic bugs after making a million
changes.
--Dan
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-05 20:42 Jim Blandy
2002-04-05 22:05 ` Daniel Berlin
@ 2002-04-05 22:34 ` Daniel Jacobowitz
2002-04-05 23:49 ` Daniel Berlin
` (2 more replies)
2002-04-06 6:31 ` Andrew Cagney
` (2 subsequent siblings)
4 siblings, 3 replies; 37+ messages in thread
From: Daniel Jacobowitz @ 2002-04-05 22:34 UTC (permalink / raw)
To: Jim Blandy; +Cc: gdb, Benjamin Kosnik, Daniel Berlin
Much to say, much to say...
On Fri, Apr 05, 2002 at 11:42:04PM -0500, Jim Blandy wrote:
>
> At the moment, GDB doesn't handle C++ namespaces or nested classes
> very well. I have a general idea of how we could address these
> limitations, which I'd like to put up for shredding M-DEL discussion.
>
> Let me admit up front that I don't really know C++, so I may be saying
> stupid things. Please set me straight if you notice something.
I know C++ fairly well, but my grasp of the technical terminology of
the language is lacking. So don't go looking up any phrases I use in
here; I'm probably making them up :) I'm sure Daniel Berlin can
correct any egregious errors.
> You can also declare "static" struct members --- you can access them
> with the `->' and `.' operators, just like ordinary members, but
> they're actually variables at fixed addresses in the .data segment ---
> much like a "static" variable in a C compound statement. But this
> means that a simple offset from a base address is no longer sufficient
> to describe a struct's member's location --- you actually start
> needing something like GDB's enum address_class. Multiple inheritance
> and virtual base classes introduce further complexity here.
I believe you're generalizing too much here. Statics are a special
case; they're essential global variables, whose name is given in the
local class scope. You've also got constant data members, which are
not necessarily backed by real symbols in C++ (I believe they always
are, in C...). Everything else are members.
The complexity from multiple inheritence and virtual base classes is
essentially orthogonal. It just affects the scopes you search.
> There's another difference between compound statements and structs
> goes away. In C, you can only reference a struct's members using the
> `.' and `->' operators, whereas you refer to a compound statement's
> variables by simply naming them. But in C++, a struct's member
> functions can refer to the struct's members by simply naming them.
> The struct's bindings become another rib in the search path for
> identifier bindings.
>
> In summary, the data structure GDB needs to represent C++ structs
> (classes, unions, whatever) has a lot of similarities to the structure
> GDB needs to represent the local variables of a compound statement.
> They both need to carry bindings for several namespaces (ordinary
> identifiers and structure tags). The names can refer to any manner of
> things: variables, functions, namespaces, base classes, and so on.
> For variables, there are a variety of locations they might occupy.
GDB already does a great deal of this by the very simple method of
using fully qualified names. It's served us remarkably well, although
of course we're hitting its limits now. But let's not be too quick to
discard that approach, for the present at least.
Also, while they're often both searched, don't confuse the structure
inheritence search path with the enclosing structure/namespace search
path. For instance, foo->x() searches only the inheritence paths.
foo->A::x is even worse (and gdb handles it badly or not at all at
present, as Michael mentioned in his mail a moment ago).
> So I would like to introduce to GDB a new type, `struct environment'
> (or is `struct env' better?) which does about the same thing that the
> `nsyms' and `sym' members of `struct block', and the `nfields' and
> `fields' members of `struct type', do now: it's just a bunch of
> bindings for names. We would use `struct environment':
>
> - in `struct block', to represent the block's local variables, replacing
> `nsyms' and `sym';
> - in `struct type', to represent a struct's members, instead of
> `struct fields'; and
> - in our representation for C++ namespaces, which seem pretty much
> like structs that can only contain static members and member
> functions (i.e., you can't ever create an instance of one).
>
> There'd be a single set of functions for building `struct environment'
> objects, and looking up bindings in them; you'd use it for variable
> lookup, and in the `.' and `->' operators. It could handle hashing,
> when appropriate.
>
> Basically, we would take two distinct areas of GDB (and a third,
> namespaces, which we haven't implemented yet but will need to), and
> support them all with a single structure and a single bunch of
> support functions. GDB would become easier to read.
How about -containing- `struct fields', instead of replacing? i.e. let
the name search happen in the `struct environment', as before, but the
data items would be fields (could be indicated in a flag in the
environment, with a pointer to the type or symbol for the enclosing
structure). I don't think turning members into symbols is a good idea.
As a side note, at the same time we should generalize our overloading
support to functions in addition to methods. This would give the
framework to make that painless. The environment could describe an
overloaded name...
> As a half-baked idea, perhaps a `struct environment' object would have
> a list of other `struct environment' objects you should search, too,
> if you didn't find what you were looking for. We could use this to
> search a compound statement's enclosing scopes to find bindings
> further out, find members inherited from base classes, and resolve C++
> `using' declarations.
As I said above, I think that going this route is a bad idea. It
should have a pointer to the enclosing object and to that object's
environment, probably, but that's the extent of it.
> How does this strike people?
>
> Open issues:
>
> - This "list of other places to search" thing may be ill-formed. I
> mean, sure, there are a set of similar behaviors going on there, but
> are they similar enough? For example:
We're thinking along the same paths here... I suspect that it is in
fact ill-formed.
> - What really happens when you start using `struct symbol' objects for
> structure members? Do we need new address classes now for `offset
> from object base address'? Does the LOC_COMPUTED idea I've been
> pushing still work?
Why do we want members to be symbols? A `struct field' expresses all
the properties of members; symbols have other properties. I think we
use symbols in too many places already.
> - How do member functions work in this arrangement? Virtual member
> functions? Virtual base classes?
If we leave searching out of it, we're fine on this front.
> - How would we introduce this incrementally?
Do we want to?
No, I'm serious. Incremental solutions are more practical to
implement, but they will come with more baggage. Baggage will haunt us
for a very long time. If we can completely handle non-minimal-symbol
lookup in this way, that's a big win.
Some other thoughts:
This is all a question of scope. As I said, right now we handle this
mostly by searching a small set of 'namespaces' (type, struct,
variable...) for a given term. We try to look up fully qualified
names, and qualify them as necessary. Scales fairly badly. We want to
search for names in the appropriate scope, breaking up qualification as
necessary. This is as opposed to search for fully qualified names.
What we want is essentially:
- The concept of an enclosing scope
- Language dependant hooks to specify which scopes to search for a
name.
We should be able to get a very nice behavior here, which we sort of
have now but not cleanly, and which well illustrates why the order to
search can be language dependant. Consider:
struct X {
int foo();
int bar();
};
int baz();
int X::foo()
{
...
int z = bar();
}
We're debugging in X::foo() and the user says 'list'. They see
'bar()'. They decide to ask 'print bar()'. We want in this case
to find X::bar(). We do even (much) worse if X::foo is a static member
function. On the other hand, this->bar() should work and this->baz()
should not.
My benchmark for whether a solution is even adequate enough to consider
is whether it obsoletes DW_AT_MIPS_linkage_name. This should. I'd
really like to make us independent of that, since it will make
constructor handling much less of a gross special case. Daniel pointed
out at one point how much space this could save in binaries.
There's probably more I meant to say about this, but it's 1:30AM
here...
--
Daniel Jacobowitz Carnegie Mellon University
MontaVista Software Debian GNU/Linux Developer
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: C++ nested classes, namespaces, structs, and compound statements
2002-04-05 20:42 Jim Blandy
@ 2002-04-05 22:05 ` Daniel Berlin
2002-04-05 22:34 ` Daniel Jacobowitz
` (3 subsequent siblings)
4 siblings, 0 replies; 37+ messages in thread
From: Daniel Berlin @ 2002-04-05 22:05 UTC (permalink / raw)
To: Jim Blandy; +Cc: gdb, Benjamin Kosnik
On Fri, 5 Apr 2002, Jim Blandy wrote:
>
> At the moment, GDB doesn't handle C++ namespaces or nested classes
> very well. I have a general idea of how we could address these
> limitations, which I'd like to put up for shredding M-DEL discussion.
>
> Let me admit up front that I don't really know C++, so I may be saying
> stupid things. Please set me straight if you notice something.
>
> In C, structs are essentially lists of member names, types, and
> locations (offsets from the structure's base address):
>
> struct S { int x; char y; struct T t; }
>
> (Unions are just the same, except that the offsets are all zero. That
> relationship carries through the entire discussion here, so I'm not
> going to talk about unions any more.)
>
> If you think about it just right (or just wrong), this is really very
> similar to the set of local variables associated with a compound
> statement:
>
> {
> int x;
> char y;
> struct T t;
>
> ...
> }
>
> As far as scoping is concerned, this compound statement is also just a
> list of names, types, and locations. The locations here are a bit
> less restricted: whereas a struct's members' locations are all offsets
> from the start of the struct, a compound statement's variables'
> locations can be registers, regions of the stack frame, fixed
> addresses (i.e., static variables), and so on. But just as a struct
> type divides up a block of storage into individual members with types,
> a compound statement's local variables divide up a function
> invocation's stack frame and registers into individual variables with
> types.
>
> The analogy isn't perfect, of course. Structs don't enclose blocks of
> code. And a compound statement is less restricted: it can also
> contain typedefs, definitions of struct and enum tags, and so on:
>
> {
> int x;
> char y;
> struct T t;
> struct L { int j, k; };
> typedef struct L L_t;
>
> ...
> }
>
> Here the definitions of `struct L' and L_t are local to the compound
> statement. In structs, however, things behave differently: struct
> tags defined within another struct have the same scope as the
> containing struct; and you can't put typedefs in a struct at all. So
> structs are really very restricted with regards to what they can
> contain.
>
> However, C++ loosens a lot of these restrictions, generalizing structs
> and classes until they really begin to look very much like compound
> statements. (The only difference between structs and classes in C++
> is whether members are public by default. So I'm not going to talk
> about classes any more.)
>
> For example, in C++, you can declare typedefs inside structs:
>
> $ cat local-typedef.C
> struct S
> {
> typedef int smootz;
>
> smootz a, b;
> };
>
> smootz c;
> $ $GccB/g++ -c local-typedef.C
> local-typedef.C:8: 'smootz' is used as a type, but is not defined as a type.
>
> The compiler accepts the definition of the typedef `smootz' and its
> use within `struct S', but outside of S the typedef isn't visible.
> Struct tags behave similarly.
>
> You can also declare "static" struct members --- you can access them
> with the `->' and `.' operators, just like ordinary members, but
> they're actually variables at fixed addresses in the .data segment ---
> much like a "static" variable in a C compound statement. But this
> means that a simple offset from a base address is no longer sufficient
> to describe a struct's member's location --- you actually start
> needing something like GDB's enum address_class. Multiple inheritance
> and virtual base classes introduce further complexity here.
>
> There's another difference between compound statements and structs
> goes away. In C, you can only reference a struct's members using the
> `.' and `->' operators, whereas you refer to a compound statement's
> variables by simply naming them. But in C++, a struct's member
> functions can refer to the struct's members by simply naming them.
> The struct's bindings become another rib in the search path for
> identifier bindings.
>
> In summary, the data structure GDB needs to represent C++ structs
> (classes, unions, whatever) has a lot of similarities to the structure
> GDB needs to represent the local variables of a compound statement.
> They both need to carry bindings for several namespaces (ordinary
> identifiers and structure tags). The names can refer to any manner of
> things: variables, functions, namespaces, base classes, and so on.
> For variables, there are a variety of locations they might occupy.
>
>
> So I would like to introduce to GDB a new type, `struct environment'
> (or is `struct env' better?) which does about the same thing that the
> `nsyms' and `sym' members of `struct block', and the `nfields' and
> `fields' members of `struct type', do now: it's just a bunch of
> bindings for names. We would use `struct environment':
>
> - in `struct block', to represent the block's local variables, replacing
> `nsyms' and `sym';
> - in `struct type', to represent a struct's members, instead of
> `struct fields'; and
> - in our representation for C++ namespaces, which seem pretty much
> like structs that can only contain static members and member
> functions (i.e., you can't ever create an instance of one).
Except.
You can alias them:
#include <string>
using namespace bob = std;
bob::string a;
Would work;
You also have issues with anonymous namespaces, unions, and structs.
yes, you can do
namespace {
int a;
}
It's not as easy to handle as you think you can't just point have a
simple pointer, you need lists, and have to order them right.
It gets more fun:
namespace { int i; } // unique::i
void f() { i++; } //unique::i++
namespace A {
namespace {
int i; //A::unique::i
int j; //A::unique::j
}
void g() { i++; } // A::unique::i++
}
using namespace A;
void h()
{
i++; // ambiguity error (unique::i or A::unique::i)
A::i++; //A::unique::i++;
j++; //A::unique::j++;
}
You can get some real hairy stuff.
Think of the memory cost for representing this.
It needs to be as shared as possible.
because I can do:
void f();
namespace A
{
void g();
}
namespace X {
using ::f;
using A::g;
}
void h()
{
X::f(); //calls ::f()
X::g(); //calls A::g()
}
Obviously, you don't want to do massive namespace injection to support
this, you ideally want to directly have A::g's symbol (named "g") inside
X.
To support resolution of names properly, struct environments probably want
a set of lookup function pointers.
That way, the lookup works properly, regardless of what language the
current frame is, and what language the symbol you are asking for is
in.
The actual structure storing the symbols and whatnot inside the
environment should be opaque, so that we
shouldn't have to do major work to replace the indexing structures
used, etc.
--Dan
^ permalink raw reply [flat|nested] 37+ messages in thread
* C++ nested classes, namespaces, structs, and compound statements
@ 2002-04-05 20:42 Jim Blandy
2002-04-05 22:05 ` Daniel Berlin
` (4 more replies)
0 siblings, 5 replies; 37+ messages in thread
From: Jim Blandy @ 2002-04-05 20:42 UTC (permalink / raw)
To: gdb; +Cc: Benjamin Kosnik, Daniel Berlin
At the moment, GDB doesn't handle C++ namespaces or nested classes
very well. I have a general idea of how we could address these
limitations, which I'd like to put up for shredding M-DEL discussion.
Let me admit up front that I don't really know C++, so I may be saying
stupid things. Please set me straight if you notice something.
In C, structs are essentially lists of member names, types, and
locations (offsets from the structure's base address):
struct S { int x; char y; struct T t; }
(Unions are just the same, except that the offsets are all zero. That
relationship carries through the entire discussion here, so I'm not
going to talk about unions any more.)
If you think about it just right (or just wrong), this is really very
similar to the set of local variables associated with a compound
statement:
{
int x;
char y;
struct T t;
...
}
As far as scoping is concerned, this compound statement is also just a
list of names, types, and locations. The locations here are a bit
less restricted: whereas a struct's members' locations are all offsets
from the start of the struct, a compound statement's variables'
locations can be registers, regions of the stack frame, fixed
addresses (i.e., static variables), and so on. But just as a struct
type divides up a block of storage into individual members with types,
a compound statement's local variables divide up a function
invocation's stack frame and registers into individual variables with
types.
The analogy isn't perfect, of course. Structs don't enclose blocks of
code. And a compound statement is less restricted: it can also
contain typedefs, definitions of struct and enum tags, and so on:
{
int x;
char y;
struct T t;
struct L { int j, k; };
typedef struct L L_t;
...
}
Here the definitions of `struct L' and L_t are local to the compound
statement. In structs, however, things behave differently: struct
tags defined within another struct have the same scope as the
containing struct; and you can't put typedefs in a struct at all. So
structs are really very restricted with regards to what they can
contain.
However, C++ loosens a lot of these restrictions, generalizing structs
and classes until they really begin to look very much like compound
statements. (The only difference between structs and classes in C++
is whether members are public by default. So I'm not going to talk
about classes any more.)
For example, in C++, you can declare typedefs inside structs:
$ cat local-typedef.C
struct S
{
typedef int smootz;
smootz a, b;
};
smootz c;
$ $GccB/g++ -c local-typedef.C
local-typedef.C:8: 'smootz' is used as a type, but is not defined as a type.
The compiler accepts the definition of the typedef `smootz' and its
use within `struct S', but outside of S the typedef isn't visible.
Struct tags behave similarly.
You can also declare "static" struct members --- you can access them
with the `->' and `.' operators, just like ordinary members, but
they're actually variables at fixed addresses in the .data segment ---
much like a "static" variable in a C compound statement. But this
means that a simple offset from a base address is no longer sufficient
to describe a struct's member's location --- you actually start
needing something like GDB's enum address_class. Multiple inheritance
and virtual base classes introduce further complexity here.
There's another difference between compound statements and structs
goes away. In C, you can only reference a struct's members using the
`.' and `->' operators, whereas you refer to a compound statement's
variables by simply naming them. But in C++, a struct's member
functions can refer to the struct's members by simply naming them.
The struct's bindings become another rib in the search path for
identifier bindings.
In summary, the data structure GDB needs to represent C++ structs
(classes, unions, whatever) has a lot of similarities to the structure
GDB needs to represent the local variables of a compound statement.
They both need to carry bindings for several namespaces (ordinary
identifiers and structure tags). The names can refer to any manner of
things: variables, functions, namespaces, base classes, and so on.
For variables, there are a variety of locations they might occupy.
So I would like to introduce to GDB a new type, `struct environment'
(or is `struct env' better?) which does about the same thing that the
`nsyms' and `sym' members of `struct block', and the `nfields' and
`fields' members of `struct type', do now: it's just a bunch of
bindings for names. We would use `struct environment':
- in `struct block', to represent the block's local variables, replacing
`nsyms' and `sym';
- in `struct type', to represent a struct's members, instead of
`struct fields'; and
- in our representation for C++ namespaces, which seem pretty much
like structs that can only contain static members and member
functions (i.e., you can't ever create an instance of one).
There'd be a single set of functions for building `struct environment'
objects, and looking up bindings in them; you'd use it for variable
lookup, and in the `.' and `->' operators. It could handle hashing,
when appropriate.
Basically, we would take two distinct areas of GDB (and a third,
namespaces, which we haven't implemented yet but will need to), and
support them all with a single structure and a single bunch of
support functions. GDB would become easier to read.
As a half-baked idea, perhaps a `struct environment' object would have
a list of other `struct environment' objects you should search, too,
if you didn't find what you were looking for. We could use this to
search a compound statement's enclosing scopes to find bindings
further out, find members inherited from base classes, and resolve C++
`using' declarations.
How does this strike people?
Open issues:
- This "list of other places to search" thing may be ill-formed. I
mean, sure, there are a set of similar behaviors going on there, but
are they similar enough? For example:
- You need a frame to find a variable's value, but you need an
object address to find a member's value.
- If you find a member in a base class, then you will often
need to adjust the object's base address in some way.
- And what about ambiguous member names?
Maybe these questions mean this `list of other places to search'
can't be handled in a uniform way.
- What really happens when you start using `struct symbol' objects for
structure members? Do we need new address classes now for `offset
from object base address'? Does the LOC_COMPUTED idea I've been
pushing still work?
- How do member functions work in this arrangement? Virtual member
functions? Virtual base classes?
- How would we introduce this incrementally?
^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2002-04-16 21:58 UTC | newest]
Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-04-05 22:02 C++ nested classes, namespaces, structs, and compound statements Michael Elizabeth Chastain
2002-04-05 22:13 ` Daniel Berlin
2002-04-05 22:30 ` Daniel Berlin
-- strict thread matches above, loose matches on Subject: below --
2002-04-05 20:42 Jim Blandy
2002-04-05 22:05 ` Daniel Berlin
2002-04-05 22:34 ` Daniel Jacobowitz
2002-04-05 23:49 ` Daniel Berlin
2002-04-06 7:18 ` Dan Kegel
2002-04-06 9:26 ` Gianni Mariani
2002-04-06 11:57 ` Daniel Berlin
2002-04-08 17:24 ` Jim Blandy
2002-04-08 17:03 ` Jim Blandy
2002-04-08 18:59 ` Daniel Jacobowitz
2002-04-09 18:35 ` Jim Blandy
2002-04-09 20:56 ` Daniel Jacobowitz
2002-04-12 15:08 ` Jim Blandy
2002-04-12 16:32 ` Daniel Jacobowitz
2002-04-08 17:19 ` Jim Blandy
2002-04-08 18:49 ` Daniel Jacobowitz
2002-04-10 10:31 ` Jim Blandy
2002-04-10 12:08 ` Daniel Jacobowitz
2002-04-12 13:58 ` Jim Blandy
2002-04-12 16:56 ` Daniel Jacobowitz
2002-04-16 12:08 ` Jim Blandy
2002-04-16 14:01 ` Daniel Jacobowitz
2002-04-16 14:52 ` Jim Blandy
2002-04-16 14:58 ` Daniel Jacobowitz
2002-04-06 6:31 ` Andrew Cagney
2002-04-06 7:58 ` Daniel Berlin
2002-04-08 0:59 ` Joel Brobecker
2002-04-06 8:49 ` Per Bothner
2002-04-08 16:29 ` Jim Blandy
2002-04-08 16:48 ` Daniel Jacobowitz
2002-04-09 6:55 ` Petr Sorfa
2002-04-10 10:34 ` Jim Blandy
2002-04-10 12:31 ` Daniel Berlin
2002-04-10 12:53 ` Petr Sorfa
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox