From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 8975 invoked by alias); 24 Aug 2002 00:19:07 -0000 Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sources.redhat.com Received: (qmail 8968 invoked from network); 24 Aug 2002 00:19:06 -0000 Received: from unknown (HELO jackfruit.Stanford.EDU) (171.64.38.136) by sources.redhat.com with SMTP; 24 Aug 2002 00:19:06 -0000 Received: (from carlton@localhost) by jackfruit.Stanford.EDU (8.11.6/8.11.6) id g7O0J6729438; Fri, 23 Aug 2002 17:19:06 -0700 X-Authentication-Warning: jackfruit.Stanford.EDU: carlton set sender to carlton@math.stanford.edu using -f To: gdb Subject: Re: adding namespace support to GDB References: Content-Type: text/plain; charset=US-ASCII From: David Carlton Date: Fri, 23 Aug 2002 17:19:00 -0000 In-Reply-To: Message-ID: User-Agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Common Lisp) MIME-Version: 1.0 X-SW-Source: 2002-08/txt/msg00314.txt.bz2 In article , David Carlton writes: > For the time being, I'm going to reread that thread more closely, > look at Petr Sorfa's module patch, look at the DWARF-3 standard, > look at existing GDB code, and think about this for a while. I haven't really done much of that yet, but I did look over some of the messages in that thread and take some notes based on that combined with some thought that I have on the issue. The notes don't contain much (any?) information along the lines of "exactly what in GDB do we have to change to make this work", since I'm still new enough to GDB's internals not to have a global feel for them to be confident about this and and I didn't have time today to poke into GDB's code all that much. But at least it's a concrete list of issues to keep in mind. So here the notes are; any comments would be gratefully appreciated. This is, of course, a rough draft: I'm sending it off now more because it's 5:00 and I want to head home for the weekend than because I think it's particularly complete or anything. Some notes on namespace-related issues: * The goal is to provide a general framework for associating data (types, locations, etc.) to names (variable names, type names, etc.) Let's tentatively call this an 'environment'. * For some discussion of this, see the thread starting from . * Language contexts where this happens: * Compound statements. In C, function bodies, bodies of loops, etc. * Compound data structures. E.g. classes in C++ or Java. (Warning: I'm learning C++ reasonably quickly, I hope, but there's a lot to learn. And I read a book about Java once...) Not to mention simpler examples: C structures, unions. * Compound name structures. C++ namespaces, Fortran modules (warning: I know zero about Fortran), Java packages. Are there any other such structures in languages that GDB supports? Probably files combined with static global variables in C go in here. (For that matter, extern global variables also go in here.) This trichotomy is not a hard-and-fast distinction, needless to say. For example, in C++, you often have the choice about whether to use a namespace or a class with static members in a given situation, and similarly a Java programmer would use a class with static members in many situations where a C++ programmer would use a namespace. I'll typically stick to C++ examples. * According to Jim Blandy (in the message referenced above), existing GDB constructs that these environments could replace are: * In 'struct block', to represent local variables, replacing 'nsym' and 'sym'. * In 'struct type', to represent fields, replacing 'struct fields'. But here I think he's only referring to local environments: there's also the global environment. * Here are some issues surrounding environments: * How does GDB initialize the environment structures? * How does GDB figure out in what environments to search for a name that a user types in? * How should we implement environments internally? * Right now, I'm not worried about the first problem so much. Having said that, here are some issues that are relevant to it: * If the compiler generates rich enough debugging information, then we don't have to worry too much about how to initialize the environment structures: we have everything we need given to us. * If the compiler doesn't generate rich enough debugging information, then we can still do a decent approximation to the correct information by, say, looking at mangled linkage names for symbols. It's not perfect, but it'll do fine. * I'm not sure _exactly_ how we'll detect whether or not we've got enough debugging information when reading the info for a given file, but we can figure out something. (E.g. for C++, use mangled linkage names until we first see a DW_TAG_namespace.) * For some environments, we can count on being able to easily figure out a complete picture of what it looks like: this should be true for compound data structures and compound statements. But it's not true for many sorts of compound name structures: stuff can get added to the global environment or to namespaces in somewhat unpredictable ways. Still, I don't think this is _too_ serious: having an analogue to the partial symbol table around plus reading in detailed debugging information for an entire file at a time should mean that we never miss information that we need. (How does the minimal symbol table come in to play?) * The second problem seems to me to be considerably more subtle; even with perfect debugging information, it's not clear to me that, at least initially, we'd implement C++'s name lookup rules completely. * Different languages vary considerably in exactly what information is accessible at any given point. * Environments usually form a "tree" in some vague sense, but exactly what that tree means (and its implications in terms of environment search rules) varies considerably based on the type of environment. For example, if you don't find a symbol in the current compound statement, you can always go up to the enclosing compound statement. Whereas, if you have a C++ namespace B nested inside a C++ namespace A, then even if you make symbols inside A::B accessible via a 'using' declaration, symbols inside A aren't necessarily accessible. * Often, language constructs for making compound name structures accessible (C++ 'using' declarations, etc.) permit some amount of renaming. * Some compound name structures don't have names. One example is files + static global variables in C; another example is anonymous namespaces in C++. (Note that the second example is a superset of the first example: the first example is basically like the special case of the second example in which the parent of the anonymous namespace is the global namespace.) * If the compiler doesn't generate rich enough debugging information, we simply won't be able to do a perfect job here. (Though I think we'll be able to do a good enough job that users will forgive us.) * When doing a lookup, the user may provide part of the name prefix in addition to the variable name. * Functions can be overloaded, so sometimes you need types as well as the name. * Do virtual member functions and/or virtual base classes pose problems? I don't think they do, but I'll list them just in case. * Anything else? I think I'm probably leaving out stuff here. I'm not too familiar with what GDB's current data structures are for representing what names are accessible at a given point. * Then there's the issue of implementing environments internally. * One issue to keep in mind is that different environments can have dramatically different numbers of names. E.g. the global environment is potentially extremely large, as is C++'s 'std' namespace; but a struct is typically quite small, as the namespaces in code that I write myself. So we need a data structure that can deal with these extremes. In particular, linear lists of names and fixed-size hash tables both sound like bad ideas to me. Does GDB or libiberty or whatever have tools for dealing with heaps or hash tables that grow as necessary? (Is it even a good idea to grow your hash tables as you add entries to them? My theoretical background in algorithms and data structures is weak.) * If we look up a name in an environment, what data do we want that name lookup to return to us? If what we're looking up is a variable, then candidates are type information and location information. But of course we might want to look up other things (structures/classes/unions, typedefs, enums, namespaces, etc.) Should we only search based on names, or search based on names + what kind of object we want to associate to that name? (Probably the latter.) * Are we going to try to implement this incrementally or not? On a separate branch or on the mainline branch? What recent patches to GDB (whether proposed or approved) help with this effort? David Carlton carlton@math.stanford.edu