From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 18196 invoked by alias); 18 Mar 2002 04:35:58 -0000 Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sources.redhat.com Received: (qmail 18099 invoked from network); 18 Mar 2002 04:35:52 -0000 Received: from unknown (HELO zwingli.cygnus.com) (208.245.165.35) by sources.redhat.com with SMTP; 18 Mar 2002 04:35:52 -0000 Received: by zwingli.cygnus.com (Postfix, from userid 442) id D81D85E9DE; Sun, 17 Mar 2002 23:35:29 -0500 (EST) To: Neil Booth Cc: gdb-patches@sources.redhat.com, gcc@gcc.gnu.org Subject: Re: RFC: C/C++ preprocessor macro support for GDB References: <20020317062306.CC96D5E9DE@zwingli.cygnus.com> <20020317101938.GA2636@daikokuya.demon.co.uk> From: Jim Blandy Date: Sun, 17 Mar 2002 20:35:00 -0000 In-Reply-To: <20020317101938.GA2636@daikokuya.demon.co.uk> Message-ID: User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.1 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-SW-Source: 2002-03/txt/msg00295.txt.bz2 Neil Booth writes: > What are the issues with using libcpp? It would be a good test of its > viability as an independent library to have it used somewhere else. I think there are two issues. Both might simply be my misunderstanding of the libcpp header files and code I read; I'd love to be set straight. - GDB has commands like this: (gdb) break *ADDRESS if CONDITION This sets a conditional breakpoint at the address computed by evaluating the expression ADDRESS, whose condition is CONDITION. ADDRESS needs to be evaluated in the current scope --- the currently selected frame and its PC --- but CONDITION needs to be evaluated in the scope in force at the *breakpoint's* address. So you can't just take the whole command and smoosh it through an expander all at once: ADDRESS and CONDITION might have totally different contexts, as far as the preprocessor is concerned. This means you've got to decide if there's an `if' in the command before you can macro-expand things. Obviously, an `if' in a string, or as part of a larger identifier, doesn't count --- you really need to work in terms of tokens. (There's a similar situation involving commas: sometimes the parser is supposed to stop when it finds its first comma outside of any parens.) So my macro expander has the following function in its public interface: /* If the null-terminated string pointed to by *LEXPTR begins with a macro invocation, return the result of expanding that invocation as a null-terminated string, and set *LEXPTR to the next character after the invocation. The result is completely expanded; it contains no further macro invocations. Otherwise, if *LEXPTR does not start with a macro invocation, return zero, and leave *LEXPTR unchanged. Use LOOKUP_FUNC and LOOKUP_BATON to find macro definitions. If this function returns a string, the caller is responsible for freeing it, using xfree. We need this expand-one-token-at-a-time interface in order to accomodate GDB's C expression parser, which may not consume the entire string. When the user enters a command like (gdb) break *func+20 if x == 5 the parser is expected to consume `func+20', and then stop when it sees the "if". But of course, "if" appearing in a character string or as part of a larger identifier doesn't count. So you pretty much have to do tokenization to find the end of the string that needs to be macro-expanded. Our C/C++ tokenizer isn't really designed to be called by anything but the yacc parser engine. */ char *macro_expand_next (char **lexptr, macro_lookup_ftype *lookup_func, void *lookup_baton); I changed GDB's lexer to call macro_expand_next before carving out each token. This means we don't have to worry about commas or `if's in macro invocations being confused with terminating commas: the expander consumes them before we ever see them. As far as I can tell, libcpp doesn't provide an analogous token-by-token entry point. Another way to deal with this would be to lex the command string twice: once to find the `if' or comma, and then again to do the real parsing, after macro-expanding each of the various expressions properly. The only difficulty here is that GDB's lexer expects to be called by a yacc-style parsing engine; it deposits tokens' semantic values in yylval, etc. To work around this, we'd need to make the lexer independent of yacc --- give it some other way to return semantic values, mostly --- and hook that into both yacc and the code looking for `if's and commas. But that approach wouldn't require any change to libcpp's interface. There's nothing too hard there. But I wanted to put together a patch which actually worked, while disturbing the existing GDB code as little as possible. And I think there's something unsatisfying about the two-pass approach; parsers ought to be able to leave input unconsumed if they want. It's a common enough idiom. Shouldn't libcpp support it? - GDB's macro data structures record all the macros that were ever #defined in a compilation unit, and the line numbers at which they were in force. Given a name and an #inclusion and a line number (or in libcpp's terminology, a logical line number?), it can find the #definition in scope at that point. This is a bit different from libcpp's data structures, which only record the macros currently in force as libcpp makes a pass through the file's text. (At least, that's the impression I got.) My macro expander is completely ignorant of the lookup table's structure; you pass it a function and a data pointer that it uses blindly for lookups. Here's the relevant typedef, and one of the prototypes, from the expander's public interface: /* A function for looking up preprocessor macro definitions. Return the preprocessor definition of NAME in scope according to BATON, or zero if NAME is not defined as a preprocessor macro. The caller must not free or modify the definition returned. It is probably unwise for the caller to hold pointers to it for very long; it probably lives in some objfile's obstacks. */ typedef struct macro_definition *(macro_lookup_ftype) (const char *name, void *baton); /* Expand any preprocessor macros in SOURCE, and return the expanded text. Use LOOKUP_FUNC and LOOKUP_FUNC_BATON to find identifiers' preprocessor definitions. SOURCE is a null-terminated string. The result is a null-terminated string, allocated using xmalloc; it is the caller's responsibility to free it. */ char *macro_expand (const char *source, macro_lookup_ftype *lookup_func, void *lookup_func_baton); When expanding an expression, GDB packages up the #inclusion and line number in the baton argument, and provides a lookup_func that takes those together with the macro name to search the macro table.