From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 18201 invoked by alias); 26 Aug 2004 18:31:42 -0000 Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sources.redhat.com Received: (qmail 18184 invoked from network); 26 Aug 2004 18:31:38 -0000 Received: from unknown (HELO lakermmtao07.cox.net) (68.230.240.32) by sourceware.org with SMTP; 26 Aug 2004 18:31:38 -0000 Received: from white ([68.9.64.121]) by lakermmtao07.cox.net (InterMail vM.6.01.03.02.01 201-2131-111-104-103-20040709) with ESMTP id <20040826183136.PZEE1237.lakermmtao07.cox.net@white>; Thu, 26 Aug 2004 14:31:36 -0400 Received: from bob by white with local (Exim 3.35 #1 (Debian)) id 1C0P26-0005YG-00; Thu, 26 Aug 2004 14:31:34 -0400 Date: Thu, 26 Aug 2004 18:31:00 -0000 From: Bob Rossi To: Michael Chastain Cc: gdb@sources.redhat.com Subject: Re: GDB/MI Output Syntax Message-ID: <20040826183134.GA20902@white> Mail-Followup-To: Michael Chastain , gdb@sources.redhat.com References: <20040825154348.GA19533@white> <412CB6B6.nail1DX11BPYQ@mindspring.com> <20040825193659.GA19945@white> <412DED43.nail3XH31S08T@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <412DED43.nail3XH31S08T@mindspring.com> User-Agent: Mutt/1.3.28i X-SW-Source: 2004-08/txt/msg00394.txt.bz2 On Thu, Aug 26, 2004 at 10:01:39AM -0400, Michael Chastain wrote: > Bob Rossi wrote: > > so far, it seems to parse everything I throw at it. However, I haven't > > tested it to much because I am building an intermediate representation. > > This is what I'll use from the front end. > > How can we hook this up with the gdb test suite? > > I've got a corpus of gdb.log files. Someone could write some Perl > script to pick out pieces and invoke your parser as an external program. > It might help to add a few more rules at the top: > > session -> input_output_pair_list > input_output_pair_list -> epsilon | input_output_pair_list input output > input -> ... > > The sticky part is that dejagnu mixes its own output into this. > Ick. Hmmm, this are some ideas, what do you think? 1. Run gdb through tee and pipe only GDB's stdout to a place where we can validate it's output. 2. Have GDB output it's stdout to 2 places somehow, similar to the idea above (except maybe a new GDB logging feature), so that the output can be parsed. 3. Create a new process, that invokes GDB, validates the output, and output's exactly what GDB used to output. 4. Somehow parse the output of GDB, through tcl like you are suggesting? Here is one consideration, if we write a parser that validates the GDB/MI Input Syntax in the same manner that we are doing for the Output Syntax then we could make a combined grammar that is possible of parsing an entire GDB/MI session. We could also add an adhoc console command parser, which would parse the console commands if the user typed any. This would be good for a few reasons. 1. GDB could have it's Input parsed with the library. Then, the MI could work off of the intermediate language, which would probably be nice. 2. Entire GDB sessions could be validated. 3. GDB could yell at clients that are sending invalid commands. BTW, I think that currently, each MI command does a lot of the parsing for itself, this is probably a bad idea and prone to bugs. I would be reasonably happy to write this Input parser since I am already writing the Output parser. The in memory representation is very low level and could easily be used by GDB. > Getting into the grammar itself: > > Comma separators and lists are kludgy. In these rules: > > result_record -> opt_token "^" result_class result_list_prime > result_list_prime -> result_list | epsilon > result_list -> result_list "," result | "," result > > The actual gdb output for a result_record could be either: > > 105^done > 103^done,BreakPointTable={...} > > It looks a little weird to me to parse the first comma as part > of result_list_prime. How about: > > result_record -> opt_token "^" result_class > result_record -> opt_token "^" result_class "," result_list > result_list -> result | result_list "," result Yes, Yes, this makes much sense. > That simplifies tuple and list as well: > > tuple -> "{}" | "{" result_list "}" > list -> "[]" | "[" value_list "]" | "[ result_list ]" I like this very much. > Style point: there is a lot of: > > foo_list -> foo_list foo | epsilon > bar_list -> bar_list bar | bar > > I think this is more readable: > > foo_list -> epsilon | foo_list foo > bar_list -> bar | bar_list bar This is fine with me either way, so we'll do it your way. > Another nit: how is the grammar even working with: > > nl -> CR | CR_LF > > Doesn't this have to be: > > nl -> LF | CR | CR LF I don't use this rule in the grammar. I have the lexer return NEWLINE. > > Or is the lexer quietly defining CR_LF to include "\n"? > > For coding purposes it would be more efficient to make NL > a single token and have the lexer recognize all three forms. > > For doco purposes it might be better to explicitly make nl > a non-terminal and show the LF, CR, CR LF terminals. > Agreed, this is exactly what I have done. OK, so here is the new grammar that I have. It parses the few example that I have, and is reasonable good. It combines many of your ideas with some new ideas of my own. What do you think? Any rule that starts with opt_ either goes to epsilon or something else. Any rule that ends in _list is a list of items. I find that actually building the syntax tree also helps organize the commands, and I haven't gotten that far with this version of the grammar. opt_output_list -> epsilon | output_list output_list -> output | output_list output output -> opt_oob_record_list opt_result_record "(gdb)" nl opt_oob_record_list -> epsilon | opt_oob_record_list oob_record nl opt_result_record -> epsilon | result_record nl result_record -> opt_token "^" result_class result_record -> opt_token "^" result_class "," opt_result_list oob_record -> async_record | stream_record async_record -> opt_token async_record_class async_output async_record_class -> "*" | "+" | "=" async_output -> async_class "," opt_result_list result_class -> "done" | "running" | "connected" | "error" | "exit" async_class -> "stopped" opt_result_list -> epsilon | result_list result_list -> result | result_list "," result result -> variable "=" value variable -> string value_list -> value | value_list "," value value -> c_string | tuple | list tuple -> "{}" | "{" result_list "}" list -> "[]" | "[" value_list "]" | "[" result_list "]" stream_record -> stream_record_class c_string stream_record_class -> "~" | "@" | "&" nl -> CR | LF | CR LF opt_token -> epsilon | token token -> any sequence of digits. Thanks, Bob Rossi