From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-19247-listarch-gdb=sources.redhat.com@sources.redhat.com>
Received: (qmail 18201 invoked by alias); 26 Aug 2004 18:31:42 -0000
Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-subscribe@sources.redhat.com>
List-Archive: <http://sources.redhat.com/ml/gdb/>
List-Post: <mailto:gdb@sources.redhat.com>
List-Help: <mailto:gdb-help@sources.redhat.com>, <http://sources.redhat.com/ml/#faqs>
Sender: gdb-owner@sources.redhat.com
Received: (qmail 18184 invoked from network); 26 Aug 2004 18:31:38 -0000
Received: from unknown (HELO lakermmtao07.cox.net) (68.230.240.32)
  by sourceware.org with SMTP; 26 Aug 2004 18:31:38 -0000
Received: from white ([68.9.64.121]) by lakermmtao07.cox.net
          (InterMail vM.6.01.03.02.01 201-2131-111-104-103-20040709)
          with ESMTP
          id <20040826183136.PZEE1237.lakermmtao07.cox.net@white>;
          Thu, 26 Aug 2004 14:31:36 -0400
Received: from bob by white with local (Exim 3.35 #1 (Debian))
	id 1C0P26-0005YG-00; Thu, 26 Aug 2004 14:31:34 -0400
Date: Thu, 26 Aug 2004 18:31:00 -0000
From: Bob Rossi <bob@brasko.net>
To: Michael Chastain <mec.gnu@mindspring.com>
Cc: gdb@sources.redhat.com
Subject: Re: GDB/MI Output Syntax
Message-ID: <20040826183134.GA20902@white>
Mail-Followup-To: Michael Chastain <mec.gnu@mindspring.com>,
	gdb@sources.redhat.com
References: <20040825154348.GA19533@white> <412CB6B6.nail1DX11BPYQ@mindspring.com> <20040825193659.GA19945@white> <412DED43.nail3XH31S08T@mindspring.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <412DED43.nail3XH31S08T@mindspring.com>
User-Agent: Mutt/1.3.28i
X-SW-Source: 2004-08/txt/msg00394.txt.bz2

On Thu, Aug 26, 2004 at 10:01:39AM -0400, Michael Chastain wrote:
> Bob Rossi <bob@brasko.net> wrote:
> > so far, it seems to parse everything I throw at it. However, I haven't
> > tested it to much because I am building an intermediate representation.
> > This is what I'll use from the front end.
> 
> How can we hook this up with the gdb test suite?
> 
> I've got a corpus of gdb.log files.  Someone could write some Perl
> script to pick out pieces and invoke your parser as an external program.
> It might help to add a few more rules at the top:
> 
>   session                 -> input_output_pair_list
>   input_output_pair_list  -> epsilon | input_output_pair_list input output
>   input                   -> ...
> 
> The sticky part is that dejagnu mixes its own output into this.
> Ick.

Hmmm, this are some ideas, what do you think?
   1. Run gdb through tee and pipe only GDB's stdout to a place where we
      can validate it's output.
   2. Have GDB output it's stdout to 2 places somehow, similar to the
   idea above (except maybe a new GDB logging feature), so that the output 
   can be parsed.
   3. Create a new process, that invokes GDB, validates the output, and
   output's exactly what GDB used to output.
   4. Somehow parse the output of GDB, through tcl like you are
   suggesting?

Here is one consideration, if we write a parser that validates the
GDB/MI Input Syntax in the same manner that we are doing for the Output
Syntax then we could make a combined grammar that is possible of parsing
an entire GDB/MI session. We could also add an adhoc console command
parser, which would parse the console commands if the user typed any.

This would be good for a few reasons. 
   1. GDB could have it's Input parsed with the library. Then, the MI
   could work off of the intermediate language, which would probably be
   nice.
   2. Entire GDB sessions could be validated. 
   3. GDB could yell at clients that are sending invalid commands.

   BTW, I think that currently, each MI command does a lot of the parsing
   for itself, this is probably a bad idea and prone to bugs.
      
I would be reasonably happy to write this Input parser since I am
already writing the Output parser. The in memory representation is very
low level and could easily be used by GDB.

> Getting into the grammar itself:
> 
> Comma separators and lists are kludgy.  In these rules:
> 
>   result_record      -> opt_token "^" result_class result_list_prime
>   result_list_prime  -> result_list | epsilon
>   result_list        -> result_list "," result | "," result
> 
> The actual gdb output for a result_record could be either:
> 
>   105^done
>   103^done,BreakPointTable={...}
> 
> It looks a little weird to me to parse the first comma as part
> of result_list_prime.  How about:
> 
>   result_record  -> opt_token "^" result_class
>   result_record  -> opt_token "^" result_class "," result_list
>   result_list    -> result | result_list "," result

Yes, Yes, this makes much sense.

> That simplifies tuple and list as well:
> 
>   tuple  -> "{}" | "{" result_list "}"
>   list   -> "[]" | "[" value_list "]" | "[ result_list ]"

I like this very much.

> Style point: there is a lot of:
> 
>   foo_list -> foo_list foo | epsilon
>   bar_list -> bar_list bar | bar
> 
> I think this is more readable:
> 
>   foo_list -> epsilon | foo_list foo
>   bar_list -> bar | bar_list bar

This is fine with me either way, so we'll do it your way.

> Another nit: how is the grammar even working with:
> 
>   nl -> CR | CR_LF
> 
> Doesn't this have to be:
> 
>   nl -> LF | CR | CR LF

I don't use this rule in the grammar. I have the lexer return NEWLINE.

> 
> Or is the lexer quietly defining CR_LF to include "\n"?
> 
> For coding purposes it would be more efficient to make NL
> a single token and have the lexer recognize all three forms.
> 
> For doco purposes it might be better to explicitly make nl
> a non-terminal and show the LF, CR, CR LF terminals.
> 
Agreed, this is exactly what I have done.

OK, so here is the new grammar that I have. It parses the few example
that I have, and is reasonable good. It combines many of your ideas with
some new ideas of my own. What do you think?

Any rule that starts with opt_ either goes to epsilon or something else.
Any rule that ends in _list is a list of items.

I find that actually building the syntax tree also helps organize the
commands, and I haven't gotten that far with this version of the
grammar.

opt_output_list         -> epsilon | output_list
output_list             -> output | output_list output
output                  -> opt_oob_record_list opt_result_record "(gdb)" nl
opt_oob_record_list     -> epsilon | opt_oob_record_list oob_record nl
opt_result_record       -> epsilon | result_record nl
result_record           -> opt_token "^" result_class
result_record           -> opt_token "^" result_class "," opt_result_list
oob_record              -> async_record | stream_record
async_record            -> opt_token async_record_class async_output
async_record_class      -> "*" | "+" | "="
async_output            -> async_class "," opt_result_list
result_class            -> "done" | "running" | "connected" | "error" | "exit"
async_class             -> "stopped"
opt_result_list         -> epsilon | result_list
result_list             -> result | result_list "," result
result                  -> variable "=" value
variable                -> string
value_list              -> value | value_list "," value
value                   -> c_string | tuple | list
tuple                   -> "{}" | "{" result_list "}"
list                    -> "[]" | "[" value_list "]" | "[" result_list "]"
stream_record           -> stream_record_class c_string
stream_record_class     -> "~" | "@" | "&"
nl                      -> CR | LF | CR LF
opt_token               -> epsilon | token
token                   -> any sequence of digits.


Thanks,
Bob Rossi