From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-37523-listarch-gdb=sources.redhat.com@sourceware.org>
Received: (qmail 30745 invoked by alias); 16 Jun 2010 00:09:47 -0000
Received: (qmail 30734 invoked by uid 22791); 16 Jun 2010 00:09:45 -0000
X-SWARE-Spam-Status: No, hits=-2.0 required=5.0	tests=AWL,BAYES_00,T_RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (38.113.113.100)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 16 Jun 2010 00:09:40 +0000
Received: (qmail 4655 invoked from network); 16 Jun 2010 00:09:38 -0000
Received: from unknown (HELO macbook-2.local) (stan@127.0.0.2)  by mail.codesourcery.com with ESMTPA; 16 Jun 2010 00:09:38 -0000
Message-ID: <4C18163D.5090802@codesourcery.com>
Date: Wed, 16 Jun 2010 00:09:00 -0000
From: Stan Shebs <stan@codesourcery.com>
User-Agent: Thunderbird 2.0.0.24 (Macintosh/20100228)
MIME-Version: 1.0
To: Doug Evans <dje@google.com>
CC: Stan Shebs <stan@codesourcery.com>,  "gdb@sourceware.org" <gdb@sourceware.org>
Subject: Re: [RFC] Collecting strings at tracepoints
References: <4C0983C3.6000604@codesourcery.com> <AANLkTil3jtscoVbvlHzh5ypb5kIE_LquKJvA35xbpGti@mail.gmail.com>
In-Reply-To: <AANLkTil3jtscoVbvlHzh5ypb5kIE_LquKJvA35xbpGti@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Mailing-List: contact gdb-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb.sourceware.org>
List-Subscribe: <mailto:gdb-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb/>
List-Post: <mailto:gdb@sourceware.org>
List-Help: <mailto:gdb-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-owner@sourceware.org
X-SW-Source: 2010-06/txt/msg00052.txt.bz2

Doug Evans wrote:
> On Fri, Jun 4, 2010 at 3:52 PM, Stan Shebs <stan@codesourcery.com> wrote:
>   
>> Collection of strings is a problem for tracepoint users, because the literal
>> interpretation of "collect str", where str is a char *, is to collect the
>> address of the string, but not any of its contents.  It is possible to use
>> the '@' syntax to get some contents, for instance "collect str@40" acquires
>> the first 40 characters, but it is a poor approximation; if the string is
>> shorter than that, you collect more than necessary, and possibly run into
>> access trouble if str+40 is outside the program's address space, or else the
>> string is longer, in which case you may miss the part you really wanted.
>>
>> For normal printing of strings GDB has a couple tricks it does.  First, it
>> explicitly recognizes types that are pointers to chars, and automatically
>> dereferences and prints the bytes it finds.  Second, the print elements
>> limit prevents excessive output in case the string is long.
>>
>> For tracepoint collection, I think the automatic heuristic is probably not a
>> good idea.  In interactive use, if you print too much string, or just wanted
>> to see the address, there's no harm in displaying extra data.  But for
>> tracing, the user needs a little more control, so that the buffer doesn't
>> inadvertantly fill up too soon.  So I think that means that we should have
>> the user explicitly request collection of string contents.
>>
>> Looking at how '@' syntax works, we can extend it without disrupting
>> expression parsing much.  For instance, "str@@" could mean to deference str,
>> and collect bytes until a 0 is seen, or the print elements limit is reached
>> (implication is that we would have to tell the target that number).  The
>> user could exercise even finer control by supplying the limit explicitly,
>> for instance "str@/80" to collect at most 80 chars of the string.
>>  ("str@@80" seems like it would cause ambiguity problems vs "str@@").
>>
>> This extended syntax could work for the print command too, in lieu of
>> tweaking the print element limit, and for types that GDB does not recognize
>> as a string type.
>>     
>
> Apologies for coming into this a bit late.
>   

I've been remiss in my replies, so will try to wrap all up here.

> I want to make sure I understand the proposed syntax.
>
> str@@ would collect up to the first \0 or print elements limit.
> str@/80 would collect up to the first \0 or 80 bytes.
>   

As Tom points out, it would actually be "*str@@" etc.

> That feels too inconsistent: "@@" triggers the special "up until the
> first \0", *except* when its @/.
> "up until the first \0" is one thing and specifying a limit is an
> add-on.  Each should have their own syntax (e.g. str@@/80; it's
> perhaps klunkier, but @@ is klunky to begin with. :-)]
>   

I just threw "@/" out there as something that was parseable.  @ is a 
totally general binary operator, the second argument doesn't have to be 
a constant (not even for tracing).  So any extensions to it need to be 
something that is not ambiguous with anything else.  "@@" for the common 
case seemed logical.  Allowing both "@@" and "@@<expr>" could get us 
into dangling-else style ambiguity; given that this is our arbitrary 
extension, why create parsing ambiguity if there is no language syntax 
forcing us to?

> Michael mentioned collect /s as a possibility.
> That *feels* better, given that you mention the print command (if p/s
> doesn't print its arg as a string, what does p/s mean?).
> To add a max-length, "collect /80s" doesn't work, it's inconsistent
> with the "x" command; "x /80s" doesn't mean "max 80 chars".
> Maybe "collect /s@80"?  [At this point, I don't have a strong opinion
> on @ vs another character.]
> "x/s@80 foo" feels like a nice extension (print foo as a string up to 80 chars)
> Plus "x/20s@80 foo" also works (print 20 strings beginning at foo,
> each with a max length of 80).
>
>   

The /s idea is appealing, but it has a couple downsides.  First, there 
is the default-collect variable, although I suppose "set default-collect 
/s str" could be made to have the right effect.  Second, it would apply 
to everything in the collection line, whether you realized it or not; I 
can see users getting burned because FUNKYTYPE is typedef'ed to char on 
some machines and not others, and so "collect /s str, funkytown" may 
fill the trace buffer unexpectedly quickly.  Having it available in 
expressions means that it can be used in more ways, although admittedly 
something like "collect $tsv = (*str@@[len-1] == (*str2@/80)[79])" is 
pretty freaky, not likely to be seen in real life.  We also need to do 
something for MI, since there are Eclipse users wanting to trace.

But the downsides aren't really bad, I think /s is worth considering 
further.

Stan