From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-24924-listarch-gdb=sources.redhat.com@sourceware.org>
Received: (qmail 22290 invoked by alias); 14 Apr 2006 20:27:31 -0000
Received: (qmail 22279 invoked by uid 22791); 14 Apr 2006 20:27:31 -0000
X-Spam-Check-By: sourceware.org
Received: from nevyn.them.org (HELO nevyn.them.org) (66.93.172.17)     by sourceware.org (qpsmtpd/0.31.1) with ESMTP; Fri, 14 Apr 2006 20:27:28 +0000
Received: from drow by nevyn.them.org with local (Exim 4.54) 	id 1FUUsy-00067n-3E; Fri, 14 Apr 2006 16:27:20 -0400
Date: Fri, 14 Apr 2006 22:18:00 -0000
From: Daniel Jacobowitz <drow@false.org>
To: Jim Blandy <jimb@red-bean.com>
Cc: Eli Zaretskii <eliz@gnu.org>, ghost@cs.msu.su, gdb@sources.redhat.com
Subject: Re: printing wchar_t*
Message-ID: <20060414202720.GA23182@nevyn.them.org>
Mail-Followup-To: Jim Blandy <jimb@red-bean.com>, 	Eli Zaretskii <eliz@gnu.org>, ghost@cs.msu.su, 	gdb@sources.redhat.com
References: <e1lsqg$aml$1@sea.gmane.org> <200604141257.41690.ghost@cs.msu.su> <uu08w1cnf.fsf@gnu.org> <200604141837.26618.ghost@cs.msu.su> <uirpc19u8.fsf@gnu.org> <8f2776cb0604141053v73e512e3o2d1c9086312316bd@mail.gmail.com> <ubqv4108c.fsf@gnu.org> <8f2776cb0604141216m216ba87ch529180cd079ce971@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <8f2776cb0604141216m216ba87ch529180cd079ce971@mail.gmail.com>
User-Agent: Mutt/1.5.8i
X-IsSubscribed: yes
Mailing-List: contact gdb-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb/>
List-Post: <mailto:gdb@sourceware.org>
List-Help: <mailto:gdb-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-owner@sourceware.org
X-SW-Source: 2006-04/txt/msg00212.txt.bz2

On Fri, Apr 14, 2006 at 12:16:36PM -0700, Jim Blandy wrote:
> The command line and MI already use the ISO C syntax for conveying
> values to the user/consumer.  I'm just saying we should expand our use
> of the syntax we already use.

I don't agree.

Saying "we use ISO C syntax for conveying data" is fairly inaccurate. 
We are inconsistent.  Some things are escaped in a C-like fashion. 
Other things are escaped in other fashions, with their own quoting
rules.  This is true in both directions, for user input and for output.

Let's consider strings in particular.  Strings are printed using
LA_PRINT_STRING.  As the name implies, the quoting done is adjusted
to match the source language convention.  Asking an FE to grok that
is just impractical.  In data intended for CLI users, we can
prettyprint things any way we want; in data intended for anything
more machinelike, I recommend we define a syntax and stick with it.

Personally, I'd just use UTF-8.  If you want GDB's output, expect it to
be UTF-8.  The MI layer is a "transport", and can add its own necessary
escaping (of quote marks, mostly).  Alternatively, make GDB output in
the current locale's character set.

So, if we print a wchar_t string as a string, and the user has conveyed
to us that their wchar_t strings are Unicode code points, then we can
convert that to the appropriate multibyte string on output using the
host character set.

Picked a host character set that can't represent some target characters?
The CLI should fall back to pretty escape sequences, I don't know what
the MI should do, but probably the answer is unimportant.

> My point is, MI consumers are already parsing ISO C strings.  They
> just need to parse more of them.

IMO, we need to make them parse less of them.

Everywhere the MI consumer needs to parse something which originated
as GDB CLI output, things go bad.  For instance, MI consumers may get
confused by the automatic limits on "set print elements", which
truncates strings.

After "set print elements 2":

(gdb) interpreter-exec mi "-var-create - * \"(char *)&__libc_version\""
^done,name="var1",numchild="1",type="char *"
(gdb) 
(gdb) interpreter-exec mi "-var-evaluate-expression var1"
^done,value="0x102a80 \"2.\"..."
(gdb) 

Not very nice of us, was that?

> There is no provision in ISO C for variable-size wchar_t encodings. 
> The portion of the standard I referred to says that wchar_t "...is an
> integer type whose range of values can represent distinct codes for
> all members of the largest extended character set speci???ed among the
> supported locales".

(A) GDB supports languages other than C.

(B) While I am inclined to agree with you about the language of ISO C,
we don't get to ignore the reality of platforms with a 16-bit wchar_t
which store UTF-16 in it.

-- 
Daniel Jacobowitz
CodeSourcery