From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 22290 invoked by alias); 14 Apr 2006 20:27:31 -0000 Received: (qmail 22279 invoked by uid 22791); 14 Apr 2006 20:27:31 -0000 X-Spam-Check-By: sourceware.org Received: from nevyn.them.org (HELO nevyn.them.org) (66.93.172.17) by sourceware.org (qpsmtpd/0.31.1) with ESMTP; Fri, 14 Apr 2006 20:27:28 +0000 Received: from drow by nevyn.them.org with local (Exim 4.54) id 1FUUsy-00067n-3E; Fri, 14 Apr 2006 16:27:20 -0400 Date: Fri, 14 Apr 2006 22:18:00 -0000 From: Daniel Jacobowitz To: Jim Blandy Cc: Eli Zaretskii , ghost@cs.msu.su, gdb@sources.redhat.com Subject: Re: printing wchar_t* Message-ID: <20060414202720.GA23182@nevyn.them.org> Mail-Followup-To: Jim Blandy , Eli Zaretskii , ghost@cs.msu.su, gdb@sources.redhat.com References: <200604141257.41690.ghost@cs.msu.su> <200604141837.26618.ghost@cs.msu.su> <8f2776cb0604141053v73e512e3o2d1c9086312316bd@mail.gmail.com> <8f2776cb0604141216m216ba87ch529180cd079ce971@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8f2776cb0604141216m216ba87ch529180cd079ce971@mail.gmail.com> User-Agent: Mutt/1.5.8i X-IsSubscribed: yes Mailing-List: contact gdb-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sourceware.org X-SW-Source: 2006-04/txt/msg00212.txt.bz2 On Fri, Apr 14, 2006 at 12:16:36PM -0700, Jim Blandy wrote: > The command line and MI already use the ISO C syntax for conveying > values to the user/consumer. I'm just saying we should expand our use > of the syntax we already use. I don't agree. Saying "we use ISO C syntax for conveying data" is fairly inaccurate. We are inconsistent. Some things are escaped in a C-like fashion. Other things are escaped in other fashions, with their own quoting rules. This is true in both directions, for user input and for output. Let's consider strings in particular. Strings are printed using LA_PRINT_STRING. As the name implies, the quoting done is adjusted to match the source language convention. Asking an FE to grok that is just impractical. In data intended for CLI users, we can prettyprint things any way we want; in data intended for anything more machinelike, I recommend we define a syntax and stick with it. Personally, I'd just use UTF-8. If you want GDB's output, expect it to be UTF-8. The MI layer is a "transport", and can add its own necessary escaping (of quote marks, mostly). Alternatively, make GDB output in the current locale's character set. So, if we print a wchar_t string as a string, and the user has conveyed to us that their wchar_t strings are Unicode code points, then we can convert that to the appropriate multibyte string on output using the host character set. Picked a host character set that can't represent some target characters? The CLI should fall back to pretty escape sequences, I don't know what the MI should do, but probably the answer is unimportant. > My point is, MI consumers are already parsing ISO C strings. They > just need to parse more of them. IMO, we need to make them parse less of them. Everywhere the MI consumer needs to parse something which originated as GDB CLI output, things go bad. For instance, MI consumers may get confused by the automatic limits on "set print elements", which truncates strings. After "set print elements 2": (gdb) interpreter-exec mi "-var-create - * \"(char *)&__libc_version\"" ^done,name="var1",numchild="1",type="char *" (gdb) (gdb) interpreter-exec mi "-var-evaluate-expression var1" ^done,value="0x102a80 \"2.\"..." (gdb) Not very nice of us, was that? > There is no provision in ISO C for variable-size wchar_t encodings. > The portion of the standard I referred to says that wchar_t "...is an > integer type whose range of values can represent distinct codes for > all members of the largest extended character set speci???ed among the > supported locales". (A) GDB supports languages other than C. (B) While I am inclined to agree with you about the language of ISO C, we don't get to ignore the reality of platforms with a 16-bit wchar_t which store UTF-16 in it. -- Daniel Jacobowitz CodeSourcery