From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-24907-listarch-gdb=sources.redhat.com@sourceware.org>
Received: (qmail 27140 invoked by alias); 14 Apr 2006 14:50:49 -0000
Received: (qmail 27131 invoked by uid 22791); 14 Apr 2006 14:50:49 -0000
X-Spam-Check-By: sourceware.org
Received: from zigzag.lvk.cs.msu.su (HELO zigzag.lvk.cs.msu.su) (158.250.17.23)     by sourceware.org (qpsmtpd/0.31) with ESMTP; Fri, 14 Apr 2006 14:50:48 +0000
Received: from Debian-exim by zigzag.lvk.cs.msu.su with spam-scanned (Exim 4.50) 	id 1FUPd9-0002tA-0R 	for gdb@sources.redhat.com; Fri, 14 Apr 2006 18:50:44 +0400
Received: from zigzag.lvk.cs.msu.su ([158.250.17.23]) 	by zigzag.lvk.cs.msu.su with esmtp (Exim 4.50) 	id 1FUPce-0002m5-VI; Fri, 14 Apr 2006 18:50:08 +0400
From: Vladimir Prus <ghost@cs.msu.su>
To: Eli Zaretskii <eliz@gnu.org>
Subject: Re: printing wchar_t*
Date: Fri, 14 Apr 2006 15:00:00 -0000
User-Agent: KMail/1.7.2
Cc: Paul Koning <pkoning@equallogic.com>,  gdb@sources.redhat.com
References: <e1lsqg$aml$1@sea.gmane.org> <17471.42725.651176.368871@gargle.gargle.HOWL> <uodz41b8l.fsf@gnu.org>
In-Reply-To: <uodz41b8l.fsf@gnu.org>
MIME-Version: 1.0
Content-Type: text/plain;   charset="koi8-r"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200604141850.08495.ghost@cs.msu.su>
Mailing-List: contact gdb-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb/>
List-Post: <mailto:gdb@sourceware.org>
List-Help: <mailto:gdb-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-owner@sourceware.org
X-SW-Source: 2006-04/txt/msg00195.txt.bz2

On Friday 14 April 2006 18:29, Eli Zaretskii wrote:
> > Date: Fri, 14 Apr 2006 09:43:01 -0400
> > From: Paul Koning <pkoning@equallogic.com>
> > Cc: ghost@cs.msu.su, gdb@sources.redhat.com
> >
> > If you have 16 bit wide chars, it seems possible that those might
> > contain UTF-16 encoding of full (beyond BMP) Unicode characters.
>
> You could use wchar_t arrays for that, but then not every array
> element will be a full character, and you will not be able to access
> individual characters by their positional index.

And what? Even if wchar_t is 32 bit then element at position 'i' can be 
combining character modifying another character, and be of little use itself.

> In other words, in this case each element of the wchar_t array is no
> longer a ``wide character'', but one of the few shorts that encode a
> character.
>
> If we want to support wchar_t arrays that store UTF-16, we will need
> to add a feature to GDB to convert UTF-16 to the full UCS-4
> codepoints, and output those.  

That's what I mentioned in a reply to Jim -- since the current string printing 
code operated "one wchar_t at a time", it's not suitable for outputing UTF-16 
encoded wchar_t values to the user.

> Alternatively, the FE will have to 
> support display of UTF-16 encoded characters.

Speaking about FE, handling UTF-16 is trivial, so printing just wchar_t values 
will be sufficient. Only if we want to properly show UTF-16 strings to a user 
of console gdb, some work may be necessary.

- Volodya