From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-24937-listarch-gdb=sources.redhat.com@sourceware.org>
Received: (qmail 2996 invoked by alias); 17 Apr 2006 06:18:09 -0000
Received: (qmail 2986 invoked by uid 22791); 17 Apr 2006 06:18:08 -0000
X-Spam-Check-By: sourceware.org
Received: from zigzag.lvk.cs.msu.su (HELO zigzag.lvk.cs.msu.su) (158.250.17.23)     by sourceware.org (qpsmtpd/0.31) with ESMTP; Mon, 17 Apr 2006 06:18:07 +0000
Received: from Debian-exim by zigzag.lvk.cs.msu.su with spam-scanned (Exim 4.50) 	id 1FVN3f-0004zF-IW 	for gdb@sources.redhat.com; Mon, 17 Apr 2006 10:18:04 +0400
Received: from zigzag.lvk.cs.msu.su ([158.250.17.23]) 	by zigzag.lvk.cs.msu.su with esmtp (Exim 4.50) 	id 1FVN3O-0004vx-86; Mon, 17 Apr 2006 10:17:42 +0400
From: Vladimir Prus <ghost@cs.msu.su>
To: Eli Zaretskii <eliz@gnu.org>
Subject: Re: printing wchar_t*
Date: Mon, 17 Apr 2006 07:05:00 -0000
User-Agent: KMail/1.7.2
Cc: pkoning@equallogic.com,  gdb@sources.redhat.com
References: <e1lsqg$aml$1@sea.gmane.org> <200604141850.08495.ghost@cs.msu.su> <uek0013sq.fsf@gnu.org>
In-Reply-To: <uek0013sq.fsf@gnu.org>
MIME-Version: 1.0
Content-Type: text/plain;   charset="koi8-r"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200604171017.41504.ghost@cs.msu.su>
Mailing-List: contact gdb-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb/>
List-Post: <mailto:gdb@sourceware.org>
List-Help: <mailto:gdb-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-owner@sourceware.org
X-SW-Source: 2006-04/txt/msg00225.txt.bz2

On Friday 14 April 2006 21:10, Eli Zaretskii wrote:

> > > If we want to support wchar_t arrays that store UTF-16, we will need
> > > to add a feature to GDB to convert UTF-16 to the full UCS-4
> > > codepoints, and output those.
> >
> > That's what I mentioned in a reply to Jim -- since the current string
> > printing code operated "one wchar_t at a time", it's not suitable for
> > outputing UTF-16 encoded wchar_t values to the user.
>
> I don't understand: if the wchar_t array holds a UTF-16 encoding, then
> when you receive the entire string, you have a UTF-16 encoding of what
> you want to display, and you yourself said that displaying a UTF-16
> encoded string is easy for you.  So where is the problem? is that only
> that you cannot know the length of the UTF-16 encoded string? or is
> there something else missing?

For my frontend -- there's no problem, I can handle UTF-16 myself. However, if
gdb is to ever produce output in UTF-8, that should be readable by the 
console, then it should handle surrogate pairs itself. Taking first and 
second element of surrogate pair and converting both to UTF-8, individually, 
won't work, for obvious reasons.

- Volodya