From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-24941-listarch-gdb=sources.redhat.com@sourceware.org>
Received: (qmail 3680 invoked by alias); 17 Apr 2006 07:35:33 -0000
Received: (qmail 3672 invoked by uid 22791); 17 Apr 2006 07:35:32 -0000
X-Spam-Check-By: sourceware.org
Received: from nitzan.inter.net.il (HELO nitzan.inter.net.il) (192.114.186.20)     by sourceware.org (qpsmtpd/0.31) with ESMTP; Mon, 17 Apr 2006 07:35:30 +0000
Received: from HOME-C4E4A596F7 (IGLD-80-230-11-227.inter.net.il [80.230.11.227]) 	by nitzan.inter.net.il (MOS 3.7.3-GA) 	with ESMTP id DDQ31359 (AUTH halo1); 	Mon, 17 Apr 2006 10:35:25 +0300 (IDT)
Date: Mon, 17 Apr 2006 08:35:00 -0000
Message-Id: <u8xq4zmbj.fsf@gnu.org>
From: Eli Zaretskii <eliz@gnu.org>
To: Vladimir Prus <ghost@cs.msu.su>
CC: pkoning@equallogic.com, gdb@sources.redhat.com
In-reply-to: <200604171017.41504.ghost@cs.msu.su> (message from Vladimir Prus 	on Mon, 17 Apr 2006 10:17:40 +0400)
Subject: Re: printing wchar_t*
Reply-to: Eli Zaretskii <eliz@gnu.org>
References: <e1lsqg$aml$1@sea.gmane.org> <200604141850.08495.ghost@cs.msu.su> <uek0013sq.fsf@gnu.org> <200604171017.41504.ghost@cs.msu.su>
X-IsSubscribed: yes
Mailing-List: contact gdb-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb/>
List-Post: <mailto:gdb@sourceware.org>
List-Help: <mailto:gdb-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-owner@sourceware.org
X-SW-Source: 2006-04/txt/msg00229.txt.bz2

> From: Vladimir Prus <ghost@cs.msu.su>
> Date: Mon, 17 Apr 2006 10:17:40 +0400
> Cc: pkoning@equallogic.com,
>  gdb@sources.redhat.com
> 
> On Friday 14 April 2006 21:10, Eli Zaretskii wrote:
> 
> > > > If we want to support wchar_t arrays that store UTF-16, we will need
> > > > to add a feature to GDB to convert UTF-16 to the full UCS-4
> > > > codepoints, and output those.
> > >
> > > That's what I mentioned in a reply to Jim -- since the current string
> > > printing code operated "one wchar_t at a time", it's not suitable for
> > > outputing UTF-16 encoded wchar_t values to the user.
> >
> > I don't understand: if the wchar_t array holds a UTF-16 encoding, then
> > when you receive the entire string, you have a UTF-16 encoding of what
> > you want to display, and you yourself said that displaying a UTF-16
> > encoded string is easy for you.  So where is the problem? is that only
> > that you cannot know the length of the UTF-16 encoded string? or is
> > there something else missing?
> 
> For my frontend -- there's no problem, I can handle UTF-16 myself. However, if
> gdb is to ever produce output in UTF-8

We were talking about wchar_t and wide character strings, which UTF-8
isn't.  Let's not confuse ourselves more than we already did.  Adding
to GDB support for converting arbitrary encoded text into UTF-8 would
be a giant job.

> then it should handle surrogate pairs itself. Taking first and 
> second element of surrogate pair and converting both to UTF-8, individually, 
> won't work, for obvious reasons.

I don't think it's quite as ``obvious'' as you imply.  Handling
surrogates is generally a job for a display engine, so a UTF-8 enabled
terminal could very well do it itself.  I don't know if they actually
do that, though.  But anyway, this is a different issue.