From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-24910-listarch-gdb=sources.redhat.com@sourceware.org>
Received: (qmail 29269 invoked by alias); 14 Apr 2006 17:10:36 -0000
Received: (qmail 29261 invoked by uid 22791); 14 Apr 2006 17:10:35 -0000
X-Spam-Check-By: sourceware.org
Received: from nitzan.inter.net.il (HELO nitzan.inter.net.il) (192.114.186.20)     by sourceware.org (qpsmtpd/0.31) with ESMTP; Fri, 14 Apr 2006 17:10:34 +0000
Received: from HOME-C4E4A596F7 (IGLD-83-130-214-179.inter.net.il [83.130.214.179]) 	by nitzan.inter.net.il (MOS 3.7.3-GA) 	with ESMTP id DDG50855 (AUTH halo1); 	Fri, 14 Apr 2006 20:10:29 +0300 (IDT)
Date: Fri, 14 Apr 2006 17:53:00 -0000
Message-Id: <uek0013sq.fsf@gnu.org>
From: Eli Zaretskii <eliz@gnu.org>
To: Vladimir Prus <ghost@cs.msu.su>
CC: pkoning@equallogic.com, gdb@sources.redhat.com
In-reply-to: <200604141850.08495.ghost@cs.msu.su> (message from Vladimir Prus 	on Fri, 14 Apr 2006 18:50:07 +0400)
Subject: Re: printing wchar_t*
Reply-to: Eli Zaretskii <eliz@gnu.org>
References: <e1lsqg$aml$1@sea.gmane.org> <17471.42725.651176.368871@gargle.gargle.HOWL> <uodz41b8l.fsf@gnu.org> <200604141850.08495.ghost@cs.msu.su>
X-IsSubscribed: yes
Mailing-List: contact gdb-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb/>
List-Post: <mailto:gdb@sourceware.org>
List-Help: <mailto:gdb-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-owner@sourceware.org
X-SW-Source: 2006-04/txt/msg00198.txt.bz2

> From: Vladimir Prus <ghost@cs.msu.su>
> Date: Fri, 14 Apr 2006 18:50:07 +0400
> Cc: Paul Koning <pkoning@equallogic.com>,  gdb@sources.redhat.com
> 
> > You could use wchar_t arrays for that, but then not every array
> > element will be a full character, and you will not be able to access
> > individual characters by their positional index.
> 
> And what? Even if wchar_t is 32 bit then element at position 'i' can be 
> combining character modifying another character, and be of little use itself.

You are introducing into the argument yet another face of a character:
how it is displayed.  It's true that some characters, when they are
adjacent to each other, are displayed in some special way (the ff
ligature is one simple example of that), but that is something for the
rendering engine to take care of, it has nothing to do with the
string's content.  As far as any software, except the rendering
engine, is concerned, the combining character is, in fact, part of the
string.  For example, if the user wants to search for such a
character, the program must find it.

So, for the purposes of processing the wchar_t strings, it is very
important to know whether they are fixed-size wide characters or
variable-size encoding.  If you just copy the string verbatim to and
fro, then it doesn't matter, but for anything more complex the
difference is very large.

> > If we want to support wchar_t arrays that store UTF-16, we will need
> > to add a feature to GDB to convert UTF-16 to the full UCS-4
> > codepoints, and output those.  
> 
> That's what I mentioned in a reply to Jim -- since the current string printing 
> code operated "one wchar_t at a time", it's not suitable for outputing UTF-16 
> encoded wchar_t values to the user.

I don't understand: if the wchar_t array holds a UTF-16 encoding, then
when you receive the entire string, you have a UTF-16 encoding of what
you want to display, and you yourself said that displaying a UTF-16
encoded string is easy for you.  So where is the problem? is that only
that you cannot know the length of the UTF-16 encoded string? or is
there something else missing?

> > Alternatively, the FE will have to 
> > support display of UTF-16 encoded characters.
> 
> Speaking about FE, handling UTF-16 is trivial

Maybe in your environment and windowing system, but not in all cases,
AFAIK.