From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 27140 invoked by alias); 14 Apr 2006 14:50:49 -0000 Received: (qmail 27131 invoked by uid 22791); 14 Apr 2006 14:50:49 -0000 X-Spam-Check-By: sourceware.org Received: from zigzag.lvk.cs.msu.su (HELO zigzag.lvk.cs.msu.su) (158.250.17.23) by sourceware.org (qpsmtpd/0.31) with ESMTP; Fri, 14 Apr 2006 14:50:48 +0000 Received: from Debian-exim by zigzag.lvk.cs.msu.su with spam-scanned (Exim 4.50) id 1FUPd9-0002tA-0R for gdb@sources.redhat.com; Fri, 14 Apr 2006 18:50:44 +0400 Received: from zigzag.lvk.cs.msu.su ([158.250.17.23]) by zigzag.lvk.cs.msu.su with esmtp (Exim 4.50) id 1FUPce-0002m5-VI; Fri, 14 Apr 2006 18:50:08 +0400 From: Vladimir Prus To: Eli Zaretskii Subject: Re: printing wchar_t* Date: Fri, 14 Apr 2006 15:00:00 -0000 User-Agent: KMail/1.7.2 Cc: Paul Koning , gdb@sources.redhat.com References: <17471.42725.651176.368871@gargle.gargle.HOWL> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="koi8-r" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200604141850.08495.ghost@cs.msu.su> Mailing-List: contact gdb-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sourceware.org X-SW-Source: 2006-04/txt/msg00195.txt.bz2 On Friday 14 April 2006 18:29, Eli Zaretskii wrote: > > Date: Fri, 14 Apr 2006 09:43:01 -0400 > > From: Paul Koning > > Cc: ghost@cs.msu.su, gdb@sources.redhat.com > > > > If you have 16 bit wide chars, it seems possible that those might > > contain UTF-16 encoding of full (beyond BMP) Unicode characters. > > You could use wchar_t arrays for that, but then not every array > element will be a full character, and you will not be able to access > individual characters by their positional index. And what? Even if wchar_t is 32 bit then element at position 'i' can be combining character modifying another character, and be of little use itself. > In other words, in this case each element of the wchar_t array is no > longer a ``wide character'', but one of the few shorts that encode a > character. > > If we want to support wchar_t arrays that store UTF-16, we will need > to add a feature to GDB to convert UTF-16 to the full UCS-4 > codepoints, and output those. That's what I mentioned in a reply to Jim -- since the current string printing code operated "one wchar_t at a time", it's not suitable for outputing UTF-16 encoded wchar_t values to the user. > Alternatively, the FE will have to > support display of UTF-16 encoded characters. Speaking about FE, handling UTF-16 is trivial, so printing just wchar_t values will be sufficient. Only if we want to properly show UTF-16 strings to a user of console gdb, some work may be necessary. - Volodya