From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 3680 invoked by alias); 17 Apr 2006 07:35:33 -0000 Received: (qmail 3672 invoked by uid 22791); 17 Apr 2006 07:35:32 -0000 X-Spam-Check-By: sourceware.org Received: from nitzan.inter.net.il (HELO nitzan.inter.net.il) (192.114.186.20) by sourceware.org (qpsmtpd/0.31) with ESMTP; Mon, 17 Apr 2006 07:35:30 +0000 Received: from HOME-C4E4A596F7 (IGLD-80-230-11-227.inter.net.il [80.230.11.227]) by nitzan.inter.net.il (MOS 3.7.3-GA) with ESMTP id DDQ31359 (AUTH halo1); Mon, 17 Apr 2006 10:35:25 +0300 (IDT) Date: Mon, 17 Apr 2006 08:35:00 -0000 Message-Id: From: Eli Zaretskii To: Vladimir Prus CC: pkoning@equallogic.com, gdb@sources.redhat.com In-reply-to: <200604171017.41504.ghost@cs.msu.su> (message from Vladimir Prus on Mon, 17 Apr 2006 10:17:40 +0400) Subject: Re: printing wchar_t* Reply-to: Eli Zaretskii References: <200604141850.08495.ghost@cs.msu.su> <200604171017.41504.ghost@cs.msu.su> X-IsSubscribed: yes Mailing-List: contact gdb-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sourceware.org X-SW-Source: 2006-04/txt/msg00229.txt.bz2 > From: Vladimir Prus > Date: Mon, 17 Apr 2006 10:17:40 +0400 > Cc: pkoning@equallogic.com, > gdb@sources.redhat.com > > On Friday 14 April 2006 21:10, Eli Zaretskii wrote: > > > > > If we want to support wchar_t arrays that store UTF-16, we will need > > > > to add a feature to GDB to convert UTF-16 to the full UCS-4 > > > > codepoints, and output those. > > > > > > That's what I mentioned in a reply to Jim -- since the current string > > > printing code operated "one wchar_t at a time", it's not suitable for > > > outputing UTF-16 encoded wchar_t values to the user. > > > > I don't understand: if the wchar_t array holds a UTF-16 encoding, then > > when you receive the entire string, you have a UTF-16 encoding of what > > you want to display, and you yourself said that displaying a UTF-16 > > encoded string is easy for you. So where is the problem? is that only > > that you cannot know the length of the UTF-16 encoded string? or is > > there something else missing? > > For my frontend -- there's no problem, I can handle UTF-16 myself. However, if > gdb is to ever produce output in UTF-8 We were talking about wchar_t and wide character strings, which UTF-8 isn't. Let's not confuse ourselves more than we already did. Adding to GDB support for converting arbitrary encoded text into UTF-8 would be a giant job. > then it should handle surrogate pairs itself. Taking first and > second element of surrogate pair and converting both to UTF-8, individually, > won't work, for obvious reasons. I don't think it's quite as ``obvious'' as you imply. Handling surrogates is generally a job for a display engine, so a UTF-8 enabled terminal could very well do it itself. I don't know if they actually do that, though. But anyway, this is a different issue.