From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 29269 invoked by alias); 14 Apr 2006 17:10:36 -0000 Received: (qmail 29261 invoked by uid 22791); 14 Apr 2006 17:10:35 -0000 X-Spam-Check-By: sourceware.org Received: from nitzan.inter.net.il (HELO nitzan.inter.net.il) (192.114.186.20) by sourceware.org (qpsmtpd/0.31) with ESMTP; Fri, 14 Apr 2006 17:10:34 +0000 Received: from HOME-C4E4A596F7 (IGLD-83-130-214-179.inter.net.il [83.130.214.179]) by nitzan.inter.net.il (MOS 3.7.3-GA) with ESMTP id DDG50855 (AUTH halo1); Fri, 14 Apr 2006 20:10:29 +0300 (IDT) Date: Fri, 14 Apr 2006 17:53:00 -0000 Message-Id: From: Eli Zaretskii To: Vladimir Prus CC: pkoning@equallogic.com, gdb@sources.redhat.com In-reply-to: <200604141850.08495.ghost@cs.msu.su> (message from Vladimir Prus on Fri, 14 Apr 2006 18:50:07 +0400) Subject: Re: printing wchar_t* Reply-to: Eli Zaretskii References: <17471.42725.651176.368871@gargle.gargle.HOWL> <200604141850.08495.ghost@cs.msu.su> X-IsSubscribed: yes Mailing-List: contact gdb-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sourceware.org X-SW-Source: 2006-04/txt/msg00198.txt.bz2 > From: Vladimir Prus > Date: Fri, 14 Apr 2006 18:50:07 +0400 > Cc: Paul Koning , gdb@sources.redhat.com > > > You could use wchar_t arrays for that, but then not every array > > element will be a full character, and you will not be able to access > > individual characters by their positional index. > > And what? Even if wchar_t is 32 bit then element at position 'i' can be > combining character modifying another character, and be of little use itself. You are introducing into the argument yet another face of a character: how it is displayed. It's true that some characters, when they are adjacent to each other, are displayed in some special way (the ff ligature is one simple example of that), but that is something for the rendering engine to take care of, it has nothing to do with the string's content. As far as any software, except the rendering engine, is concerned, the combining character is, in fact, part of the string. For example, if the user wants to search for such a character, the program must find it. So, for the purposes of processing the wchar_t strings, it is very important to know whether they are fixed-size wide characters or variable-size encoding. If you just copy the string verbatim to and fro, then it doesn't matter, but for anything more complex the difference is very large. > > If we want to support wchar_t arrays that store UTF-16, we will need > > to add a feature to GDB to convert UTF-16 to the full UCS-4 > > codepoints, and output those. > > That's what I mentioned in a reply to Jim -- since the current string printing > code operated "one wchar_t at a time", it's not suitable for outputing UTF-16 > encoded wchar_t values to the user. I don't understand: if the wchar_t array holds a UTF-16 encoding, then when you receive the entire string, you have a UTF-16 encoding of what you want to display, and you yourself said that displaying a UTF-16 encoded string is easy for you. So where is the problem? is that only that you cannot know the length of the UTF-16 encoded string? or is there something else missing? > > Alternatively, the FE will have to > > support display of UTF-16 encoded characters. > > Speaking about FE, handling UTF-16 is trivial Maybe in your environment and windowing system, but not in all cases, AFAIK.