From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 22501 invoked by alias); 14 Apr 2006 13:38:31 -0000 Received: (qmail 22493 invoked by uid 22791); 14 Apr 2006 13:38:31 -0000 X-Spam-Check-By: sourceware.org Received: from nile.gnat.com (HELO nile.gnat.com) (205.232.38.5) by sourceware.org (qpsmtpd/0.31) with ESMTP; Fri, 14 Apr 2006 13:38:30 +0000 Received: from localhost (localhost [127.0.0.1]) by filtered-nile.gnat.com (Postfix) with ESMTP id 6E8D948CC51; Fri, 14 Apr 2006 09:38:28 -0400 (EDT) Received: from nile.gnat.com ([127.0.0.1]) by localhost (nile.gnat.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 28140-01-3; Fri, 14 Apr 2006 09:38:28 -0400 (EDT) Received: from [127.0.0.1] (dhcp6.gnat.com [205.232.38.246]) by nile.gnat.com (Postfix) with ESMTP id 2CEC248CBE0; Fri, 14 Apr 2006 09:38:28 -0400 (EDT) Message-ID: <443FA5D4.7040901@adacore.com> Date: Fri, 14 Apr 2006 13:59:00 -0000 From: Robert Dewar User-Agent: Thunderbird 1.5 (Windows/20051201) MIME-Version: 1.0 To: Eli Zaretskii , Vladimir Prus , gdb@sources.redhat.com Subject: Re: printing wchar_t* References: <200604141246.58094.ghost@cs.msu.su> <20060414130729.GB12955@nevyn.them.org> In-Reply-To: <20060414130729.GB12955@nevyn.them.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact gdb-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sourceware.org X-SW-Source: 2006-04/txt/msg00186.txt.bz2 Daniel Jacobowitz wrote: > On Fri, Apr 14, 2006 at 03:55:49PM +0300, Eli Zaretskii wrote: >> Anyway, UTF-16 is a variable-length encoding, so wchar_t is not it. > > There's a rant about this in the glibc manual I was just reading... > > In fact, on many platforms, wchar_t is only 16-bit. How exactly you > handle UTF-8 or UCS-4 input in this case, I don't really understand. Seems clear, you can only represent a limited range of codes if you only have 16 bits! UTF-8 is a variable length encoding that can represent any character in the 32-bit range. Obviously if you have to construct wchar_t values from UTF-8 input, then you will not be able to represent characters whose codes exceed 65535. Same with UCS-4. >