From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17972 invoked by alias); 14 Apr 2006 06:10:30 -0000 Received: (qmail 17964 invoked by uid 22791); 14 Apr 2006 06:10:29 -0000 X-Spam-Check-By: sourceware.org Received: from main.gmane.org (HELO ciao.gmane.org) (80.91.229.2) by sourceware.org (qpsmtpd/0.31) with ESMTP; Fri, 14 Apr 2006 06:10:28 +0000 Received: from list by ciao.gmane.org with local (Exim 4.43) id 1FUHVg-0007hC-7c for gdb@sources.redhat.com; Fri, 14 Apr 2006 08:10:24 +0200 Received: from zigzag.lvk.cs.msu.su ([158.250.17.23]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 14 Apr 2006 08:10:24 +0200 Received: from ghost by zigzag.lvk.cs.msu.su with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Fri, 14 Apr 2006 08:10:24 +0200 To: gdb@sources.redhat.com From: Vladimir Prus Subject: Re: printing wchar_t* Date: Fri, 14 Apr 2006 07:58:00 -0000 Message-ID: References: <8f2776cb0604131031g370d6fa9p9361421bd21d178@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7Bit User-Agent: KNode/0.8.2 X-IsSubscribed: yes Mailing-List: contact gdb-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sourceware.org X-SW-Source: 2006-04/txt/msg00168.txt.bz2 Jim Blandy wrote: > On 4/13/06, Vladimir Prus wrote: >> I have a user-defined command that can produce the output I want, but is >> defining a custom command the right approach? > > Well, you'd like wide strings to be printed properly when they appear > in structures, as arguments to functions, and so on, right? So a > user-defined command isn't ideal. I think I'll still need to do some processing for wchar_t* on frontend side. The problem is that I don't see any way how gdb can print wchar_t in a way that does not require post-processing. It can print it as UTF8, but then for printing char* gdb should use local 8 bit encoding, which is likely to be *not* UTF8. Gdb can probably use some extra markers for values: like: "foo" for string in local 8-bit encoding L"foo" for string in UTF8 encoding. It's also possible to use "\u" escapes. But then there's a problem: - Do we assume that wchar_t is always UTF-16 or UTF-32? - If not: - how user can select this? - how user-specified encoding will be handled > The best approach would be to extend charset.[ch] to handle wide > character sets as well, and then add code to the language-specific > printing routines to use the charset functions. (This is fortunately > much simpler than adding support for multibyte characters.) For, for each wchar_t element language-specific code will call 'target_wchar_t_to_host', that will output specific representation of that wchar_t. Hmm, the interface there seem to assume theres 1<->1 mapping between target and host characters. This makes L"UTF8" format and ascii string with \u escapes format impossible, It seems. - Volodya