From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 23226 invoked by alias); 16 Jan 2009 09:36:55 -0000 Received: (qmail 23218 invoked by uid 22791); 16 Jan 2009 09:36:54 -0000 X-SWARE-Spam-Status: No, hits=-0.7 required=5.0 tests=AWL,BAYES_00,RCVD_IN_SORBS_WEB,SPF_SOFTFAIL X-Spam-Check-By: sourceware.org Received: from mtaout2.012.net.il (HELO mtaout2.012.net.il) (84.95.2.4) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 16 Jan 2009 09:36:13 +0000 Received: from conversion-daemon.i_mtaout2.012.net.il by i_mtaout2.012.net.il (HyperSendmail v2004.12) id <0KDK0060052M6H00@i_mtaout2.012.net.il> for gdb-patches@sourceware.org; Fri, 16 Jan 2009 11:36:21 +0200 (IST) Received: from HOME-C4E4A596F7 ([77.127.202.36]) by i_mtaout2.012.net.il (HyperSendmail v2004.12) with ESMTPA id <0KDK00H3Y5CK0ZS1@i_mtaout2.012.net.il>; Fri, 16 Jan 2009 11:36:21 +0200 (IST) Date: Fri, 16 Jan 2009 09:36:00 -0000 From: Eli Zaretskii Subject: Re: [PATCH/WIP] C/C++ wchar_t/Unicode printing support In-reply-to: <20090115202411.5f154657@rex.config> To: Julian Brown Cc: gdb-patches@sourceware.org, tromey@redhat.com Reply-to: Eli Zaretskii Message-id: References: <20090115202411.5f154657@rex.config> X-IsSubscribed: yes Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2009-01/txt/msg00376.txt.bz2 > Date: Thu, 15 Jan 2009 20:24:11 +0000 > From: Julian Brown > Cc: tromey@redhat.com > > This patch contains (at least the start of) support for printing > wchar_t strings from a debugged program within GDB. This is the subject > for GDB bugs 9103 (and its duplicates 9369, 9268) and maybe 7821. Thank you! > OK to apply Not without documentation, sorry. Such an important feature should not go in undocumented. > or any comments? A few: > (gdb) show host-charset > The host character set is "UTF-8" (auto). Elsewhere in GDB, we show such settings in a slightly different form: (gdb) show language The current source language is "auto; currently c". I like this latter form better: it first says that the setting is "auto", then what is the detected state. > + #ifndef GDB_DEFAULT_TARGET_WIDE_CHARSET > + #define GDB_DEFAULT_TARGET_WIDE_CHARSET "UTF-32" > + #endif > + > + #ifndef GDB_INTERNAL_CODESET > + #define GDB_INTERNAL_CODESET "UCS-4LE" > + #endif Why are these the defaults? because of what GNU/Linux (i.e. glibc) does, or for some other reason? If the former, shouldn't this be autoconfigured? > + static const char *target_wide_charset_enum[] = > + { > + "UCS-2", > + "UCS-2LE", > + "UCS-2BE", > + "UCS-4", > + "UCS-4LE", > + "UCS-4BE", > + "UTF-16", > + "UTF-16LE", > + "UTF-16BE", > + "UTF-32", > + "UTF-32LE", > + "UTF-32BE", > + 0 > + }; Why do we need the UCS-2 charsets? That's just confusing; are there important platforms that support UCS-2 instead of UTF-16? I'd also suggest to consider removing UTF-32 and its endian variants, since they are exactly identical to UCS-4. (Unless someone wants to support the Emacs 23 internal representation, but that one should be called by its own name anyway.)