From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24163 invoked by alias); 15 Jan 2009 22:16:17 -0000 Received: (qmail 23784 invoked by uid 22791); 15 Jan 2009 22:16:16 -0000 X-SWARE-Spam-Status: No, hits=-2.1 required=5.0 tests=AWL,BAYES_00,SPF_PASS X-Spam-Check-By: sourceware.org Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (65.74.133.4) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 15 Jan 2009 22:15:29 +0000 Received: (qmail 11708 invoked from network); 15 Jan 2009 22:15:27 -0000 Received: from unknown (HELO rex.config) (julian@127.0.0.2) by mail.codesourcery.com with ESMTPA; 15 Jan 2009 22:15:27 -0000 Date: Thu, 15 Jan 2009 22:16:00 -0000 From: Julian Brown To: Tom Tromey Cc: gdb-patches@sourceware.org Subject: Re: [PATCH/WIP] C/C++ wchar_t/Unicode printing support Message-ID: <20090115221523.28c15971@rex.config> In-Reply-To: References: <20090115202411.5f154657@rex.config> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2009-01/txt/msg00366.txt.bz2 Thanks for the quick reply! On Thu, 15 Jan 2009 13:59:51 -0700 Tom Tromey wrote: > >>>>> "Julian" == Julian Brown writes: > > Julian> 3. Types which are literally called "wchar_t" are assumed to > Julian> be wide characters. > > I did something similar -- my patch looks at TYPE_NAME to see if it is > "wchar_t". In C, this is a typedef, and so I needed the appended to > make it work. Without this patch, lookup_typename will find a > "wchar_t" symbol whose type has a TYPE_NAME which is not "wchar_t". > That seemed odd. The patch changes the dwarf reader so that the > wchar_t symbol points to a type whose name is "wchar_t". > > I think the failing case here was "p L'a'", so I suppose it would not > necessarily show up with your patch. I don't think I'd run across that problem, no... > Julian> $3 = (wchar_t *) 0x85c4 "Sch\x00f6ne Gr\x00fc\x00dfe" > > It should probably print L"..." :-) Yeah, true. > Yeah. Mine: > > * Handles input and output of wide characters and strings, and also > the new C0X u"" and U"" syntax. > * Adds "%ls" and "%lc" to the gdb printf. Sounds good. > * Handles all target character sets, in particular variable length > encodings are handled. My patch is supposed to handle variable-length encodings for target wide character set -- but that's not tested, so is probably broken :-) > * Auto-selects the appropriate endianness for wide characters on the > target. Cool. > * Getting the list of character sets support by iconv is a pain. > Right now I just have a list of dubious provenance (read: iconv -l > | sed). > > Perhaps we can invoke "iconv -l" at startup... eww. I ran into this problem too. An earlier version of my patch had this, in register_iconv_charsets(): FILE *fh; /* Fixed buffers never caused anyone problems did they? */ char charset[200]; int seen_a_charset = 0; struct charset *cs; fh = popen ("iconv -l", "r"); if (!fh) return 1; while (! feof (fh)) { int n = fscanf (fh, " %s/%*s/", &charset[0]); if (n != 1) break; seen_a_charset = 1; register_charset (simple_charset (xstrdup (charset), 1, NULL, NULL, NULL, NULL)); } pclose (fh); return !seen_a_charset; ...which isn't quite right, but can maybe be adapted into something which is. > Another difference is that I have the intermediate step go through the > host wchar_t rather than UCS-4. This is nice because it means we can > use iswprint to decide if something is printable. But, it may have > limitations, I suppose, on a host where wchar_t is less capable. I think that might break for recent win32, where wchar_t is UTF-16 (i.e. more than one wide character may be needed for a given code point). > Julian> OK to apply, or any comments? > > If you wouldn't mind holding off, my patch is nearing completion. It > is feature complete, and at the moment I am writing test cases. Sure, I don't mind holding off. > I'm happy to send what I have now, if you want to see it. Or it is > all in the archer git repository on the tromey-archer-charset branch. > > I've lifted stuff -- ideas and code -- from your patch, but the result > is pretty different. Perhaps we could discuss the areas where we made > different decisions and try to plot the best route forward. OK, I'll have a look, but I'm not sure if I'll have anything sensible to say :-) Cheers, Julian