From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 23260 invoked by alias); 20 Apr 2011 07:59:58 -0000 Received: (qmail 23251 invoked by uid 22791); 20 Apr 2011 07:59:56 -0000 X-SWARE-Spam-Status: No, hits=-1.5 required=5.0 tests=AWL,BAYES_00,MSGID_MULTIPLE_AT X-Spam-Check-By: sourceware.org Received: from mailhost.u-strasbg.fr (HELO mailhost.u-strasbg.fr) (130.79.200.153) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 20 Apr 2011 07:59:41 +0000 Received: from md2.u-strasbg.fr (md2.u-strasbg.fr [IPv6:2001:660:2402::187]) by mailhost.u-strasbg.fr (8.14.3/jtpda-5.5pre1) with ESMTP id p3K7xXVW078545 ; Wed, 20 Apr 2011 09:59:33 +0200 (CEST) (envelope-from pierre.muller@ics-cnrs.unistra.fr) Received: from mailserver.u-strasbg.fr (ms1.u-strasbg.fr [130.79.204.10]) by md2.u-strasbg.fr (8.14.4/jtpda-5.5pre1) with ESMTP id p3K7xWhY023716 ; Wed, 20 Apr 2011 09:59:32 +0200 (CEST) (envelope-from pierre.muller@ics-cnrs.unistra.fr) Received: from E6510Muller (gw-ics.u-strasbg.fr [130.79.210.225]) (user=mullerp mech=LOGIN) by mailserver.u-strasbg.fr (8.14.4/jtpda-5.5pre1) with ESMTP id p3K7xVho070814 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO) ; Wed, 20 Apr 2011 09:59:32 +0200 (CEST) (envelope-from pierre.muller@ics-cnrs.unistra.fr) From: "Pierre Muller" To: "'Tom Tromey'" Cc: References: <5928.31498147479$1302882967@news.gmane.org> <005101cbfc50$193136b0$4b93a410$%muller@ics-cnrs.unistra.fr> <20110416162455.GA5599@host1.jankratochvil.net> <000001cbfc7d$3f67f440$be37dcc0$%muller@ics-cnrs.unistra.fr> <83zknpoacd.fsf@gnu.org> <21014.6501930014$1303139687@news.gmane.org> <34716.7311156683$1303204711@news.gmane.org> <16656.7281041809$1303221408@news.gmane.org> In-Reply-To: Subject: RE: [RFC-v5] Handle cygwin wchar_t specifics Date: Wed, 20 Apr 2011 07:59:00 -0000 Message-ID: <000201cbff30$e2266030$a6732090$@muller@ics-cnrs.unistra.fr> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2011-04/txt/msg00346.txt.bz2 > Tom> I don't think DEFAULT_INTERMEDIATE_ENCODING is needed. > > Pierre> I assumed you ment: not necessary if PHONY_ICONV is defined, > Pierre> and this is what I changed below. > Pierre> (I would personally have favored to completely remove > Pierre> INTERMEDIATE_ENCODING macro and call the function directly.) > > Sorry, that isn't what I meant. Hopefully I got it right this time... > All this new code is needed only in the __STDC_ISO_10646__ case. > All other cases are already handled ok. > So, I think it is best to only introduce new code along the > __STDC_ISO_10646__ branches. Thus far your patches have touched all the > other branches -- but there is no reason to do that, and I think it just > makes it more complicated without an associated benefit. > > Pierre> +#ifdef __STDC_ISO_10646__ > Pierre> + if (sizeof (gdb_wchar_t) == 2) > > You might as well unify the 2 and 4 byte cases like I said earlier, and > just die for any other value. Done below. > You can use a static assert trick to make > it die during compilation, which I think is better than dying at > runtime. E.g.: > extern char your_platform_is_bogus[(sizeof (gdb_wchar_t) == 2 > || sizeof (gdb_wchar_t) == 4) > ? 1 : -1]; Used below (renamed your_gdb_wchar_t_is_bogus). > Pierre> + /* Check that the name is in the list of handled charsets. > */ > Pierre> + for (i = 0; charset_enum[i]; i++) > > I don't think this is really needed either. > Or, if you really want to do the check, do it by calling iconv_open at > initialization, and then just make gdb die early -- whatever platform > does this is really messed up. Also done below. > Pierre> + /* if gdb_wchar_t is not of size 2, or if "UTF-16XE" and "UCS- > 2XE" are > Pierre> + not known, use DEFAULT_INTERMEDIATE_ENCODING macro. */ > Pierre> + return DEFAULT_INTERMEDIATE_ENCODING; > > I don't think this will generally do the right thing. > For example, your patch defines DEFAULT_INTERMEDIATE_ENCODING to > "UCS-4LE" in the !WORDS_BIGENDIAN case. But we already know that > gdb_wchar_t has 2 bytes. So I think this will just result in the same > bug as today. I hope I now understood what you wanted: the new code makes less changes to gdb_wchar_t. It only uses intermediate_encoding function in the case where UCS-4LE/BE where set before. To avoid having this code compiled in other cases, I defined a new macro called USE_INTERMEDIATE_ENCODING_FUNCTION and charset.c code changes are limited to this conditional. I used iconv_open to check for working charset names and added a call to error if none is found. Comments? Pierre 2011-04-20 Pierre Muller * gdb_wchar.h (USE_INTERMEDIATE_ENCODING_FUNCTION): New macro. (INTERMEDIATE_ENCODING): Change value to intermediate_encoding function call if __STDC_ISO_10646__ macro is defined. (intermediate_encoding): New prototype. * charset.c (your_gdb_wchar_t_is_bogus): New test variable to generate compile time error for unsupported gdb_wchar_t size. (ENDIAN_SUFFIX): New macro. (intermediate_encoding): New function. Index: charset.c =================================================================== RCS file: /cvs/src/src/gdb/charset.c,v retrieving revision 1.43 diff -u -p -r1.43 charset.c --- charset.c 11 Jan 2011 15:10:01 -0000 1.43 +++ charset.c 20 Apr 2011 07:48:21 -0000 @@ -922,6 +922,70 @@ default_auto_wide_charset (void) return GDB_DEFAULT_TARGET_WIDE_CHARSET; } + +#ifdef USE_INTERMEDIATE_ENCODING_FUNCTION +/* Macro used for UTF or UCS endianness suffix. */ +#if WORDS_BIGENDIAN +#define ENDIAN_SUFFIX "BE" +#else +#define ENDIAN_SUFFIX "LE" +#endif + +/* The code below serves to generate a compile time error if + gdb_wchar_t type is not of size 2 nor 4, despite the fact that + macro __STDC_ISO_10646__ is defined. + This is better than a gdb_assert call, because GDB cannot handle + strings correctly if this size is different. */ + +static char your_gdb_wchar_t_is_bogus[(sizeof (gdb_wchar_t) == 2 + || sizeof (gdb_wchar_t) == 4) + ? 1 : -1]; + +/* intermediate_encoding returns the charset unsed internally by + GDB to convert between target and host encodings. As the test above + compiled, sizeof (gdb_wchar_t) is either 2 or 4 bytes. + UTF-16/32 is tested first, UCS-2/4 is tested as a second option, + otherwise an error is generated. */ + +const char * +intermediate_encoding (void) +{ + iconv_t desc; + static const char *stored_result = NULL; + const char *result; + int i; + + if (stored_result) + return stored_result; + result = xstrprintf ("UTF-%d%s", sizeof (gdb_wchar_t) * 8, ENDIAN_SUFFIX); + /* Check that the name is supported by iconv_open. */ + desc = iconv_open (result, host_charset ()); + if (desc != (iconv_t) -1) + { + iconv_close (desc); + stored_result = result; + return result; + } + /* Not valid, free the allocated memory. */ + xfree ((void *) result); + /* Second try, with UCS-2 type. */ + result = xstrprintf ("UCS-%d%s", sizeof (gdb_wchar_t), ENDIAN_SUFFIX); + /* Check that the name is supported by iconv_open. */ + desc = iconv_open (result, host_charset ()); + if (desc != (iconv_t) -1) + { + iconv_close (desc); + stored_result = result; + return result; + } + /* Not valid, free the allocated memory. */ + xfree ((void *) result); + /* No valid charset found, generate error here. */ + error ("Unable to find a vaild charset for string conversions"); +} + +#endif /* USE_INTERMEDIATE_ENCODING_FUNCTION */ + void _initialize_charset (void) { Index: gdb_wchar.h =================================================================== RCS file: /cvs/src/src/gdb/gdb_wchar.h,v retrieving revision 1.6 diff -u -p -r1.6 gdb_wchar.h --- gdb_wchar.h 1 Jan 2011 15:33:05 -0000 1.6 +++ gdb_wchar.h 20 Apr 2011 07:48:21 -0000 @@ -78,11 +78,10 @@ typedef wint_t gdb_wint_t; iconv_open. We put the endianness into the encoding name to avoid hosts that emit a BOM when the unadorned name is used. */ #if defined (__STDC_ISO_10646__) -#if WORDS_BIGENDIAN -#define INTERMEDIATE_ENCODING "UCS-4BE" -#else -#define INTERMEDIATE_ENCODING "UCS-4LE" -#endif +#define USE_INTERMEDIATE_ENCODING_FUNCTION +#define INTERMEDIATE_ENCODING intermediate_encoding () +const char *intermediate_encoding (void); + #elif defined (_LIBICONV_VERSION) && _LIBICONV_VERSION >= 0x108 #define INTERMEDIATE_ENCODING "wchar_t" #else