From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16088 invoked by alias); 19 Apr 2011 09:18:25 -0000 Received: (qmail 16080 invoked by uid 22791); 19 Apr 2011 09:18:23 -0000 X-SWARE-Spam-Status: No, hits=-1.4 required=5.0 tests=AWL,BAYES_00,MSGID_MULTIPLE_AT X-Spam-Check-By: sourceware.org Received: from mailhost.u-strasbg.fr (HELO mailhost.u-strasbg.fr) (130.79.200.158) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 19 Apr 2011 09:18:08 +0000 Received: from md2.u-strasbg.fr (md2.u-strasbg.fr [IPv6:2001:660:2402::187]) by mailhost.u-strasbg.fr (8.14.3/jtpda-5.5pre1) with ESMTP id p3J9I1io083713 ; Tue, 19 Apr 2011 11:18:01 +0200 (CEST) (envelope-from pierre.muller@ics-cnrs.unistra.fr) Received: from mailserver.u-strasbg.fr (ms7.u-strasbg.fr [130.79.204.16]) by md2.u-strasbg.fr (8.14.4/jtpda-5.5pre1) with ESMTP id p3J9I05H078506 ; Tue, 19 Apr 2011 11:18:00 +0200 (CEST) (envelope-from pierre.muller@ics-cnrs.unistra.fr) Received: from E6510Muller (gw-ics.u-strasbg.fr [130.79.210.225]) (user=mullerp mech=LOGIN) by mailserver.u-strasbg.fr (8.14.4/jtpda-5.5pre1) with ESMTP id p3J9HxSa023111 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO) ; Tue, 19 Apr 2011 11:18:00 +0200 (CEST) (envelope-from pierre.muller@ics-cnrs.unistra.fr) From: "Pierre Muller" To: "'Tom Tromey'" Cc: References: <5928.31498147479$1302882967@news.gmane.org> <005101cbfc50$193136b0$4b93a410$%muller@ics-cnrs.unistra.fr> <20110416162455.GA5599@host1.jankratochvil.net> <000001cbfc7d$3f67f440$be37dcc0$%muller@ics-cnrs.unistra.fr> <83zknpoacd.fsf@gnu.org> <21014.6501930014$1303139687@news.gmane.org> In-Reply-To: Subject: [RFC-v4] Handle cygwin wchar_t specifics Date: Tue, 19 Apr 2011 09:18:00 -0000 Message-ID: <004f01cbfe72$adddeb40$0999c1c0$@muller@ics-cnrs.unistra.fr> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2011-04/txt/msg00298.txt.bz2 > -----Message d'origine----- > De=A0: gdb-patches-owner@sourceware.org [mailto:gdb-patches- > owner@sourceware.org] De la part de Tom Tromey > Envoy=E9=A0: lundi 18 avril 2011 19:18 > =C0=A0: Pierre Muller > Cc=A0: 'Eli Zaretskii'; jan.kratochvil@redhat.com; gdb-patches@sourceware.org > Objet=A0: Re: [RFA-v3] Handle cygwin wchar_t specifics >=20 > >>>>> "Pierre" =3D=3D Pierre Muller writes: >=20 > Pierre> This patch also changes the intermediate_encoding for mingw hosts, > Pierre> from "wchar_t" to "UTF-16LE", but this seems to work nicely > Pierre> for both mingw32 and mingw64 (and only if iconv is found, > Pierre> otherwise gdb_wchar_t is simply char and phony functions are used). >=20 > Pierre> -#define INTERMEDIATE_ENCODING host_charset () > Pierre> +#define DEFAULT_INTERMEDIATE_ENCODING host_charset () >=20 > This changes the behavior if the gdb user changes the host encoding. > This is an unusual situation, admittedly, but it seems to me that it is > just as easy to only introduce the `intermediate_encoding' global in the > UTF-{16,32} case. >=20 > Pierre> + intermediate_encoding =3D DEFAULT_INTERMEDIATE_ENCODING; > Pierre> +# if defined (USE_WIN32API) || defined (__CYGWIN__) > Pierre> + if (sizeof (gdb_wchar_t) =3D=3D 2) > Pierre> + intermediate_encoding =3D "UTF-16LE"; > Pierre> +# endif >=20 > Here, instead of a special case for __CYGWIN__, and instead of > hard-coding the endian-ness, just use the same code for all > __STDC_ISO_10646__ platforms. Maybe something like: >=20 > intermediate_encoding =3D xstrprintf ("UTF-%d%s", 8 * sizeof (wchar_t), > WORDS_BIGENDIAN ? "BE" : "LE"); Three problems here: 1) we should really use "gdb_wchar_t" type, not "wchar_t" 2) If sizeof(gdb_wchar_t) =3D=3D 1 I don't think that UTF-8LE and UTF-8BE exist, do they? At least they are not in the iconv -l list for current cygwin. 3) WORD_BIGENDIAN is not defined at all on Cygwin, so that your code would probably not compile. A further question is whether UTF-32 is always supported... Below is yet another proposal: it transforms INTERMEDIATE_ENCODING macro into a call to intermediate_encoding function. This functions handles especially the case when gdb_wchar_t is 2 byte long, by trying UTF-16XE (with X equal L or B), and if this one is not in the list of supported charsets, tries UCS-2XE. As there is apparently no advantage of using UTF-32 over UCS-4 (according to Eli) I did not extend the change to the 4 byte case. Comments welcome, Pierre Muller 2011-04-19 Pierre Muller * gdb_wchar.h (DEFAULT_INTERMEDIATE_ENCODING): New macro. (INTERMEDIATE_ENCODING): Change value to intermediate_encoding function call. (intermediate_encoding): New prototype. * charset.c (intermediate_encoding): New function. Index: charset.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /cvs/src/src/gdb/charset.c,v retrieving revision 1.43 diff -u -p -r1.43 charset.c --- charset.c 11 Jan 2011 15:10:01 -0000 1.43 +++ charset.c 19 Apr 2011 09:05:43 -0000 @@ -922,6 +922,50 @@ default_auto_wide_charset (void) return GDB_DEFAULT_TARGET_WIDE_CHARSET; } =20 +#ifdef WORDS_BIGENDIAN +#define ENDIAN_SUFFIX "BE" +#else +#define ENDIAN_SUFFIX "LE" +#endif + +const char * +intermediate_encoding (void) +{ + if (sizeof (gdb_wchar_t) =3D=3D 2) + { + static const char *stored_result =3D NULL; + const char *result; + int i; + + if (stored_result) + return stored_result; + result =3D "UTF-16" ENDIAN_SUFFIX; + /* Check that the name is in the list of handled charsets. */ + for (i =3D 0; charset_enum[i]; i++) + { + if (strcmp (result, charset_enum[i]) =3D=3D 0) + { + stored_result =3D result; + return result; + } + } + /* Second try, with UCS-2 type. */ + result =3D "UCS-2" ENDIAN_SUFFIX; + /* Check that the name is in the list of handled charsets. */ + for (i =3D 0; charset_enum[i]; i++) + { + if (strcmp (result, charset_enum[i]) =3D=3D 0) + { + stored_result =3D result; + return result; + } + } + } + /* if gdb_wchar_t is not of size 2, or if "UTF-16XE" and "UCS-2XE" are + not known, use DEFAULT_INTERMEDIATE_ENCODING macro. */ + return DEFAULT_INTERMEDIATE_ENCODING; +} + void _initialize_charset (void) { Index: gdb_wchar.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /cvs/src/src/gdb/gdb_wchar.h,v retrieving revision 1.6 diff -u -p -r1.6 gdb_wchar.h --- gdb_wchar.h 1 Jan 2011 15:33:05 -0000 1.6 +++ gdb_wchar.h 19 Apr 2011 09:05:43 -0000 @@ -79,12 +79,12 @@ typedef wint_t gdb_wint_t; hosts that emit a BOM when the unadorned name is used. */ #if defined (__STDC_ISO_10646__) #if WORDS_BIGENDIAN -#define INTERMEDIATE_ENCODING "UCS-4BE" +#define DEFAULT_INTERMEDIATE_ENCODING "UCS-4BE" #else -#define INTERMEDIATE_ENCODING "UCS-4LE" +#define DEFAULT_INTERMEDIATE_ENCODING "UCS-4LE" #endif #elif defined (_LIBICONV_VERSION) && _LIBICONV_VERSION >=3D 0x108 -#define INTERMEDIATE_ENCODING "wchar_t" +#define DEFAULT_INTERMEDIATE_ENCODING "wchar_t" #else /* This shouldn't happen, because the earlier #if should have filtered out this case. */ @@ -115,11 +115,14 @@ typedef int gdb_wint_t; also providing a phony iconv, we might as well just stick with "wchar_t". */ #ifdef PHONY_ICONV -#define INTERMEDIATE_ENCODING "wchar_t" +#define DEFAULT_INTERMEDIATE_ENCODING "wchar_t" #else -#define INTERMEDIATE_ENCODING host_charset () +#define DEFAULT_INTERMEDIATE_ENCODING host_charset () #endif =20 #endif =20 +#define INTERMEDIATE_ENCODING intermediate_encoding () +const char *intermediate_encoding (void); + #endif /* GDB_WCHAR_H */