* Re: [RFA] Handle cygwin wchar_t specifics [not found] <5928.31498147479$1302882967@news.gmane.org> @ 2011-04-15 18:16 ` Tom Tromey 2011-04-16 16:05 ` Pierre Muller 2011-04-18 20:07 ` Corinna Vinschen 0 siblings, 2 replies; 30+ messages in thread From: Tom Tromey @ 2011-04-15 18:16 UTC (permalink / raw) To: Pierre Muller; +Cc: gdb-patches >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes: Pierre> because of this, GDB uses "UCS-4LE" Pierre> for the macro INTERMEDIATE_ENCODING on Cygwin Pierre> (while "wchar_t" it uses for mingw32, which works well). Ok, I see the problem. I thought this: /* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4. But this is not true! For some values of __STDC_ISO_10646__, a 2 byte wide character type suffices. In particular, Cygwin's value of 200305 means that it corresponds to Unicode 4.0.0: http://www.unicode.org/versions/components-4.0.0.html I think this might be a Cygwin bug, but it is pretty hard to wade through the ISO / Unicode differences and other assorted standardese to see. (The reason I think it might be a bug is that Unicode 4.0.0 defines some characters > 0xFFFF.) Anyway, it doesn't matter if this is a Cygwin bug, since GDB's assumption here is wrong anyway. Pierre> The patch below fixes this by Pierre> explicitly setting the UCS size to two for Windows targets. I think in the __STDC_ISO_10646__ case, we should just explicitly use sizeof (wchar_t) somewhere to choose the intermediate encoding. I think this will be more robust than testing some host define. Pierre> +#define wchar_size (&(((wchar_t) (0)) + 1) - &((char *) 0)) This doesn't seem to be used. Tom ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [RFA] Handle cygwin wchar_t specifics 2011-04-15 18:16 ` [RFA] Handle cygwin wchar_t specifics Tom Tromey @ 2011-04-16 16:05 ` Pierre Muller 2011-04-16 16:25 ` Jan Kratochvil 2011-04-16 21:24 ` [RFA] " Tom Tromey 2011-04-18 20:07 ` Corinna Vinschen 1 sibling, 2 replies; 30+ messages in thread From: Pierre Muller @ 2011-04-16 16:05 UTC (permalink / raw) To: 'Tom Tromey'; +Cc: gdb-patches > -----Message d'origine----- > De : gdb-patches-owner@sourceware.org [mailto:gdb-patches- > owner@sourceware.org] De la part de Tom Tromey > Envoyé : vendredi 15 avril 2011 20:15 > À : Pierre Muller > Cc : gdb-patches@sourceware.org > Objet : Re: [RFA] Handle cygwin wchar_t specifics > > >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes: > > Pierre> because of this, GDB uses "UCS-4LE" > Pierre> for the macro INTERMEDIATE_ENCODING on Cygwin > Pierre> (while "wchar_t" it uses for mingw32, which works well). > > Ok, I see the problem. I thought this: > > /* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4. > > But this is not true! For some values of __STDC_ISO_10646__, a 2 byte > wide character type suffices. In particular, Cygwin's value of 200305 > means that it corresponds to Unicode 4.0.0: > > http://www.unicode.org/versions/components-4.0.0.html > > I think this might be a Cygwin bug, but it is pretty hard to wade > through the ISO / Unicode differences and other assorted standardese to > see. (The reason I think it might be a bug is that Unicode 4.0.0 > defines some characters > 0xFFFF.) > > Anyway, it doesn't matter if this is a Cygwin bug, since GDB's > assumption here is wrong anyway. OK. > Pierre> The patch below fixes this by > Pierre> explicitly setting the UCS size to two for Windows targets. > > I think in the __STDC_ISO_10646__ case, we should just explicitly use > sizeof (wchar_t) somewhere to choose the intermediate encoding. I think > this will be more robust than testing some host define. Yes, but the problem is that it is not possible to use sizeof inside a #if conditions :( > Pierre> +#define wchar_size (&(((wchar_t) (0)) + 1) - &((char *) 0)) > > This doesn't seem to be used. I googled around to see if there is a workaround to this limitation of not being able to use sizeof inside conditionals and then forgot to remove it... Do you know of any way to get the size of wchar_t? I suspect we will need to add this to the configure scripts... But I am still very bad on that part. Help most welcome, Pierre ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFA] Handle cygwin wchar_t specifics 2011-04-16 16:05 ` Pierre Muller @ 2011-04-16 16:25 ` Jan Kratochvil 2011-04-16 21:29 ` [RFA-v2] " Pierre Muller [not found] ` <000001cbfc7d$3f67f440$be37dcc0$%muller@ics-cnrs.unistra.fr> 2011-04-16 21:24 ` [RFA] " Tom Tromey 1 sibling, 2 replies; 30+ messages in thread From: Jan Kratochvil @ 2011-04-16 16:25 UTC (permalink / raw) To: Pierre Muller; +Cc: 'Tom Tromey', gdb-patches On Sat, 16 Apr 2011 18:05:19 +0200, Pierre Muller wrote: > Do you know of any way to get the size of wchar_t? > I suspect we will need to add this to the configure scripts... > But I am still very bad on that part. I do not follow the platform specifics of the problem but this specific technical task is attached. On GNU/Linux I get in config.h: /* The size of `wchar_t', as computed by sizeof. */ #define SIZEOF_WCHAR_T 4 info '(autoconf)AC_CHECK_SIZEOF' Fro cross-compilation the default is 4, for some unknown error it is 0. HTH, Jan --- a/gdb/config.in +++ b/gdb/config.in @@ -804,6 +804,9 @@ /* The size of `long', as computed by sizeof. */ #undef SIZEOF_LONG +/* The size of `wchar_t', as computed by sizeof. */ +#undef SIZEOF_WCHAR_T + /* Define to l, ll, u, ul, ull, etc., as suitable for constants of type 'size_t'. */ #undef SIZE_T_SUFFIX --- a/gdb/configure +++ b/gdb/configure @@ -11637,6 +11637,44 @@ _ACEOF fi +# The cast to long int works around a bug in the HP C Compiler +# version HP92453-01 B.11.11.23709.GP, which incorrectly rejects +# declarations like `int a3[[(sizeof (unsigned char)) >= 0]];'. +# This bug is HP SR number 8606223364. +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking size of wchar_t" >&5 +$as_echo_n "checking size of wchar_t... " >&6; } +if test "${ac_cv_sizeof_wchar_t+set}" = set; then : + $as_echo_n "(cached) " >&6 +else + if ac_fn_c_compute_int "$LINENO" "(long int) (sizeof (wchar_t))" "ac_cv_sizeof_wchar_t" " +#include <wchar.h> +#include <wctype.h> + +"; then : + +else + if test "$ac_cv_type_wchar_t" = yes; then + { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 +$as_echo "$as_me: error: in \`$ac_pwd':" >&2;} +{ as_fn_set_status 77 +as_fn_error "cannot compute sizeof (wchar_t) +See \`config.log' for more details." "$LINENO" 5; }; } + else + ac_cv_sizeof_wchar_t=0 + fi +fi + +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_sizeof_wchar_t" >&5 +$as_echo "$ac_cv_sizeof_wchar_t" >&6; } + + + +cat >>confdefs.h <<_ACEOF +#define SIZEOF_WCHAR_T $ac_cv_sizeof_wchar_t +_ACEOF + + # ------------------------------------- # # Checks for compiler characteristics. # --- a/gdb/configure.ac +++ b/gdb/configure.ac @@ -976,6 +976,10 @@ AC_CHECK_TYPES(socklen_t, [], [], [#include <sys/types.h> #include <sys/socket.h> ]) +AC_CHECK_SIZEOF([wchar_t], 4, [ +#include <wchar.h> +#include <wctype.h> +]) # ------------------------------------- # # Checks for compiler characteristics. # ^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFA-v2] Handle cygwin wchar_t specifics 2011-04-16 16:25 ` Jan Kratochvil @ 2011-04-16 21:29 ` Pierre Muller 2011-04-16 22:35 ` Jan Kratochvil [not found] ` <000001cbfc7d$3f67f440$be37dcc0$%muller@ics-cnrs.unistra.fr> 1 sibling, 1 reply; 30+ messages in thread From: Pierre Muller @ 2011-04-16 21:29 UTC (permalink / raw) To: 'Jan Kratochvil'; +Cc: 'Tom Tromey', gdb-patches Thanks Jan, I was able thanks to your code to generate a patch that seems to work for me. Goal of the patch is to generate for INTERMEDIATE_ENCODING a name that is "UCS-XYY" where YY is LE or BE (which was already handled before but where X is either 4 or 2 depending of the size of type wchar_t type. I don't know if the configure change is completely generated by the small configure.ac change. If this is true, the ChangeLog entry should probably just say Regenerate for configure. Is this patch OK? Should it be include in 7.3 branch? Pierre 2011-04-16 Pierre Muller <muller@ics.u-strasbg.fr> Correct INTERMEDIATE_ENCODING macro setup for systems using 2 byte "wchar_t" type. * gdb_wchar.h: Use new SIZEOF_WCHAR_T macro to set INTERMEDIATE_ENCODING macro value. * config.in: Add SIZEOF_WCHAR_T macro. * configure.ac: Add rule for SIZEOF_WCHAR_T. * configure: Likewise. Index: config.in =================================================================== RCS file: /cvs/src/src/gdb/config.in,v retrieving revision 1.125 diff -u -p -r1.125 config.in --- config.in 17 Mar 2011 13:19:09 -0000 1.125 +++ config.in 16 Apr 2011 21:19:47 -0000 @@ -804,6 +804,9 @@ /* The size of `long', as computed by sizeof. */ #undef SIZEOF_LONG +/* The size of `wchar_t', as computed by sizeof. */ +#undef SIZEOF_WCHAR_T + /* Define to l, ll, u, ul, ull, etc., as suitable for constants of type 'size_t'. */ #undef SIZE_T_SUFFIX Index: configure =================================================================== RCS file: /cvs/src/src/gdb/configure,v retrieving revision 1.329 diff -u -p -r1.329 configure --- configure 17 Mar 2011 13:19:09 -0000 1.329 +++ configure 16 Apr 2011 21:19:52 -0000 @@ -11637,6 +11637,44 @@ _ACEOF fi +# The cast to long int works around a bug in the HP C Compiler +# version HP92453-01 B.11.11.23709.GP, which incorrectly rejects +# declarations like `int a3[[(sizeof (unsigned char)) >= 0]];'. +# This bug is HP SR number 8606223364. +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking size of wchar_t" >&5 +$as_echo_n "checking size of wchar_t... " >&6; } +if test "${ac_cv_sizeof_wchar_t+set}" = set; then : + $as_echo_n "(cached) " >&6 +else + if ac_fn_c_compute_int "$LINENO" "(long int) (sizeof (wchar_t))" "ac_cv_sizeof_wchar_t" " +#include <wchar.h> +#include <wctype.h> + +"; then : + +else + if test "$ac_cv_type_wchar_t" = yes; then + { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5 +$as_echo "$as_me: error: in \`$ac_pwd':" >&2;} +{ as_fn_set_status 77 +as_fn_error "cannot compute sizeof (wchar_t) +See \`config.log' for more details." "$LINENO" 5; }; } + else + ac_cv_sizeof_wchar_t=0 + fi +fi + +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_sizeof_wchar_t" >&5 +$as_echo "$ac_cv_sizeof_wchar_t" >&6; } + + + +cat >>confdefs.h <<_ACEOF +#define SIZEOF_WCHAR_T $ac_cv_sizeof_wchar_t +_ACEOF + + # ------------------------------------- # # Checks for compiler characteristics. # Index: configure.ac =================================================================== RCS file: /cvs/src/src/gdb/configure.ac,v retrieving revision 1.144 diff -u -p -r1.144 configure.ac --- configure.ac 17 Mar 2011 13:19:10 -0000 1.144 +++ configure.ac 16 Apr 2011 21:19:52 -0000 @@ -976,6 +976,10 @@ AC_CHECK_TYPES(socklen_t, [], [], [#include <sys/types.h> #include <sys/socket.h> ]) +AC_CHECK_SIZEOF([wchar_t], 4, [ +#include <wchar.h> +#include <wctype.h> +]) # ------------------------------------- # # Checks for compiler characteristics. # Index: gdb_wchar.h =================================================================== RCS file: /cvs/src/src/gdb/gdb_wchar.h,v retrieving revision 1.6 diff -u -p -r1.6 gdb_wchar.h --- gdb_wchar.h 1 Jan 2011 15:33:05 -0000 1.6 +++ gdb_wchar.h 16 Apr 2011 21:19:57 -0000 @@ -60,6 +60,7 @@ #include <wchar.h> #include <wctype.h> +#include "config.h" typedef wchar_t gdb_wchar_t; typedef wint_t gdb_wint_t; @@ -71,20 +72,26 @@ typedef wint_t gdb_wint_t; #define gdb_WEOF WEOF #define LCST(X) L ## X +/* Transform SIZEOF_WCHAR_T into a string. This requires a two-level + macro. This macro is used to generate INTERMEDIATE_ENCODING below. */ +#define STR_VAL1(X) #X +#define STR_VAL(X) STR_VAL1(X) +#define SIZEOF_WCHAR_T_STR STR_VAL(SIZEOF_WCHAR_T) -/* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4. +/* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4 or UCS-2. + We use the version having the same size as "wchar_t" type. We exploit this fact in the hope that there are hosts that define this but which do not support "wchar_t" as an encoding argument to iconv_open. We put the endianness into the encoding name to avoid hosts that emit a BOM when the unadorned name is used. */ #if defined (__STDC_ISO_10646__) -#if WORDS_BIGENDIAN -#define INTERMEDIATE_ENCODING "UCS-4BE" -#else -#define INTERMEDIATE_ENCODING "UCS-4LE" -#endif +# if WORDS_BIGENDIAN +# define INTERMEDIATE_ENCODING "UCS-" SIZEOF_WCHAR_T_STR "BE" +# else +# define INTERMEDIATE_ENCODING "UCS-" SIZEOF_WCHAR_T_STR "LE" +# endif #elif defined (_LIBICONV_VERSION) && _LIBICONV_VERSION >= 0x108 -#define INTERMEDIATE_ENCODING "wchar_t" +# define INTERMEDIATE_ENCODING "wchar_t" #else /* This shouldn't happen, because the earlier #if should have filtered out this case. */ ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFA-v2] Handle cygwin wchar_t specifics 2011-04-16 21:29 ` [RFA-v2] " Pierre Muller @ 2011-04-16 22:35 ` Jan Kratochvil 0 siblings, 0 replies; 30+ messages in thread From: Jan Kratochvil @ 2011-04-16 22:35 UTC (permalink / raw) To: Pierre Muller; +Cc: 'Tom Tromey', gdb-patches On Sat, 16 Apr 2011 23:28:35 +0200, Pierre Muller wrote: > --- config.in 17 Mar 2011 13:19:09 -0000 1.125 > +++ config.in 16 Apr 2011 21:19:47 -0000 > --- configure 17 Mar 2011 13:19:09 -0000 1.329 > +++ configure 16 Apr 2011 21:19:52 -0000 You do not have to post autogenerated files here, it was posted only due to your request for your convenience before. > --- gdb_wchar.h 1 Jan 2011 15:33:05 -0000 1.6 > +++ gdb_wchar.h 16 Apr 2011 21:19:57 -0000 > @@ -60,6 +60,7 @@ > > #include <wchar.h> > #include <wctype.h> > +#include "config.h" I would say rather #include "defs.h" and as the first #include line. > +#define STR_VAL1(X) #X > +#define STR_VAL(X) STR_VAL1(X) This is called XSTRING in include/symcat.h. > +#define SIZEOF_WCHAR_T_STR STR_VAL(SIZEOF_WCHAR_T) ^^ space - coding style > -/* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4. > +/* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4 or > UCS-2. Corrupted diff by your mailer word wrapping. These are just technical notes, no real review/approval. Thanks, Jan ^ permalink raw reply [flat|nested] 30+ messages in thread
[parent not found: <000001cbfc7d$3f67f440$be37dcc0$%muller@ics-cnrs.unistra.fr>]
* Re: [RFA-v2] Handle cygwin wchar_t specifics [not found] ` <000001cbfc7d$3f67f440$be37dcc0$%muller@ics-cnrs.unistra.fr> @ 2011-04-17 2:55 ` Eli Zaretskii 2011-04-18 10:36 ` Pierre Muller ` (3 more replies) 0 siblings, 4 replies; 30+ messages in thread From: Eli Zaretskii @ 2011-04-17 2:55 UTC (permalink / raw) To: Pierre Muller; +Cc: jan.kratochvil, tromey, gdb-patches > From: "Pierre Muller" <pierre.muller@ics-cnrs.unistra.fr> > Cc: "'Tom Tromey'" <tromey@redhat.com>, <gdb-patches@sourceware.org> > Date: Sat, 16 Apr 2011 23:28:35 +0200 > > -/* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4. > +/* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4 or UCS-2. Please use UTF-16, not UCS-2. What Windows uses is the former. The latter is the old name from the days when Unicode covered only the BMP; it was superseded by UTF-16 that covers more than that. ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [RFA-v2] Handle cygwin wchar_t specifics 2011-04-17 2:55 ` Eli Zaretskii @ 2011-04-18 10:36 ` Pierre Muller [not found] ` <00a801cbfdb4$551214a0$ff363de0$%muller@ics-cnrs.unistra.fr> ` (2 subsequent siblings) 3 siblings, 0 replies; 30+ messages in thread From: Pierre Muller @ 2011-04-18 10:36 UTC (permalink / raw) To: 'Eli Zaretskii'; +Cc: jan.kratochvil, tromey, gdb-patches Hi Eli, > -----Message d'origine----- > De : gdb-patches-owner@sourceware.org [mailto:gdb-patches- > owner@sourceware.org] De la part de Eli Zaretskii > Envoyé : dimanche 17 avril 2011 04:56 > À : Pierre Muller > Cc : jan.kratochvil@redhat.com; tromey@redhat.com; gdb- > patches@sourceware.org > Objet : Re: [RFA-v2] Handle cygwin wchar_t specifics > > > From: "Pierre Muller" <pierre.muller@ics-cnrs.unistra.fr> > > Cc: "'Tom Tromey'" <tromey@redhat.com>, <gdb-patches@sourceware.org> > > Date: Sat, 16 Apr 2011 23:28:35 +0200 > > > > -/* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4. > > +/* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4 or > UCS-2. > > Please use UTF-16, not UCS-2. What Windows uses is the former. The > latter is the old name from the days when Unicode covered only the > BMP; it was superseded by UTF-16 that covers more than that. Are you sure this is correct? I tried what you said, but "UTF-16" seems to mean "UTF-16BE" while UTF-16LE" seems to do a better job. But if UTF-16 is better than UCS-2, shouldn't we also favor UTF-32 over UCS-4? I will send a new RFA using UTF16-LE for windows shortly. Pierre ^ permalink raw reply [flat|nested] 30+ messages in thread
[parent not found: <00a801cbfdb4$551214a0$ff363de0$%muller@ics-cnrs.unistra.fr>]
* Re: [RFA-v2] Handle cygwin wchar_t specifics [not found] ` <00a801cbfdb4$551214a0$ff363de0$%muller@ics-cnrs.unistra.fr> @ 2011-04-18 10:57 ` Eli Zaretskii 0 siblings, 0 replies; 30+ messages in thread From: Eli Zaretskii @ 2011-04-18 10:57 UTC (permalink / raw) To: Pierre Muller; +Cc: jan.kratochvil, tromey, gdb-patches > From: "Pierre Muller" <pierre.muller@ics-cnrs.unistra.fr> > Cc: <jan.kratochvil@redhat.com>, <tromey@redhat.com>, <gdb-patches@sourceware.org> > Date: Mon, 18 Apr 2011 12:35:26 +0200 > > > > -/* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4. > > > +/* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4 or > > UCS-2. > > > > Please use UTF-16, not UCS-2. What Windows uses is the former. The > > latter is the old name from the days when Unicode covered only the > > BMP; it was superseded by UTF-16 that covers more than that. > > Are you sure this is correct? > I tried what you said, but "UTF-16" seems to mean "UTF-16BE" > while UTF-16LE" seems to do a better job. UTF-16 means both LE and BE varieties. I meant to use UTF-16 in the comment, instead of UCS-2. In the code, you need to use the variety that suits the endianness of the host platform. > But if UTF-16 is better than UCS-2, > shouldn't we also favor UTF-32 over UCS-4? IMO, there's no need, since Unicode still didn't exceed 32 bits. ^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFA-v3] Handle cygwin wchar_t specifics 2011-04-17 2:55 ` Eli Zaretskii 2011-04-18 10:36 ` Pierre Muller [not found] ` <00a801cbfdb4$551214a0$ff363de0$%muller@ics-cnrs.unistra.fr> @ 2011-04-18 15:14 ` Pierre Muller [not found] ` <21014.6501930014$1303139687@news.gmane.org> 3 siblings, 0 replies; 30+ messages in thread From: Pierre Muller @ 2011-04-18 15:14 UTC (permalink / raw) To: 'Eli Zaretskii'; +Cc: jan.kratochvil, tromey, gdb-patches Here is a new version of my patch that should only change something for Windows-OS hosts. This patch also changes the intermediate_encoding for mingw hosts, from "wchar_t" to "UTF-16LE", but this seems to work nicely for both mingw32 and mingw64 (and only if iconv is found, otherwise gdb_wchar_t is simply char and phony functions are used). The change might nevertheless be restricted to __CYGWIN__ only if you think that this is a better option. Comments? Pierre 2011-04-16 Pierre Muller <muller@ics.u-strasbg.fr> Correct INTERMEDIATE_ENCODING macro setup for Windows OS using 2 byte "wchar_t" type. * gdb_wchar.h (DEFAULT_INTERMEDIATE_ENCODING): New macro. (INTERMEDIATE_ENCODING): Change macro value to... (intermediate_encoding): New external. * charset.c (intermediate_encoding): New variable. (_initialize_charset): Assign default value of intermediate_encoding using DEFAULT_INTERMEDAIT_ENCODING. Override this for Windows OS system if size of "gdb_wchar_t" type is two. Index: gdb_wchar.h =================================================================== RCS file: /cvs/src/src/gdb/gdb_wchar.h,v retrieving revision 1.6 diff -u -p -r1.6 gdb_wchar.h --- gdb_wchar.h 1 Jan 2011 15:33:05 -0000 1.6 +++ gdb_wchar.h 18 Apr 2011 15:07:03 -0000 @@ -79,12 +79,12 @@ typedef wint_t gdb_wint_t; hosts that emit a BOM when the unadorned name is used. */ #if defined (__STDC_ISO_10646__) #if WORDS_BIGENDIAN -#define INTERMEDIATE_ENCODING "UCS-4BE" +#define DEFAULT_INTERMEDIATE_ENCODING "UCS-4BE" #else -#define INTERMEDIATE_ENCODING "UCS-4LE" +#define DEFAULT_INTERMEDIATE_ENCODING "UCS-4LE" #endif #elif defined (_LIBICONV_VERSION) && _LIBICONV_VERSION >= 0x108 -#define INTERMEDIATE_ENCODING "wchar_t" +#define DEFAULT_INTERMEDIATE_ENCODING "wchar_t" #else /* This shouldn't happen, because the earlier #if should have filtered out this case. */ @@ -115,11 +115,14 @@ typedef int gdb_wint_t; also providing a phony iconv, we might as well just stick with "wchar_t". */ #ifdef PHONY_ICONV -#define INTERMEDIATE_ENCODING "wchar_t" +#define DEFAULT_INTERMEDIATE_ENCODING "wchar_t" #else -#define INTERMEDIATE_ENCODING host_charset () +#define DEFAULT_INTERMEDIATE_ENCODING host_charset () #endif #endif +#define INTERMEDIATE_ENCODING intermediate_encoding +extern const char *intermediate_encoding; + #endif /* GDB_WCHAR_H */ Index: charset.c =================================================================== RCS file: /cvs/src/src/gdb/charset.c,v retrieving revision 1.43 diff -u -p -r1.43 charset.c --- charset.c 11 Jan 2011 15:10:01 -0000 1.43 +++ charset.c 18 Apr 2011 15:07:03 -0000 @@ -206,6 +206,7 @@ phony_iconv (iconv_t utf_flag, const cha #define GDB_DEFAULT_TARGET_WIDE_CHARSET "UTF-32" #endif +const char *intermediate_encoding = NULL; static const char *auto_host_charset_name = GDB_DEFAULT_HOST_CHARSET; static const char *host_charset_name = "auto"; static void @@ -935,7 +936,7 @@ _initialize_charset (void) charset_enum = default_charset_names; #ifndef PHONY_ICONV -#ifdef HAVE_LANGINFO_CODESET +# ifdef HAVE_LANGINFO_CODESET /* The result of nl_langinfo may be overwritten later. This may leak a little memory, if the user later changes the host charset, but that doesn't matter much. */ @@ -946,7 +947,7 @@ _initialize_charset (void) if (!strcmp (auto_host_charset_name, "646") || !*auto_host_charset_name) auto_host_charset_name = "ASCII"; auto_target_charset_name = auto_host_charset_name; -#elif defined (USE_WIN32API) +# elif defined (USE_WIN32API) { /* "CP" + x<=5 digits + paranoia. */ static char w32_host_default_charset[16]; @@ -956,8 +957,14 @@ _initialize_charset (void) auto_host_charset_name = w32_host_default_charset; auto_target_charset_name = auto_host_charset_name; } +# endif #endif -#endif + + intermediate_encoding = DEFAULT_INTERMEDIATE_ENCODING; +# if defined (USE_WIN32API) || defined (__CYGWIN__) + if (sizeof (gdb_wchar_t) == 2) + intermediate_encoding = "UTF-16LE"; +# endif add_setshow_enum_cmd ("charset", class_support, charset_enum, &host_charset_name, _("\ ^ permalink raw reply [flat|nested] 30+ messages in thread
[parent not found: <21014.6501930014$1303139687@news.gmane.org>]
* Re: [RFA-v3] Handle cygwin wchar_t specifics [not found] ` <21014.6501930014$1303139687@news.gmane.org> @ 2011-04-18 17:18 ` Tom Tromey 2011-04-19 9:18 ` [RFC-v4] " Pierre Muller ` (2 more replies) 0 siblings, 3 replies; 30+ messages in thread From: Tom Tromey @ 2011-04-18 17:18 UTC (permalink / raw) To: Pierre Muller; +Cc: 'Eli Zaretskii', jan.kratochvil, gdb-patches >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes: Pierre> This patch also changes the intermediate_encoding for mingw hosts, Pierre> from "wchar_t" to "UTF-16LE", but this seems to work nicely Pierre> for both mingw32 and mingw64 (and only if iconv is found, Pierre> otherwise gdb_wchar_t is simply char and phony functions are used). Pierre> -#define INTERMEDIATE_ENCODING host_charset () Pierre> +#define DEFAULT_INTERMEDIATE_ENCODING host_charset () This changes the behavior if the gdb user changes the host encoding. This is an unusual situation, admittedly, but it seems to me that it is just as easy to only introduce the `intermediate_encoding' global in the UTF-{16,32} case. Pierre> + intermediate_encoding = DEFAULT_INTERMEDIATE_ENCODING; Pierre> +# if defined (USE_WIN32API) || defined (__CYGWIN__) Pierre> + if (sizeof (gdb_wchar_t) == 2) Pierre> + intermediate_encoding = "UTF-16LE"; Pierre> +# endif Here, instead of a special case for __CYGWIN__, and instead of hard-coding the endian-ness, just use the same code for all __STDC_ISO_10646__ platforms. Maybe something like: intermediate_encoding = xstrprintf ("UTF-%d%s", 8 * sizeof (wchar_t), WORDS_BIGENDIAN ? "BE" : "LE"); Tom ^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC-v4] Handle cygwin wchar_t specifics 2011-04-18 17:18 ` Tom Tromey @ 2011-04-19 9:18 ` Pierre Muller [not found] ` <004f01cbfe72$adddeb40$0999c1c0$%muller@ics-cnrs.unistra.fr> [not found] ` <34716.7311156683$1303204711@news.gmane.org> 2 siblings, 0 replies; 30+ messages in thread From: Pierre Muller @ 2011-04-19 9:18 UTC (permalink / raw) To: 'Tom Tromey'; +Cc: gdb-patches > -----Message d'origine----- > De : gdb-patches-owner@sourceware.org [mailto:gdb-patches- > owner@sourceware.org] De la part de Tom Tromey > Envoyé : lundi 18 avril 2011 19:18 > À : Pierre Muller > Cc : 'Eli Zaretskii'; jan.kratochvil@redhat.com; gdb-patches@sourceware.org > Objet : Re: [RFA-v3] Handle cygwin wchar_t specifics > > >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes: > > Pierre> This patch also changes the intermediate_encoding for mingw hosts, > Pierre> from "wchar_t" to "UTF-16LE", but this seems to work nicely > Pierre> for both mingw32 and mingw64 (and only if iconv is found, > Pierre> otherwise gdb_wchar_t is simply char and phony functions are used). > > Pierre> -#define INTERMEDIATE_ENCODING host_charset () > Pierre> +#define DEFAULT_INTERMEDIATE_ENCODING host_charset () > > This changes the behavior if the gdb user changes the host encoding. > This is an unusual situation, admittedly, but it seems to me that it is > just as easy to only introduce the `intermediate_encoding' global in the > UTF-{16,32} case. > > Pierre> + intermediate_encoding = DEFAULT_INTERMEDIATE_ENCODING; > Pierre> +# if defined (USE_WIN32API) || defined (__CYGWIN__) > Pierre> + if (sizeof (gdb_wchar_t) == 2) > Pierre> + intermediate_encoding = "UTF-16LE"; > Pierre> +# endif > > Here, instead of a special case for __CYGWIN__, and instead of > hard-coding the endian-ness, just use the same code for all > __STDC_ISO_10646__ platforms. Maybe something like: > > intermediate_encoding = xstrprintf ("UTF-%d%s", 8 * sizeof (wchar_t), > WORDS_BIGENDIAN ? "BE" : "LE"); Three problems here: 1) we should really use "gdb_wchar_t" type, not "wchar_t" 2) If sizeof(gdb_wchar_t) == 1 I don't think that UTF-8LE and UTF-8BE exist, do they? At least they are not in the iconv -l list for current cygwin. 3) WORD_BIGENDIAN is not defined at all on Cygwin, so that your code would probably not compile. A further question is whether UTF-32 is always supported... Below is yet another proposal: it transforms INTERMEDIATE_ENCODING macro into a call to intermediate_encoding function. This functions handles especially the case when gdb_wchar_t is 2 byte long, by trying UTF-16XE (with X equal L or B), and if this one is not in the list of supported charsets, tries UCS-2XE. As there is apparently no advantage of using UTF-32 over UCS-4 (according to Eli) I did not extend the change to the 4 byte case. Comments welcome, Pierre Muller 2011-04-19 Pierre Muller <muller@ics.u-strasbg.fr> * gdb_wchar.h (DEFAULT_INTERMEDIATE_ENCODING): New macro. (INTERMEDIATE_ENCODING): Change value to intermediate_encoding function call. (intermediate_encoding): New prototype. * charset.c (intermediate_encoding): New function. Index: charset.c =================================================================== RCS file: /cvs/src/src/gdb/charset.c,v retrieving revision 1.43 diff -u -p -r1.43 charset.c --- charset.c 11 Jan 2011 15:10:01 -0000 1.43 +++ charset.c 19 Apr 2011 09:05:43 -0000 @@ -922,6 +922,50 @@ default_auto_wide_charset (void) return GDB_DEFAULT_TARGET_WIDE_CHARSET; } +#ifdef WORDS_BIGENDIAN +#define ENDIAN_SUFFIX "BE" +#else +#define ENDIAN_SUFFIX "LE" +#endif + +const char * +intermediate_encoding (void) +{ + if (sizeof (gdb_wchar_t) == 2) + { + static const char *stored_result = NULL; + const char *result; + int i; + + if (stored_result) + return stored_result; + result = "UTF-16" ENDIAN_SUFFIX; + /* Check that the name is in the list of handled charsets. */ + for (i = 0; charset_enum[i]; i++) + { + if (strcmp (result, charset_enum[i]) == 0) + { + stored_result = result; + return result; + } + } + /* Second try, with UCS-2 type. */ + result = "UCS-2" ENDIAN_SUFFIX; + /* Check that the name is in the list of handled charsets. */ + for (i = 0; charset_enum[i]; i++) + { + if (strcmp (result, charset_enum[i]) == 0) + { + stored_result = result; + return result; + } + } + } + /* if gdb_wchar_t is not of size 2, or if "UTF-16XE" and "UCS-2XE" are + not known, use DEFAULT_INTERMEDIATE_ENCODING macro. */ + return DEFAULT_INTERMEDIATE_ENCODING; +} + void _initialize_charset (void) { Index: gdb_wchar.h =================================================================== RCS file: /cvs/src/src/gdb/gdb_wchar.h,v retrieving revision 1.6 diff -u -p -r1.6 gdb_wchar.h --- gdb_wchar.h 1 Jan 2011 15:33:05 -0000 1.6 +++ gdb_wchar.h 19 Apr 2011 09:05:43 -0000 @@ -79,12 +79,12 @@ typedef wint_t gdb_wint_t; hosts that emit a BOM when the unadorned name is used. */ #if defined (__STDC_ISO_10646__) #if WORDS_BIGENDIAN -#define INTERMEDIATE_ENCODING "UCS-4BE" +#define DEFAULT_INTERMEDIATE_ENCODING "UCS-4BE" #else -#define INTERMEDIATE_ENCODING "UCS-4LE" +#define DEFAULT_INTERMEDIATE_ENCODING "UCS-4LE" #endif #elif defined (_LIBICONV_VERSION) && _LIBICONV_VERSION >= 0x108 -#define INTERMEDIATE_ENCODING "wchar_t" +#define DEFAULT_INTERMEDIATE_ENCODING "wchar_t" #else /* This shouldn't happen, because the earlier #if should have filtered out this case. */ @@ -115,11 +115,14 @@ typedef int gdb_wint_t; also providing a phony iconv, we might as well just stick with "wchar_t". */ #ifdef PHONY_ICONV -#define INTERMEDIATE_ENCODING "wchar_t" +#define DEFAULT_INTERMEDIATE_ENCODING "wchar_t" #else -#define INTERMEDIATE_ENCODING host_charset () +#define DEFAULT_INTERMEDIATE_ENCODING host_charset () #endif #endif +#define INTERMEDIATE_ENCODING intermediate_encoding () +const char *intermediate_encoding (void); + #endif /* GDB_WCHAR_H */ ^ permalink raw reply [flat|nested] 30+ messages in thread
[parent not found: <004f01cbfe72$adddeb40$0999c1c0$%muller@ics-cnrs.unistra.fr>]
* Re: [RFC-v4] Handle cygwin wchar_t specifics [not found] ` <004f01cbfe72$adddeb40$0999c1c0$%muller@ics-cnrs.unistra.fr> @ 2011-04-19 9:34 ` Eli Zaretskii 0 siblings, 0 replies; 30+ messages in thread From: Eli Zaretskii @ 2011-04-19 9:34 UTC (permalink / raw) To: Pierre Muller; +Cc: tromey, gdb-patches > From: "Pierre Muller" <pierre.muller@ics-cnrs.unistra.fr> > Cc: <gdb-patches@sourceware.org> > Date: Tue, 19 Apr 2011 11:17:59 +0200 > > 2) If sizeof(gdb_wchar_t) == 1 > I don't think that UTF-8LE and UTF-8BE exist, do they? No. Single-byte encodings are by definition endian-less (because endianness is about byte order in multibyte words). ^ permalink raw reply [flat|nested] 30+ messages in thread
[parent not found: <34716.7311156683$1303204711@news.gmane.org>]
* Re: [RFC-v4] Handle cygwin wchar_t specifics [not found] ` <34716.7311156683$1303204711@news.gmane.org> @ 2011-04-19 13:19 ` Tom Tromey 2011-04-19 13:56 ` [RFC-v5] " Pierre Muller [not found] ` <16656.7281041809$1303221408@news.gmane.org> 0 siblings, 2 replies; 30+ messages in thread From: Tom Tromey @ 2011-04-19 13:19 UTC (permalink / raw) To: Pierre Muller; +Cc: gdb-patches >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes: Pierre> 1) we should really use "gdb_wchar_t" type, not "wchar_t" Yeah. Pierre> 2) If sizeof(gdb_wchar_t) == 1 Pierre> I don't think that UTF-8LE and UTF-8BE exist, do they? Pierre> At least they are not in the iconv -l list for current cygwin. A platform where this is true should not define __STDC_ISO_10646__. You might as well just assert that the size is 2 or 4. Pierre> 3) WORD_BIGENDIAN is not defined at all on Cygwin, Pierre> so that your code would probably not compile. Yeah, I forgot, you need #if. See config.in. Pierre> A further question is whether UTF-32 is always supported... If someone can find a platform where wchar_t is 4 bytes, where __STDC_ISO_10646__ is defined, and where UTF-32 is not understood, then we can complain bitterly and change the code again. Pierre> Below is yet another proposal: Pierre> it transforms INTERMEDIATE_ENCODING macro into a call to Pierre> intermediate_encoding function. I'd prefer it if the new code is only used in the __STDC_ISO_10646__ case. Pierre> +#ifdef WORDS_BIGENDIAN #if Pierre> +const char * Pierre> +intermediate_encoding (void) New functions require an introductory comment. Pierre> #ifdef PHONY_ICONV Pierre> -#define INTERMEDIATE_ENCODING "wchar_t" Pierre> +#define DEFAULT_INTERMEDIATE_ENCODING "wchar_t" I don't think DEFAULT_INTERMEDIATE_ENCODING is needed. Tom ^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFC-v5] Handle cygwin wchar_t specifics 2011-04-19 13:19 ` Tom Tromey @ 2011-04-19 13:56 ` Pierre Muller [not found] ` <16656.7281041809$1303221408@news.gmane.org> 1 sibling, 0 replies; 30+ messages in thread From: Pierre Muller @ 2011-04-19 13:56 UTC (permalink / raw) To: 'Tom Tromey'; +Cc: gdb-patches > -----Message d'origine----- > De : gdb-patches-owner@sourceware.org [mailto:gdb-patches- > owner@sourceware.org] De la part de Tom Tromey > Envoyé : mardi 19 avril 2011 15:19 > À : Pierre Muller > Cc : gdb-patches@sourceware.org > Objet : Re: [RFC-v4] Handle cygwin wchar_t specifics > > >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes: > > Pierre> 1) we should really use "gdb_wchar_t" type, not "wchar_t" > > Yeah. > > Pierre> 2) If sizeof(gdb_wchar_t) == 1 > Pierre> I don't think that UTF-8LE and UTF-8BE exist, do they? > Pierre> At least they are not in the iconv -l list for current cygwin. > > A platform where this is true should not define __STDC_ISO_10646__. > You might as well just assert that the size is 2 or 4. > > Pierre> 3) WORD_BIGENDIAN is not defined at all on Cygwin, > Pierre> so that your code would probably not compile. > > Yeah, I forgot, you need #if. See config.in. > > Pierre> A further question is whether UTF-32 is always supported... > > If someone can find a platform where wchar_t is 4 bytes, where > __STDC_ISO_10646__ is defined, and where UTF-32 is not understood, then > we can complain bitterly and change the code again. > > Pierre> Below is yet another proposal: > Pierre> it transforms INTERMEDIATE_ENCODING macro into a call to > Pierre> intermediate_encoding function. > > I'd prefer it if the new code is only used in the __STDC_ISO_10646__ > case. Done below. > Pierre> +#ifdef WORDS_BIGENDIAN > > #if OK, corrected below. > Pierre> +const char * > Pierre> +intermediate_encoding (void) > > New functions require an introductory comment. I wrote a minimal description, feel free to improve it. > Pierre> #ifdef PHONY_ICONV > Pierre> -#define INTERMEDIATE_ENCODING "wchar_t" > Pierre> +#define DEFAULT_INTERMEDIATE_ENCODING "wchar_t" > > I don't think DEFAULT_INTERMEDIATE_ENCODING is needed. I assumed you ment: not necessary if PHONY_ICONV is defined, and this is what I changed below. (I would personally have favored to completely remove INTERMEDIATE_ENCODING macro and call the function directly.) > Tom Thanks for your comments, I tried to take all into account in the new version below. Checked on cygwin (where __STDC_ISO_10646__ is defined), mingw32 (not defined) and mingw64 (no iconv at all, and consequently no intermediate_encoding function). All three allow at least printing out of version correctly. More comments? Pierre 2011-04-19 Pierre Muller <muller@ics.u-strasbg.fr> * gdb_wchar.h (DEFAULT_INTERMEDIATE_ENCODING): New macro. (INTERMEDIATE_ENCODING): Change value to intermediate_encoding function call. (intermediate_encoding): New prototype. * charset.c (ENDIAN_SUFFIX): New macro. (intermediate_encoding): New function. Index: charset.c =================================================================== RCS file: /cvs/src/src/gdb/charset.c,v retrieving revision 1.43 diff -u -p -r1.43 charset.c --- charset.c 11 Jan 2011 15:10:01 -0000 1.43 +++ charset.c 19 Apr 2011 13:42:54 -0000 @@ -922,6 +922,59 @@ default_auto_wide_charset (void) return GDB_DEFAULT_TARGET_WIDE_CHARSET; } + +#ifndef PHONY_ICONV +/* Macro used for UTF or UCS endianness suffix. */ +#if WORDS_BIGENDIAN +#define ENDIAN_SUFFIX "BE" +#else +#define ENDIAN_SUFFIX "LE" +#endif + +/* intermediate_encoding returns the charset unsed internally by + GDB to convert between target and host encodings. */ + +const char * +intermediate_encoding (void) +{ +#ifdef __STDC_ISO_10646__ + if (sizeof (gdb_wchar_t) == 2) + { + static const char *stored_result = NULL; + const char *result; + int i; + + if (stored_result) + return stored_result; + result = "UTF-16" ENDIAN_SUFFIX; + /* Check that the name is in the list of handled charsets. */ + for (i = 0; charset_enum[i]; i++) + { + if (strcmp (result, charset_enum[i]) == 0) + { + stored_result = result; + return result; + } + } + /* Second try, with UCS-2 type. */ + result = "UCS-2" ENDIAN_SUFFIX; + /* Check that the name is in the list of handled charsets. */ + for (i = 0; charset_enum[i]; i++) + { + if (strcmp (result, charset_enum[i]) == 0) + { + stored_result = result; + return result; + } + } + } +#endif /* __STDC_ISO_10646__ */ + /* if gdb_wchar_t is not of size 2, or if "UTF-16XE" and "UCS-2XE" are + not known, use DEFAULT_INTERMEDIATE_ENCODING macro. */ + return DEFAULT_INTERMEDIATE_ENCODING; +} +#endif /* not PHONY_ICONV */ + void _initialize_charset (void) { Index: gdb_wchar.h =================================================================== RCS file: /cvs/src/src/gdb/gdb_wchar.h,v retrieving revision 1.6 diff -u -p -r1.6 gdb_wchar.h --- gdb_wchar.h 1 Jan 2011 15:33:05 -0000 1.6 +++ gdb_wchar.h 19 Apr 2011 13:42:54 -0000 @@ -79,18 +79,20 @@ typedef wint_t gdb_wint_t; hosts that emit a BOM when the unadorned name is used. */ #if defined (__STDC_ISO_10646__) #if WORDS_BIGENDIAN -#define INTERMEDIATE_ENCODING "UCS-4BE" +#define DEFAULT_INTERMEDIATE_ENCODING "UCS-4BE" #else -#define INTERMEDIATE_ENCODING "UCS-4LE" +#define DEFAULT_INTERMEDIATE_ENCODING "UCS-4LE" #endif #elif defined (_LIBICONV_VERSION) && _LIBICONV_VERSION >= 0x108 -#define INTERMEDIATE_ENCODING "wchar_t" +#define DEFAULT_INTERMEDIATE_ENCODING "wchar_t" #else /* This shouldn't happen, because the earlier #if should have filtered out this case. */ #error "Neither __STDC_ISO_10646__ nor _LIBICONV_VERSION defined" #endif +#define INTERMEDIATE_ENCODING intermediate_encoding () + #else /* If we got here and have wchar_t support, we might be on a system @@ -117,9 +119,13 @@ typedef int gdb_wint_t; #ifdef PHONY_ICONV #define INTERMEDIATE_ENCODING "wchar_t" #else -#define INTERMEDIATE_ENCODING host_charset () +#define DEFAULT_INTERMEDIATE_ENCODING host_charset () +#endif + #endif +#ifndef PHONY_ICONV +const char *intermediate_encoding (void); #endif #endif /* GDB_WCHAR_H */ ^ permalink raw reply [flat|nested] 30+ messages in thread
[parent not found: <16656.7281041809$1303221408@news.gmane.org>]
* Re: [RFC-v5] Handle cygwin wchar_t specifics [not found] ` <16656.7281041809$1303221408@news.gmane.org> @ 2011-04-19 17:50 ` Tom Tromey 2011-04-20 7:59 ` Pierre Muller [not found] ` <420.768399681215$1303286406@news.gmane.org> 0 siblings, 2 replies; 30+ messages in thread From: Tom Tromey @ 2011-04-19 17:50 UTC (permalink / raw) To: Pierre Muller; +Cc: gdb-patches >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes: Tom> I don't think DEFAULT_INTERMEDIATE_ENCODING is needed. Pierre> I assumed you ment: not necessary if PHONY_ICONV is defined, Pierre> and this is what I changed below. Pierre> (I would personally have favored to completely remove Pierre> INTERMEDIATE_ENCODING macro and call the function directly.) Sorry, that isn't what I meant. All this new code is needed only in the __STDC_ISO_10646__ case. All other cases are already handled ok. So, I think it is best to only introduce new code along the __STDC_ISO_10646__ branches. Thus far your patches have touched all the other branches -- but there is no reason to do that, and I think it just makes it more complicated without an associated benefit. Pierre> +#ifdef __STDC_ISO_10646__ Pierre> + if (sizeof (gdb_wchar_t) == 2) You might as well unify the 2 and 4 byte cases like I said earlier, and just die for any other value. You can use a static assert trick to make it die during compilation, which I think is better than dying at runtime. E.g.: extern char your_platform_is_bogus[(sizeof (gdb_wchar_t) == 2 || sizeof (gdb_wchar_t) == 4) ? 1 : -1]; Pierre> + /* Check that the name is in the list of handled charsets. */ Pierre> + for (i = 0; charset_enum[i]; i++) I don't think this is really needed either. Or, if you really want to do the check, do it by calling iconv_open at initialization, and then just make gdb die early -- whatever platform does this is really messed up. Pierre> + /* if gdb_wchar_t is not of size 2, or if "UTF-16XE" and "UCS-2XE" are Pierre> + not known, use DEFAULT_INTERMEDIATE_ENCODING macro. */ Pierre> + return DEFAULT_INTERMEDIATE_ENCODING; I don't think this will generally do the right thing. For example, your patch defines DEFAULT_INTERMEDIATE_ENCODING to "UCS-4LE" in the !WORDS_BIGENDIAN case. But we already know that gdb_wchar_t has 2 bytes. So I think this will just result in the same bug as today. Tom ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [RFC-v5] Handle cygwin wchar_t specifics 2011-04-19 17:50 ` Tom Tromey @ 2011-04-20 7:59 ` Pierre Muller 2011-04-20 21:08 ` Pedro Alves [not found] ` <420.768399681215$1303286406@news.gmane.org> 1 sibling, 1 reply; 30+ messages in thread From: Pierre Muller @ 2011-04-20 7:59 UTC (permalink / raw) To: 'Tom Tromey'; +Cc: gdb-patches > Tom> I don't think DEFAULT_INTERMEDIATE_ENCODING is needed. > > Pierre> I assumed you ment: not necessary if PHONY_ICONV is defined, > Pierre> and this is what I changed below. > Pierre> (I would personally have favored to completely remove > Pierre> INTERMEDIATE_ENCODING macro and call the function directly.) > > Sorry, that isn't what I meant. Hopefully I got it right this time... > All this new code is needed only in the __STDC_ISO_10646__ case. > All other cases are already handled ok. > So, I think it is best to only introduce new code along the > __STDC_ISO_10646__ branches. Thus far your patches have touched all the > other branches -- but there is no reason to do that, and I think it just > makes it more complicated without an associated benefit. > > Pierre> +#ifdef __STDC_ISO_10646__ > Pierre> + if (sizeof (gdb_wchar_t) == 2) > > You might as well unify the 2 and 4 byte cases like I said earlier, and > just die for any other value. Done below. > You can use a static assert trick to make > it die during compilation, which I think is better than dying at > runtime. E.g.: > extern char your_platform_is_bogus[(sizeof (gdb_wchar_t) == 2 > || sizeof (gdb_wchar_t) == 4) > ? 1 : -1]; Used below (renamed your_gdb_wchar_t_is_bogus). > Pierre> + /* Check that the name is in the list of handled charsets. > */ > Pierre> + for (i = 0; charset_enum[i]; i++) > > I don't think this is really needed either. > Or, if you really want to do the check, do it by calling iconv_open at > initialization, and then just make gdb die early -- whatever platform > does this is really messed up. Also done below. > Pierre> + /* if gdb_wchar_t is not of size 2, or if "UTF-16XE" and "UCS- > 2XE" are > Pierre> + not known, use DEFAULT_INTERMEDIATE_ENCODING macro. */ > Pierre> + return DEFAULT_INTERMEDIATE_ENCODING; > > I don't think this will generally do the right thing. > For example, your patch defines DEFAULT_INTERMEDIATE_ENCODING to > "UCS-4LE" in the !WORDS_BIGENDIAN case. But we already know that > gdb_wchar_t has 2 bytes. So I think this will just result in the same > bug as today. I hope I now understood what you wanted: the new code makes less changes to gdb_wchar_t. It only uses intermediate_encoding function in the case where UCS-4LE/BE where set before. To avoid having this code compiled in other cases, I defined a new macro called USE_INTERMEDIATE_ENCODING_FUNCTION and charset.c code changes are limited to this conditional. I used iconv_open to check for working charset names and added a call to error if none is found. Comments? Pierre 2011-04-20 Pierre Muller <muller@ics.u-strasbg.fr> * gdb_wchar.h (USE_INTERMEDIATE_ENCODING_FUNCTION): New macro. (INTERMEDIATE_ENCODING): Change value to intermediate_encoding function call if __STDC_ISO_10646__ macro is defined. (intermediate_encoding): New prototype. * charset.c (your_gdb_wchar_t_is_bogus): New test variable to generate compile time error for unsupported gdb_wchar_t size. (ENDIAN_SUFFIX): New macro. (intermediate_encoding): New function. Index: charset.c =================================================================== RCS file: /cvs/src/src/gdb/charset.c,v retrieving revision 1.43 diff -u -p -r1.43 charset.c --- charset.c 11 Jan 2011 15:10:01 -0000 1.43 +++ charset.c 20 Apr 2011 07:48:21 -0000 @@ -922,6 +922,70 @@ default_auto_wide_charset (void) return GDB_DEFAULT_TARGET_WIDE_CHARSET; } + +#ifdef USE_INTERMEDIATE_ENCODING_FUNCTION +/* Macro used for UTF or UCS endianness suffix. */ +#if WORDS_BIGENDIAN +#define ENDIAN_SUFFIX "BE" +#else +#define ENDIAN_SUFFIX "LE" +#endif + +/* The code below serves to generate a compile time error if + gdb_wchar_t type is not of size 2 nor 4, despite the fact that + macro __STDC_ISO_10646__ is defined. + This is better than a gdb_assert call, because GDB cannot handle + strings correctly if this size is different. */ + +static char your_gdb_wchar_t_is_bogus[(sizeof (gdb_wchar_t) == 2 + || sizeof (gdb_wchar_t) == 4) + ? 1 : -1]; + +/* intermediate_encoding returns the charset unsed internally by + GDB to convert between target and host encodings. As the test above + compiled, sizeof (gdb_wchar_t) is either 2 or 4 bytes. + UTF-16/32 is tested first, UCS-2/4 is tested as a second option, + otherwise an error is generated. */ + +const char * +intermediate_encoding (void) +{ + iconv_t desc; + static const char *stored_result = NULL; + const char *result; + int i; + + if (stored_result) + return stored_result; + result = xstrprintf ("UTF-%d%s", sizeof (gdb_wchar_t) * 8, ENDIAN_SUFFIX); + /* Check that the name is supported by iconv_open. */ + desc = iconv_open (result, host_charset ()); + if (desc != (iconv_t) -1) + { + iconv_close (desc); + stored_result = result; + return result; + } + /* Not valid, free the allocated memory. */ + xfree ((void *) result); + /* Second try, with UCS-2 type. */ + result = xstrprintf ("UCS-%d%s", sizeof (gdb_wchar_t), ENDIAN_SUFFIX); + /* Check that the name is supported by iconv_open. */ + desc = iconv_open (result, host_charset ()); + if (desc != (iconv_t) -1) + { + iconv_close (desc); + stored_result = result; + return result; + } + /* Not valid, free the allocated memory. */ + xfree ((void *) result); + /* No valid charset found, generate error here. */ + error ("Unable to find a vaild charset for string conversions"); +} + +#endif /* USE_INTERMEDIATE_ENCODING_FUNCTION */ + void _initialize_charset (void) { Index: gdb_wchar.h =================================================================== RCS file: /cvs/src/src/gdb/gdb_wchar.h,v retrieving revision 1.6 diff -u -p -r1.6 gdb_wchar.h --- gdb_wchar.h 1 Jan 2011 15:33:05 -0000 1.6 +++ gdb_wchar.h 20 Apr 2011 07:48:21 -0000 @@ -78,11 +78,10 @@ typedef wint_t gdb_wint_t; iconv_open. We put the endianness into the encoding name to avoid hosts that emit a BOM when the unadorned name is used. */ #if defined (__STDC_ISO_10646__) -#if WORDS_BIGENDIAN -#define INTERMEDIATE_ENCODING "UCS-4BE" -#else -#define INTERMEDIATE_ENCODING "UCS-4LE" -#endif +#define USE_INTERMEDIATE_ENCODING_FUNCTION +#define INTERMEDIATE_ENCODING intermediate_encoding () +const char *intermediate_encoding (void); + #elif defined (_LIBICONV_VERSION) && _LIBICONV_VERSION >= 0x108 #define INTERMEDIATE_ENCODING "wchar_t" #else ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFC-v5] Handle cygwin wchar_t specifics 2011-04-20 7:59 ` Pierre Muller @ 2011-04-20 21:08 ` Pedro Alves 2011-04-21 6:57 ` Pierre Muller [not found] ` <15550.7422438406$1303369059@news.gmane.org> 0 siblings, 2 replies; 30+ messages in thread From: Pedro Alves @ 2011-04-20 21:08 UTC (permalink / raw) To: gdb-patches; +Cc: Pierre Muller, 'Tom Tromey' On Wednesday 20 April 2011 08:59:31, Pierre Muller wrote: > +static char your_gdb_wchar_t_is_bogus[(sizeof (gdb_wchar_t) == 2 > + || sizeof (gdb_wchar_t) == 4) > + ? 1 : -1]; > + Didn't "extern" work? -- Pedro Alves ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [RFC-v5] Handle cygwin wchar_t specifics 2011-04-20 21:08 ` Pedro Alves @ 2011-04-21 6:57 ` Pierre Muller 2011-04-21 7:17 ` [RFA-v6] " Pierre Muller [not found] ` <15550.7422438406$1303369059@news.gmane.org> 1 sibling, 1 reply; 30+ messages in thread From: Pierre Muller @ 2011-04-21 6:57 UTC (permalink / raw) To: 'Pedro Alves', gdb-patches; +Cc: 'Tom Tromey' > -----Message d'origine----- > De : gdb-patches-owner@sourceware.org [mailto:gdb-patches- > owner@sourceware.org] De la part de Pedro Alves > Envoyé : mercredi 20 avril 2011 23:08 > À : gdb-patches@sourceware.org > Cc : Pierre Muller; 'Tom Tromey' > Objet : Re: [RFC-v5] Handle cygwin wchar_t specifics > > On Wednesday 20 April 2011 08:59:31, Pierre Muller wrote: > > +static char your_gdb_wchar_t_is_bogus[(sizeof (gdb_wchar_t) == 2 > > + || sizeof (gdb_wchar_t) == 4) > > + ? 1 : -1]; > > + > > Didn't "extern" work? I didn't test it out before: it does work on my system, but is it sure it will compile and link correctly on all C compilers used for GDB? There is apparently no trace of your_gdb_wchar_t_is_bogus in my charset.o object file once it is made external. Could someone confirm that this will also compile on other C compiler used without creating link failures? I am perfectly willing to change this but it seemed to me to rely on some "obscure C compiler feature" (don't forget that I learned C to be able to support pascal language in GDB...). Pierre ^ permalink raw reply [flat|nested] 30+ messages in thread
* [RFA-v6] Handle cygwin wchar_t specifics 2011-04-21 6:57 ` Pierre Muller @ 2011-04-21 7:17 ` Pierre Muller 2011-04-21 9:02 ` Pierre Muller [not found] ` <24274.3825926029$1303376558@news.gmane.org> 0 siblings, 2 replies; 30+ messages in thread From: Pierre Muller @ 2011-04-21 7:17 UTC (permalink / raw) To: 'Pedro Alves', 'Tom Tromey'; +Cc: gdb-patches > > Didn't "extern" work? > > I didn't test it out before: > it does work on my system, but is it > sure it will compile and link correctly on > all C compilers used for GDB? > > There is apparently no trace of your_gdb_wchar_t_is_bogus > in my charset.o object file once it is made external. > > Could someone confirm that this will also compile > on other C compiler used without creating link failures? > > I am perfectly willing to change this but it seemed > to me to rely on some "obscure C compiler feature" > (don't forget that I learned C to be able to support pascal > language in GDB...). I am stupid.. of course this works, otherwise each object would contain the thousands of externals defined in all the loaded headers... Here is a new version, with both static to external switch for you_gdb_wchar_t_is_bogus and Tom latest comments. As it is not exactly what Tom asked me to change, I resubmit it anyhow. Pierre PS: Should that be included in 7.3 branch? 2011-04-21 Pierre Muller <muller@ics.u-strasbg.fr> * gdb_wchar.h (USE_INTERMEDIATE_ENCODING_FUNCTION): New macro. (INTERMEDIATE_ENCODING): Change value to intermediate_encoding function call if __STDC_ISO_10646__ macro is defined. (intermediate_encoding): New prototype. * charset.c (your_gdb_wchar_t_is_bogus): New extern test variable to generate compile time error for unsupported gdb_wchar_t size. (ENDIAN_SUFFIX): New macro. (intermediate_encoding): New function. Index: gdb_wchar.h =================================================================== RCS file: /cvs/src/src/gdb/gdb_wchar.h,v retrieving revision 1.6 diff -u -p -r1.6 gdb_wchar.h --- gdb_wchar.h 1 Jan 2011 15:33:05 -0000 1.6 +++ gdb_wchar.h 21 Apr 2011 07:09:52 -0000 @@ -78,11 +78,10 @@ typedef wint_t gdb_wint_t; iconv_open. We put the endianness into the encoding name to avoid hosts that emit a BOM when the unadorned name is used. */ #if defined (__STDC_ISO_10646__) -#if WORDS_BIGENDIAN -#define INTERMEDIATE_ENCODING "UCS-4BE" -#else -#define INTERMEDIATE_ENCODING "UCS-4LE" -#endif +#define USE_INTERMEDIATE_ENCODING_FUNCTION +#define INTERMEDIATE_ENCODING intermediate_encoding () +const char *intermediate_encoding (void); + #elif defined (_LIBICONV_VERSION) && _LIBICONV_VERSION >= 0x108 #define INTERMEDIATE_ENCODING "wchar_t" #else Index: charset.c =================================================================== RCS file: /cvs/src/src/gdb/charset.c,v retrieving revision 1.43 diff -u -p -r1.43 charset.c --- charset.c 11 Jan 2011 15:10:01 -0000 1.43 +++ charset.c 21 Apr 2011 07:09:52 -0000 @@ -922,6 +922,70 @@ default_auto_wide_charset (void) return GDB_DEFAULT_TARGET_WIDE_CHARSET; } + +#ifdef USE_INTERMEDIATE_ENCODING_FUNCTION +/* Macro used for UTF or UCS endianness suffix. */ +#if WORDS_BIGENDIAN +#define ENDIAN_SUFFIX "BE" +#else +#define ENDIAN_SUFFIX "LE" +#endif + +/* The code below serves to generate a compile time error if + gdb_wchar_t type is not of size 2 nor 4, despite the fact that + macro __STDC_ISO_10646__ is defined. + This is better than a gdb_assert call, because GDB cannot handle + strings correctly if this size is different. */ + +extern char your_gdb_wchar_t_is_bogus[(sizeof (gdb_wchar_t) == 2 + || sizeof (gdb_wchar_t) == 4) + ? 1 : -1]; + +/* intermediate_encoding returns the charset unsed internally by + GDB to convert between target and host encodings. As the test above + compiled, sizeof (gdb_wchar_t) is either 2 or 4 bytes. + UTF-16/32 is tested first, UCS-2/4 is tested as a second option, + otherwise an error is generated. */ + +const char * +intermediate_encoding (void) +{ + iconv_t desc; + static const char *stored_result = NULL; + char *result; + int i; + + if (stored_result) + return stored_result; + result = xstrprintf ("UTF-%d%s", sizeof (gdb_wchar_t) * 8, ENDIAN_SUFFIX); + /* Check that the name is supported by iconv_open. */ + desc = iconv_open (result, host_charset ()); + if (desc != (iconv_t) -1) + { + iconv_close (desc); + stored_result = result; + return result; + } + /* Not valid, free the allocated memory. */ + xfree (result); + /* Second try, with UCS-2 type. */ + result = xstrprintf ("UCS-%d%s", sizeof (gdb_wchar_t), ENDIAN_SUFFIX); + /* Check that the name is supported by iconv_open. */ + desc = iconv_open (result, host_charset ()); + if (desc != (iconv_t) -1) + { + iconv_close (desc); + stored_result = result; + return result; + } + /* Not valid, free the allocated memory. */ + xfree (result); + /* No valid charset found, generate error here. */ + error (_("Unable to find a vaild charset for string conversions")); +} + +#endif /* USE_INTERMEDIATE_ENCODING_FUNCTION */ + void _initialize_charset (void) { ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [RFA-v6] Handle cygwin wchar_t specifics 2011-04-21 7:17 ` [RFA-v6] " Pierre Muller @ 2011-04-21 9:02 ` Pierre Muller [not found] ` <24274.3825926029$1303376558@news.gmane.org> 1 sibling, 0 replies; 30+ messages in thread From: Pierre Muller @ 2011-04-21 9:02 UTC (permalink / raw) To: gdb-patches; +Cc: 'Pedro Alves', 'Tom Tromey' Whoops, I just ran a test on a Compile farm machine x86_64-unknown-linux-gnu., there was one more problem : sizeof return type seems to be a "long unsigned int" at least on x86_64 linux. Thus we do need two typecasts around sizeof (gdb_wchar_t) * 8 and sizeof (gdb_wchar_t) in the xstrprintf parameters. After that change, no char related testsuite changes appear. Below is the modified patch that also included the needed typecasts. Pierre 2011-04-21 Pierre Muller <muller@ics.u-strasbg.fr> * gdb_wchar.h (USE_INTERMEDIATE_ENCODING_FUNCTION): New macro. (INTERMEDIATE_ENCODING): Change value to intermediate_encoding function call if __STDC_ISO_10646__ macro is defined. (intermediate_encoding): New prototype. * charset.c (your_gdb_wchar_t_is_bogus): New extern test variable to generate compile time error for unsupported gdb_wchar_t size. (ENDIAN_SUFFIX): New macro. (intermediate_encoding): New function. Index: src/gdb/gdb_wchar.h =================================================================== RCS file: /cvs/src/src/gdb/gdb_wchar.h,v retrieving revision 1.6 diff -u -p -r1.6 gdb_wchar.h --- src/gdb/gdb_wchar.h 1 Jan 2011 15:33:05 -0000 1.6 +++ src/gdb/gdb_wchar.h 21 Apr 2011 07:35:52 -0000 @@ -78,11 +78,10 @@ typedef wint_t gdb_wint_t; iconv_open. We put the endianness into the encoding name to avoid hosts that emit a BOM when the unadorned name is used. */ #if defined (__STDC_ISO_10646__) -#if WORDS_BIGENDIAN -#define INTERMEDIATE_ENCODING "UCS-4BE" -#else -#define INTERMEDIATE_ENCODING "UCS-4LE" -#endif +#define USE_INTERMEDIATE_ENCODING_FUNCTION +#define INTERMEDIATE_ENCODING intermediate_encoding () +const char *intermediate_encoding (void); + #elif defined (_LIBICONV_VERSION) && _LIBICONV_VERSION >= 0x108 #define INTERMEDIATE_ENCODING "wchar_t" #else Index: src/gdb/charset.c =================================================================== RCS file: /cvs/src/src/gdb/charset.c,v retrieving revision 1.43 diff -u -p -r1.43 charset.c --- src/gdb/charset.c 11 Jan 2011 15:10:01 -0000 1.43 +++ src/gdb/charset.c 21 Apr 2011 07:35:52 -0000 @@ -922,6 +922,72 @@ default_auto_wide_charset (void) return GDB_DEFAULT_TARGET_WIDE_CHARSET; } + +#ifdef USE_INTERMEDIATE_ENCODING_FUNCTION +/* Macro used for UTF or UCS endianness suffix. */ +#if WORDS_BIGENDIAN +#define ENDIAN_SUFFIX "BE" +#else +#define ENDIAN_SUFFIX "LE" +#endif + +/* The code below serves to generate a compile time error if + gdb_wchar_t type is not of size 2 nor 4, despite the fact that + macro __STDC_ISO_10646__ is defined. + This is better than a gdb_assert call, because GDB cannot handle + strings correctly if this size is different. */ + +extern char your_gdb_wchar_t_is_bogus[(sizeof (gdb_wchar_t) == 2 + || sizeof (gdb_wchar_t) == 4) + ? 1 : -1]; + +/* intermediate_encoding returns the charset unsed internally by + GDB to convert between target and host encodings. As the test above + compiled, sizeof (gdb_wchar_t) is either 2 or 4 bytes. + UTF-16/32 is tested first, UCS-2/4 is tested as a second option, + otherwise an error is generated. */ + +const char * +intermediate_encoding (void) +{ + iconv_t desc; + static const char *stored_result = NULL; + char *result; + int i; + + if (stored_result) + return stored_result; + result = xstrprintf ("UTF-%d%s", (int) (sizeof (gdb_wchar_t) * 8), + ENDIAN_SUFFIX); + /* Check that the name is supported by iconv_open. */ + desc = iconv_open (result, host_charset ()); + if (desc != (iconv_t) -1) + { + iconv_close (desc); + stored_result = result; + return result; + } + /* Not valid, free the allocated memory. */ + xfree (result); + /* Second try, with UCS-2 type. */ + result = xstrprintf ("UCS-%d%s", (int) sizeof (gdb_wchar_t), + ENDIAN_SUFFIX); + /* Check that the name is supported by iconv_open. */ + desc = iconv_open (result, host_charset ()); + if (desc != (iconv_t) -1) + { + iconv_close (desc); + stored_result = result; + return result; + } + /* Not valid, free the allocated memory. */ + xfree (result); + /* No valid charset found, generate error here. */ + error (_("Unable to find a vaild charset for string conversions")); +} + +#endif /* USE_INTERMEDIATE_ENCODING_FUNCTION */ + void _initialize_charset (void) { ^ permalink raw reply [flat|nested] 30+ messages in thread
[parent not found: <24274.3825926029$1303376558@news.gmane.org>]
* Re: [RFA-v6] Handle cygwin wchar_t specifics [not found] ` <24274.3825926029$1303376558@news.gmane.org> @ 2011-04-21 14:14 ` Tom Tromey 2011-04-21 14:27 ` Pierre Muller [not found] ` <4691.37052209607$1303396084@news.gmane.org> 0 siblings, 2 replies; 30+ messages in thread From: Tom Tromey @ 2011-04-21 14:14 UTC (permalink / raw) To: Pierre Muller; +Cc: gdb-patches, 'Pedro Alves' >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes: Pierre> Below is the modified patch that also included the needed Pierre> typecasts. This is ok. Thanks. Tom ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [RFA-v6] Handle cygwin wchar_t specifics 2011-04-21 14:14 ` Tom Tromey @ 2011-04-21 14:27 ` Pierre Muller [not found] ` <4691.37052209607$1303396084@news.gmane.org> 1 sibling, 0 replies; 30+ messages in thread From: Pierre Muller @ 2011-04-21 14:27 UTC (permalink / raw) To: 'Tom Tromey'; +Cc: gdb-patches, 'Pedro Alves' > -----Message d'origine----- > De : gdb-patches-owner@sourceware.org [mailto:gdb-patches- > owner@sourceware.org] De la part de Tom Tromey > Envoyé : jeudi 21 avril 2011 16:14 > À : Pierre Muller > Cc : gdb-patches@sourceware.org; 'Pedro Alves' > Objet : Re: [RFA-v6] Handle cygwin wchar_t specifics > > >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes: > > Pierre> Below is the modified patch that also included the needed > Pierre> typecasts. > > This is ok. Thanks. Thanks for the help to both of you. Patch committed, What about 7.3 branch? Pierre ^ permalink raw reply [flat|nested] 30+ messages in thread
[parent not found: <4691.37052209607$1303396084@news.gmane.org>]
* Re: [RFA-v6] Handle cygwin wchar_t specifics [not found] ` <4691.37052209607$1303396084@news.gmane.org> @ 2011-04-21 15:06 ` Tom Tromey 2011-04-21 16:39 ` Pierre Muller [not found] ` <25400.1310132027$1303403986@news.gmane.org> 0 siblings, 2 replies; 30+ messages in thread From: Tom Tromey @ 2011-04-21 15:06 UTC (permalink / raw) To: Pierre Muller; +Cc: gdb-patches, 'Pedro Alves' >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes: Pierre> What about 7.3 branch? Sounds good. Tom ^ permalink raw reply [flat|nested] 30+ messages in thread
* RE: [RFA-v6] Handle cygwin wchar_t specifics 2011-04-21 15:06 ` Tom Tromey @ 2011-04-21 16:39 ` Pierre Muller [not found] ` <25400.1310132027$1303403986@news.gmane.org> 1 sibling, 0 replies; 30+ messages in thread From: Pierre Muller @ 2011-04-21 16:39 UTC (permalink / raw) To: 'Tom Tromey'; +Cc: gdb-patches, 'Pedro Alves' > -----Message d'origine----- > De : gdb-patches-owner@sourceware.org [mailto:gdb-patches- > owner@sourceware.org] De la part de Tom Tromey > Envoyé : jeudi 21 avril 2011 17:06 > À : Pierre Muller > Cc : gdb-patches@sourceware.org; 'Pedro Alves' > Objet : Re: [RFA-v6] Handle cygwin wchar_t specifics > > >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes: > > Pierre> What about 7.3 branch? > > Sounds good. The only thing that worries me, is that it finally changes the default when gdb_wchar_t is of size 4 from UCS-4XE to UTF-32XE, is this OK for everyone? Pierre ^ permalink raw reply [flat|nested] 30+ messages in thread
[parent not found: <25400.1310132027$1303403986@news.gmane.org>]
* Re: [RFA-v6] Handle cygwin wchar_t specifics [not found] ` <25400.1310132027$1303403986@news.gmane.org> @ 2011-04-21 20:25 ` Tom Tromey 2011-04-21 21:18 ` 7.3 commit " Pierre Muller 0 siblings, 1 reply; 30+ messages in thread From: Tom Tromey @ 2011-04-21 20:25 UTC (permalink / raw) To: Pierre Muller; +Cc: gdb-patches, 'Pedro Alves' >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes: Pierre> The only thing that worries me, Pierre> is that it finally changes the default when Pierre> gdb_wchar_t is of size 4 from UCS-4XE to UTF-32XE, Pierre> is this OK for everyone? I don't think it will be a problem. Tom ^ permalink raw reply [flat|nested] 30+ messages in thread
* 7.3 commit [RFA-v6] Handle cygwin wchar_t specifics 2011-04-21 20:25 ` Tom Tromey @ 2011-04-21 21:18 ` Pierre Muller 0 siblings, 0 replies; 30+ messages in thread From: Pierre Muller @ 2011-04-21 21:18 UTC (permalink / raw) To: 'Tom Tromey'; +Cc: gdb-patches, 'Pedro Alves' > -----Message d'origine----- > De : gdb-patches-owner@sourceware.org [mailto:gdb-patches- > owner@sourceware.org] De la part de Tom Tromey > Envoyé : jeudi 21 avril 2011 22:25 > À : Pierre Muller > Cc : gdb-patches@sourceware.org; 'Pedro Alves' > Objet : Re: [RFA-v6] Handle cygwin wchar_t specifics > > >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes: > > Pierre> The only thing that worries me, > Pierre> is that it finally changes the default when > Pierre> gdb_wchar_t is of size 4 from UCS-4XE to UTF-32XE, > Pierre> is this OK for everyone? > > I don't think it will be a problem. > > Tom With Tom's approval, I committed the patch to support gdb_wchar_t of size 2 to 7.3 branch. Thanks Tom, Pierre ^ permalink raw reply [flat|nested] 30+ messages in thread
[parent not found: <15550.7422438406$1303369059@news.gmane.org>]
* Re: [RFC-v5] Handle cygwin wchar_t specifics [not found] ` <15550.7422438406$1303369059@news.gmane.org> @ 2011-04-21 14:10 ` Tom Tromey 0 siblings, 0 replies; 30+ messages in thread From: Tom Tromey @ 2011-04-21 14:10 UTC (permalink / raw) To: Pierre Muller; +Cc: 'Pedro Alves', gdb-patches >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes: Pedro> Didn't "extern" work? Pierre> I didn't test it out before: Pierre> it does work on my system, but is it Pierre> sure it will compile and link correctly on Pierre> all C compilers used for GDB? Yes. Pierre> I am perfectly willing to change this but it seemed Pierre> to me to rely on some "obscure C compiler feature" Pierre> (don't forget that I learned C to be able to support pascal Pierre> language in GDB...). :-) This is a reasonably standard idiom for "static assert". Tom ^ permalink raw reply [flat|nested] 30+ messages in thread
[parent not found: <420.768399681215$1303286406@news.gmane.org>]
* Re: [RFC-v5] Handle cygwin wchar_t specifics [not found] ` <420.768399681215$1303286406@news.gmane.org> @ 2011-04-20 20:21 ` Tom Tromey 0 siblings, 0 replies; 30+ messages in thread From: Tom Tromey @ 2011-04-20 20:21 UTC (permalink / raw) To: Pierre Muller; +Cc: gdb-patches >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes: Pierre> Hopefully I got it right this time... Just a tiny nit left :) Thanks for persevering. Pierre> + const char *result; You can make this one just a "char *". That will let you avoid casts in the xfree calls. Pierre> + error ("Unable to find a vaild charset for string conversions"); Needs _(). Ok with those changes. Tom ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFA] Handle cygwin wchar_t specifics 2011-04-16 16:05 ` Pierre Muller 2011-04-16 16:25 ` Jan Kratochvil @ 2011-04-16 21:24 ` Tom Tromey 1 sibling, 0 replies; 30+ messages in thread From: Tom Tromey @ 2011-04-16 21:24 UTC (permalink / raw) To: Pierre Muller; +Cc: gdb-patches >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes: Pierre> Yes, but the problem is that it is not possible to use sizeof Pierre> inside a #if conditions :( Pierre> Do you know of any way to get the size of wchar_t? Pierre> I suspect we will need to add this to the configure scripts... Pierre> But I am still very bad on that part. In this case you don't need to know the size during preprocessing. You can do something like: extern const char *intermediate_encoding; #define INTERMEDIATE_ENCODING intermediate_encoding ... and then initialize the string in _initialize_charset, under the appropriate conditions. The only caveat is to check the case where the size is neither 2 nor 4. Tom ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [RFA] Handle cygwin wchar_t specifics 2011-04-15 18:16 ` [RFA] Handle cygwin wchar_t specifics Tom Tromey 2011-04-16 16:05 ` Pierre Muller @ 2011-04-18 20:07 ` Corinna Vinschen 1 sibling, 0 replies; 30+ messages in thread From: Corinna Vinschen @ 2011-04-18 20:07 UTC (permalink / raw) To: gdb-patches Hi Tom, On Apr 15 12:15, Tom Tromey wrote: > >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes: > > Pierre> because of this, GDB uses "UCS-4LE" > Pierre> for the macro INTERMEDIATE_ENCODING on Cygwin > Pierre> (while "wchar_t" it uses for mingw32, which works well). > > Ok, I see the problem. I thought this: > > /* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4. > > But this is not true! For some values of __STDC_ISO_10646__, a 2 byte > wide character type suffices. In particular, Cygwin's value of 200305 > means that it corresponds to Unicode 4.0.0: > > http://www.unicode.org/versions/components-4.0.0.html > > I think this might be a Cygwin bug, but it is pretty hard to wade > through the ISO / Unicode differences and other assorted standardese to > see. (The reason I think it might be a bug is that Unicode 4.0.0 > defines some characters > 0xFFFF.) I see there's another solution in the works, but just to let you know that Bruno Haible and I discussed the definition of __STDC_ISO_10646__ on the Cygwin list back in January. We didn't come to a conclusion since we both interpret the standards differently, but this gives you some insight why __STDC_ISO_10646__ is defined on Cygwin, see http://cygwin.com/ml/cygwin/2011-01/msg00410.html, line 70ff. Corinna -- Corinna Vinschen Cygwin Project Co-Leader Red Hat ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2011-04-21 21:18 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <5928.31498147479$1302882967@news.gmane.org>
2011-04-15 18:16 ` [RFA] Handle cygwin wchar_t specifics Tom Tromey
2011-04-16 16:05 ` Pierre Muller
2011-04-16 16:25 ` Jan Kratochvil
2011-04-16 21:29 ` [RFA-v2] " Pierre Muller
2011-04-16 22:35 ` Jan Kratochvil
[not found] ` <000001cbfc7d$3f67f440$be37dcc0$%muller@ics-cnrs.unistra.fr>
2011-04-17 2:55 ` Eli Zaretskii
2011-04-18 10:36 ` Pierre Muller
[not found] ` <00a801cbfdb4$551214a0$ff363de0$%muller@ics-cnrs.unistra.fr>
2011-04-18 10:57 ` Eli Zaretskii
2011-04-18 15:14 ` [RFA-v3] " Pierre Muller
[not found] ` <21014.6501930014$1303139687@news.gmane.org>
2011-04-18 17:18 ` Tom Tromey
2011-04-19 9:18 ` [RFC-v4] " Pierre Muller
[not found] ` <004f01cbfe72$adddeb40$0999c1c0$%muller@ics-cnrs.unistra.fr>
2011-04-19 9:34 ` Eli Zaretskii
[not found] ` <34716.7311156683$1303204711@news.gmane.org>
2011-04-19 13:19 ` Tom Tromey
2011-04-19 13:56 ` [RFC-v5] " Pierre Muller
[not found] ` <16656.7281041809$1303221408@news.gmane.org>
2011-04-19 17:50 ` Tom Tromey
2011-04-20 7:59 ` Pierre Muller
2011-04-20 21:08 ` Pedro Alves
2011-04-21 6:57 ` Pierre Muller
2011-04-21 7:17 ` [RFA-v6] " Pierre Muller
2011-04-21 9:02 ` Pierre Muller
[not found] ` <24274.3825926029$1303376558@news.gmane.org>
2011-04-21 14:14 ` Tom Tromey
2011-04-21 14:27 ` Pierre Muller
[not found] ` <4691.37052209607$1303396084@news.gmane.org>
2011-04-21 15:06 ` Tom Tromey
2011-04-21 16:39 ` Pierre Muller
[not found] ` <25400.1310132027$1303403986@news.gmane.org>
2011-04-21 20:25 ` Tom Tromey
2011-04-21 21:18 ` 7.3 commit " Pierre Muller
[not found] ` <15550.7422438406$1303369059@news.gmane.org>
2011-04-21 14:10 ` [RFC-v5] " Tom Tromey
[not found] ` <420.768399681215$1303286406@news.gmane.org>
2011-04-20 20:21 ` Tom Tromey
2011-04-16 21:24 ` [RFA] " Tom Tromey
2011-04-18 20:07 ` Corinna Vinschen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox