Re: [RFA] Handle cygwin wchar

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

* Re: [RFA] Handle cygwin wchar_t specifics
       [not found] <5928.31498147479$1302882967@news.gmane.org>
@ 2011-04-15 18:16 ` Tom Tromey
  2011-04-16 16:05   ` Pierre Muller
  2011-04-18 20:07   ` Corinna Vinschen
  0 siblings, 2 replies; 31+ messages in thread
From: Tom Tromey @ 2011-04-15 18:16 UTC (permalink / raw)
  To: Pierre Muller; +Cc: gdb-patches

>>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes:

Pierre> because of this, GDB uses "UCS-4LE" 
Pierre> for the macro INTERMEDIATE_ENCODING on Cygwin 
Pierre> (while "wchar_t" it uses for mingw32, which works well).

Ok, I see the problem.  I thought this:

    /* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4.

But this is not true!  For some values of __STDC_ISO_10646__, a 2 byte
wide character type suffices.  In particular, Cygwin's value of 200305
means that it corresponds to Unicode 4.0.0:

    http://www.unicode.org/versions/components-4.0.0.html

I think this might be a Cygwin bug, but it is pretty hard to wade
through the ISO / Unicode differences and other assorted standardese to
see.  (The reason I think it might be a bug is that Unicode 4.0.0
defines some characters > 0xFFFF.)

Anyway, it doesn't matter if this is a Cygwin bug, since GDB's
assumption here is wrong anyway.

Pierre> The patch below fixes this by
Pierre> explicitly setting the UCS size to two for Windows targets.

I think in the __STDC_ISO_10646__ case, we should just explicitly use
sizeof (wchar_t) somewhere to choose the intermediate encoding.  I think
this will be more robust than testing some host define.

Pierre> +#define wchar_size  (&(((wchar_t) (0)) + 1) - &((char *) 0))

This doesn't seem to be used.

Tom

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [RFA] Handle cygwin wchar_t specifics
  2011-04-15 18:16 ` [RFA] Handle cygwin wchar_t specifics Tom Tromey
@ 2011-04-16 16:05   ` Pierre Muller
  2011-04-16 16:25     ` Jan Kratochvil
  2011-04-16 21:24     ` [RFA] " Tom Tromey
  2011-04-18 20:07   ` Corinna Vinschen
  1 sibling, 2 replies; 31+ messages in thread
From: Pierre Muller @ 2011-04-16 16:05 UTC (permalink / raw)
  To: 'Tom Tromey'; +Cc: gdb-patches



> -----Message d'origine-----
> De : gdb-patches-owner@sourceware.org [mailto:gdb-patches-
> owner@sourceware.org] De la part de Tom Tromey
> Envoyé : vendredi 15 avril 2011 20:15
> À : Pierre Muller
> Cc : gdb-patches@sourceware.org
> Objet : Re: [RFA] Handle cygwin wchar_t specifics
> 
> >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr>
writes:
> 
> Pierre> because of this, GDB uses "UCS-4LE"
> Pierre> for the macro INTERMEDIATE_ENCODING on Cygwin
> Pierre> (while "wchar_t" it uses for mingw32, which works well).
> 
> Ok, I see the problem.  I thought this:
> 
>     /* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4.
> 
> But this is not true!  For some values of __STDC_ISO_10646__, a 2 byte
> wide character type suffices.  In particular, Cygwin's value of 200305
> means that it corresponds to Unicode 4.0.0:
> 
>     http://www.unicode.org/versions/components-4.0.0.html
> 
> I think this might be a Cygwin bug, but it is pretty hard to wade
> through the ISO / Unicode differences and other assorted standardese to
> see.  (The reason I think it might be a bug is that Unicode 4.0.0
> defines some characters > 0xFFFF.)
> 
> Anyway, it doesn't matter if this is a Cygwin bug, since GDB's
> assumption here is wrong anyway.
OK. 
> Pierre> The patch below fixes this by
> Pierre> explicitly setting the UCS size to two for Windows targets.
> 
> I think in the __STDC_ISO_10646__ case, we should just explicitly use
> sizeof (wchar_t) somewhere to choose the intermediate encoding.  I think
> this will be more robust than testing some host define.

  Yes, but the problem is that it is not possible to use sizeof
inside a #if conditions :(
 
> Pierre> +#define wchar_size  (&(((wchar_t) (0)) + 1) - &((char *) 0))
> 
> This doesn't seem to be used.

  I googled around to see if there is a workaround to this
limitation of not being able to use sizeof inside conditionals
and then forgot to remove it...

  Do you know of any way to get the size of wchar_t?
  I suspect we will need to add this to the configure scripts...
But I am still very bad on that part.

  Help most welcome,

Pierre


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFA] Handle cygwin wchar_t specifics
  2011-04-16 16:05   ` Pierre Muller
@ 2011-04-16 16:25     ` Jan Kratochvil
  2011-04-16 21:29       ` [RFA-v2] " Pierre Muller
       [not found]       ` <000001cbfc7d$3f67f440$be37dcc0$%muller@ics-cnrs.unistra.fr>
  2011-04-16 21:24     ` [RFA] " Tom Tromey
  1 sibling, 2 replies; 31+ messages in thread
From: Jan Kratochvil @ 2011-04-16 16:25 UTC (permalink / raw)
  To: Pierre Muller; +Cc: 'Tom Tromey', gdb-patches

On Sat, 16 Apr 2011 18:05:19 +0200, Pierre Muller wrote:
>   Do you know of any way to get the size of wchar_t?
>   I suspect we will need to add this to the configure scripts...
> But I am still very bad on that part.

I do not follow the platform specifics of the problem but this specific
technical task is attached.  On GNU/Linux I get in config.h:

/* The size of `wchar_t', as computed by sizeof. */
#define SIZEOF_WCHAR_T 4

info '(autoconf)AC_CHECK_SIZEOF'
Fro cross-compilation the default is 4, for some unknown error it is 0.


HTH,
Jan


--- a/gdb/config.in
+++ b/gdb/config.in
@@ -804,6 +804,9 @@
 /* The size of `long', as computed by sizeof. */
 #undef SIZEOF_LONG
 
+/* The size of `wchar_t', as computed by sizeof. */
+#undef SIZEOF_WCHAR_T
+
 /* Define to l, ll, u, ul, ull, etc., as suitable for constants of type
    'size_t'. */
 #undef SIZE_T_SUFFIX
--- a/gdb/configure
+++ b/gdb/configure
@@ -11637,6 +11637,44 @@ _ACEOF
 
 fi
 
+# The cast to long int works around a bug in the HP C Compiler
+# version HP92453-01 B.11.11.23709.GP, which incorrectly rejects
+# declarations like `int a3[[(sizeof (unsigned char)) >= 0]];'.
+# This bug is HP SR number 8606223364.
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking size of wchar_t" >&5
+$as_echo_n "checking size of wchar_t... " >&6; }
+if test "${ac_cv_sizeof_wchar_t+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+  if ac_fn_c_compute_int "$LINENO" "(long int) (sizeof (wchar_t))" "ac_cv_sizeof_wchar_t"        "
+#include <wchar.h>
+#include <wctype.h>
+
+"; then :
+
+else
+  if test "$ac_cv_type_wchar_t" = yes; then
+     { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+$as_echo "$as_me: error: in \`$ac_pwd':" >&2;}
+{ as_fn_set_status 77
+as_fn_error "cannot compute sizeof (wchar_t)
+See \`config.log' for more details." "$LINENO" 5; }; }
+   else
+     ac_cv_sizeof_wchar_t=0
+   fi
+fi
+
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_sizeof_wchar_t" >&5
+$as_echo "$ac_cv_sizeof_wchar_t" >&6; }
+
+
+
+cat >>confdefs.h <<_ACEOF
+#define SIZEOF_WCHAR_T $ac_cv_sizeof_wchar_t
+_ACEOF
+
+
 
 # ------------------------------------- #
 # Checks for compiler characteristics.  #
--- a/gdb/configure.ac
+++ b/gdb/configure.ac
@@ -976,6 +976,10 @@ AC_CHECK_TYPES(socklen_t, [], [],
 [#include <sys/types.h>
 #include <sys/socket.h>
 ])
+AC_CHECK_SIZEOF([wchar_t], 4, [
+#include <wchar.h>
+#include <wctype.h>
+])
 
 # ------------------------------------- #
 # Checks for compiler characteristics.  #


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFA] Handle cygwin wchar_t specifics
  2011-04-16 16:05   ` Pierre Muller
  2011-04-16 16:25     ` Jan Kratochvil
@ 2011-04-16 21:24     ` Tom Tromey
  1 sibling, 0 replies; 31+ messages in thread
From: Tom Tromey @ 2011-04-16 21:24 UTC (permalink / raw)
  To: Pierre Muller; +Cc: gdb-patches

>>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes:

Pierre>   Yes, but the problem is that it is not possible to use sizeof
Pierre> inside a #if conditions :(

Pierre>   Do you know of any way to get the size of wchar_t?
Pierre>   I suspect we will need to add this to the configure scripts...
Pierre> But I am still very bad on that part.

In this case you don't need to know the size during preprocessing.
You can do something like:

extern const char *intermediate_encoding;
#define INTERMEDIATE_ENCODING intermediate_encoding

... and then initialize the string in _initialize_charset, under the
appropriate conditions.  The only caveat is to check the case where the
size is neither 2 nor 4.

Tom

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFA-v2] Handle cygwin wchar_t specifics
  2011-04-16 16:25     ` Jan Kratochvil
@ 2011-04-16 21:29       ` Pierre Muller
  2011-04-16 22:35         ` Jan Kratochvil
       [not found]       ` <000001cbfc7d$3f67f440$be37dcc0$%muller@ics-cnrs.unistra.fr>
  1 sibling, 1 reply; 31+ messages in thread
From: Pierre Muller @ 2011-04-16 21:29 UTC (permalink / raw)
  To: 'Jan Kratochvil'; +Cc: 'Tom Tromey', gdb-patches

  Thanks Jan,

I was able thanks to your code to generate a patch that seems to work for
me.

  Goal of the patch is to generate for
INTERMEDIATE_ENCODING a name that is
"UCS-XYY"
where YY is LE or BE (which was already handled before
but where X is either 4 or 2 depending of the
size of type wchar_t type.

  I don't know if the configure change is completely generated
by the small configure.ac change. If this is true, the ChangeLog
entry should probably just say Regenerate for configure.

  Is this patch OK?
Should it be include in 7.3 branch?

Pierre


2011-04-16  Pierre Muller  <muller@ics.u-strasbg.fr>

	Correct INTERMEDIATE_ENCODING macro setup for systems using
	2 byte "wchar_t" type.
	* gdb_wchar.h: Use new SIZEOF_WCHAR_T macro to set
	INTERMEDIATE_ENCODING macro value.
	* config.in: Add  SIZEOF_WCHAR_T macro.
	* configure.ac: Add rule for SIZEOF_WCHAR_T.
	* configure: Likewise.
	
	
Index: config.in
===================================================================
RCS file: /cvs/src/src/gdb/config.in,v
retrieving revision 1.125
diff -u -p -r1.125 config.in
--- config.in	17 Mar 2011 13:19:09 -0000	1.125
+++ config.in	16 Apr 2011 21:19:47 -0000
@@ -804,6 +804,9 @@
 /* The size of `long', as computed by sizeof. */
 #undef SIZEOF_LONG
 
+/* The size of `wchar_t', as computed by sizeof. */
+#undef SIZEOF_WCHAR_T
+
 /* Define to l, ll, u, ul, ull, etc., as suitable for constants of type
    'size_t'. */
 #undef SIZE_T_SUFFIX
Index: configure
===================================================================
RCS file: /cvs/src/src/gdb/configure,v
retrieving revision 1.329
diff -u -p -r1.329 configure
--- configure	17 Mar 2011 13:19:09 -0000	1.329
+++ configure	16 Apr 2011 21:19:52 -0000
@@ -11637,6 +11637,44 @@ _ACEOF
 
 fi
 
+# The cast to long int works around a bug in the HP C Compiler
+# version HP92453-01 B.11.11.23709.GP, which incorrectly rejects
+# declarations like `int a3[[(sizeof (unsigned char)) >= 0]];'.
+# This bug is HP SR number 8606223364.
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking size of wchar_t" >&5
+$as_echo_n "checking size of wchar_t... " >&6; }
+if test "${ac_cv_sizeof_wchar_t+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+  if ac_fn_c_compute_int "$LINENO" "(long int) (sizeof (wchar_t))"
"ac_cv_sizeof_wchar_t"        "
+#include <wchar.h>
+#include <wctype.h>
+
+"; then :
+
+else
+  if test "$ac_cv_type_wchar_t" = yes; then
+     { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+$as_echo "$as_me: error: in \`$ac_pwd':" >&2;}
+{ as_fn_set_status 77
+as_fn_error "cannot compute sizeof (wchar_t)
+See \`config.log' for more details." "$LINENO" 5; }; }
+   else
+     ac_cv_sizeof_wchar_t=0
+   fi
+fi
+
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_sizeof_wchar_t" >&5
+$as_echo "$ac_cv_sizeof_wchar_t" >&6; }
+
+
+
+cat >>confdefs.h <<_ACEOF
+#define SIZEOF_WCHAR_T $ac_cv_sizeof_wchar_t
+_ACEOF
+
+
 
 # ------------------------------------- #
 # Checks for compiler characteristics.  #
Index: configure.ac
===================================================================
RCS file: /cvs/src/src/gdb/configure.ac,v
retrieving revision 1.144
diff -u -p -r1.144 configure.ac
--- configure.ac	17 Mar 2011 13:19:10 -0000	1.144
+++ configure.ac	16 Apr 2011 21:19:52 -0000
@@ -976,6 +976,10 @@ AC_CHECK_TYPES(socklen_t, [], [],
 [#include <sys/types.h>
 #include <sys/socket.h>
 ])
+AC_CHECK_SIZEOF([wchar_t], 4, [
+#include <wchar.h>
+#include <wctype.h>
+])
 
 # ------------------------------------- #
 # Checks for compiler characteristics.  #
Index: gdb_wchar.h
===================================================================
RCS file: /cvs/src/src/gdb/gdb_wchar.h,v
retrieving revision 1.6
diff -u -p -r1.6 gdb_wchar.h
--- gdb_wchar.h	1 Jan 2011 15:33:05 -0000	1.6
+++ gdb_wchar.h	16 Apr 2011 21:19:57 -0000
@@ -60,6 +60,7 @@
 
 #include <wchar.h>
 #include <wctype.h>
+#include "config.h"
 
 typedef wchar_t gdb_wchar_t;
 typedef wint_t gdb_wint_t;
@@ -71,20 +72,26 @@ typedef wint_t gdb_wint_t;
 #define gdb_WEOF WEOF
 
 #define LCST(X) L ## X
+/* Transform SIZEOF_WCHAR_T into a string. This requires a two-level
+   macro.  This macro is used to generate INTERMEDIATE_ENCODING below.  */
+#define STR_VAL1(X) #X
+#define STR_VAL(X) STR_VAL1(X)
+#define SIZEOF_WCHAR_T_STR STR_VAL(SIZEOF_WCHAR_T)
 
-/* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4.
+/* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4 or
UCS-2.
+   We use the version having the same size as "wchar_t" type.
    We exploit this fact in the hope that there are hosts that define
    this but which do not support "wchar_t" as an encoding argument to
    iconv_open.  We put the endianness into the encoding name to avoid
    hosts that emit a BOM when the unadorned name is used.  */
 #if defined (__STDC_ISO_10646__)
-#if WORDS_BIGENDIAN
-#define INTERMEDIATE_ENCODING "UCS-4BE"
-#else
-#define INTERMEDIATE_ENCODING "UCS-4LE"
-#endif
+#  if WORDS_BIGENDIAN
+#    define INTERMEDIATE_ENCODING "UCS-" SIZEOF_WCHAR_T_STR "BE"
+#  else
+#    define INTERMEDIATE_ENCODING "UCS-" SIZEOF_WCHAR_T_STR "LE"
+#  endif
 #elif defined (_LIBICONV_VERSION) && _LIBICONV_VERSION >= 0x108
-#define INTERMEDIATE_ENCODING "wchar_t"
+#  define INTERMEDIATE_ENCODING "wchar_t"
 #else
 /* This shouldn't happen, because the earlier #if should have filtered
    out this case.  */


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFA-v2] Handle cygwin wchar_t specifics
  2011-04-16 21:29       ` [RFA-v2] " Pierre Muller
@ 2011-04-16 22:35         ` Jan Kratochvil
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Kratochvil @ 2011-04-16 22:35 UTC (permalink / raw)
  To: Pierre Muller; +Cc: 'Tom Tromey', gdb-patches

On Sat, 16 Apr 2011 23:28:35 +0200, Pierre Muller wrote:
> --- config.in	17 Mar 2011 13:19:09 -0000	1.125
> +++ config.in	16 Apr 2011 21:19:47 -0000
> --- configure	17 Mar 2011 13:19:09 -0000	1.329
> +++ configure	16 Apr 2011 21:19:52 -0000

You do not have to post autogenerated files here, it was posted only due to
your request for your convenience before.


> --- gdb_wchar.h	1 Jan 2011 15:33:05 -0000	1.6
> +++ gdb_wchar.h	16 Apr 2011 21:19:57 -0000
> @@ -60,6 +60,7 @@
>  
>  #include <wchar.h>
>  #include <wctype.h>
> +#include "config.h"

I would say rather #include "defs.h" and as the first #include line.


> +#define STR_VAL1(X) #X
> +#define STR_VAL(X) STR_VAL1(X)

This is called XSTRING in include/symcat.h.


> +#define SIZEOF_WCHAR_T_STR STR_VAL(SIZEOF_WCHAR_T)
                                    ^^ space - coding style


> -/* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4.
> +/* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4 or
> UCS-2.

Corrupted diff by your mailer word wrapping.


These are just technical notes, no real review/approval.


Thanks,
Jan


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFA-v2] Handle cygwin wchar_t specifics
       [not found]       ` <000001cbfc7d$3f67f440$be37dcc0$%muller@ics-cnrs.unistra.fr>
@ 2011-04-17  2:55         ` Eli Zaretskii
  2011-04-18 10:36           ` Pierre Muller
                             ` (3 more replies)
  0 siblings, 4 replies; 31+ messages in thread
From: Eli Zaretskii @ 2011-04-17  2:55 UTC (permalink / raw)
  To: Pierre Muller; +Cc: jan.kratochvil, tromey, gdb-patches

> From: "Pierre Muller" <pierre.muller@ics-cnrs.unistra.fr>
> Cc: "'Tom Tromey'" <tromey@redhat.com>, <gdb-patches@sourceware.org>
> Date: Sat, 16 Apr 2011 23:28:35 +0200
> 
> -/* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4.
> +/* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4 or UCS-2.

Please use UTF-16, not UCS-2.  What Windows uses is the former.  The
latter is the old name from the days when Unicode covered only the
BMP; it was superseded by UTF-16 that covers more than that.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [RFA-v2] Handle cygwin wchar_t specifics
  2011-04-17  2:55         ` Eli Zaretskii
@ 2011-04-18 10:36           ` Pierre Muller
       [not found]           ` <00a801cbfdb4$551214a0$ff363de0$%muller@ics-cnrs.unistra.fr>
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 31+ messages in thread
From: Pierre Muller @ 2011-04-18 10:36 UTC (permalink / raw)
  To: 'Eli Zaretskii'; +Cc: jan.kratochvil, tromey, gdb-patches

  Hi Eli,

> -----Message d'origine-----
> De : gdb-patches-owner@sourceware.org [mailto:gdb-patches-
> owner@sourceware.org] De la part de Eli Zaretskii
> Envoyé : dimanche 17 avril 2011 04:56
> À : Pierre Muller
> Cc : jan.kratochvil@redhat.com; tromey@redhat.com; gdb-
> patches@sourceware.org
> Objet : Re: [RFA-v2] Handle cygwin wchar_t specifics
> 
> > From: "Pierre Muller" <pierre.muller@ics-cnrs.unistra.fr>
> > Cc: "'Tom Tromey'" <tromey@redhat.com>, <gdb-patches@sourceware.org>
> > Date: Sat, 16 Apr 2011 23:28:35 +0200
> >
> > -/* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4.
> > +/* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4 or
> UCS-2.
> 
> Please use UTF-16, not UCS-2.  What Windows uses is the former.  The
> latter is the old name from the days when Unicode covered only the
> BMP; it was superseded by UTF-16 that covers more than that.

  Are you sure this is correct?
I tried what you said, but "UTF-16" seems to mean "UTF-16BE"
while UTF-16LE" seems to do a better job.


  But if UTF-16 is better than UCS-2,
shouldn't we also favor UTF-32 over UCS-4?
 
  I will send a new RFA using UTF16-LE for windows shortly.

Pierre


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFA-v2] Handle cygwin wchar_t specifics
       [not found]           ` <00a801cbfdb4$551214a0$ff363de0$%muller@ics-cnrs.unistra.fr>
@ 2011-04-18 10:57             ` Eli Zaretskii
  0 siblings, 0 replies; 31+ messages in thread
From: Eli Zaretskii @ 2011-04-18 10:57 UTC (permalink / raw)
  To: Pierre Muller; +Cc: jan.kratochvil, tromey, gdb-patches

> From: "Pierre Muller" <pierre.muller@ics-cnrs.unistra.fr>
> Cc: <jan.kratochvil@redhat.com>, <tromey@redhat.com>,        <gdb-patches@sourceware.org>
> Date: Mon, 18 Apr 2011 12:35:26 +0200
> 
> > > -/* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4.
> > > +/* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4 or
> > UCS-2.
> > 
> > Please use UTF-16, not UCS-2.  What Windows uses is the former.  The
> > latter is the old name from the days when Unicode covered only the
> > BMP; it was superseded by UTF-16 that covers more than that.
> 
>   Are you sure this is correct?
> I tried what you said, but "UTF-16" seems to mean "UTF-16BE"
> while UTF-16LE" seems to do a better job.

UTF-16 means both LE and BE varieties.  I meant to use UTF-16 in the
comment, instead of UCS-2.  In the code, you need to use the variety
that suits the endianness of the host platform.

>   But if UTF-16 is better than UCS-2,
> shouldn't we also favor UTF-32 over UCS-4?

IMO, there's no need, since Unicode still didn't exceed 32 bits.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFA-v3] Handle cygwin wchar_t specifics
  2011-04-17  2:55         ` Eli Zaretskii
  2011-04-18 10:36           ` Pierre Muller
       [not found]           ` <00a801cbfdb4$551214a0$ff363de0$%muller@ics-cnrs.unistra.fr>
@ 2011-04-18 15:14           ` Pierre Muller
       [not found]           ` <21014.6501930014$1303139687@news.gmane.org>
  3 siblings, 0 replies; 31+ messages in thread
From: Pierre Muller @ 2011-04-18 15:14 UTC (permalink / raw)
  To: 'Eli Zaretskii'; +Cc: jan.kratochvil, tromey, gdb-patches

Here is a new version of my patch that should only
change something for Windows-OS hosts.

  This patch also changes the intermediate_encoding for mingw hosts,
  from "wchar_t" to "UTF-16LE", but this seems to work nicely
for both mingw32 and mingw64 (and only if iconv is found,
otherwise gdb_wchar_t is simply char and phony functions are used).
 
  The change might nevertheless be restricted to __CYGWIN__ only
if you think that this is a better option.

  Comments?

Pierre
 

2011-04-16  Pierre Muller  <muller@ics.u-strasbg.fr>

	Correct INTERMEDIATE_ENCODING macro setup for Windows OS using
	2 byte "wchar_t" type.
	* gdb_wchar.h (DEFAULT_INTERMEDIATE_ENCODING): New macro.
	(INTERMEDIATE_ENCODING): Change macro value to... 
	(intermediate_encoding): New external.
	* charset.c (intermediate_encoding): New variable.
	(_initialize_charset): Assign default value of intermediate_encoding
	using DEFAULT_INTERMEDAIT_ENCODING. Override this for
	Windows OS system if size of "gdb_wchar_t" type is two.
	
Index: gdb_wchar.h
===================================================================
RCS file: /cvs/src/src/gdb/gdb_wchar.h,v
retrieving revision 1.6
diff -u -p -r1.6 gdb_wchar.h
--- gdb_wchar.h	1 Jan 2011 15:33:05 -0000	1.6
+++ gdb_wchar.h	18 Apr 2011 15:07:03 -0000
@@ -79,12 +79,12 @@ typedef wint_t gdb_wint_t;
    hosts that emit a BOM when the unadorned name is used.  */
 #if defined (__STDC_ISO_10646__)
 #if WORDS_BIGENDIAN
-#define INTERMEDIATE_ENCODING "UCS-4BE"
+#define DEFAULT_INTERMEDIATE_ENCODING "UCS-4BE"
 #else
-#define INTERMEDIATE_ENCODING "UCS-4LE"
+#define DEFAULT_INTERMEDIATE_ENCODING "UCS-4LE"
 #endif
 #elif defined (_LIBICONV_VERSION) && _LIBICONV_VERSION >= 0x108
-#define INTERMEDIATE_ENCODING "wchar_t"
+#define DEFAULT_INTERMEDIATE_ENCODING "wchar_t"
 #else
 /* This shouldn't happen, because the earlier #if should have filtered
    out this case.  */
@@ -115,11 +115,14 @@ typedef int gdb_wint_t;
    also providing a phony iconv, we might as well just stick with
    "wchar_t".  */
 #ifdef PHONY_ICONV
-#define INTERMEDIATE_ENCODING "wchar_t"
+#define DEFAULT_INTERMEDIATE_ENCODING "wchar_t"
 #else
-#define INTERMEDIATE_ENCODING host_charset ()
+#define DEFAULT_INTERMEDIATE_ENCODING host_charset ()
 #endif
 
 #endif
 
+#define INTERMEDIATE_ENCODING intermediate_encoding
+extern const char *intermediate_encoding;
+
 #endif /* GDB_WCHAR_H */
Index: charset.c
===================================================================
RCS file: /cvs/src/src/gdb/charset.c,v
retrieving revision 1.43
diff -u -p -r1.43 charset.c
--- charset.c	11 Jan 2011 15:10:01 -0000	1.43
+++ charset.c	18 Apr 2011 15:07:03 -0000
@@ -206,6 +206,7 @@ phony_iconv (iconv_t utf_flag, const cha
 #define GDB_DEFAULT_TARGET_WIDE_CHARSET "UTF-32"
 #endif
 
+const char *intermediate_encoding = NULL;
 static const char *auto_host_charset_name = GDB_DEFAULT_HOST_CHARSET;
 static const char *host_charset_name = "auto";
 static void
@@ -935,7 +936,7 @@ _initialize_charset (void)
     charset_enum = default_charset_names;
 
 #ifndef PHONY_ICONV
-#ifdef HAVE_LANGINFO_CODESET
+# ifdef HAVE_LANGINFO_CODESET
   /* The result of nl_langinfo may be overwritten later.  This may
      leak a little memory, if the user later changes the host charset,
      but that doesn't matter much.  */
@@ -946,7 +947,7 @@ _initialize_charset (void)
   if (!strcmp (auto_host_charset_name, "646") || !*auto_host_charset_name)
     auto_host_charset_name = "ASCII";
   auto_target_charset_name = auto_host_charset_name;
-#elif defined (USE_WIN32API)
+# elif defined (USE_WIN32API)
   {
     /* "CP" + x<=5 digits + paranoia.  */
     static char w32_host_default_charset[16];
@@ -956,8 +957,14 @@ _initialize_charset (void)
     auto_host_charset_name = w32_host_default_charset;
     auto_target_charset_name = auto_host_charset_name;
   }
+# endif
 #endif
-#endif
+
+  intermediate_encoding = DEFAULT_INTERMEDIATE_ENCODING;
+# if defined (USE_WIN32API) || defined (__CYGWIN__)
+  if (sizeof (gdb_wchar_t) == 2)
+    intermediate_encoding = "UTF-16LE";
+# endif
 
   add_setshow_enum_cmd ("charset", class_support,
 			charset_enum, &host_charset_name, _("\


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFA-v3] Handle cygwin wchar_t specifics
       [not found]           ` <21014.6501930014$1303139687@news.gmane.org>
@ 2011-04-18 17:18             ` Tom Tromey
  2011-04-19  9:18               ` [RFC-v4] " Pierre Muller
                                 ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Tom Tromey @ 2011-04-18 17:18 UTC (permalink / raw)
  To: Pierre Muller; +Cc: 'Eli Zaretskii', jan.kratochvil, gdb-patches

>>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes:

Pierre>   This patch also changes the intermediate_encoding for mingw hosts,
Pierre>   from "wchar_t" to "UTF-16LE", but this seems to work nicely
Pierre> for both mingw32 and mingw64 (and only if iconv is found,
Pierre> otherwise gdb_wchar_t is simply char and phony functions are used).

Pierre> -#define INTERMEDIATE_ENCODING host_charset ()
Pierre> +#define DEFAULT_INTERMEDIATE_ENCODING host_charset ()

This changes the behavior if the gdb user changes the host encoding.
This is an unusual situation, admittedly, but it seems to me that it is
just as easy to only introduce the `intermediate_encoding' global in the
UTF-{16,32} case.

Pierre> +  intermediate_encoding = DEFAULT_INTERMEDIATE_ENCODING;
Pierre> +# if defined (USE_WIN32API) || defined (__CYGWIN__)
Pierre> +  if (sizeof (gdb_wchar_t) == 2)
Pierre> +    intermediate_encoding = "UTF-16LE";
Pierre> +# endif

Here, instead of a special case for __CYGWIN__, and instead of
hard-coding the endian-ness, just use the same code for all
__STDC_ISO_10646__ platforms.  Maybe something like:

intermediate_encoding = xstrprintf ("UTF-%d%s", 8 * sizeof (wchar_t),
                                    WORDS_BIGENDIAN ? "BE" : "LE");

Tom

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFA] Handle cygwin wchar_t specifics
  2011-04-15 18:16 ` [RFA] Handle cygwin wchar_t specifics Tom Tromey
  2011-04-16 16:05   ` Pierre Muller
@ 2011-04-18 20:07   ` Corinna Vinschen
  1 sibling, 0 replies; 31+ messages in thread
From: Corinna Vinschen @ 2011-04-18 20:07 UTC (permalink / raw)
  To: gdb-patches

Hi Tom,

On Apr 15 12:15, Tom Tromey wrote:
> >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes:
> 
> Pierre> because of this, GDB uses "UCS-4LE" 
> Pierre> for the macro INTERMEDIATE_ENCODING on Cygwin 
> Pierre> (while "wchar_t" it uses for mingw32, which works well).
> 
> Ok, I see the problem.  I thought this:
> 
>     /* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4.
> 
> But this is not true!  For some values of __STDC_ISO_10646__, a 2 byte
> wide character type suffices.  In particular, Cygwin's value of 200305
> means that it corresponds to Unicode 4.0.0:
> 
>     http://www.unicode.org/versions/components-4.0.0.html
> 
> I think this might be a Cygwin bug, but it is pretty hard to wade
> through the ISO / Unicode differences and other assorted standardese to
> see.  (The reason I think it might be a bug is that Unicode 4.0.0
> defines some characters > 0xFFFF.)

I see there's another solution in the works, but just to let you know
that Bruno Haible and I discussed the definition of __STDC_ISO_10646__
on the Cygwin list back in January.  We didn't come to a conclusion
since we both interpret the standards differently, but this gives
you some insight why __STDC_ISO_10646__ is defined on Cygwin, see
http://cygwin.com/ml/cygwin/2011-01/msg00410.html, line 70ff.


Corinna

-- 
Corinna Vinschen
Cygwin Project Co-Leader
Red Hat


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC-v4] Handle cygwin wchar_t specifics
  2011-04-18 17:18             ` Tom Tromey
@ 2011-04-19  9:18               ` Pierre Muller
       [not found]               ` <004f01cbfe72$adddeb40$0999c1c0$%muller@ics-cnrs.unistra.fr>
       [not found]               ` <34716.7311156683$1303204711@news.gmane.org>
  2 siblings, 0 replies; 31+ messages in thread
From: Pierre Muller @ 2011-04-19  9:18 UTC (permalink / raw)
  To: 'Tom Tromey'; +Cc: gdb-patches



> -----Message d'origine-----
> De : gdb-patches-owner@sourceware.org [mailto:gdb-patches-
> owner@sourceware.org] De la part de Tom Tromey
> Envoyé : lundi 18 avril 2011 19:18
> À : Pierre Muller
> Cc : 'Eli Zaretskii'; jan.kratochvil@redhat.com;
gdb-patches@sourceware.org
> Objet : Re: [RFA-v3] Handle cygwin wchar_t specifics
> 
> >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr>
writes:
> 
> Pierre>   This patch also changes the intermediate_encoding for mingw
hosts,
> Pierre>   from "wchar_t" to "UTF-16LE", but this seems to work nicely
> Pierre> for both mingw32 and mingw64 (and only if iconv is found,
> Pierre> otherwise gdb_wchar_t is simply char and phony functions are
used).
> 
> Pierre> -#define INTERMEDIATE_ENCODING host_charset ()
> Pierre> +#define DEFAULT_INTERMEDIATE_ENCODING host_charset ()
> 
> This changes the behavior if the gdb user changes the host encoding.
> This is an unusual situation, admittedly, but it seems to me that it is
> just as easy to only introduce the `intermediate_encoding' global in the
> UTF-{16,32} case.
> 
> Pierre> +  intermediate_encoding = DEFAULT_INTERMEDIATE_ENCODING;
> Pierre> +# if defined (USE_WIN32API) || defined (__CYGWIN__)
> Pierre> +  if (sizeof (gdb_wchar_t) == 2)
> Pierre> +    intermediate_encoding = "UTF-16LE";
> Pierre> +# endif
> 
> Here, instead of a special case for __CYGWIN__, and instead of
> hard-coding the endian-ness, just use the same code for all
> __STDC_ISO_10646__ platforms.  Maybe something like:
> 
> intermediate_encoding = xstrprintf ("UTF-%d%s", 8 * sizeof (wchar_t),
>                                     WORDS_BIGENDIAN ? "BE" : "LE");

  Three problems here:

1) we should really use "gdb_wchar_t" type, not "wchar_t"
2) If sizeof(gdb_wchar_t) == 1
I don't think that UTF-8LE and UTF-8BE exist, do they?
At least they are not in the iconv -l list for current cygwin.
3) WORD_BIGENDIAN is not defined at all on Cygwin,
so that your code would probably not compile.

A further question is whether UTF-32 is always supported...

Below is yet another proposal:
it transforms INTERMEDIATE_ENCODING macro into a call to
intermediate_encoding function.
This functions handles especially the case when gdb_wchar_t is 2 byte long,
by trying UTF-16XE (with X equal L or B), and if this one is not
in the list of supported charsets, tries UCS-2XE.

  As there is apparently no advantage of using UTF-32 over UCS-4 (according
to Eli)
I did not extend the change to the 4 byte case.

  Comments welcome,

Pierre Muller


2011-04-19  Pierre Muller  <muller@ics.u-strasbg.fr>

	* gdb_wchar.h (DEFAULT_INTERMEDIATE_ENCODING): New macro.
	(INTERMEDIATE_ENCODING): Change value to intermediate_encoding
	function call.
	(intermediate_encoding): New prototype.
	* charset.c (intermediate_encoding): New function.

Index: charset.c
===================================================================
RCS file: /cvs/src/src/gdb/charset.c,v
retrieving revision 1.43
diff -u -p -r1.43 charset.c
--- charset.c	11 Jan 2011 15:10:01 -0000	1.43
+++ charset.c	19 Apr 2011 09:05:43 -0000
@@ -922,6 +922,50 @@ default_auto_wide_charset (void)
   return GDB_DEFAULT_TARGET_WIDE_CHARSET;
 }
 
+#ifdef WORDS_BIGENDIAN
+#define ENDIAN_SUFFIX "BE"
+#else
+#define ENDIAN_SUFFIX "LE"
+#endif
+
+const char *
+intermediate_encoding (void)
+{
+  if (sizeof (gdb_wchar_t) == 2)
+    {
+      static const char *stored_result = NULL;
+      const char *result;
+      int i;
+
+      if (stored_result)
+	return stored_result;
+      result = "UTF-16" ENDIAN_SUFFIX;
+      /* Check that the name is in the list of handled charsets.  */
+      for (i = 0; charset_enum[i]; i++)
+	{
+	  if (strcmp (result, charset_enum[i]) == 0)
+	    {
+	      stored_result = result;
+	      return result;
+	    }
+	}
+      /* Second try, with UCS-2 type.  */
+      result = "UCS-2" ENDIAN_SUFFIX;
+      /* Check that the name is in the list of handled charsets.  */
+      for (i = 0; charset_enum[i]; i++)
+	{
+	  if (strcmp (result, charset_enum[i]) == 0)
+	    {
+	      stored_result = result;
+	      return result;
+	    }
+	}
+    }
+  /* if gdb_wchar_t is not of size 2, or if "UTF-16XE" and "UCS-2XE" are
+     not known, use DEFAULT_INTERMEDIATE_ENCODING macro.  */
+  return DEFAULT_INTERMEDIATE_ENCODING;
+}
+
 void
 _initialize_charset (void)
 {
Index: gdb_wchar.h
===================================================================
RCS file: /cvs/src/src/gdb/gdb_wchar.h,v
retrieving revision 1.6
diff -u -p -r1.6 gdb_wchar.h
--- gdb_wchar.h	1 Jan 2011 15:33:05 -0000	1.6
+++ gdb_wchar.h	19 Apr 2011 09:05:43 -0000
@@ -79,12 +79,12 @@ typedef wint_t gdb_wint_t;
    hosts that emit a BOM when the unadorned name is used.  */
 #if defined (__STDC_ISO_10646__)
 #if WORDS_BIGENDIAN
-#define INTERMEDIATE_ENCODING "UCS-4BE"
+#define DEFAULT_INTERMEDIATE_ENCODING "UCS-4BE"
 #else
-#define INTERMEDIATE_ENCODING "UCS-4LE"
+#define DEFAULT_INTERMEDIATE_ENCODING "UCS-4LE"
 #endif
 #elif defined (_LIBICONV_VERSION) && _LIBICONV_VERSION >= 0x108
-#define INTERMEDIATE_ENCODING "wchar_t"
+#define DEFAULT_INTERMEDIATE_ENCODING "wchar_t"
 #else
 /* This shouldn't happen, because the earlier #if should have filtered
    out this case.  */
@@ -115,11 +115,14 @@ typedef int gdb_wint_t;
    also providing a phony iconv, we might as well just stick with
    "wchar_t".  */
 #ifdef PHONY_ICONV
-#define INTERMEDIATE_ENCODING "wchar_t"
+#define DEFAULT_INTERMEDIATE_ENCODING "wchar_t"
 #else
-#define INTERMEDIATE_ENCODING host_charset ()
+#define DEFAULT_INTERMEDIATE_ENCODING host_charset ()
 #endif
 
 #endif
 
+#define INTERMEDIATE_ENCODING intermediate_encoding ()
+const char *intermediate_encoding (void);
+
 #endif /* GDB_WCHAR_H */


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC-v4] Handle cygwin wchar_t specifics
       [not found]               ` <004f01cbfe72$adddeb40$0999c1c0$%muller@ics-cnrs.unistra.fr>
@ 2011-04-19  9:34                 ` Eli Zaretskii
  0 siblings, 0 replies; 31+ messages in thread
From: Eli Zaretskii @ 2011-04-19  9:34 UTC (permalink / raw)
  To: Pierre Muller; +Cc: tromey, gdb-patches

> From: "Pierre Muller" <pierre.muller@ics-cnrs.unistra.fr>
> Cc: <gdb-patches@sourceware.org>
> Date: Tue, 19 Apr 2011 11:17:59 +0200
> 
> 2) If sizeof(gdb_wchar_t) == 1
> I don't think that UTF-8LE and UTF-8BE exist, do they?

No.  Single-byte encodings are by definition endian-less (because
endianness is about byte order in multibyte words).


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC-v4] Handle cygwin wchar_t specifics
       [not found]               ` <34716.7311156683$1303204711@news.gmane.org>
@ 2011-04-19 13:19                 ` Tom Tromey
  2011-04-19 13:56                   ` [RFC-v5] " Pierre Muller
       [not found]                   ` <16656.7281041809$1303221408@news.gmane.org>
  0 siblings, 2 replies; 31+ messages in thread
From: Tom Tromey @ 2011-04-19 13:19 UTC (permalink / raw)
  To: Pierre Muller; +Cc: gdb-patches

>>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes:

Pierre> 1) we should really use "gdb_wchar_t" type, not "wchar_t"

Yeah.

Pierre> 2) If sizeof(gdb_wchar_t) == 1
Pierre> I don't think that UTF-8LE and UTF-8BE exist, do they?
Pierre> At least they are not in the iconv -l list for current cygwin.

A platform where this is true should not define __STDC_ISO_10646__.
You might as well just assert that the size is 2 or 4.

Pierre> 3) WORD_BIGENDIAN is not defined at all on Cygwin,
Pierre> so that your code would probably not compile.

Yeah, I forgot, you need #if.  See config.in.

Pierre> A further question is whether UTF-32 is always supported...

If someone can find a platform where wchar_t is 4 bytes, where
__STDC_ISO_10646__ is defined, and where UTF-32 is not understood, then
we can complain bitterly and change the code again.

Pierre> Below is yet another proposal:
Pierre> it transforms INTERMEDIATE_ENCODING macro into a call to
Pierre> intermediate_encoding function.

I'd prefer it if the new code is only used in the __STDC_ISO_10646__
case.

Pierre> +#ifdef WORDS_BIGENDIAN

#if

Pierre> +const char *
Pierre> +intermediate_encoding (void)

New functions require an introductory comment.

Pierre>  #ifdef PHONY_ICONV
Pierre> -#define INTERMEDIATE_ENCODING "wchar_t"
Pierre> +#define DEFAULT_INTERMEDIATE_ENCODING "wchar_t"

I don't think DEFAULT_INTERMEDIATE_ENCODING is needed.

Tom

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC-v5] Handle cygwin wchar_t specifics
  2011-04-19 13:19                 ` Tom Tromey
@ 2011-04-19 13:56                   ` Pierre Muller
       [not found]                   ` <16656.7281041809$1303221408@news.gmane.org>
  1 sibling, 0 replies; 31+ messages in thread
From: Pierre Muller @ 2011-04-19 13:56 UTC (permalink / raw)
  To: 'Tom Tromey'; +Cc: gdb-patches



> -----Message d'origine-----
> De : gdb-patches-owner@sourceware.org [mailto:gdb-patches-
> owner@sourceware.org] De la part de Tom Tromey
> Envoyé : mardi 19 avril 2011 15:19
> À : Pierre Muller
> Cc : gdb-patches@sourceware.org
> Objet : Re: [RFC-v4] Handle cygwin wchar_t specifics
> 
> >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr>
writes:
> 
> Pierre> 1) we should really use "gdb_wchar_t" type, not "wchar_t"
> 
> Yeah.
> 
> Pierre> 2) If sizeof(gdb_wchar_t) == 1
> Pierre> I don't think that UTF-8LE and UTF-8BE exist, do they?
> Pierre> At least they are not in the iconv -l list for current cygwin.
> 
> A platform where this is true should not define __STDC_ISO_10646__.
> You might as well just assert that the size is 2 or 4.
> 
> Pierre> 3) WORD_BIGENDIAN is not defined at all on Cygwin,
> Pierre> so that your code would probably not compile.
> 
> Yeah, I forgot, you need #if.  See config.in.
> 
> Pierre> A further question is whether UTF-32 is always supported...
> 
> If someone can find a platform where wchar_t is 4 bytes, where
> __STDC_ISO_10646__ is defined, and where UTF-32 is not understood, then
> we can complain bitterly and change the code again.
> 
> Pierre> Below is yet another proposal:
> Pierre> it transforms INTERMEDIATE_ENCODING macro into a call to
> Pierre> intermediate_encoding function.
> 
> I'd prefer it if the new code is only used in the __STDC_ISO_10646__
> case.
 Done below. 
> Pierre> +#ifdef WORDS_BIGENDIAN
> 
> #if
OK, corrected below.
> Pierre> +const char *
> Pierre> +intermediate_encoding (void)
> 
> New functions require an introductory comment.
  I wrote a minimal description, feel free to improve it.
 
> Pierre>  #ifdef PHONY_ICONV
> Pierre> -#define INTERMEDIATE_ENCODING "wchar_t"
> Pierre> +#define DEFAULT_INTERMEDIATE_ENCODING "wchar_t"
> 
> I don't think DEFAULT_INTERMEDIATE_ENCODING is needed.

  I assumed you ment: not necessary if PHONY_ICONV is defined,
and this is what I changed below.
(I would personally have favored to completely remove
INTERMEDIATE_ENCODING macro and call the function directly.)
 
> Tom

  Thanks for your comments,
I tried to take all into account in the new version
below.

  Checked on cygwin (where __STDC_ISO_10646__ is defined), 
mingw32  (not defined) and mingw64 (no iconv at all,
and consequently no intermediate_encoding function).
All three allow at least printing out of version correctly.

  More comments?

Pierre


2011-04-19  Pierre Muller  <muller@ics.u-strasbg.fr>

	* gdb_wchar.h (DEFAULT_INTERMEDIATE_ENCODING): New macro.
	(INTERMEDIATE_ENCODING): Change value to intermediate_encoding
	function call.
	(intermediate_encoding): New prototype.
	* charset.c (ENDIAN_SUFFIX): New macro.
	(intermediate_encoding): New function.
	
Index: charset.c
===================================================================
RCS file: /cvs/src/src/gdb/charset.c,v
retrieving revision 1.43
diff -u -p -r1.43 charset.c
--- charset.c	11 Jan 2011 15:10:01 -0000	1.43
+++ charset.c	19 Apr 2011 13:42:54 -0000
@@ -922,6 +922,59 @@ default_auto_wide_charset (void)
   return GDB_DEFAULT_TARGET_WIDE_CHARSET;
 }
 
+
+#ifndef PHONY_ICONV
+/* Macro used for UTF or UCS endianness suffix.  */
+#if WORDS_BIGENDIAN
+#define ENDIAN_SUFFIX "BE"
+#else
+#define ENDIAN_SUFFIX "LE"
+#endif
+
+/* intermediate_encoding returns the charset unsed internally by
+   GDB to convert between target and host encodings.  */
+
+const char *
+intermediate_encoding (void)
+{
+#ifdef __STDC_ISO_10646__
+  if (sizeof (gdb_wchar_t) == 2)
+    {
+      static const char *stored_result = NULL;
+      const char *result;
+      int i;
+
+      if (stored_result)
+	return stored_result;
+      result = "UTF-16" ENDIAN_SUFFIX;
+      /* Check that the name is in the list of handled charsets.  */
+      for (i = 0; charset_enum[i]; i++)
+	{
+	  if (strcmp (result, charset_enum[i]) == 0)
+	    {
+	      stored_result = result;
+	      return result;
+	    }
+	}
+      /* Second try, with UCS-2 type.  */
+      result = "UCS-2" ENDIAN_SUFFIX;
+      /* Check that the name is in the list of handled charsets.  */
+      for (i = 0; charset_enum[i]; i++)
+	{
+	  if (strcmp (result, charset_enum[i]) == 0)
+	    {
+	      stored_result = result;
+	      return result;
+	    }
+	}
+    }
+#endif /* __STDC_ISO_10646__ */
+  /* if gdb_wchar_t is not of size 2, or if "UTF-16XE" and "UCS-2XE" are
+     not known, use DEFAULT_INTERMEDIATE_ENCODING macro.  */
+  return DEFAULT_INTERMEDIATE_ENCODING;
+}
+#endif /* not PHONY_ICONV */
+
 void
 _initialize_charset (void)
 {
Index: gdb_wchar.h
===================================================================
RCS file: /cvs/src/src/gdb/gdb_wchar.h,v
retrieving revision 1.6
diff -u -p -r1.6 gdb_wchar.h
--- gdb_wchar.h	1 Jan 2011 15:33:05 -0000	1.6
+++ gdb_wchar.h	19 Apr 2011 13:42:54 -0000
@@ -79,18 +79,20 @@ typedef wint_t gdb_wint_t;
    hosts that emit a BOM when the unadorned name is used.  */
 #if defined (__STDC_ISO_10646__)
 #if WORDS_BIGENDIAN
-#define INTERMEDIATE_ENCODING "UCS-4BE"
+#define DEFAULT_INTERMEDIATE_ENCODING "UCS-4BE"
 #else
-#define INTERMEDIATE_ENCODING "UCS-4LE"
+#define DEFAULT_INTERMEDIATE_ENCODING "UCS-4LE"
 #endif
 #elif defined (_LIBICONV_VERSION) && _LIBICONV_VERSION >= 0x108
-#define INTERMEDIATE_ENCODING "wchar_t"
+#define DEFAULT_INTERMEDIATE_ENCODING "wchar_t"
 #else
 /* This shouldn't happen, because the earlier #if should have filtered
    out this case.  */
 #error "Neither __STDC_ISO_10646__ nor _LIBICONV_VERSION defined"
 #endif
 
+#define INTERMEDIATE_ENCODING intermediate_encoding ()
+
 #else
 
 /* If we got here and have wchar_t support, we might be on a system
@@ -117,9 +119,13 @@ typedef int gdb_wint_t;
 #ifdef PHONY_ICONV
 #define INTERMEDIATE_ENCODING "wchar_t"
 #else
-#define INTERMEDIATE_ENCODING host_charset ()
+#define DEFAULT_INTERMEDIATE_ENCODING host_charset ()
+#endif
+
 #endif
 
+#ifndef PHONY_ICONV
+const char *intermediate_encoding (void);
 #endif
 
 #endif /* GDB_WCHAR_H */


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC-v5] Handle cygwin wchar_t specifics
       [not found]                   ` <16656.7281041809$1303221408@news.gmane.org>
@ 2011-04-19 17:50                     ` Tom Tromey
  2011-04-20  7:59                       ` Pierre Muller
       [not found]                       ` <420.768399681215$1303286406@news.gmane.org>
  0 siblings, 2 replies; 31+ messages in thread
From: Tom Tromey @ 2011-04-19 17:50 UTC (permalink / raw)
  To: Pierre Muller; +Cc: gdb-patches

>>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes:

Tom> I don't think DEFAULT_INTERMEDIATE_ENCODING is needed.

Pierre>   I assumed you ment: not necessary if PHONY_ICONV is defined,
Pierre> and this is what I changed below.
Pierre> (I would personally have favored to completely remove
Pierre> INTERMEDIATE_ENCODING macro and call the function directly.)

Sorry, that isn't what I meant.

All this new code is needed only in the __STDC_ISO_10646__ case.
All other cases are already handled ok.
So, I think it is best to only introduce new code along the
__STDC_ISO_10646__ branches.  Thus far your patches have touched all the
other branches -- but there is no reason to do that, and I think it just
makes it more complicated without an associated benefit.

Pierre> +#ifdef __STDC_ISO_10646__
Pierre> +  if (sizeof (gdb_wchar_t) == 2)

You might as well unify the 2 and 4 byte cases like I said earlier, and
just die for any other value.  You can use a static assert trick to make
it die during compilation, which I think is better than dying at
runtime.  E.g.:

extern char your_platform_is_bogus[(sizeof (gdb_wchar_t) == 2
                                    || sizeof (gdb_wchar_t) == 4)
                                    ? 1 : -1];

Pierre> +      /* Check that the name is in the list of handled charsets.  */
Pierre> +      for (i = 0; charset_enum[i]; i++)

I don't think this is really needed either.
Or, if you really want to do the check, do it by calling iconv_open at
initialization, and then just make gdb die early -- whatever platform
does this is really messed up.

Pierre> +  /* if gdb_wchar_t is not of size 2, or if "UTF-16XE" and "UCS-2XE" are
Pierre> +     not known, use DEFAULT_INTERMEDIATE_ENCODING macro.  */
Pierre> +  return DEFAULT_INTERMEDIATE_ENCODING;

I don't think this will generally do the right thing.
For example, your patch defines DEFAULT_INTERMEDIATE_ENCODING to
"UCS-4LE" in the !WORDS_BIGENDIAN case.  But we already know that
gdb_wchar_t has 2 bytes.  So I think this will just result in the same
bug as today.

Tom

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [RFC-v5] Handle cygwin wchar_t specifics
  2011-04-19 17:50                     ` Tom Tromey
@ 2011-04-20  7:59                       ` Pierre Muller
  2011-04-20 21:08                         ` Pedro Alves
       [not found]                       ` <420.768399681215$1303286406@news.gmane.org>
  1 sibling, 1 reply; 31+ messages in thread
From: Pierre Muller @ 2011-04-20  7:59 UTC (permalink / raw)
  To: 'Tom Tromey'; +Cc: gdb-patches

> Tom> I don't think DEFAULT_INTERMEDIATE_ENCODING is needed.
> 
> Pierre>   I assumed you ment: not necessary if PHONY_ICONV is defined,
> Pierre> and this is what I changed below.
> Pierre> (I would personally have favored to completely remove
> Pierre> INTERMEDIATE_ENCODING macro and call the function directly.)
> 
> Sorry, that isn't what I meant.
  Hopefully I got it right this time...
 
> All this new code is needed only in the __STDC_ISO_10646__ case.
> All other cases are already handled ok.
> So, I think it is best to only introduce new code along the
> __STDC_ISO_10646__ branches.  Thus far your patches have touched all the
> other branches -- but there is no reason to do that, and I think it just
> makes it more complicated without an associated benefit.
> 
> Pierre> +#ifdef __STDC_ISO_10646__
> Pierre> +  if (sizeof (gdb_wchar_t) == 2)
> 
> You might as well unify the 2 and 4 byte cases like I said earlier, and
> just die for any other value.
Done below.
>  You can use a static assert trick to make
> it die during compilation, which I think is better than dying at
> runtime.  E.g.:
> extern char your_platform_is_bogus[(sizeof (gdb_wchar_t) == 2
>                                     || sizeof (gdb_wchar_t) == 4)
>                                     ? 1 : -1];
Used below (renamed your_gdb_wchar_t_is_bogus).
 
> Pierre> +      /* Check that the name is in the list of handled charsets.
> */
> Pierre> +      for (i = 0; charset_enum[i]; i++)
> 
> I don't think this is really needed either.
> Or, if you really want to do the check, do it by calling iconv_open at
> initialization, and then just make gdb die early -- whatever platform
> does this is really messed up.
 Also done below. 
> Pierre> +  /* if gdb_wchar_t is not of size 2, or if "UTF-16XE" and "UCS-
> 2XE" are
> Pierre> +     not known, use DEFAULT_INTERMEDIATE_ENCODING macro.  */
> Pierre> +  return DEFAULT_INTERMEDIATE_ENCODING;
> 
> I don't think this will generally do the right thing.
> For example, your patch defines DEFAULT_INTERMEDIATE_ENCODING to
> "UCS-4LE" in the !WORDS_BIGENDIAN case.  But we already know that
> gdb_wchar_t has 2 bytes.  So I think this will just result in the same
> bug as today.

  I hope I now understood what you wanted:
the new code makes less changes to gdb_wchar_t.
It only uses intermediate_encoding function in the case where UCS-4LE/BE
where set before.
  To avoid having this code compiled in other cases,
I defined a new macro called USE_INTERMEDIATE_ENCODING_FUNCTION
and charset.c code changes are limited to this conditional.
  I used iconv_open to check for working charset names
and added a call to error if none is found.

  Comments?

Pierre 


2011-04-20  Pierre Muller  <muller@ics.u-strasbg.fr>

	* gdb_wchar.h (USE_INTERMEDIATE_ENCODING_FUNCTION): New macro.
	(INTERMEDIATE_ENCODING): Change value to intermediate_encoding
	function call if __STDC_ISO_10646__ macro is defined.
	(intermediate_encoding): New prototype.
	* charset.c (your_gdb_wchar_t_is_bogus): New test variable
	to generate compile time error for unsupported gdb_wchar_t
	size.
	(ENDIAN_SUFFIX): New macro.
	(intermediate_encoding): New function.
	

Index: charset.c
===================================================================
RCS file: /cvs/src/src/gdb/charset.c,v
retrieving revision 1.43
diff -u -p -r1.43 charset.c
--- charset.c	11 Jan 2011 15:10:01 -0000	1.43
+++ charset.c	20 Apr 2011 07:48:21 -0000
@@ -922,6 +922,70 @@ default_auto_wide_charset (void)
   return GDB_DEFAULT_TARGET_WIDE_CHARSET;
 }
 
+
+#ifdef USE_INTERMEDIATE_ENCODING_FUNCTION
+/* Macro used for UTF or UCS endianness suffix.  */
+#if WORDS_BIGENDIAN
+#define ENDIAN_SUFFIX "BE"
+#else
+#define ENDIAN_SUFFIX "LE"
+#endif
+
+/* The code below serves to generate a compile time error if
+   gdb_wchar_t type is not of size 2 nor 4, despite the fact that
+   macro __STDC_ISO_10646__ is defined.
+   This is better than a gdb_assert call, because GDB cannot handle
+   strings correctly if this size is different.  */
+
+static char your_gdb_wchar_t_is_bogus[(sizeof (gdb_wchar_t) == 2
+				       || sizeof (gdb_wchar_t) == 4)
+				      ? 1 : -1];
+
+/* intermediate_encoding returns the charset unsed internally by
+   GDB to convert between target and host encodings. As the test above
+   compiled, sizeof (gdb_wchar_t) is either 2 or 4 bytes.
+   UTF-16/32 is tested first, UCS-2/4 is tested as a second option,
+   otherwise an error is generated.  */
+
+const char *
+intermediate_encoding (void)
+{
+  iconv_t desc;
+  static const char *stored_result = NULL;
+  const char *result;
+  int i;
+
+  if (stored_result)
+    return stored_result;
+  result = xstrprintf ("UTF-%d%s", sizeof (gdb_wchar_t) * 8,
ENDIAN_SUFFIX);
+  /* Check that the name is supported by iconv_open.  */
+  desc = iconv_open (result, host_charset ());
+  if (desc != (iconv_t) -1)
+    {
+      iconv_close (desc);
+      stored_result = result;
+      return result;
+    }
+  /* Not valid, free the allocated memory.  */
+  xfree ((void *) result);
+  /* Second try, with UCS-2 type.  */
+  result = xstrprintf ("UCS-%d%s", sizeof (gdb_wchar_t), ENDIAN_SUFFIX);
+  /* Check that the name is supported by iconv_open.  */
+  desc = iconv_open (result, host_charset ());
+  if (desc != (iconv_t) -1)
+    {
+      iconv_close (desc);
+      stored_result = result;
+      return result;
+    }
+  /* Not valid, free the allocated memory.  */
+  xfree ((void *) result);
+  /* No valid charset found, generate error here.  */
+  error ("Unable to find a vaild charset for string conversions");
+}
+
+#endif /* USE_INTERMEDIATE_ENCODING_FUNCTION */
+
 void
 _initialize_charset (void)
 {
Index: gdb_wchar.h
===================================================================
RCS file: /cvs/src/src/gdb/gdb_wchar.h,v
retrieving revision 1.6
diff -u -p -r1.6 gdb_wchar.h
--- gdb_wchar.h	1 Jan 2011 15:33:05 -0000	1.6
+++ gdb_wchar.h	20 Apr 2011 07:48:21 -0000
@@ -78,11 +78,10 @@ typedef wint_t gdb_wint_t;
    iconv_open.  We put the endianness into the encoding name to avoid
    hosts that emit a BOM when the unadorned name is used.  */
 #if defined (__STDC_ISO_10646__)
-#if WORDS_BIGENDIAN
-#define INTERMEDIATE_ENCODING "UCS-4BE"
-#else
-#define INTERMEDIATE_ENCODING "UCS-4LE"
-#endif
+#define USE_INTERMEDIATE_ENCODING_FUNCTION
+#define INTERMEDIATE_ENCODING intermediate_encoding ()
+const char *intermediate_encoding (void);
+
 #elif defined (_LIBICONV_VERSION) && _LIBICONV_VERSION >= 0x108
 #define INTERMEDIATE_ENCODING "wchar_t"
 #else


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC-v5] Handle cygwin wchar_t specifics
       [not found]                       ` <420.768399681215$1303286406@news.gmane.org>
@ 2011-04-20 20:21                         ` Tom Tromey
  0 siblings, 0 replies; 31+ messages in thread
From: Tom Tromey @ 2011-04-20 20:21 UTC (permalink / raw)
  To: Pierre Muller; +Cc: gdb-patches

>>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes:

Pierre>   Hopefully I got it right this time...
 
Just a tiny nit left :)
Thanks for persevering.

Pierre> +  const char *result;

You can make this one just a "char *".
That will let you avoid casts in the xfree calls.

Pierre> +  error ("Unable to find a vaild charset for string conversions");

Needs _().

Ok with those changes.

Tom


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC-v5] Handle cygwin wchar_t specifics
  2011-04-20  7:59                       ` Pierre Muller
@ 2011-04-20 21:08                         ` Pedro Alves
  2011-04-21  6:57                           ` Pierre Muller
       [not found]                           ` <15550.7422438406$1303369059@news.gmane.org>
  0 siblings, 2 replies; 31+ messages in thread
From: Pedro Alves @ 2011-04-20 21:08 UTC (permalink / raw)
  To: gdb-patches; +Cc: Pierre Muller, 'Tom Tromey'

On Wednesday 20 April 2011 08:59:31, Pierre Muller wrote:
> +static char your_gdb_wchar_t_is_bogus[(sizeof (gdb_wchar_t) == 2
> +                                      || sizeof (gdb_wchar_t) == 4)
> +                                     ? 1 : -1];
> +

Didn't "extern" work?

-- 
Pedro Alves


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [RFC-v5] Handle cygwin wchar_t specifics
  2011-04-20 21:08                         ` Pedro Alves
@ 2011-04-21  6:57                           ` Pierre Muller
  2011-04-21  7:17                             ` [RFA-v6] " Pierre Muller
       [not found]                           ` <15550.7422438406$1303369059@news.gmane.org>
  1 sibling, 1 reply; 31+ messages in thread
From: Pierre Muller @ 2011-04-21  6:57 UTC (permalink / raw)
  To: 'Pedro Alves', gdb-patches; +Cc: 'Tom Tromey'

> -----Message d'origine-----
> De : gdb-patches-owner@sourceware.org [mailto:gdb-patches-
> owner@sourceware.org] De la part de Pedro Alves
> Envoyé : mercredi 20 avril 2011 23:08
> À : gdb-patches@sourceware.org
> Cc : Pierre Muller; 'Tom Tromey'
> Objet : Re: [RFC-v5] Handle cygwin wchar_t specifics
> 
> On Wednesday 20 April 2011 08:59:31, Pierre Muller wrote:
> > +static char your_gdb_wchar_t_is_bogus[(sizeof (gdb_wchar_t) == 2
> > +                                      || sizeof (gdb_wchar_t) == 4)
> > +                                     ? 1 : -1];
> > +
> 
> Didn't "extern" work?

  I didn't test it out before:
it does work on my system, but is it
sure it will compile and link correctly on
all C compilers used for GDB?

  There is apparently no trace of your_gdb_wchar_t_is_bogus
in my charset.o object file once it is made external.

  Could someone confirm that this will also compile
on other C compiler used without creating link failures?

I am perfectly willing to change this but it seemed
to me to rely on some "obscure C compiler feature"
(don't forget that I learned C to be able to support pascal
language in GDB...).

Pierre

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFA-v6] Handle cygwin wchar_t specifics
  2011-04-21  6:57                           ` Pierre Muller
@ 2011-04-21  7:17                             ` Pierre Muller
  2011-04-21  9:02                               ` Pierre Muller
       [not found]                               ` <24274.3825926029$1303376558@news.gmane.org>
  0 siblings, 2 replies; 31+ messages in thread
From: Pierre Muller @ 2011-04-21  7:17 UTC (permalink / raw)
  To: 'Pedro Alves', 'Tom Tromey'; +Cc: gdb-patches

> > Didn't "extern" work?
> 
>   I didn't test it out before:
> it does work on my system, but is it
> sure it will compile and link correctly on
> all C compilers used for GDB?
> 
>   There is apparently no trace of your_gdb_wchar_t_is_bogus
> in my charset.o object file once it is made external.
> 
>   Could someone confirm that this will also compile
> on other C compiler used without creating link failures?
> 
> I am perfectly willing to change this but it seemed
> to me to rely on some "obscure C compiler feature"
> (don't forget that I learned C to be able to support pascal
> language in GDB...).

   I am stupid..
of course this works, otherwise each object would contain the
thousands of externals defined in all the loaded headers...

  Here is a new version,
with both static to external switch for you_gdb_wchar_t_is_bogus
and Tom latest comments.

  As it is not exactly what Tom asked me to change,
I resubmit it anyhow.
 
 Pierre

PS: Should that be included in 7.3 branch?


2011-04-21  Pierre Muller  <muller@ics.u-strasbg.fr>

	* gdb_wchar.h (USE_INTERMEDIATE_ENCODING_FUNCTION): New macro.
	(INTERMEDIATE_ENCODING): Change value to intermediate_encoding
	function call if __STDC_ISO_10646__ macro is defined.
	(intermediate_encoding): New prototype.
	* charset.c (your_gdb_wchar_t_is_bogus): New extern test variable
	to generate compile time error for unsupported gdb_wchar_t size.
	(ENDIAN_SUFFIX): New macro.
	(intermediate_encoding): New function.
	

Index: gdb_wchar.h
===================================================================
RCS file: /cvs/src/src/gdb/gdb_wchar.h,v
retrieving revision 1.6
diff -u -p -r1.6 gdb_wchar.h
--- gdb_wchar.h	1 Jan 2011 15:33:05 -0000	1.6
+++ gdb_wchar.h	21 Apr 2011 07:09:52 -0000
@@ -78,11 +78,10 @@ typedef wint_t gdb_wint_t;
    iconv_open.  We put the endianness into the encoding name to avoid
    hosts that emit a BOM when the unadorned name is used.  */
 #if defined (__STDC_ISO_10646__)
-#if WORDS_BIGENDIAN
-#define INTERMEDIATE_ENCODING "UCS-4BE"
-#else
-#define INTERMEDIATE_ENCODING "UCS-4LE"
-#endif
+#define USE_INTERMEDIATE_ENCODING_FUNCTION
+#define INTERMEDIATE_ENCODING intermediate_encoding ()
+const char *intermediate_encoding (void);
+
 #elif defined (_LIBICONV_VERSION) && _LIBICONV_VERSION >= 0x108
 #define INTERMEDIATE_ENCODING "wchar_t"
 #else
Index: charset.c
===================================================================
RCS file: /cvs/src/src/gdb/charset.c,v
retrieving revision 1.43
diff -u -p -r1.43 charset.c
--- charset.c	11 Jan 2011 15:10:01 -0000	1.43
+++ charset.c	21 Apr 2011 07:09:52 -0000
@@ -922,6 +922,70 @@ default_auto_wide_charset (void)
   return GDB_DEFAULT_TARGET_WIDE_CHARSET;
 }
 
+
+#ifdef USE_INTERMEDIATE_ENCODING_FUNCTION
+/* Macro used for UTF or UCS endianness suffix.  */
+#if WORDS_BIGENDIAN
+#define ENDIAN_SUFFIX "BE"
+#else
+#define ENDIAN_SUFFIX "LE"
+#endif
+
+/* The code below serves to generate a compile time error if
+   gdb_wchar_t type is not of size 2 nor 4, despite the fact that
+   macro __STDC_ISO_10646__ is defined.
+   This is better than a gdb_assert call, because GDB cannot handle
+   strings correctly if this size is different.  */
+
+extern char your_gdb_wchar_t_is_bogus[(sizeof (gdb_wchar_t) == 2
+				       || sizeof (gdb_wchar_t) == 4)
+				      ? 1 : -1];
+
+/* intermediate_encoding returns the charset unsed internally by
+   GDB to convert between target and host encodings. As the test above
+   compiled, sizeof (gdb_wchar_t) is either 2 or 4 bytes.
+   UTF-16/32 is tested first, UCS-2/4 is tested as a second option,
+   otherwise an error is generated.  */
+
+const char *
+intermediate_encoding (void)
+{
+  iconv_t desc;
+  static const char *stored_result = NULL;
+  char *result;
+  int i;
+
+  if (stored_result)
+    return stored_result;
+  result = xstrprintf ("UTF-%d%s", sizeof (gdb_wchar_t) * 8,
ENDIAN_SUFFIX);
+  /* Check that the name is supported by iconv_open.  */
+  desc = iconv_open (result, host_charset ());
+  if (desc != (iconv_t) -1)
+    {
+      iconv_close (desc);
+      stored_result = result;
+      return result;
+    }
+  /* Not valid, free the allocated memory.  */
+  xfree (result);
+  /* Second try, with UCS-2 type.  */
+  result = xstrprintf ("UCS-%d%s", sizeof (gdb_wchar_t), ENDIAN_SUFFIX);
+  /* Check that the name is supported by iconv_open.  */
+  desc = iconv_open (result, host_charset ());
+  if (desc != (iconv_t) -1)
+    {
+      iconv_close (desc);
+      stored_result = result;
+      return result;
+    }
+  /* Not valid, free the allocated memory.  */
+  xfree (result);
+  /* No valid charset found, generate error here.  */
+  error (_("Unable to find a vaild charset for string conversions"));
+}
+
+#endif /* USE_INTERMEDIATE_ENCODING_FUNCTION */
+
 void
 _initialize_charset (void)
 {


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [RFA-v6] Handle cygwin wchar_t specifics
  2011-04-21  7:17                             ` [RFA-v6] " Pierre Muller
@ 2011-04-21  9:02                               ` Pierre Muller
       [not found]                               ` <24274.3825926029$1303376558@news.gmane.org>
  1 sibling, 0 replies; 31+ messages in thread
From: Pierre Muller @ 2011-04-21  9:02 UTC (permalink / raw)
  To: gdb-patches; +Cc: 'Pedro Alves', 'Tom Tromey'

 Whoops,

  I just ran a test on a Compile farm machine x86_64-unknown-linux-gnu.,
there was one more problem :
sizeof return type seems to be a "long unsigned int"
at least on x86_64 linux.
  Thus we do need two typecasts around 
sizeof (gdb_wchar_t) * 8
and
sizeof (gdb_wchar_t) 
in the xstrprintf parameters.

  After that change, no char related testsuite
changes appear.


Below is the modified patch that also included the needed
typecasts.

Pierre



2011-04-21  Pierre Muller  <muller@ics.u-strasbg.fr>

	* gdb_wchar.h (USE_INTERMEDIATE_ENCODING_FUNCTION): New macro.
	(INTERMEDIATE_ENCODING): Change value to intermediate_encoding
	function call if __STDC_ISO_10646__ macro is defined.
	(intermediate_encoding): New prototype.
	* charset.c (your_gdb_wchar_t_is_bogus): New extern test variable
	to generate compile time error for unsupported gdb_wchar_t size.
	(ENDIAN_SUFFIX): New macro.
	(intermediate_encoding): New function.
	
Index: src/gdb/gdb_wchar.h
===================================================================
RCS file: /cvs/src/src/gdb/gdb_wchar.h,v
retrieving revision 1.6
diff -u -p -r1.6 gdb_wchar.h
--- src/gdb/gdb_wchar.h	1 Jan 2011 15:33:05 -0000	1.6
+++ src/gdb/gdb_wchar.h	21 Apr 2011 07:35:52 -0000
@@ -78,11 +78,10 @@ typedef wint_t gdb_wint_t;
    iconv_open.  We put the endianness into the encoding name to avoid
    hosts that emit a BOM when the unadorned name is used.  */
 #if defined (__STDC_ISO_10646__)
-#if WORDS_BIGENDIAN
-#define INTERMEDIATE_ENCODING "UCS-4BE"
-#else
-#define INTERMEDIATE_ENCODING "UCS-4LE"
-#endif
+#define USE_INTERMEDIATE_ENCODING_FUNCTION
+#define INTERMEDIATE_ENCODING intermediate_encoding ()
+const char *intermediate_encoding (void);
+
 #elif defined (_LIBICONV_VERSION) && _LIBICONV_VERSION >= 0x108
 #define INTERMEDIATE_ENCODING "wchar_t"
 #else
Index: src/gdb/charset.c
===================================================================
RCS file: /cvs/src/src/gdb/charset.c,v
retrieving revision 1.43
diff -u -p -r1.43 charset.c
--- src/gdb/charset.c	11 Jan 2011 15:10:01 -0000	1.43
+++ src/gdb/charset.c	21 Apr 2011 07:35:52 -0000
@@ -922,6 +922,72 @@ default_auto_wide_charset (void)
   return GDB_DEFAULT_TARGET_WIDE_CHARSET;
 }
 
+
+#ifdef USE_INTERMEDIATE_ENCODING_FUNCTION
+/* Macro used for UTF or UCS endianness suffix.  */
+#if WORDS_BIGENDIAN
+#define ENDIAN_SUFFIX "BE"
+#else
+#define ENDIAN_SUFFIX "LE"
+#endif
+
+/* The code below serves to generate a compile time error if
+   gdb_wchar_t type is not of size 2 nor 4, despite the fact that
+   macro __STDC_ISO_10646__ is defined.
+   This is better than a gdb_assert call, because GDB cannot handle
+   strings correctly if this size is different.  */
+
+extern char your_gdb_wchar_t_is_bogus[(sizeof (gdb_wchar_t) == 2
+				       || sizeof (gdb_wchar_t) == 4)
+				      ? 1 : -1];
+
+/* intermediate_encoding returns the charset unsed internally by
+   GDB to convert between target and host encodings. As the test above
+   compiled, sizeof (gdb_wchar_t) is either 2 or 4 bytes.
+   UTF-16/32 is tested first, UCS-2/4 is tested as a second option,
+   otherwise an error is generated.  */
+
+const char *
+intermediate_encoding (void)
+{
+  iconv_t desc;
+  static const char *stored_result = NULL;
+  char *result;
+  int i;
+
+  if (stored_result)
+    return stored_result;
+  result = xstrprintf ("UTF-%d%s", (int) (sizeof (gdb_wchar_t) * 8),
+		       ENDIAN_SUFFIX);
+  /* Check that the name is supported by iconv_open.  */
+  desc = iconv_open (result, host_charset ());
+  if (desc != (iconv_t) -1)
+    {
+      iconv_close (desc);
+      stored_result = result;
+      return result;
+    }
+  /* Not valid, free the allocated memory.  */
+  xfree (result);
+  /* Second try, with UCS-2 type.  */
+  result = xstrprintf ("UCS-%d%s", (int) sizeof (gdb_wchar_t),
+		       ENDIAN_SUFFIX);
+  /* Check that the name is supported by iconv_open.  */
+  desc = iconv_open (result, host_charset ());
+  if (desc != (iconv_t) -1)
+    {
+      iconv_close (desc);
+      stored_result = result;
+      return result;
+    }
+  /* Not valid, free the allocated memory.  */
+  xfree (result);
+  /* No valid charset found, generate error here.  */
+  error (_("Unable to find a vaild charset for string conversions"));
+}
+
+#endif /* USE_INTERMEDIATE_ENCODING_FUNCTION */
+
 void
 _initialize_charset (void)
 {


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC-v5] Handle cygwin wchar_t specifics
       [not found]                           ` <15550.7422438406$1303369059@news.gmane.org>
@ 2011-04-21 14:10                             ` Tom Tromey
  0 siblings, 0 replies; 31+ messages in thread
From: Tom Tromey @ 2011-04-21 14:10 UTC (permalink / raw)
  To: Pierre Muller; +Cc: 'Pedro Alves', gdb-patches

>>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes:

Pedro> Didn't "extern" work?

Pierre>   I didn't test it out before:
Pierre> it does work on my system, but is it
Pierre> sure it will compile and link correctly on
Pierre> all C compilers used for GDB?

Yes.

Pierre> I am perfectly willing to change this but it seemed
Pierre> to me to rely on some "obscure C compiler feature"
Pierre> (don't forget that I learned C to be able to support pascal
Pierre> language in GDB...).

:-)

This is a reasonably standard idiom for "static assert".

Tom

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFA-v6] Handle cygwin wchar_t specifics
       [not found]                               ` <24274.3825926029$1303376558@news.gmane.org>
@ 2011-04-21 14:14                                 ` Tom Tromey
  2011-04-21 14:27                                   ` Pierre Muller
       [not found]                                   ` <4691.37052209607$1303396084@news.gmane.org>
  0 siblings, 2 replies; 31+ messages in thread
From: Tom Tromey @ 2011-04-21 14:14 UTC (permalink / raw)
  To: Pierre Muller; +Cc: gdb-patches, 'Pedro Alves'

>>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes:

Pierre> Below is the modified patch that also included the needed
Pierre> typecasts.

This is ok.  Thanks.

Tom


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [RFA-v6] Handle cygwin wchar_t specifics
  2011-04-21 14:14                                 ` Tom Tromey
@ 2011-04-21 14:27                                   ` Pierre Muller
       [not found]                                   ` <4691.37052209607$1303396084@news.gmane.org>
  1 sibling, 0 replies; 31+ messages in thread
From: Pierre Muller @ 2011-04-21 14:27 UTC (permalink / raw)
  To: 'Tom Tromey'; +Cc: gdb-patches, 'Pedro Alves'



> -----Message d'origine-----
> De : gdb-patches-owner@sourceware.org [mailto:gdb-patches-
> owner@sourceware.org] De la part de Tom Tromey
> Envoyé : jeudi 21 avril 2011 16:14
> À : Pierre Muller
> Cc : gdb-patches@sourceware.org; 'Pedro Alves'
> Objet : Re: [RFA-v6] Handle cygwin wchar_t specifics
> 
> >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr>
writes:
> 
> Pierre> Below is the modified patch that also included the needed
> Pierre> typecasts.
> 
> This is ok.  Thanks.

  Thanks for the help to both of you.
Patch committed,

  What about 7.3 branch?

Pierre


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFA-v6] Handle cygwin wchar_t specifics
       [not found]                                   ` <4691.37052209607$1303396084@news.gmane.org>
@ 2011-04-21 15:06                                     ` Tom Tromey
  2011-04-21 16:39                                       ` Pierre Muller
       [not found]                                       ` <25400.1310132027$1303403986@news.gmane.org>
  0 siblings, 2 replies; 31+ messages in thread
From: Tom Tromey @ 2011-04-21 15:06 UTC (permalink / raw)
  To: Pierre Muller; +Cc: gdb-patches, 'Pedro Alves'

>>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes:

Pierre>   What about 7.3 branch?

Sounds good.

Tom


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [RFA-v6] Handle cygwin wchar_t specifics
  2011-04-21 15:06                                     ` Tom Tromey
@ 2011-04-21 16:39                                       ` Pierre Muller
       [not found]                                       ` <25400.1310132027$1303403986@news.gmane.org>
  1 sibling, 0 replies; 31+ messages in thread
From: Pierre Muller @ 2011-04-21 16:39 UTC (permalink / raw)
  To: 'Tom Tromey'; +Cc: gdb-patches, 'Pedro Alves'



> -----Message d'origine-----
> De : gdb-patches-owner@sourceware.org [mailto:gdb-patches-
> owner@sourceware.org] De la part de Tom Tromey
> Envoyé : jeudi 21 avril 2011 17:06
> À : Pierre Muller
> Cc : gdb-patches@sourceware.org; 'Pedro Alves'
> Objet : Re: [RFA-v6] Handle cygwin wchar_t specifics
> 
> >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr>
writes:
> 
> Pierre>   What about 7.3 branch?
> 
> Sounds good.


 The only thing that worries me,
is that it finally changes the default when
gdb_wchar_t is of size 4 from UCS-4XE to UTF-32XE,
is this OK for everyone?

Pierre


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFA-v6] Handle cygwin wchar_t specifics
       [not found]                                       ` <25400.1310132027$1303403986@news.gmane.org>
@ 2011-04-21 20:25                                         ` Tom Tromey
  2011-04-21 21:18                                           ` 7.3 commit " Pierre Muller
  0 siblings, 1 reply; 31+ messages in thread
From: Tom Tromey @ 2011-04-21 20:25 UTC (permalink / raw)
  To: Pierre Muller; +Cc: gdb-patches, 'Pedro Alves'

>>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes:

Pierre>  The only thing that worries me,
Pierre> is that it finally changes the default when
Pierre> gdb_wchar_t is of size 4 from UCS-4XE to UTF-32XE,
Pierre> is this OK for everyone?

I don't think it will be a problem.

Tom


^ permalink raw reply	[flat|nested] 31+ messages in thread

* 7.3 commit [RFA-v6] Handle cygwin wchar_t specifics
  2011-04-21 20:25                                         ` Tom Tromey
@ 2011-04-21 21:18                                           ` Pierre Muller
  0 siblings, 0 replies; 31+ messages in thread
From: Pierre Muller @ 2011-04-21 21:18 UTC (permalink / raw)
  To: 'Tom Tromey'; +Cc: gdb-patches, 'Pedro Alves'

> -----Message d'origine-----
> De : gdb-patches-owner@sourceware.org [mailto:gdb-patches-
> owner@sourceware.org] De la part de Tom Tromey
> Envoyé : jeudi 21 avril 2011 22:25
> À : Pierre Muller
> Cc : gdb-patches@sourceware.org; 'Pedro Alves'
> Objet : Re: [RFA-v6] Handle cygwin wchar_t specifics
> 
> >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr>
writes:
> 
> Pierre>  The only thing that worries me,
> Pierre> is that it finally changes the default when
> Pierre> gdb_wchar_t is of size 4 from UCS-4XE to UTF-32XE,
> Pierre> is this OK for everyone?
> 
> I don't think it will be a problem.
> 
> Tom

 With Tom's approval,
I committed the patch to support gdb_wchar_t of size 2
to 7.3 branch.

  Thanks Tom,

Pierre


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFA] Handle cygwin wchar_t specifics
@ 2011-04-15 15:55 Pierre Muller
  0 siblings, 0 replies; 31+ messages in thread
From: Pierre Muller @ 2011-04-15 15:55 UTC (permalink / raw)
  To: gdb-patches

See my email about a problem with GDB on Cygwin.

http://sourceware.org/ml/gdb/2011-04/msg00058.html

  I found out that the problem is related to the
fact that __STDC_ISO_10646__ is defined in:

$ grep -n ISO_10646  /usr/include/*/*
/usr/include/sys/features.h:185:#define __STDC_ISO_10646__ 200305L

because of this, GDB uses "UCS-4LE" 
for the macro INTERMEDIATE_ENCODING on Cygwin 
(while "wchar_t" it uses for mingw32, which works well).

  This assumes that wchar_t is 4 byte long, which is not true for
Cygwin at least...

The patch below fixes this by
explicitly setting the UCS size to two for Windows targets.

  It would probably be good to have this in branch too...

  

Pierre Muller
GDB pascal language maintainer


2011-04-15  Pierre Muller  <muller@ics.u-strasbg.fr>

	Correct INTERMEDIATE_ENCODING macro setup for Cygwin port.
	* gdb_wchar.h (UCS_SIZE): New macro.
	
Index: gdb_wchar.h
===================================================================
RCS file: /cvs/src/src/gdb/gdb_wchar.h,v
retrieving revision 1.6
diff -u -p -r1.6 gdb_wchar.h
--- gdb_wchar.h	1 Jan 2011 15:33:05 -0000	1.6
+++ gdb_wchar.h	15 Apr 2011 15:51:52 -0000
@@ -61,6 +61,7 @@
 #include <wchar.h>
 #include <wctype.h>
 
+#define wchar_size  (&(((wchar_t) (0)) + 1) - &((char *) 0))
 typedef wchar_t gdb_wchar_t;
 typedef wint_t gdb_wint_t;
 
@@ -73,18 +74,25 @@ typedef wint_t gdb_wint_t;
 #define LCST(X) L ## X
 
 /* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4.
+   This is not true for Cygwin at least. Windows OS seems to require
+   16-bit wchar_t type, so we handle those especially.
    We exploit this fact in the hope that there are hosts that define
    this but which do not support "wchar_t" as an encoding argument to
    iconv_open.  We put the endianness into the encoding name to avoid
    hosts that emit a BOM when the unadorned name is used.  */
-#if defined (__STDC_ISO_10646__)
-#if WORDS_BIGENDIAN
-#define INTERMEDIATE_ENCODING "UCS-4BE"
+#if defined (__CYGWIN__) || defined (__MINGW32__)
+#  define UCS_SIZE "2"
 #else
-#define INTERMEDIATE_ENCODING "UCS-4LE"
+#  define UCS_SIZE "4"
 #endif
+#if defined (__STDC_ISO_10646__)
+#  if WORDS_BIGENDIAN
+#    define INTERMEDIATE_ENCODING "UCS-" UCS_SIZE "BE"
+#  else
+#    define INTERMEDIATE_ENCODING "UCS-" UCS_SIZE "LE"
+#  endif
 #elif defined (_LIBICONV_VERSION) && _LIBICONV_VERSION >= 0x108
-#define INTERMEDIATE_ENCODING "wchar_t"
+#  define INTERMEDIATE_ENCODING "wchar_t"
 #else
 /* This shouldn't happen, because the earlier #if should have filtered
    out this case.  */


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2011-04-21 21:18 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <5928.31498147479$1302882967@news.gmane.org>
2011-04-15 18:16 ` [RFA] Handle cygwin wchar_t specifics Tom Tromey
2011-04-16 16:05   ` Pierre Muller
2011-04-16 16:25     ` Jan Kratochvil
2011-04-16 21:29       ` [RFA-v2] " Pierre Muller
2011-04-16 22:35         ` Jan Kratochvil
     [not found]       ` <000001cbfc7d$3f67f440$be37dcc0$%muller@ics-cnrs.unistra.fr>
2011-04-17  2:55         ` Eli Zaretskii
2011-04-18 10:36           ` Pierre Muller
     [not found]           ` <00a801cbfdb4$551214a0$ff363de0$%muller@ics-cnrs.unistra.fr>
2011-04-18 10:57             ` Eli Zaretskii
2011-04-18 15:14           ` [RFA-v3] " Pierre Muller
     [not found]           ` <21014.6501930014$1303139687@news.gmane.org>
2011-04-18 17:18             ` Tom Tromey
2011-04-19  9:18               ` [RFC-v4] " Pierre Muller
     [not found]               ` <004f01cbfe72$adddeb40$0999c1c0$%muller@ics-cnrs.unistra.fr>
2011-04-19  9:34                 ` Eli Zaretskii
     [not found]               ` <34716.7311156683$1303204711@news.gmane.org>
2011-04-19 13:19                 ` Tom Tromey
2011-04-19 13:56                   ` [RFC-v5] " Pierre Muller
     [not found]                   ` <16656.7281041809$1303221408@news.gmane.org>
2011-04-19 17:50                     ` Tom Tromey
2011-04-20  7:59                       ` Pierre Muller
2011-04-20 21:08                         ` Pedro Alves
2011-04-21  6:57                           ` Pierre Muller
2011-04-21  7:17                             ` [RFA-v6] " Pierre Muller
2011-04-21  9:02                               ` Pierre Muller
     [not found]                               ` <24274.3825926029$1303376558@news.gmane.org>
2011-04-21 14:14                                 ` Tom Tromey
2011-04-21 14:27                                   ` Pierre Muller
     [not found]                                   ` <4691.37052209607$1303396084@news.gmane.org>
2011-04-21 15:06                                     ` Tom Tromey
2011-04-21 16:39                                       ` Pierre Muller
     [not found]                                       ` <25400.1310132027$1303403986@news.gmane.org>
2011-04-21 20:25                                         ` Tom Tromey
2011-04-21 21:18                                           ` 7.3 commit " Pierre Muller
     [not found]                           ` <15550.7422438406$1303369059@news.gmane.org>
2011-04-21 14:10                             ` [RFC-v5] " Tom Tromey
     [not found]                       ` <420.768399681215$1303286406@news.gmane.org>
2011-04-20 20:21                         ` Tom Tromey
2011-04-16 21:24     ` [RFA] " Tom Tromey
2011-04-18 20:07   ` Corinna Vinschen
2011-04-15 15:55 Pierre Muller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox