* [RFA] Handle cygwin wchar_t specifics
@ 2011-04-15 15:55 Pierre Muller
0 siblings, 0 replies; 6+ messages in thread
From: Pierre Muller @ 2011-04-15 15:55 UTC (permalink / raw)
To: gdb-patches
See my email about a problem with GDB on Cygwin.
http://sourceware.org/ml/gdb/2011-04/msg00058.html
I found out that the problem is related to the
fact that __STDC_ISO_10646__ is defined in:
$ grep -n ISO_10646 /usr/include/*/*
/usr/include/sys/features.h:185:#define __STDC_ISO_10646__ 200305L
because of this, GDB uses "UCS-4LE"
for the macro INTERMEDIATE_ENCODING on Cygwin
(while "wchar_t" it uses for mingw32, which works well).
This assumes that wchar_t is 4 byte long, which is not true for
Cygwin at least...
The patch below fixes this by
explicitly setting the UCS size to two for Windows targets.
It would probably be good to have this in branch too...
Pierre Muller
GDB pascal language maintainer
2011-04-15 Pierre Muller <muller@ics.u-strasbg.fr>
Correct INTERMEDIATE_ENCODING macro setup for Cygwin port.
* gdb_wchar.h (UCS_SIZE): New macro.
Index: gdb_wchar.h
===================================================================
RCS file: /cvs/src/src/gdb/gdb_wchar.h,v
retrieving revision 1.6
diff -u -p -r1.6 gdb_wchar.h
--- gdb_wchar.h 1 Jan 2011 15:33:05 -0000 1.6
+++ gdb_wchar.h 15 Apr 2011 15:51:52 -0000
@@ -61,6 +61,7 @@
#include <wchar.h>
#include <wctype.h>
+#define wchar_size (&(((wchar_t) (0)) + 1) - &((char *) 0))
typedef wchar_t gdb_wchar_t;
typedef wint_t gdb_wint_t;
@@ -73,18 +74,25 @@ typedef wint_t gdb_wint_t;
#define LCST(X) L ## X
/* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4.
+ This is not true for Cygwin at least. Windows OS seems to require
+ 16-bit wchar_t type, so we handle those especially.
We exploit this fact in the hope that there are hosts that define
this but which do not support "wchar_t" as an encoding argument to
iconv_open. We put the endianness into the encoding name to avoid
hosts that emit a BOM when the unadorned name is used. */
-#if defined (__STDC_ISO_10646__)
-#if WORDS_BIGENDIAN
-#define INTERMEDIATE_ENCODING "UCS-4BE"
+#if defined (__CYGWIN__) || defined (__MINGW32__)
+# define UCS_SIZE "2"
#else
-#define INTERMEDIATE_ENCODING "UCS-4LE"
+# define UCS_SIZE "4"
#endif
+#if defined (__STDC_ISO_10646__)
+# if WORDS_BIGENDIAN
+# define INTERMEDIATE_ENCODING "UCS-" UCS_SIZE "BE"
+# else
+# define INTERMEDIATE_ENCODING "UCS-" UCS_SIZE "LE"
+# endif
#elif defined (_LIBICONV_VERSION) && _LIBICONV_VERSION >= 0x108
-#define INTERMEDIATE_ENCODING "wchar_t"
+# define INTERMEDIATE_ENCODING "wchar_t"
#else
/* This shouldn't happen, because the earlier #if should have filtered
out this case. */
^ permalink raw reply [flat|nested] 6+ messages in thread[parent not found: <5928.31498147479$1302882967@news.gmane.org>]
* Re: [RFA] Handle cygwin wchar_t specifics
[not found] <5928.31498147479$1302882967@news.gmane.org>
@ 2011-04-15 18:16 ` Tom Tromey
2011-04-16 16:05 ` Pierre Muller
2011-04-18 20:07 ` Corinna Vinschen
0 siblings, 2 replies; 6+ messages in thread
From: Tom Tromey @ 2011-04-15 18:16 UTC (permalink / raw)
To: Pierre Muller; +Cc: gdb-patches
>>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes:
Pierre> because of this, GDB uses "UCS-4LE"
Pierre> for the macro INTERMEDIATE_ENCODING on Cygwin
Pierre> (while "wchar_t" it uses for mingw32, which works well).
Ok, I see the problem. I thought this:
/* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4.
But this is not true! For some values of __STDC_ISO_10646__, a 2 byte
wide character type suffices. In particular, Cygwin's value of 200305
means that it corresponds to Unicode 4.0.0:
http://www.unicode.org/versions/components-4.0.0.html
I think this might be a Cygwin bug, but it is pretty hard to wade
through the ISO / Unicode differences and other assorted standardese to
see. (The reason I think it might be a bug is that Unicode 4.0.0
defines some characters > 0xFFFF.)
Anyway, it doesn't matter if this is a Cygwin bug, since GDB's
assumption here is wrong anyway.
Pierre> The patch below fixes this by
Pierre> explicitly setting the UCS size to two for Windows targets.
I think in the __STDC_ISO_10646__ case, we should just explicitly use
sizeof (wchar_t) somewhere to choose the intermediate encoding. I think
this will be more robust than testing some host define.
Pierre> +#define wchar_size (&(((wchar_t) (0)) + 1) - &((char *) 0))
This doesn't seem to be used.
Tom
^ permalink raw reply [flat|nested] 6+ messages in thread* RE: [RFA] Handle cygwin wchar_t specifics
2011-04-15 18:16 ` Tom Tromey
@ 2011-04-16 16:05 ` Pierre Muller
2011-04-16 16:25 ` Jan Kratochvil
2011-04-16 21:24 ` Tom Tromey
2011-04-18 20:07 ` Corinna Vinschen
1 sibling, 2 replies; 6+ messages in thread
From: Pierre Muller @ 2011-04-16 16:05 UTC (permalink / raw)
To: 'Tom Tromey'; +Cc: gdb-patches
> -----Message d'origine-----
> De : gdb-patches-owner@sourceware.org [mailto:gdb-patches-
> owner@sourceware.org] De la part de Tom Tromey
> Envoyé : vendredi 15 avril 2011 20:15
> À : Pierre Muller
> Cc : gdb-patches@sourceware.org
> Objet : Re: [RFA] Handle cygwin wchar_t specifics
>
> >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr>
writes:
>
> Pierre> because of this, GDB uses "UCS-4LE"
> Pierre> for the macro INTERMEDIATE_ENCODING on Cygwin
> Pierre> (while "wchar_t" it uses for mingw32, which works well).
>
> Ok, I see the problem. I thought this:
>
> /* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4.
>
> But this is not true! For some values of __STDC_ISO_10646__, a 2 byte
> wide character type suffices. In particular, Cygwin's value of 200305
> means that it corresponds to Unicode 4.0.0:
>
> http://www.unicode.org/versions/components-4.0.0.html
>
> I think this might be a Cygwin bug, but it is pretty hard to wade
> through the ISO / Unicode differences and other assorted standardese to
> see. (The reason I think it might be a bug is that Unicode 4.0.0
> defines some characters > 0xFFFF.)
>
> Anyway, it doesn't matter if this is a Cygwin bug, since GDB's
> assumption here is wrong anyway.
OK.
> Pierre> The patch below fixes this by
> Pierre> explicitly setting the UCS size to two for Windows targets.
>
> I think in the __STDC_ISO_10646__ case, we should just explicitly use
> sizeof (wchar_t) somewhere to choose the intermediate encoding. I think
> this will be more robust than testing some host define.
Yes, but the problem is that it is not possible to use sizeof
inside a #if conditions :(
> Pierre> +#define wchar_size (&(((wchar_t) (0)) + 1) - &((char *) 0))
>
> This doesn't seem to be used.
I googled around to see if there is a workaround to this
limitation of not being able to use sizeof inside conditionals
and then forgot to remove it...
Do you know of any way to get the size of wchar_t?
I suspect we will need to add this to the configure scripts...
But I am still very bad on that part.
Help most welcome,
Pierre
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFA] Handle cygwin wchar_t specifics
2011-04-16 16:05 ` Pierre Muller
@ 2011-04-16 16:25 ` Jan Kratochvil
2011-04-16 21:24 ` Tom Tromey
1 sibling, 0 replies; 6+ messages in thread
From: Jan Kratochvil @ 2011-04-16 16:25 UTC (permalink / raw)
To: Pierre Muller; +Cc: 'Tom Tromey', gdb-patches
On Sat, 16 Apr 2011 18:05:19 +0200, Pierre Muller wrote:
> Do you know of any way to get the size of wchar_t?
> I suspect we will need to add this to the configure scripts...
> But I am still very bad on that part.
I do not follow the platform specifics of the problem but this specific
technical task is attached. On GNU/Linux I get in config.h:
/* The size of `wchar_t', as computed by sizeof. */
#define SIZEOF_WCHAR_T 4
info '(autoconf)AC_CHECK_SIZEOF'
Fro cross-compilation the default is 4, for some unknown error it is 0.
HTH,
Jan
--- a/gdb/config.in
+++ b/gdb/config.in
@@ -804,6 +804,9 @@
/* The size of `long', as computed by sizeof. */
#undef SIZEOF_LONG
+/* The size of `wchar_t', as computed by sizeof. */
+#undef SIZEOF_WCHAR_T
+
/* Define to l, ll, u, ul, ull, etc., as suitable for constants of type
'size_t'. */
#undef SIZE_T_SUFFIX
--- a/gdb/configure
+++ b/gdb/configure
@@ -11637,6 +11637,44 @@ _ACEOF
fi
+# The cast to long int works around a bug in the HP C Compiler
+# version HP92453-01 B.11.11.23709.GP, which incorrectly rejects
+# declarations like `int a3[[(sizeof (unsigned char)) >= 0]];'.
+# This bug is HP SR number 8606223364.
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking size of wchar_t" >&5
+$as_echo_n "checking size of wchar_t... " >&6; }
+if test "${ac_cv_sizeof_wchar_t+set}" = set; then :
+ $as_echo_n "(cached) " >&6
+else
+ if ac_fn_c_compute_int "$LINENO" "(long int) (sizeof (wchar_t))" "ac_cv_sizeof_wchar_t" "
+#include <wchar.h>
+#include <wctype.h>
+
+"; then :
+
+else
+ if test "$ac_cv_type_wchar_t" = yes; then
+ { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
+$as_echo "$as_me: error: in \`$ac_pwd':" >&2;}
+{ as_fn_set_status 77
+as_fn_error "cannot compute sizeof (wchar_t)
+See \`config.log' for more details." "$LINENO" 5; }; }
+ else
+ ac_cv_sizeof_wchar_t=0
+ fi
+fi
+
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_sizeof_wchar_t" >&5
+$as_echo "$ac_cv_sizeof_wchar_t" >&6; }
+
+
+
+cat >>confdefs.h <<_ACEOF
+#define SIZEOF_WCHAR_T $ac_cv_sizeof_wchar_t
+_ACEOF
+
+
# ------------------------------------- #
# Checks for compiler characteristics. #
--- a/gdb/configure.ac
+++ b/gdb/configure.ac
@@ -976,6 +976,10 @@ AC_CHECK_TYPES(socklen_t, [], [],
[#include <sys/types.h>
#include <sys/socket.h>
])
+AC_CHECK_SIZEOF([wchar_t], 4, [
+#include <wchar.h>
+#include <wctype.h>
+])
# ------------------------------------- #
# Checks for compiler characteristics. #
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [RFA] Handle cygwin wchar_t specifics
2011-04-16 16:05 ` Pierre Muller
2011-04-16 16:25 ` Jan Kratochvil
@ 2011-04-16 21:24 ` Tom Tromey
1 sibling, 0 replies; 6+ messages in thread
From: Tom Tromey @ 2011-04-16 21:24 UTC (permalink / raw)
To: Pierre Muller; +Cc: gdb-patches
>>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes:
Pierre> Yes, but the problem is that it is not possible to use sizeof
Pierre> inside a #if conditions :(
Pierre> Do you know of any way to get the size of wchar_t?
Pierre> I suspect we will need to add this to the configure scripts...
Pierre> But I am still very bad on that part.
In this case you don't need to know the size during preprocessing.
You can do something like:
extern const char *intermediate_encoding;
#define INTERMEDIATE_ENCODING intermediate_encoding
... and then initialize the string in _initialize_charset, under the
appropriate conditions. The only caveat is to check the case where the
size is neither 2 nor 4.
Tom
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFA] Handle cygwin wchar_t specifics
2011-04-15 18:16 ` Tom Tromey
2011-04-16 16:05 ` Pierre Muller
@ 2011-04-18 20:07 ` Corinna Vinschen
1 sibling, 0 replies; 6+ messages in thread
From: Corinna Vinschen @ 2011-04-18 20:07 UTC (permalink / raw)
To: gdb-patches
Hi Tom,
On Apr 15 12:15, Tom Tromey wrote:
> >>>>> "Pierre" == Pierre Muller <pierre.muller@ics-cnrs.unistra.fr> writes:
>
> Pierre> because of this, GDB uses "UCS-4LE"
> Pierre> for the macro INTERMEDIATE_ENCODING on Cygwin
> Pierre> (while "wchar_t" it uses for mingw32, which works well).
>
> Ok, I see the problem. I thought this:
>
> /* If __STDC_ISO_10646__ is defined, then the host wchar_t is UCS-4.
>
> But this is not true! For some values of __STDC_ISO_10646__, a 2 byte
> wide character type suffices. In particular, Cygwin's value of 200305
> means that it corresponds to Unicode 4.0.0:
>
> http://www.unicode.org/versions/components-4.0.0.html
>
> I think this might be a Cygwin bug, but it is pretty hard to wade
> through the ISO / Unicode differences and other assorted standardese to
> see. (The reason I think it might be a bug is that Unicode 4.0.0
> defines some characters > 0xFFFF.)
I see there's another solution in the works, but just to let you know
that Bruno Haible and I discussed the definition of __STDC_ISO_10646__
on the Cygwin list back in January. We didn't come to a conclusion
since we both interpret the standards differently, but this gives
you some insight why __STDC_ISO_10646__ is defined on Cygwin, see
http://cygwin.com/ml/cygwin/2011-01/msg00410.html, line 70ff.
Corinna
--
Corinna Vinschen
Cygwin Project Co-Leader
Red Hat
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-04-18 20:07 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-15 15:55 [RFA] Handle cygwin wchar_t specifics Pierre Muller
[not found] <5928.31498147479$1302882967@news.gmane.org>
2011-04-15 18:16 ` Tom Tromey
2011-04-16 16:05 ` Pierre Muller
2011-04-16 16:25 ` Jan Kratochvil
2011-04-16 21:24 ` Tom Tromey
2011-04-18 20:07 ` Corinna Vinschen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox