* Iconv / Solaris
@ 2009-08-27 2:22 Daniel Jacobowitz
2009-08-27 17:09 ` Tom Tromey
0 siblings, 1 reply; 10+ messages in thread
From: Daniel Jacobowitz @ 2009-08-27 2:22 UTC (permalink / raw)
To: Tom Tromey, gdb-patches
Hi Tom,
Just confirming what I said on IRC: __STDC_ISO_10646__ is not defined
anywhere, but with __sun__ your patch from here:
http://sourceware.org/ml/gdb-patches/2009-07/msg00434.html
fixes an otherwise broken GDB on Solaris. I think UCS-4 is right, but
the Solaris documentation isn't clear; maybe it would be clear if I
understood wide characters better.
Actually, poking around the internet, it looks like wchar_t's contents
may be locale-dependent...
--
Daniel Jacobowitz
CodeSourcery
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Iconv / Solaris
2009-08-27 2:22 Iconv / Solaris Daniel Jacobowitz
@ 2009-08-27 17:09 ` Tom Tromey
2009-08-27 17:45 ` Daniel Jacobowitz
0 siblings, 1 reply; 10+ messages in thread
From: Tom Tromey @ 2009-08-27 17:09 UTC (permalink / raw)
To: gdb-patches
>>>>> "Daniel" == Daniel Jacobowitz <drow@false.org> writes:
Daniel> Just confirming what I said on IRC: __STDC_ISO_10646__ is not defined
Daniel> anywhere, but with __sun__ your patch from here:
Daniel> http://sourceware.org/ml/gdb-patches/2009-07/msg00434.html
Daniel> fixes an otherwise broken GDB on Solaris. I think UCS-4 is right, but
Daniel> the Solaris documentation isn't clear; maybe it would be clear if I
Daniel> understood wide characters better.
Daniel> Actually, poking around the internet, it looks like wchar_t's contents
Daniel> may be locale-dependent...
Yeah :-(
I did a little digging on various sun.com sites, and all I could find is
that wchar_t will be UCS-4 if the current locale is using UTF-8. That
is not a very strong guarantee -- it means that GDB might work for some
users but not others, if there are locales where wchar_t is not UCS-4.
The very safest thing to do is disable use of wchar_t on a platform like
this. E.g., the appended hack can be used to try it. This will provide
a user experience similar to GDB 6.8.
Alternatively, you could try using the __sun__ variant and running gdb
in a non-UTF-8 locale. If it works we could go with (a variant of) this
approach.
Finally, I looked at libiconv-1.13. It also supports converting to the
"wchar_t" encoding using mbrtowc, if that is available. I guess it must
perform an intermediate conversion. Maybe using libiconv on Solaris
would provide a better experience for your users.
Tom
*** gdb_wchar.h.~1.2.~ 2009-04-15 15:59:05.000000000 -0600
--- gdb_wchar.h 2009-08-27 10:31:01.000000000 -0600
***************
*** 47,53 ****
/* We use "btowc" as a sentinel to detect functioning wchar_t
support. */
! #if defined (HAVE_ICONV) && defined (HAVE_WCHAR_H) && defined (HAVE_BTOWC)
#include <wchar.h>
#include <wctype.h>
--- 47,53 ----
/* We use "btowc" as a sentinel to detect functioning wchar_t
support. */
! #if defined (HAVE_ICONV) && defined (HAVE_WCHAR_H) && defined (HAVE_BTOWC) && !defined (__sun__)
#include <wchar.h>
#include <wctype.h>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Iconv / Solaris
2009-08-27 17:09 ` Tom Tromey
@ 2009-08-27 17:45 ` Daniel Jacobowitz
2009-08-27 20:36 ` Tom Tromey
0 siblings, 1 reply; 10+ messages in thread
From: Daniel Jacobowitz @ 2009-08-27 17:45 UTC (permalink / raw)
To: gdb-patches
On Thu, Aug 27, 2009 at 11:00:41AM -0600, Tom Tromey wrote:
> Alternatively, you could try using the __sun__ variant and running gdb
> in a non-UTF-8 locale. If it works we could go with (a variant of) this
> approach.
What do we look for? That is, how would I know if it was working or
not? I can easily try an ISO-8859-1 locale, but otherwise I'm a bit
out of my depth.
--
Daniel Jacobowitz
CodeSourcery
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Iconv / Solaris
2009-08-27 17:45 ` Daniel Jacobowitz
@ 2009-08-27 20:36 ` Tom Tromey
2009-08-27 20:38 ` Daniel Jacobowitz
0 siblings, 1 reply; 10+ messages in thread
From: Tom Tromey @ 2009-08-27 20:36 UTC (permalink / raw)
To: gdb-patches
>>>>> "Daniel" == Daniel Jacobowitz <drow@false.org> writes:
Daniel> On Thu, Aug 27, 2009 at 11:00:41AM -0600, Tom Tromey wrote:
>> Alternatively, you could try using the __sun__ variant and running gdb
>> in a non-UTF-8 locale. If it works we could go with (a variant of) this
>> approach.
Daniel> What do we look for? That is, how would I know if it was working or
Daniel> not? I can easily try an ISO-8859-1 locale, but otherwise I'm a bit
Daniel> out of my depth.
Hmm, good question.
For ISO-8859-1, it is tricky, because that is a subset of UCS-4.
I think you could do a test in other ISO-8859 locales: take a narrow
character not appearing in ISO-8859-1, convert it to a wchar_t using
btowc, and then print the value. If the value is the same as the UCS-4
value, you probably have UCS-4 wchar_t.
E.g., in ISO-8859-15, 0xA4 is the euro currency sign. In UCS-4 this is
0x20A0.
The cases I was more concerned about were locales using encodings like
SJIS or EUC. I'm not sure what wchar_t encoding these might use.
So, I dug through the OpenSolaris source a little and I think UCS-4 is
not always used. In particular it looks like mbtowc can call:
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libbc/libc/gen/common/euc.multibyte.c#_mbtowc_euc
... which looks like it uses an ad hoc flattened EUC encoding.
The initial problem here is that iconv will not accept "wchar_t" as an
encoding on this platform. I see we only have one AC_TRY_RUN in gdb
... am I right in assuming that these are not ok?
If they are ok, we can test this at configure time.
If they are not ok, I think we can just add a new setting to
configure.host. This is simpler to implement.
Tom
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Iconv / Solaris
2009-08-27 20:36 ` Tom Tromey
@ 2009-08-27 20:38 ` Daniel Jacobowitz
2009-08-27 20:50 ` Daniel Jacobowitz
2009-08-28 1:24 ` Tom Tromey
0 siblings, 2 replies; 10+ messages in thread
From: Daniel Jacobowitz @ 2009-08-27 20:38 UTC (permalink / raw)
To: gdb-patches
On Thu, Aug 27, 2009 at 02:30:30PM -0600, Tom Tromey wrote:
> The initial problem here is that iconv will not accept "wchar_t" as an
> encoding on this platform. I see we only have one AC_TRY_RUN in gdb
> ... am I right in assuming that these are not ok?
They are not OK. Please don't add another if you can avoid it. I
think the one that's there is for long long printf? I used to have
to override the cache variable... haven't checked lately.
> If they are not ok, I think we can just add a new setting to
> configure.host. This is simpler to implement.
I'm not sure how to do this without hardcoding it by platform. If the
user has external libiconv, do we still want a change?
--
Daniel Jacobowitz
CodeSourcery
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Iconv / Solaris
2009-08-27 20:38 ` Daniel Jacobowitz
@ 2009-08-27 20:50 ` Daniel Jacobowitz
2009-08-27 22:03 ` Daniel Jacobowitz
2009-08-28 1:01 ` Tom Tromey
2009-08-28 1:24 ` Tom Tromey
1 sibling, 2 replies; 10+ messages in thread
From: Daniel Jacobowitz @ 2009-08-27 20:50 UTC (permalink / raw)
To: gdb-patches
On Thu, Aug 27, 2009 at 04:36:09PM -0400, Daniel Jacobowitz wrote:
> > If they are not ok, I think we can just add a new setting to
> > configure.host. This is simpler to implement.
>
> I'm not sure how to do this without hardcoding it by platform. If the
> user has external libiconv, do we still want a change?
Maybe I'm thinking about this wrong... can we determine the encoding
of wchar_t somehow that works on Solaris? Something like what we do
now with nl_langinfo? Or is it not guaranteed to have any known
encoding?
I'm lost in the configure maze, but if we don't define PHONY_ICONV,
then INTERMEDIATE_CHARSET ought to be host_charset anyway. So the
fact that your patch made a difference implies that PHONY_ICONV is
defined. So what's failing? Isn't it our *dummy* iconv_open?
--
Daniel Jacobowitz
CodeSourcery
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Iconv / Solaris
2009-08-27 20:50 ` Daniel Jacobowitz
@ 2009-08-27 22:03 ` Daniel Jacobowitz
2009-08-28 1:01 ` Tom Tromey
1 sibling, 0 replies; 10+ messages in thread
From: Daniel Jacobowitz @ 2009-08-27 22:03 UTC (permalink / raw)
To: gdb-patches
On Thu, Aug 27, 2009 at 04:42:58PM -0400, Daniel Jacobowitz wrote:
> Maybe I'm thinking about this wrong... can we determine the encoding
> of wchar_t somehow that works on Solaris? Something like what we do
> now with nl_langinfo? Or is it not guaranteed to have any known
> encoding?
>
> I'm lost in the configure maze, but if we don't define PHONY_ICONV,
> then INTERMEDIATE_CHARSET ought to be host_charset anyway. So the
> fact that your patch made a difference implies that PHONY_ICONV is
> defined. So what's failing? Isn't it our *dummy* iconv_open?
No, HAVE_ICONV is defined. So how does changing the default
definition of INTERMEDIATE_ENCODING make a difference?
All of HAVE_ICONV, HAVE_WCHAR_H, HAVE_BTOWC are defined.
Oh. We use host_charset if gdb_wchar_t is char. I misread the
#if's... I don't see how you use iconv and wchar_t together,
otherwise. Are you just not supposed to?
--
Daniel Jacobowitz
CodeSourcery
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Iconv / Solaris
2009-08-27 20:50 ` Daniel Jacobowitz
2009-08-27 22:03 ` Daniel Jacobowitz
@ 2009-08-28 1:01 ` Tom Tromey
1 sibling, 0 replies; 10+ messages in thread
From: Tom Tromey @ 2009-08-28 1:01 UTC (permalink / raw)
To: gdb-patches
>>>>> "Daniel" == Daniel Jacobowitz <drow@false.org> writes:
Daniel> Maybe I'm thinking about this wrong... can we determine the
Daniel> encoding of wchar_t somehow that works on Solaris? Something
Daniel> like what we do now with nl_langinfo?
Nope.
For the "narrow" charset you can use nl_langinfo(CODESET).
There is no equivalent for the wide charset :-(
Many systems let you pass "wchar_t" to iconv_open, instead.
However, Solaris doesn't.
Daniel> Or is it not guaranteed to have any known encoding?
If the system defines __STDC_ISO_10646__, then you know it uses Unicode.
Otherwise, all bets are off.
Daniel> I'm lost in the configure maze
Yeah. The way it works:
* If you have a fully working iconv + wchar_t suite, then:
- All the gdb_* macros are defined as their wide counterparts,
e.g. gdb_iswprint == iswprint
The intermediate charset is wchar_t.
* You might have iconv but no working wchar_t support.
In this case we use the narrow forms for everything,
e.g., gdb_iswprint == isprint (and gdb_wchar_t == char).
However we still use iconv for recoding.
The intermediate charset is host_charset.
This is the scenario I propose we make Solaris use, preferably by
using AC_TRY_RUN to test iconv_open.
The impact on the user is that if he tries to print a string with
non-host-charset characters, he will get escapes -- basically what GDB
6.8 does.
* You might have nothing at all. This is the PHONY_ICONV case.
In this scenario we use the narrow forms for everything and basically
just fail unless host_charset == target_charset.
Tom
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Iconv / Solaris
2009-08-27 20:38 ` Daniel Jacobowitz
2009-08-27 20:50 ` Daniel Jacobowitz
@ 2009-08-28 1:24 ` Tom Tromey
2009-08-28 17:00 ` Tom Tromey
1 sibling, 1 reply; 10+ messages in thread
From: Tom Tromey @ 2009-08-28 1:24 UTC (permalink / raw)
To: gdb-patches
>>>>> "Daniel" == Daniel Jacobowitz <drow@false.org> writes:
Daniel> On Thu, Aug 27, 2009 at 02:30:30PM -0600, Tom Tromey wrote:
>> The initial problem here is that iconv will not accept "wchar_t" as an
>> encoding on this platform. I see we only have one AC_TRY_RUN in gdb
>> ... am I right in assuming that these are not ok?
Daniel> They are not OK. Please don't add another if you can avoid it. I
Daniel> think the one that's there is for long long printf? I used to have
Daniel> to override the cache variable... haven't checked lately.
Oops, I read your notes out-of-order.
Ok, no new AC_TRY_RUN.
The existing one is some obscure old Linux thing:
dnl For Linux/i386, glibc 2.1.3 was released with a bogus
dnl prfpregset_t type (it's a typedef for the pointer to a struct
dnl instead of the struct itself). We detect this here, and work
dnl around it in gdb_proc_service.h.
>> If they are not ok, I think we can just add a new setting to
>> configure.host. This is simpler to implement.
Daniel> I'm not sure how to do this without hardcoding it by platform. If the
Daniel> user has external libiconv, do we still want a change?
What we want is to skip case #1 in gdb_wchar.h (the full support case)
and let configure choose between case #2 (iconv only) and case #3
(nothing) depending on whether iconv was found.
I can write a patch tomorrow.
Tom
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Iconv / Solaris
2009-08-28 1:24 ` Tom Tromey
@ 2009-08-28 17:00 ` Tom Tromey
0 siblings, 0 replies; 10+ messages in thread
From: Tom Tromey @ 2009-08-28 17:00 UTC (permalink / raw)
To: gdb-patches
>>>>> "Tom" == Tom Tromey <tromey@redhat.com> writes:
Tom> I can write a patch tomorrow.
Please try the appended. You will have to re-run autoheader and
autoconf.
I took the opportunity to add more text to the comment in gdb_wchar.h,
in the hopes that this would make it more clear.
If this works for you, I will check it in.
Tom
2009-08-28 Tom Tromey <tromey@redhat.com>
* gdb_wchar.h: Update comments. Use DISABLE_ICONV.
* configure, config.in: Rebuild.
* configure.ac (DISABLE_ICONV): Define when needed.
* configure.host: Set gdb_host_wchar_iconv.
Index: configure.ac
===================================================================
RCS file: /cvs/src/src/gdb/configure.ac,v
retrieving revision 1.105
diff -u -r1.105 configure.ac
--- configure.ac 22 Aug 2009 17:08:09 -0000 1.105
+++ configure.ac 28 Aug 2009 16:15:22 -0000
@@ -788,6 +788,12 @@
ttrace wborder setlocale iconvlist libiconvlist btowc])
AM_LANGINFO_CODESET
+# An additional check for whether iconv is ok for us to use.
+if test "$gdb_host_wchar_iconv" = no; then
+ AC_DEFINE([DISABLE_ICONV], 1,
+ [Define if host iconv is unsuitable for use by gdb])
+fi
+
# Check the return and argument types of ptrace. No canned test for
# this, so roll our own.
gdb_ptrace_headers='
Index: configure.host
===================================================================
RCS file: /cvs/src/src/gdb/configure.host,v
retrieving revision 1.103
diff -u -r1.103 configure.host
--- configure.host 11 Jan 2009 13:15:56 -0000 1.103
+++ configure.host 28 Aug 2009 16:15:22 -0000
@@ -8,6 +8,8 @@
# gdb_host_double_format host's double floatformat, or 0
# gdb_host_long_double_format host's long double floatformat, or 0
# gdb_host_obs host-specific .o files to include
+# gdb_host_wchar_iconv "no" if iconv_open will not accept
+# "wchar_t" as an argument.
# Map host cpu into the config cpu subdirectory name.
# The default is $host_cpu.
@@ -207,3 +209,15 @@
gdb_host_long_double_format=0
;;
esac
+
+
+# Per-host iconv setting.
+
+# Default to "yes".
+gdb_host_wchar_iconv=yes
+
+case "${host}" in
+*-*-solaris*)
+ gdb_host_wchar_iconv=no
+ ;;
+esac
Index: gdb_wchar.h
===================================================================
RCS file: /cvs/src/src/gdb/gdb_wchar.h,v
retrieving revision 1.2
diff -u -r1.2 gdb_wchar.h
--- gdb_wchar.h 15 Apr 2009 22:20:31 -0000 1.2
+++ gdb_wchar.h 28 Aug 2009 16:15:22 -0000
@@ -21,18 +21,33 @@
/* We handle three different modes here.
- Capable systems have the full suite: wchar_t support and iconv
+ 1. Capable systems have the full suite: wchar_t support and iconv
(perhaps via GNU libiconv). On these machines, full functionality
- is available.
+ is available. In this case, the intermediate character set is
+ "wchar_t".
- DJGPP is known to have libiconv but not wchar_t support. On
+ 2. DJGPP is known to have libiconv but not wchar_t support. On
systems like this, we use the narrow character functions. The full
functionality is available to the user, but many characters (those
- outside the narrow range) will be displayed as escapes.
+ outside the narrow range) will be displayed as escapes. In this
+ case, the intermediate character set uses the host encoding.
- Finally, some systems do not have iconv. Here we provide a phony
- iconv which only handles a single character set, and we provide
- wrappers for the wchar_t functionality we use. */
+ We also end up in this scenario if the host iconv is not fully
+ suitable. Solaris falls into this category both because iconv_open
+ does not accept "wchar_t" as an argument, but also because wchar_t
+ does not have a fixed encoding.
+
+ 3. Finally, some systems do not have iconv. Here we provide a
+ phony iconv which only handles a single character set, and we
+ provide wrappers for the wchar_t functionality we use. This is
+ what the "PHONY_ICONV" define means, below. In this case, the
+ intermediate character set uses the host encoding, and limited
+ functionality is available to the user.
+
+ It is possible that you may run into a system that does not support
+ "wchar_t" as an argument to iconv_open, but where
+ __STDC_ISO_10646__ is defined. If you have such a system, we can
+ add a fourth case where we fix the intermediate encoding. */
#define INTERMEDIATE_ENCODING "wchar_t"
@@ -40,14 +55,17 @@
#if defined (HAVE_ICONV)
#include <iconv.h>
#else
-/* This define is used elsewhere so we don't need to duplicate the
- same checking logic in multiple places. */
+/* Case 3. This define is used elsewhere so we don't need to
+ duplicate the same checking logic in multiple places. */
#define PHONY_ICONV
#endif
/* We use "btowc" as a sentinel to detect functioning wchar_t
support. */
-#if defined (HAVE_ICONV) && defined (HAVE_WCHAR_H) && defined (HAVE_BTOWC)
+#if defined (HAVE_ICONV) && defined (HAVE_WCHAR_H) && defined (HAVE_BTOWC) \
+ && !defined (DISABLE_ICONV)
+
+/* Case 1. */
#include <wchar.h>
#include <wctype.h>
@@ -65,6 +83,8 @@
#else
+/* Case 2 and case 3, depending on whether PHONY_ICONV is defined. */
+
typedef char gdb_wchar_t;
typedef int gdb_wint_t;
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2009-08-28 16:17 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-27 2:22 Iconv / Solaris Daniel Jacobowitz
2009-08-27 17:09 ` Tom Tromey
2009-08-27 17:45 ` Daniel Jacobowitz
2009-08-27 20:36 ` Tom Tromey
2009-08-27 20:38 ` Daniel Jacobowitz
2009-08-27 20:50 ` Daniel Jacobowitz
2009-08-27 22:03 ` Daniel Jacobowitz
2009-08-28 1:01 ` Tom Tromey
2009-08-28 1:24 ` Tom Tromey
2009-08-28 17:00 ` Tom Tromey
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox