Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed
* Iconv / Solaris
@ 2009-08-27  2:22 Daniel Jacobowitz
  2009-08-27 17:09 ` Tom Tromey
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Jacobowitz @ 2009-08-27  2:22 UTC (permalink / raw)
  To: Tom Tromey, gdb-patches

Hi Tom,

Just confirming what I said on IRC: __STDC_ISO_10646__ is not defined
anywhere, but with __sun__ your patch from here:

http://sourceware.org/ml/gdb-patches/2009-07/msg00434.html

fixes an otherwise broken GDB on Solaris.  I think UCS-4 is right, but
the Solaris documentation isn't clear; maybe it would be clear if I
understood wide characters better.

Actually, poking around the internet, it looks like wchar_t's contents
may be locale-dependent...

-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Iconv / Solaris
  2009-08-27  2:22 Iconv / Solaris Daniel Jacobowitz
@ 2009-08-27 17:09 ` Tom Tromey
  2009-08-27 17:45   ` Daniel Jacobowitz
  0 siblings, 1 reply; 10+ messages in thread
From: Tom Tromey @ 2009-08-27 17:09 UTC (permalink / raw)
  To: gdb-patches

>>>>> "Daniel" == Daniel Jacobowitz <drow@false.org> writes:

Daniel> Just confirming what I said on IRC: __STDC_ISO_10646__ is not defined
Daniel> anywhere, but with __sun__ your patch from here:
Daniel> http://sourceware.org/ml/gdb-patches/2009-07/msg00434.html
Daniel> fixes an otherwise broken GDB on Solaris.  I think UCS-4 is right, but
Daniel> the Solaris documentation isn't clear; maybe it would be clear if I
Daniel> understood wide characters better.

Daniel> Actually, poking around the internet, it looks like wchar_t's contents
Daniel> may be locale-dependent...

Yeah :-(

I did a little digging on various sun.com sites, and all I could find is
that wchar_t will be UCS-4 if the current locale is using UTF-8.  That
is not a very strong guarantee -- it means that GDB might work for some
users but not others, if there are locales where wchar_t is not UCS-4.

The very safest thing to do is disable use of wchar_t on a platform like
this.  E.g., the appended hack can be used to try it.  This will provide
a user experience similar to GDB 6.8.

Alternatively, you could try using the __sun__ variant and running gdb
in a non-UTF-8 locale.  If it works we could go with (a variant of) this
approach.

Finally, I looked at libiconv-1.13.  It also supports converting to the
"wchar_t" encoding using mbrtowc, if that is available.  I guess it must
perform an intermediate conversion.  Maybe using libiconv on Solaris
would provide a better experience for your users.

Tom

*** gdb_wchar.h.~1.2.~	2009-04-15 15:59:05.000000000 -0600
--- gdb_wchar.h	2009-08-27 10:31:01.000000000 -0600
***************
*** 47,53 ****
  
  /* We use "btowc" as a sentinel to detect functioning wchar_t
     support.  */
! #if defined (HAVE_ICONV) && defined (HAVE_WCHAR_H) && defined (HAVE_BTOWC)
  
  #include <wchar.h>
  #include <wctype.h>
--- 47,53 ----
  
  /* We use "btowc" as a sentinel to detect functioning wchar_t
     support.  */
! #if defined (HAVE_ICONV) && defined (HAVE_WCHAR_H) && defined (HAVE_BTOWC) && !defined (__sun__)
  
  #include <wchar.h>
  #include <wctype.h>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Iconv / Solaris
  2009-08-27 17:09 ` Tom Tromey
@ 2009-08-27 17:45   ` Daniel Jacobowitz
  2009-08-27 20:36     ` Tom Tromey
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Jacobowitz @ 2009-08-27 17:45 UTC (permalink / raw)
  To: gdb-patches

On Thu, Aug 27, 2009 at 11:00:41AM -0600, Tom Tromey wrote:
> Alternatively, you could try using the __sun__ variant and running gdb
> in a non-UTF-8 locale.  If it works we could go with (a variant of) this
> approach.

What do we look for?  That is, how would I know if it was working or
not?  I can easily try an ISO-8859-1 locale, but otherwise I'm a bit
out of my depth.

-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Iconv / Solaris
  2009-08-27 17:45   ` Daniel Jacobowitz
@ 2009-08-27 20:36     ` Tom Tromey
  2009-08-27 20:38       ` Daniel Jacobowitz
  0 siblings, 1 reply; 10+ messages in thread
From: Tom Tromey @ 2009-08-27 20:36 UTC (permalink / raw)
  To: gdb-patches

>>>>> "Daniel" == Daniel Jacobowitz <drow@false.org> writes:

Daniel> On Thu, Aug 27, 2009 at 11:00:41AM -0600, Tom Tromey wrote:
>> Alternatively, you could try using the __sun__ variant and running gdb
>> in a non-UTF-8 locale.  If it works we could go with (a variant of) this
>> approach.

Daniel> What do we look for?  That is, how would I know if it was working or
Daniel> not?  I can easily try an ISO-8859-1 locale, but otherwise I'm a bit
Daniel> out of my depth.

Hmm, good question.

For ISO-8859-1, it is tricky, because that is a subset of UCS-4.

I think you could do a test in other ISO-8859 locales: take a narrow
character not appearing in ISO-8859-1, convert it to a wchar_t using
btowc, and then print the value.  If the value is the same as the UCS-4
value, you probably have UCS-4 wchar_t.

E.g., in ISO-8859-15, 0xA4 is the euro currency sign.  In UCS-4 this is
0x20A0.

The cases I was more concerned about were locales using encodings like
SJIS or EUC.  I'm not sure what wchar_t encoding these might use.

So, I dug through the OpenSolaris source a little and I think UCS-4 is
not always used.  In particular it looks like mbtowc can call:

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libbc/libc/gen/common/euc.multibyte.c#_mbtowc_euc

... which looks like it uses an ad hoc flattened EUC encoding.


The initial problem here is that iconv will not accept "wchar_t" as an
encoding on this platform.  I see we only have one AC_TRY_RUN in gdb
... am I right in assuming that these are not ok?

If they are ok, we can test this at configure time.

If they are not ok, I think we can just add a new setting to
configure.host.  This is simpler to implement.

Tom


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Iconv / Solaris
  2009-08-27 20:36     ` Tom Tromey
@ 2009-08-27 20:38       ` Daniel Jacobowitz
  2009-08-27 20:50         ` Daniel Jacobowitz
  2009-08-28  1:24         ` Tom Tromey
  0 siblings, 2 replies; 10+ messages in thread
From: Daniel Jacobowitz @ 2009-08-27 20:38 UTC (permalink / raw)
  To: gdb-patches

On Thu, Aug 27, 2009 at 02:30:30PM -0600, Tom Tromey wrote:
> The initial problem here is that iconv will not accept "wchar_t" as an
> encoding on this platform.  I see we only have one AC_TRY_RUN in gdb
> ... am I right in assuming that these are not ok?

They are not OK.  Please don't add another if you can avoid it.  I
think the one that's there is for long long printf?  I used to have
to override the cache variable... haven't checked lately.

> If they are not ok, I think we can just add a new setting to
> configure.host.  This is simpler to implement.

I'm not sure how to do this without hardcoding it by platform.  If the
user has external libiconv, do we still want a change?

-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Iconv / Solaris
  2009-08-27 20:38       ` Daniel Jacobowitz
@ 2009-08-27 20:50         ` Daniel Jacobowitz
  2009-08-27 22:03           ` Daniel Jacobowitz
  2009-08-28  1:01           ` Tom Tromey
  2009-08-28  1:24         ` Tom Tromey
  1 sibling, 2 replies; 10+ messages in thread
From: Daniel Jacobowitz @ 2009-08-27 20:50 UTC (permalink / raw)
  To: gdb-patches

On Thu, Aug 27, 2009 at 04:36:09PM -0400, Daniel Jacobowitz wrote:
> > If they are not ok, I think we can just add a new setting to
> > configure.host.  This is simpler to implement.
> 
> I'm not sure how to do this without hardcoding it by platform.  If the
> user has external libiconv, do we still want a change?

Maybe I'm thinking about this wrong... can we determine the encoding
of wchar_t somehow that works on Solaris?  Something like what we do
now with nl_langinfo?  Or is it not guaranteed to have any known
encoding?

I'm lost in the configure maze, but if we don't define PHONY_ICONV,
then INTERMEDIATE_CHARSET ought to be host_charset anyway.  So the
fact that your patch made a difference implies that PHONY_ICONV is
defined.  So what's failing?  Isn't it our *dummy* iconv_open?

-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Iconv / Solaris
  2009-08-27 20:50         ` Daniel Jacobowitz
@ 2009-08-27 22:03           ` Daniel Jacobowitz
  2009-08-28  1:01           ` Tom Tromey
  1 sibling, 0 replies; 10+ messages in thread
From: Daniel Jacobowitz @ 2009-08-27 22:03 UTC (permalink / raw)
  To: gdb-patches

On Thu, Aug 27, 2009 at 04:42:58PM -0400, Daniel Jacobowitz wrote:
> Maybe I'm thinking about this wrong... can we determine the encoding
> of wchar_t somehow that works on Solaris?  Something like what we do
> now with nl_langinfo?  Or is it not guaranteed to have any known
> encoding?
> 
> I'm lost in the configure maze, but if we don't define PHONY_ICONV,
> then INTERMEDIATE_CHARSET ought to be host_charset anyway.  So the
> fact that your patch made a difference implies that PHONY_ICONV is
> defined.  So what's failing?  Isn't it our *dummy* iconv_open?

No, HAVE_ICONV is defined.  So how does changing the default
definition of INTERMEDIATE_ENCODING make a difference?

All of HAVE_ICONV, HAVE_WCHAR_H, HAVE_BTOWC are defined.

Oh.  We use host_charset if gdb_wchar_t is char.  I misread the
#if's... I don't see how you use iconv and wchar_t together,
otherwise.  Are you just not supposed to?

-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Iconv / Solaris
  2009-08-27 20:50         ` Daniel Jacobowitz
  2009-08-27 22:03           ` Daniel Jacobowitz
@ 2009-08-28  1:01           ` Tom Tromey
  1 sibling, 0 replies; 10+ messages in thread
From: Tom Tromey @ 2009-08-28  1:01 UTC (permalink / raw)
  To: gdb-patches

>>>>> "Daniel" == Daniel Jacobowitz <drow@false.org> writes:

Daniel> Maybe I'm thinking about this wrong... can we determine the
Daniel> encoding of wchar_t somehow that works on Solaris?  Something
Daniel> like what we do now with nl_langinfo?

Nope.

For the "narrow" charset you can use nl_langinfo(CODESET).
There is no equivalent for the wide charset :-(

Many systems let you pass "wchar_t" to iconv_open, instead.
However, Solaris doesn't.

Daniel> Or is it not guaranteed to have any known encoding?

If the system defines __STDC_ISO_10646__, then you know it uses Unicode.

Otherwise, all bets are off.

Daniel> I'm lost in the configure maze

Yeah.  The way it works:

* If you have a fully working iconv + wchar_t suite, then:
  - All the gdb_* macros are defined as their wide counterparts,
    e.g. gdb_iswprint == iswprint
  The intermediate charset is wchar_t.

* You might have iconv but no working wchar_t support.
  In this case we use the narrow forms for everything,
  e.g., gdb_iswprint == isprint (and gdb_wchar_t == char).
  However we still use iconv for recoding.
  The intermediate charset is host_charset.

  This is the scenario I propose we make Solaris use, preferably by
  using AC_TRY_RUN to test iconv_open.
  The impact on the user is that if he tries to print a string with
  non-host-charset characters, he will get escapes -- basically what GDB
  6.8 does.

* You might have nothing at all.  This is the PHONY_ICONV case.
  In this scenario we use the narrow forms for everything and basically
  just fail unless host_charset == target_charset.

Tom


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Iconv / Solaris
  2009-08-27 20:38       ` Daniel Jacobowitz
  2009-08-27 20:50         ` Daniel Jacobowitz
@ 2009-08-28  1:24         ` Tom Tromey
  2009-08-28 17:00           ` Tom Tromey
  1 sibling, 1 reply; 10+ messages in thread
From: Tom Tromey @ 2009-08-28  1:24 UTC (permalink / raw)
  To: gdb-patches

>>>>> "Daniel" == Daniel Jacobowitz <drow@false.org> writes:

Daniel> On Thu, Aug 27, 2009 at 02:30:30PM -0600, Tom Tromey wrote:
>> The initial problem here is that iconv will not accept "wchar_t" as an
>> encoding on this platform.  I see we only have one AC_TRY_RUN in gdb
>> ... am I right in assuming that these are not ok?

Daniel> They are not OK.  Please don't add another if you can avoid it.  I
Daniel> think the one that's there is for long long printf?  I used to have
Daniel> to override the cache variable... haven't checked lately.

Oops, I read your notes out-of-order.

Ok, no new AC_TRY_RUN.

The existing one is some obscure old Linux thing:

  dnl For Linux/i386, glibc 2.1.3 was released with a bogus
  dnl prfpregset_t type (it's a typedef for the pointer to a struct
  dnl instead of the struct itself).  We detect this here, and work
  dnl around it in gdb_proc_service.h.

>> If they are not ok, I think we can just add a new setting to
>> configure.host.  This is simpler to implement.

Daniel> I'm not sure how to do this without hardcoding it by platform.  If the
Daniel> user has external libiconv, do we still want a change?

What we want is to skip case #1 in gdb_wchar.h (the full support case)
and let configure choose between case #2 (iconv only) and case #3
(nothing) depending on whether iconv was found.

I can write a patch tomorrow.

Tom


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Iconv / Solaris
  2009-08-28  1:24         ` Tom Tromey
@ 2009-08-28 17:00           ` Tom Tromey
  0 siblings, 0 replies; 10+ messages in thread
From: Tom Tromey @ 2009-08-28 17:00 UTC (permalink / raw)
  To: gdb-patches

>>>>> "Tom" == Tom Tromey <tromey@redhat.com> writes:

Tom> I can write a patch tomorrow.

Please try the appended.  You will have to re-run autoheader and
autoconf.

I took the opportunity to add more text to the comment in gdb_wchar.h,
in the hopes that this would make it more clear.

If this works for you, I will check it in.

Tom

2009-08-28  Tom Tromey  <tromey@redhat.com>

	* gdb_wchar.h: Update comments.  Use DISABLE_ICONV.
	* configure, config.in: Rebuild.
	* configure.ac (DISABLE_ICONV): Define when needed.
	* configure.host: Set gdb_host_wchar_iconv.

Index: configure.ac
===================================================================
RCS file: /cvs/src/src/gdb/configure.ac,v
retrieving revision 1.105
diff -u -r1.105 configure.ac
--- configure.ac	22 Aug 2009 17:08:09 -0000	1.105
+++ configure.ac	28 Aug 2009 16:15:22 -0000
@@ -788,6 +788,12 @@
 		ttrace wborder setlocale iconvlist libiconvlist btowc])
 AM_LANGINFO_CODESET
 
+# An additional check for whether iconv is ok for us to use.
+if test "$gdb_host_wchar_iconv" = no; then
+  AC_DEFINE([DISABLE_ICONV], 1,
+            [Define if host iconv is unsuitable for use by gdb])
+fi
+
 # Check the return and argument types of ptrace.  No canned test for
 # this, so roll our own.
 gdb_ptrace_headers='
Index: configure.host
===================================================================
RCS file: /cvs/src/src/gdb/configure.host,v
retrieving revision 1.103
diff -u -r1.103 configure.host
--- configure.host	11 Jan 2009 13:15:56 -0000	1.103
+++ configure.host	28 Aug 2009 16:15:22 -0000
@@ -8,6 +8,8 @@
 #  gdb_host_double_format	host's double floatformat, or 0
 #  gdb_host_long_double_format	host's long double floatformat, or 0
 #  gdb_host_obs			host-specific .o files to include
+#  gdb_host_wchar_iconv		"no" if iconv_open will not accept
+#				"wchar_t" as an argument.
 
 # Map host cpu into the config cpu subdirectory name.
 # The default is $host_cpu.
@@ -207,3 +209,15 @@
 	gdb_host_long_double_format=0
 	;;
 esac
+
+
+# Per-host iconv setting.
+
+# Default to "yes".
+gdb_host_wchar_iconv=yes
+
+case "${host}" in
+*-*-solaris*)
+	gdb_host_wchar_iconv=no
+	;;
+esac
Index: gdb_wchar.h
===================================================================
RCS file: /cvs/src/src/gdb/gdb_wchar.h,v
retrieving revision 1.2
diff -u -r1.2 gdb_wchar.h
--- gdb_wchar.h	15 Apr 2009 22:20:31 -0000	1.2
+++ gdb_wchar.h	28 Aug 2009 16:15:22 -0000
@@ -21,18 +21,33 @@
 
 /* We handle three different modes here.
    
-   Capable systems have the full suite: wchar_t support and iconv
+   1. Capable systems have the full suite: wchar_t support and iconv
    (perhaps via GNU libiconv).  On these machines, full functionality
-   is available.
+   is available.  In this case, the intermediate character set is
+   "wchar_t".
    
-   DJGPP is known to have libiconv but not wchar_t support.  On
+   2. DJGPP is known to have libiconv but not wchar_t support.  On
    systems like this, we use the narrow character functions.  The full
    functionality is available to the user, but many characters (those
-   outside the narrow range) will be displayed as escapes.
+   outside the narrow range) will be displayed as escapes.  In this
+   case, the intermediate character set uses the host encoding.
    
-   Finally, some systems do not have iconv.  Here we provide a phony
-   iconv which only handles a single character set, and we provide
-   wrappers for the wchar_t functionality we use.  */
+   We also end up in this scenario if the host iconv is not fully
+   suitable.  Solaris falls into this category both because iconv_open
+   does not accept "wchar_t" as an argument, but also because wchar_t
+   does not have a fixed encoding.
+
+   3. Finally, some systems do not have iconv.  Here we provide a
+   phony iconv which only handles a single character set, and we
+   provide wrappers for the wchar_t functionality we use.  This is
+   what the "PHONY_ICONV" define means, below. In this case, the
+   intermediate character set uses the host encoding, and limited
+   functionality is available to the user.
+   
+   It is possible that you may run into a system that does not support
+   "wchar_t" as an argument to iconv_open, but where
+   __STDC_ISO_10646__ is defined.  If you have such a system, we can
+   add a fourth case where we fix the intermediate encoding.  */
 
 
 #define INTERMEDIATE_ENCODING "wchar_t"
@@ -40,14 +55,17 @@
 #if defined (HAVE_ICONV)
 #include <iconv.h>
 #else
-/* This define is used elsewhere so we don't need to duplicate the
-   same checking logic in multiple places.  */
+/* Case 3.  This define is used elsewhere so we don't need to
+   duplicate the same checking logic in multiple places.  */
 #define PHONY_ICONV
 #endif
 
 /* We use "btowc" as a sentinel to detect functioning wchar_t
    support.  */
-#if defined (HAVE_ICONV) && defined (HAVE_WCHAR_H) && defined (HAVE_BTOWC)
+#if defined (HAVE_ICONV) && defined (HAVE_WCHAR_H) && defined (HAVE_BTOWC) \
+  && !defined (DISABLE_ICONV)
+
+/* Case 1.  */
 
 #include <wchar.h>
 #include <wctype.h>
@@ -65,6 +83,8 @@
 
 #else
 
+/* Case 2 and case 3, depending on whether PHONY_ICONV is defined.  */
+
 typedef char gdb_wchar_t;
 typedef int gdb_wint_t;
 


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-08-28 16:17 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-27  2:22 Iconv / Solaris Daniel Jacobowitz
2009-08-27 17:09 ` Tom Tromey
2009-08-27 17:45   ` Daniel Jacobowitz
2009-08-27 20:36     ` Tom Tromey
2009-08-27 20:38       ` Daniel Jacobowitz
2009-08-27 20:50         ` Daniel Jacobowitz
2009-08-27 22:03           ` Daniel Jacobowitz
2009-08-28  1:01           ` Tom Tromey
2009-08-28  1:24         ` Tom Tromey
2009-08-28 17:00           ` Tom Tromey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox