Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed
* iconv returning byte order marks for Solaris 2.9
@ 2009-07-15 18:28 Andrew
  2009-07-15 18:57 ` Tom Tromey
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew @ 2009-07-15 18:28 UTC (permalink / raw)
  To: gdb-patches

[-- Attachment #1: Type: text/plain, Size: 355 bytes --]

Hi

I found a problem printing strings for gdb 6.8 weekly snapshot 
(2009 07 07) on Solaris 2.9. 

I eventually found that changing INTERMEDIATE_ENCODING 
in gdb_wchar.h to "UCS-4" and applying the following
patch worked. Any comments?

I'm not sure how to handle the INTERMEDIATE_ENCODING 
change, since it's probably system dependent. 

Andrew



      

[-- Attachment #2: patch_charset.txt --]
[-- Type: text/plain, Size: 671 bytes --]

diff -rau src.original/gdb/charset.c src/gdb/charset.c
--- src.original/gdb/charset.c	2009-07-15 13:05:42.000896000 -0400
+++ src/gdb/charset.c	2009-07-15 13:09:23.000013000 -0400
@@ -646,6 +646,20 @@
       *out_chars = iter->out;
       *ptr = orig_inptr;
       *len = orig_in - iter->bytes;
+
+      if (num > 1) {
+	if ( (iter->out[0] == (gdb_wchar_t) 0xfffe) ||
+	     (iter->out[0] == (gdb_wchar_t) 0xfeff) ) {
+
+	  /* iconv returned byte order marks, skip those */
+	  int mov;
+	  for (mov = 0; mov < (num - 1); mov ++) 
+	    iter->out[mov] = iter->out[mov + 1];
+	  
+	  num -= 1;
+	}
+      }
+
       return num;
     }
 
Only in src/gdb: charset.c.~1.24.~

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: iconv returning byte order marks for Solaris 2.9
  2009-07-15 18:28 iconv returning byte order marks for Solaris 2.9 Andrew
@ 2009-07-15 18:57 ` Tom Tromey
  2009-07-16  2:29   ` Andrew
  0 siblings, 1 reply; 7+ messages in thread
From: Tom Tromey @ 2009-07-15 18:57 UTC (permalink / raw)
  To: ke; +Cc: gdb-patches

>>>>> "Andrew" == Andrew  <ke@alum.bu.edu> writes:

Andrew> I found a problem printing strings for gdb 6.8 weekly snapshot 
Andrew> (2009 07 07) on Solaris 2.9. 

Thanks for finding and diagnosing this.

Andrew> I eventually found that changing INTERMEDIATE_ENCODING 
Andrew> in gdb_wchar.h to "UCS-4" and applying the following
Andrew> patch worked. Any comments?

Andrew> I'm not sure how to handle the INTERMEDIATE_ENCODING 
Andrew> change, since it's probably system dependent. 

I don't have access to Solaris.  If I understand correctly, the
situation is:

* wchar_t on Solaris is encoded using UCS-4
* iconv_open accepts "wchar_t" as an encoding name
* in this case, iconv emits a BOM

First, this seems like it must be a Solaris bug, just because I can't
imagine how this would be useful.

I don't think we can use your patch as-is.  It does the BOM elimination
unconditionally, but really I think we can only do it on platforms where
we know that wchar_t is UCS-4 (or UCS-2 I suppose).

Does Solaris 9 support a full suite of conversions?  If not, one option
would be to use libiconv, and find a way to disable most of this code by
default on Solaris.

Failing that, the simplest fix would be if there is an encoding
(compatible with wchar_t) we can use on Solaris which does not insert
the BOM.  For example, maybe "UCS-4BE" or "UCS-4LE", depending on the
architecture.  I think a fix like this could be done entirely in
gdb_wchar.h.  Could you try that?

As far as the host dependency, we can probably just check
__STDC_ISO_10646__.

Tom


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: iconv returning byte order marks for Solaris 2.9
  2009-07-15 18:57 ` Tom Tromey
@ 2009-07-16  2:29   ` Andrew
  2009-07-17 19:19     ` Tom Tromey
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew @ 2009-07-16  2:29 UTC (permalink / raw)
  To: gdb-patches


In the system I'm working iconv_open doesn't accept "wchar_t" as encoding name. It failed when INTERMEDIATE_ENCODING was set to that. 

But setting INTERMEDIATE_ENCODING to "UCS-4BE" eliminated the BOM in the beginning. 

Andrew

--- On Wed, 7/15/09, Tom Tromey <tromey@redhat.com> wrote:

> From: Tom Tromey <tromey@redhat.com>
> Subject: Re: iconv returning byte order marks for Solaris 2.9
> To: ke@alum.bu.edu
> Cc: gdb-patches@sourceware.org
> Date: Wednesday, July 15, 2009, 2:24 PM
> >>>>> "Andrew" ==
> Andrew  <ke@alum.bu.edu>
> writes:
> 
> Andrew> I found a problem printing strings for gdb 6.8
> weekly snapshot 
> Andrew> (2009 07 07) on Solaris 2.9. 
> 
> Thanks for finding and diagnosing this.
> 
> Andrew> I eventually found that changing
> INTERMEDIATE_ENCODING 
> Andrew> in gdb_wchar.h to "UCS-4" and applying the
> following
> Andrew> patch worked. Any comments?
> 
> Andrew> I'm not sure how to handle the
> INTERMEDIATE_ENCODING 
> Andrew> change, since it's probably system dependent. 
> 
> I don't have access to Solaris.  If I understand
> correctly, the
> situation is:
> 
> * wchar_t on Solaris is encoded using UCS-4
> * iconv_open accepts "wchar_t" as an encoding name
> * in this case, iconv emits a BOM
> 
> First, this seems like it must be a Solaris bug, just
> because I can't
> imagine how this would be useful.
> 
> I don't think we can use your patch as-is.  It does
> the BOM elimination
> unconditionally, but really I think we can only do it on
> platforms where
> we know that wchar_t is UCS-4 (or UCS-2 I suppose).
> 
> Does Solaris 9 support a full suite of conversions? 
> If not, one option
> would be to use libiconv, and find a way to disable most of
> this code by
> default on Solaris.
> 
> Failing that, the simplest fix would be if there is an
> encoding
> (compatible with wchar_t) we can use on Solaris which does
> not insert
> the BOM.  For example, maybe "UCS-4BE" or "UCS-4LE",
> depending on the
> architecture.  I think a fix like this could be done
> entirely in
> gdb_wchar.h.  Could you try that?
> 
> As far as the host dependency, we can probably just check
> __STDC_ISO_10646__.
> 
> Tom
> 




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: iconv returning byte order marks for Solaris 2.9
  2009-07-16  2:29   ` Andrew
@ 2009-07-17 19:19     ` Tom Tromey
  2009-07-21 20:18       ` Andrew
  0 siblings, 1 reply; 7+ messages in thread
From: Tom Tromey @ 2009-07-17 19:19 UTC (permalink / raw)
  To: ke; +Cc: gdb-patches

>>>>> "Andrew" == Andrew  <ke@alum.bu.edu> writes:

Andrew> In the system I'm working iconv_open doesn't accept "wchar_t" as
Andrew> encoding name. It failed when INTERMEDIATE_ENCODING was set to
Andrew> that.

Ah, thanks.

Andrew> But setting INTERMEDIATE_ENCODING to "UCS-4BE" eliminated the
Andrew> BOM in the beginning.

Great.  Could you try the appended patch?
I'm testing it on Linux.

Tom

diff --git a/gdb/gdb_wchar.h b/gdb/gdb_wchar.h
index 07a6c87..241e051 100644
--- a/gdb/gdb_wchar.h
+++ b/gdb/gdb_wchar.h
@@ -35,8 +35,6 @@
    wrappers for the wchar_t functionality we use.  */
 
 
-#define INTERMEDIATE_ENCODING "wchar_t"
-
 #if defined (HAVE_ICONV)
 #include <iconv.h>
 #else
@@ -63,6 +61,20 @@ typedef wint_t gdb_wint_t;
 
 #define LCST(X) L ## X
 
+#ifdef __STDC_ISO_10646__
+/* On Solaris 9, iconv_open does not accept "wchar_t".  So, on this
+   platform, and other platforms where wchar_t is known to use
+   ISO-10646, choose an appropriate explicit charset name.  Also,
+   UCS-4 on Solaris will emit a BOM, which we don't want.  So, we
+   choose an explicit little- or big-endian variant, depending on the
+   host.  */
+#if WORDS_BIGENDIAN
+#define INTERMEDIATE_ENCODING "UCS-4BE"
+#else
+#define INTERMEDIATE_ENCODING "UCS-4LE"
+#endif
+#endif
+
 #else
 
 typedef char gdb_wchar_t;
@@ -87,4 +99,8 @@ typedef int gdb_wint_t;
 
 #endif
 
+#ifndef INTERMEDIATE_ENCODING
+#define INTERMEDIATE_ENCODING "wchar_t"
+#endif
+
 #endif /* GDB_WCHAR_H */


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: iconv returning byte order marks for Solaris 2.9
  2009-07-17 19:19     ` Tom Tromey
@ 2009-07-21 20:18       ` Andrew
  2009-07-24 21:58         ` Tom Tromey
  2009-08-14 20:13         ` Tom Tromey
  0 siblings, 2 replies; 7+ messages in thread
From: Andrew @ 2009-07-21 20:18 UTC (permalink / raw)
  To: gdb-patches


Thanks for the patch. I'm actually generating solaris binaries
using a cross compiler (from a Linux box) and in my current 
configuration it doesn't work. __STDC_ISO_10646__ is not defined.

I will try installing gcc locally and see how that works.

Andrew

--- On Fri, 7/17/09, Tom Tromey <tromey@redhat.com> wrote:

> From: Tom Tromey <tromey@redhat.com>
> Subject: Re: iconv returning byte order marks for Solaris 2.9
> To: ke@alum.bu.edu
> Cc: gdb-patches@sourceware.org
> Date: Friday, July 17, 2009, 3:02 PM
> >>>>> "Andrew" ==
> Andrew  <ke@alum.bu.edu>
> writes:
> 
> Andrew> In the system I'm working iconv_open doesn't
> accept "wchar_t" as
> Andrew> encoding name. It failed when
> INTERMEDIATE_ENCODING was set to
> Andrew> that.
> 
> Ah, thanks.
> 
> Andrew> But setting INTERMEDIATE_ENCODING to "UCS-4BE"
> eliminated the
> Andrew> BOM in the beginning.
> 
> Great.  Could you try the appended patch?
> I'm testing it on Linux.
> 
> Tom
> 
> diff --git a/gdb/gdb_wchar.h b/gdb/gdb_wchar.h
> index 07a6c87..241e051 100644
> --- a/gdb/gdb_wchar.h
> +++ b/gdb/gdb_wchar.h
> @@ -35,8 +35,6 @@
>     wrappers for the wchar_t functionality we
> use.  */
>  
>  
> -#define INTERMEDIATE_ENCODING "wchar_t"
> -
>  #if defined (HAVE_ICONV)
>  #include <iconv.h>
>  #else
> @@ -63,6 +61,20 @@ typedef wint_t gdb_wint_t;
>  
>  #define LCST(X) L ## X
>  
> +#ifdef __STDC_ISO_10646__
> +/* On Solaris 9, iconv_open does not accept
> "wchar_t".  So, on this
> +   platform, and other platforms where
> wchar_t is known to use
> +   ISO-10646, choose an appropriate
> explicit charset name.  Also,
> +   UCS-4 on Solaris will emit a BOM, which
> we don't want.  So, we
> +   choose an explicit little- or big-endian
> variant, depending on the
> +   host.  */
> +#if WORDS_BIGENDIAN
> +#define INTERMEDIATE_ENCODING "UCS-4BE"
> +#else
> +#define INTERMEDIATE_ENCODING "UCS-4LE"
> +#endif
> +#endif
> +
>  #else
>  
>  typedef char gdb_wchar_t;
> @@ -87,4 +99,8 @@ typedef int gdb_wint_t;
>  
>  #endif
>  
> +#ifndef INTERMEDIATE_ENCODING
> +#define INTERMEDIATE_ENCODING "wchar_t"
> +#endif
> +
>  #endif /* GDB_WCHAR_H */
> 




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: iconv returning byte order marks for Solaris 2.9
  2009-07-21 20:18       ` Andrew
@ 2009-07-24 21:58         ` Tom Tromey
  2009-08-14 20:13         ` Tom Tromey
  1 sibling, 0 replies; 7+ messages in thread
From: Tom Tromey @ 2009-07-24 21:58 UTC (permalink / raw)
  To: ke; +Cc: gdb-patches

>>>>> "Andrew" == Andrew  <ke@alum.bu.edu> writes:

Andrew> Thanks for the patch. I'm actually generating solaris binaries
Andrew> using a cross compiler (from a Linux box) and in my current 
Andrew> configuration it doesn't work. __STDC_ISO_10646__ is not defined.

Ouch.

This seems like a QoI issue in the cross compiler.  But, it is hard to
call it a bug exactly.

I don't know the best thing to do in this case.  I suppose we could make
a new variable used by configure.host that would override the name of
the wchar_t encoding.

Tom


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: iconv returning byte order marks for Solaris 2.9
  2009-07-21 20:18       ` Andrew
  2009-07-24 21:58         ` Tom Tromey
@ 2009-08-14 20:13         ` Tom Tromey
  1 sibling, 0 replies; 7+ messages in thread
From: Tom Tromey @ 2009-08-14 20:13 UTC (permalink / raw)
  To: ke; +Cc: gdb-patches

>>>>> "Andrew" == Andrew  <ke@alum.bu.edu> writes:

Andrew> Thanks for the patch. I'm actually generating solaris binaries
Andrew> using a cross compiler (from a Linux box) and in my current 
Andrew> configuration it doesn't work. __STDC_ISO_10646__ is not defined.

Andrew> I will try installing gcc locally and see how that works.

Hi, what is the status of this?

I'm asking because if I need to make some changes to gdb's configury,
I'd like to have plenty of time before 7.0.

thanks,
Tom


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-08-14 19:29 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-15 18:28 iconv returning byte order marks for Solaris 2.9 Andrew
2009-07-15 18:57 ` Tom Tromey
2009-07-16  2:29   ` Andrew
2009-07-17 19:19     ` Tom Tromey
2009-07-21 20:18       ` Andrew
2009-07-24 21:58         ` Tom Tromey
2009-08-14 20:13         ` Tom Tromey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox