* iconv returning byte order marks for Solaris 2.9
@ 2009-07-15 18:28 Andrew
2009-07-15 18:57 ` Tom Tromey
0 siblings, 1 reply; 7+ messages in thread
From: Andrew @ 2009-07-15 18:28 UTC (permalink / raw)
To: gdb-patches
[-- Attachment #1: Type: text/plain, Size: 355 bytes --]
Hi
I found a problem printing strings for gdb 6.8 weekly snapshot
(2009 07 07) on Solaris 2.9.
I eventually found that changing INTERMEDIATE_ENCODING
in gdb_wchar.h to "UCS-4" and applying the following
patch worked. Any comments?
I'm not sure how to handle the INTERMEDIATE_ENCODING
change, since it's probably system dependent.
Andrew
[-- Attachment #2: patch_charset.txt --]
[-- Type: text/plain, Size: 671 bytes --]
diff -rau src.original/gdb/charset.c src/gdb/charset.c
--- src.original/gdb/charset.c 2009-07-15 13:05:42.000896000 -0400
+++ src/gdb/charset.c 2009-07-15 13:09:23.000013000 -0400
@@ -646,6 +646,20 @@
*out_chars = iter->out;
*ptr = orig_inptr;
*len = orig_in - iter->bytes;
+
+ if (num > 1) {
+ if ( (iter->out[0] == (gdb_wchar_t) 0xfffe) ||
+ (iter->out[0] == (gdb_wchar_t) 0xfeff) ) {
+
+ /* iconv returned byte order marks, skip those */
+ int mov;
+ for (mov = 0; mov < (num - 1); mov ++)
+ iter->out[mov] = iter->out[mov + 1];
+
+ num -= 1;
+ }
+ }
+
return num;
}
Only in src/gdb: charset.c.~1.24.~
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: iconv returning byte order marks for Solaris 2.9 2009-07-15 18:28 iconv returning byte order marks for Solaris 2.9 Andrew @ 2009-07-15 18:57 ` Tom Tromey 2009-07-16 2:29 ` Andrew 0 siblings, 1 reply; 7+ messages in thread From: Tom Tromey @ 2009-07-15 18:57 UTC (permalink / raw) To: ke; +Cc: gdb-patches >>>>> "Andrew" == Andrew <ke@alum.bu.edu> writes: Andrew> I found a problem printing strings for gdb 6.8 weekly snapshot Andrew> (2009 07 07) on Solaris 2.9. Thanks for finding and diagnosing this. Andrew> I eventually found that changing INTERMEDIATE_ENCODING Andrew> in gdb_wchar.h to "UCS-4" and applying the following Andrew> patch worked. Any comments? Andrew> I'm not sure how to handle the INTERMEDIATE_ENCODING Andrew> change, since it's probably system dependent. I don't have access to Solaris. If I understand correctly, the situation is: * wchar_t on Solaris is encoded using UCS-4 * iconv_open accepts "wchar_t" as an encoding name * in this case, iconv emits a BOM First, this seems like it must be a Solaris bug, just because I can't imagine how this would be useful. I don't think we can use your patch as-is. It does the BOM elimination unconditionally, but really I think we can only do it on platforms where we know that wchar_t is UCS-4 (or UCS-2 I suppose). Does Solaris 9 support a full suite of conversions? If not, one option would be to use libiconv, and find a way to disable most of this code by default on Solaris. Failing that, the simplest fix would be if there is an encoding (compatible with wchar_t) we can use on Solaris which does not insert the BOM. For example, maybe "UCS-4BE" or "UCS-4LE", depending on the architecture. I think a fix like this could be done entirely in gdb_wchar.h. Could you try that? As far as the host dependency, we can probably just check __STDC_ISO_10646__. Tom ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: iconv returning byte order marks for Solaris 2.9 2009-07-15 18:57 ` Tom Tromey @ 2009-07-16 2:29 ` Andrew 2009-07-17 19:19 ` Tom Tromey 0 siblings, 1 reply; 7+ messages in thread From: Andrew @ 2009-07-16 2:29 UTC (permalink / raw) To: gdb-patches In the system I'm working iconv_open doesn't accept "wchar_t" as encoding name. It failed when INTERMEDIATE_ENCODING was set to that. But setting INTERMEDIATE_ENCODING to "UCS-4BE" eliminated the BOM in the beginning. Andrew --- On Wed, 7/15/09, Tom Tromey <tromey@redhat.com> wrote: > From: Tom Tromey <tromey@redhat.com> > Subject: Re: iconv returning byte order marks for Solaris 2.9 > To: ke@alum.bu.edu > Cc: gdb-patches@sourceware.org > Date: Wednesday, July 15, 2009, 2:24 PM > >>>>> "Andrew" == > Andrew <ke@alum.bu.edu> > writes: > > Andrew> I found a problem printing strings for gdb 6.8 > weekly snapshot > Andrew> (2009 07 07) on Solaris 2.9. > > Thanks for finding and diagnosing this. > > Andrew> I eventually found that changing > INTERMEDIATE_ENCODING > Andrew> in gdb_wchar.h to "UCS-4" and applying the > following > Andrew> patch worked. Any comments? > > Andrew> I'm not sure how to handle the > INTERMEDIATE_ENCODING > Andrew> change, since it's probably system dependent. > > I don't have access to Solaris. If I understand > correctly, the > situation is: > > * wchar_t on Solaris is encoded using UCS-4 > * iconv_open accepts "wchar_t" as an encoding name > * in this case, iconv emits a BOM > > First, this seems like it must be a Solaris bug, just > because I can't > imagine how this would be useful. > > I don't think we can use your patch as-is. It does > the BOM elimination > unconditionally, but really I think we can only do it on > platforms where > we know that wchar_t is UCS-4 (or UCS-2 I suppose). > > Does Solaris 9 support a full suite of conversions? > If not, one option > would be to use libiconv, and find a way to disable most of > this code by > default on Solaris. > > Failing that, the simplest fix would be if there is an > encoding > (compatible with wchar_t) we can use on Solaris which does > not insert > the BOM. For example, maybe "UCS-4BE" or "UCS-4LE", > depending on the > architecture. I think a fix like this could be done > entirely in > gdb_wchar.h. Could you try that? > > As far as the host dependency, we can probably just check > __STDC_ISO_10646__. > > Tom > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: iconv returning byte order marks for Solaris 2.9 2009-07-16 2:29 ` Andrew @ 2009-07-17 19:19 ` Tom Tromey 2009-07-21 20:18 ` Andrew 0 siblings, 1 reply; 7+ messages in thread From: Tom Tromey @ 2009-07-17 19:19 UTC (permalink / raw) To: ke; +Cc: gdb-patches >>>>> "Andrew" == Andrew <ke@alum.bu.edu> writes: Andrew> In the system I'm working iconv_open doesn't accept "wchar_t" as Andrew> encoding name. It failed when INTERMEDIATE_ENCODING was set to Andrew> that. Ah, thanks. Andrew> But setting INTERMEDIATE_ENCODING to "UCS-4BE" eliminated the Andrew> BOM in the beginning. Great. Could you try the appended patch? I'm testing it on Linux. Tom diff --git a/gdb/gdb_wchar.h b/gdb/gdb_wchar.h index 07a6c87..241e051 100644 --- a/gdb/gdb_wchar.h +++ b/gdb/gdb_wchar.h @@ -35,8 +35,6 @@ wrappers for the wchar_t functionality we use. */ -#define INTERMEDIATE_ENCODING "wchar_t" - #if defined (HAVE_ICONV) #include <iconv.h> #else @@ -63,6 +61,20 @@ typedef wint_t gdb_wint_t; #define LCST(X) L ## X +#ifdef __STDC_ISO_10646__ +/* On Solaris 9, iconv_open does not accept "wchar_t". So, on this + platform, and other platforms where wchar_t is known to use + ISO-10646, choose an appropriate explicit charset name. Also, + UCS-4 on Solaris will emit a BOM, which we don't want. So, we + choose an explicit little- or big-endian variant, depending on the + host. */ +#if WORDS_BIGENDIAN +#define INTERMEDIATE_ENCODING "UCS-4BE" +#else +#define INTERMEDIATE_ENCODING "UCS-4LE" +#endif +#endif + #else typedef char gdb_wchar_t; @@ -87,4 +99,8 @@ typedef int gdb_wint_t; #endif +#ifndef INTERMEDIATE_ENCODING +#define INTERMEDIATE_ENCODING "wchar_t" +#endif + #endif /* GDB_WCHAR_H */ ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: iconv returning byte order marks for Solaris 2.9 2009-07-17 19:19 ` Tom Tromey @ 2009-07-21 20:18 ` Andrew 2009-07-24 21:58 ` Tom Tromey 2009-08-14 20:13 ` Tom Tromey 0 siblings, 2 replies; 7+ messages in thread From: Andrew @ 2009-07-21 20:18 UTC (permalink / raw) To: gdb-patches Thanks for the patch. I'm actually generating solaris binaries using a cross compiler (from a Linux box) and in my current configuration it doesn't work. __STDC_ISO_10646__ is not defined. I will try installing gcc locally and see how that works. Andrew --- On Fri, 7/17/09, Tom Tromey <tromey@redhat.com> wrote: > From: Tom Tromey <tromey@redhat.com> > Subject: Re: iconv returning byte order marks for Solaris 2.9 > To: ke@alum.bu.edu > Cc: gdb-patches@sourceware.org > Date: Friday, July 17, 2009, 3:02 PM > >>>>> "Andrew" == > Andrew <ke@alum.bu.edu> > writes: > > Andrew> In the system I'm working iconv_open doesn't > accept "wchar_t" as > Andrew> encoding name. It failed when > INTERMEDIATE_ENCODING was set to > Andrew> that. > > Ah, thanks. > > Andrew> But setting INTERMEDIATE_ENCODING to "UCS-4BE" > eliminated the > Andrew> BOM in the beginning. > > Great. Could you try the appended patch? > I'm testing it on Linux. > > Tom > > diff --git a/gdb/gdb_wchar.h b/gdb/gdb_wchar.h > index 07a6c87..241e051 100644 > --- a/gdb/gdb_wchar.h > +++ b/gdb/gdb_wchar.h > @@ -35,8 +35,6 @@ > wrappers for the wchar_t functionality we > use. */ > > > -#define INTERMEDIATE_ENCODING "wchar_t" > - > #if defined (HAVE_ICONV) > #include <iconv.h> > #else > @@ -63,6 +61,20 @@ typedef wint_t gdb_wint_t; > > #define LCST(X) L ## X > > +#ifdef __STDC_ISO_10646__ > +/* On Solaris 9, iconv_open does not accept > "wchar_t". So, on this > + platform, and other platforms where > wchar_t is known to use > + ISO-10646, choose an appropriate > explicit charset name. Also, > + UCS-4 on Solaris will emit a BOM, which > we don't want. So, we > + choose an explicit little- or big-endian > variant, depending on the > + host. */ > +#if WORDS_BIGENDIAN > +#define INTERMEDIATE_ENCODING "UCS-4BE" > +#else > +#define INTERMEDIATE_ENCODING "UCS-4LE" > +#endif > +#endif > + > #else > > typedef char gdb_wchar_t; > @@ -87,4 +99,8 @@ typedef int gdb_wint_t; > > #endif > > +#ifndef INTERMEDIATE_ENCODING > +#define INTERMEDIATE_ENCODING "wchar_t" > +#endif > + > #endif /* GDB_WCHAR_H */ > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: iconv returning byte order marks for Solaris 2.9 2009-07-21 20:18 ` Andrew @ 2009-07-24 21:58 ` Tom Tromey 2009-08-14 20:13 ` Tom Tromey 1 sibling, 0 replies; 7+ messages in thread From: Tom Tromey @ 2009-07-24 21:58 UTC (permalink / raw) To: ke; +Cc: gdb-patches >>>>> "Andrew" == Andrew <ke@alum.bu.edu> writes: Andrew> Thanks for the patch. I'm actually generating solaris binaries Andrew> using a cross compiler (from a Linux box) and in my current Andrew> configuration it doesn't work. __STDC_ISO_10646__ is not defined. Ouch. This seems like a QoI issue in the cross compiler. But, it is hard to call it a bug exactly. I don't know the best thing to do in this case. I suppose we could make a new variable used by configure.host that would override the name of the wchar_t encoding. Tom ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: iconv returning byte order marks for Solaris 2.9 2009-07-21 20:18 ` Andrew 2009-07-24 21:58 ` Tom Tromey @ 2009-08-14 20:13 ` Tom Tromey 1 sibling, 0 replies; 7+ messages in thread From: Tom Tromey @ 2009-08-14 20:13 UTC (permalink / raw) To: ke; +Cc: gdb-patches >>>>> "Andrew" == Andrew <ke@alum.bu.edu> writes: Andrew> Thanks for the patch. I'm actually generating solaris binaries Andrew> using a cross compiler (from a Linux box) and in my current Andrew> configuration it doesn't work. __STDC_ISO_10646__ is not defined. Andrew> I will try installing gcc locally and see how that works. Hi, what is the status of this? I'm asking because if I need to make some changes to gdb's configury, I'd like to have plenty of time before 7.0. thanks, Tom ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-08-14 19:29 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-07-15 18:28 iconv returning byte order marks for Solaris 2.9 Andrew 2009-07-15 18:57 ` Tom Tromey 2009-07-16 2:29 ` Andrew 2009-07-17 19:19 ` Tom Tromey 2009-07-21 20:18 ` Andrew 2009-07-24 21:58 ` Tom Tromey 2009-08-14 20:13 ` Tom Tromey
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox