printing 0xbeef wchar_t on x86-windows...

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

* printing 0xbeef wchar_t on x86-windows...
@ 2012-10-15 19:01 Joel Brobecker
  2012-10-15 19:46 ` Eli Zaretskii
  2012-10-16 20:43 ` Tom Tromey
  0 siblings, 2 replies; 12+ messages in thread
From: Joel Brobecker @ 2012-10-15 19:01 UTC (permalink / raw)
  To: gdb-patches; +Cc: Tom Tromey

[-- Attachment #1: Type: text/plain, Size: 3203 bytes --]

Hello,

I have a variable of type wchar_t whose value is 0xbeef, simply
defined as follow:

    wchar_t single = 0xbeef;

But with the current HEAD, I get:

    (gdb) print single
    $5 = 48879 L'\357'

In chronological order:

  * valprint.c:generic_emit_char calls wchar_iterate, and finds
    one valid character according to the intermediate encoding
    ("wchar_t"), even though the character isn't valid in the
    original/target charset ("CP1252").

  * valprint.c:print_wchar then checks whether the character is
    printable or not. If it wasn't, then print_wchar would have
    converted the multi-byte sequence into a hex string image.
    But unfortunately for us, Window's iswprint likes 0xbeef as
    printable, as so print_wchar puts it in the buffer as is to
    be printed.

  * Before actually printing the buffer, generic_emit_char converts
    the string from the intermediate encoding into the host encoding,
    which is "CP1252". The converstion routine now finds that,
    although the multi-bypte sequence is printable, it isn't valid
    in the target encoding (iconv returns EILSEQ), and thus
    replaces the wchar by a string with a sequence of octal numbers,
    one for each byte. For instance \357 is 0xef.

    But the problem is that convert_between_encodings was called
    with the width set to 1, instead of using the character type's
    size.

With the attached patch, we now get the following output...

    (gdb) print single
    $2 = 48879 L'\357\276'

... which is no longer missing half of the wide character value.

For completeness' sake, GDB 7.5 used to produce the following output:

    (gdb) print single
    $2 = 48879 L'\xbeef'

I prefer this output, as it provides the wide character as one number,
rather than two. The reason why GDB 7.5 presented the value this way
is because it took a different path during the initial iteration, thanks
to the fact that the intermediate encoding was "CP1252" instead of
"wchar_t", making the character invalid the whole way. This comes from
a change in defs.h which added an include of build-gnulib/config.h,
which itself caused HAVE_WCHAR_H to be defined, thus influencing
the intermediate encoding.

I have a feeling that going back to "CP1252" as the intermediate
encoding isn't something that we'd like to do. What I explored for
a while, was the idea of having convert_between_encodings transform
invalid sequences into one single number, the same way print_wchar
does. But I think that there is an endianness issue - not sure -
as we don't really know whether the buffer is following the target
or host endinaness. We need that piece of info in order to extract
the wide character's value.

Nonetheless, I think that this can be looked at separately if desired.
In the meantime, the following patch updates the calls to
convert_between_encodings to pass the correct width, and the new
output is already an improvement. So I think that the attached
patch is worth checking in on its own.

gdb/ChangeLog:

        * valprint.c (generic_emit_char): Pass correct width in call to
        convert_between_encodings.
        (generic_printstr): Likewise.

Tested on x86-linux. OK to commit?

Thanks,
-- 
Joel

[-- Attachment #2: wchar-0xbeef.diff --]
[-- Type: text/x-diff, Size: 934 bytes --]

diff --git a/gdb/valprint.c b/gdb/valprint.c
index 6e651f6..31cef54 100644
--- a/gdb/valprint.c
+++ b/gdb/valprint.c
@@ -2037,7 +2037,7 @@ generic_emit_char (int c, struct type *type, struct ui_file *stream,
   convert_between_encodings (INTERMEDIATE_ENCODING, host_charset (),
 			     obstack_base (&wchar_buf),
 			     obstack_object_size (&wchar_buf),
-			     1, &output, translit_char);
+			     TYPE_LENGTH (type), &output, translit_char);
   obstack_1grow (&output, '\0');

   fputs_filtered (obstack_base (&output), stream);
@@ -2278,7 +2278,7 @@ generic_printstr (struct ui_file *stream, struct type *type,
   convert_between_encodings (INTERMEDIATE_ENCODING, host_charset (),
 			     obstack_base (&wchar_buf),
 			     obstack_object_size (&wchar_buf),
-			     1, &output, translit_char);
+			     width, &output, translit_char);
   obstack_1grow (&output, '\0');

   fputs_filtered (obstack_base (&output), stream);

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: printing 0xbeef wchar_t on x86-windows...
  2012-10-15 19:01 printing 0xbeef wchar_t on x86-windows Joel Brobecker
@ 2012-10-15 19:46 ` Eli Zaretskii
  2012-10-15 20:14   ` Joel Brobecker
  2012-10-16 20:43 ` Tom Tromey
  1 sibling, 1 reply; 12+ messages in thread
From: Eli Zaretskii @ 2012-10-15 19:46 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: gdb-patches, tromey

> Date: Mon, 15 Oct 2012 12:00:52 -0700
> From: Joel Brobecker <brobecker@adacore.com>
> Cc: Tom Tromey <tromey@redhat.com>
> 
>     wchar_t single = 0xbeef;
> 
> But with the current HEAD, I get:
> 
>     (gdb) print single
>     $5 = 48879 L'\357'
> 
> In chronological order:
> 
>   * valprint.c:generic_emit_char calls wchar_iterate, and finds
>     one valid character according to the intermediate encoding
>     ("wchar_t"), even though the character isn't valid in the
>     original/target charset ("CP1252").

How would cp1252 enter the picture, when you are talking about a
wchar_t data type?

>     But unfortunately for us, Window's iswprint likes 0xbeef as
>     printable

This happens to be a Unicode codepoint of a Hangul word-constituent
character.  That's what you get for putting random values into wchar_t
data type ;-)

>     But the problem is that convert_between_encodings was called
>     with the width set to 1, instead of using the character type's
>     size.
> 
> With the attached patch, we now get the following output...
> 
>     (gdb) print single
>     $2 = 48879 L'\357\276'
> 
> ... which is no longer missing half of the wide character value.

I guess that's the right output, so long as your output charset does
not support that Hangul character.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: printing 0xbeef wchar_t on x86-windows...
  2012-10-15 19:46 ` Eli Zaretskii
@ 2012-10-15 20:14   ` Joel Brobecker
  0 siblings, 0 replies; 12+ messages in thread
From: Joel Brobecker @ 2012-10-15 20:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gdb-patches, tromey

> >   * valprint.c:generic_emit_char calls wchar_iterate, and finds
> >     one valid character according to the intermediate encoding
> >     ("wchar_t"), even though the character isn't valid in the
> >     original/target charset ("CP1252").
> 
> How would cp1252 enter the picture, when you are talking about a
> wchar_t data type?

It's the default target charset on this host (see "set/show
host-charset"). That's pretty much all I know.

-- 
Joel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: printing 0xbeef wchar_t on x86-windows...
  2012-10-15 19:01 printing 0xbeef wchar_t on x86-windows Joel Brobecker
  2012-10-15 19:46 ` Eli Zaretskii
@ 2012-10-16 20:43 ` Tom Tromey
  2012-10-16 22:43   ` Joel Brobecker
  2012-10-16 23:31   ` Joel Brobecker
  1 sibling, 2 replies; 12+ messages in thread
From: Tom Tromey @ 2012-10-16 20:43 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: gdb-patches

>>>>> "Joel" == Joel Brobecker <brobecker@adacore.com> writes:

Joel>   * valprint.c:generic_emit_char calls wchar_iterate, and finds
Joel>     one valid character according to the intermediate encoding
Joel>     ("wchar_t"), even though the character isn't valid in the
Joel>     original/target charset ("CP1252").

FWIW I think Eli's analysis here is correct.

generic_emit_char should be assuming that the character is in the target
wide charset, not in the target charset.  That is, "show
target-wide-charset".

If the 'encoding' argument to generic_emit_char is "CP1252" then I think
something went wrong earlier.

Joel>   * Before actually printing the buffer, generic_emit_char converts
Joel>     the string from the intermediate encoding into the host encoding,
Joel>     which is "CP1252". The converstion routine now finds that,
Joel>     although the multi-bypte sequence is printable, it isn't valid
Joel>     in the target encoding (iconv returns EILSEQ), and thus

Must be the host encoding here, not the target encoding?

Joel>     But the problem is that convert_between_encodings was called
Joel>     with the width set to 1, instead of using the character type's
Joel>     size.

This does seem wrong.  But, I don't think that using the type length
here is correct, either.

The width argument to convert_between_encodings is documented as:

   WIDTH is the width of a character from the FROM charset, in bytes.
   For a variable width encoding, WIDTH should be the size of a "base
   character".

(I didn't check whether this comment is accurate.)

And, this call to convert_between_encodings is converting from the
intermediate charset to the host charset.  So, I think this should be
sizeof (gdb_wchar_t).

Before putting something like that in, though, I would like to look at
Keith's pending patch that reworks this code.  Maybe he already fixed
the bug.

Also, I think this should have a regression test.

Joel> For completeness' sake, GDB 7.5 used to produce the following output:
Joel>     (gdb) print single
Joel>     $2 = 48879 L'\xbeef'
Joel> I prefer this output, as it provides the wide character as one number,
Joel> rather than two.

Offhandedly I agree, but my recollection is that all these little
decisions have some logic behind them (though sometimes just "that's how
it used to work"), and so you have to dig down to see what the change
would really imply.

Joel> The reason why GDB 7.5 presented the value this way
Joel> is because it took a different path during the initial iteration, thanks
Joel> to the fact that the intermediate encoding was "CP1252" instead of
Joel> "wchar_t", making the character invalid the whole way. This comes from
Joel> a change in defs.h which added an include of build-gnulib/config.h,
Joel> which itself caused HAVE_WCHAR_H to be defined, thus influencing
Joel> the intermediate encoding.

This area is quite fiddly unfortunately.

It sounds like the recent gnulib imports have invalidated some of the
logic in gdb_wchar.h.  It seems that we can now always rely on wchar.h
being available.  So maybe we could at least remove some configury and
#ifs.

Tom

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: printing 0xbeef wchar_t on x86-windows...
  2012-10-16 20:43 ` Tom Tromey
@ 2012-10-16 22:43   ` Joel Brobecker
  2012-10-17  1:37     ` Tom Tromey
  2012-10-16 23:31   ` Joel Brobecker
  1 sibling, 1 reply; 12+ messages in thread
From: Joel Brobecker @ 2012-10-16 22:43 UTC (permalink / raw)
  To: Tom Tromey; +Cc: gdb-patches

Thanks for the review, Tom.

> generic_emit_char should be assuming that the character is in the target
> wide charset, not in the target charset.  That is, "show
> target-wide-charset".

> If the 'encoding' argument to generic_emit_char is "CP1252" then I think
> something went wrong earlier.

I will double-check what is exactly going on. I only very recently
discovered the target-wide-charset setting (and will post a patch
for AIX soon), so I probably still only have a partial picture.

> Joel>   * Before actually printing the buffer, generic_emit_char converts
> Joel>     the string from the intermediate encoding into the host encoding,
> Joel>     which is "CP1252". The converstion routine now finds that,
> Joel>     although the multi-bypte sequence is printable, it isn't valid
> Joel>     in the target encoding (iconv returns EILSEQ), and thus
> 
> Must be the host encoding here, not the target encoding?

Yes - poor (overloaded) choice of terms in this case. I shoud probably
have used "destination"...

> And, this call to convert_between_encodings is converting from the
> intermediate charset to the host charset.  So, I think this should be
> sizeof (gdb_wchar_t).

I keep getting confused over these...

> Before putting something like that in, though, I would like to look at
> Keith's pending patch that reworks this code.  Maybe he already fixed
> the bug.

OK. I couldn't find the patch in question, so I couldn't test it.

> Also, I think this should have a regression test.

We actually already have one (see wchar.exp).

-- 
Joel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: printing 0xbeef wchar_t on x86-windows...
  2012-10-16 20:43 ` Tom Tromey
  2012-10-16 22:43   ` Joel Brobecker
@ 2012-10-16 23:31   ` Joel Brobecker
  2012-10-17  1:38     ` Tom Tromey
  1 sibling, 1 reply; 12+ messages in thread
From: Joel Brobecker @ 2012-10-16 23:31 UTC (permalink / raw)
  To: Tom Tromey; +Cc: gdb-patches

> Joel>   * valprint.c:generic_emit_char calls wchar_iterate, and finds
> Joel>     one valid character according to the intermediate encoding
> Joel>     ("wchar_t"), even though the character isn't valid in the
> Joel>     original/target charset ("CP1252").

> generic_emit_char should be assuming that the character is in the target
> wide charset, not in the target charset.  That is, "show
> target-wide-charset".
> 
> If the 'encoding' argument to generic_emit_char is "CP1252" then I think
> something went wrong earlier.

OK, small correction: generic_emit_char was called with 'encoding'
set to "UTF-16LE", which makes sense, given that it is what the
windows (actually cygwin) -tdep file explicitly sets it to.

I probably got confused in my notes with what was happening with
GDB 7.5, or maybe just got confused period.

Other than that, I think that the rest remains accurate, so it seem
that...

> And, this call to convert_between_encodings is converting from the
> intermediate charset to the host charset.  So, I think this should be
> sizeof (gdb_wchar_t).

... would be the way to go, assuming that we're not waiting for Keith's
patches. A small request: If Keith's patch is still some ways off, I'd
love to have a fix put in while we wait. This bug, and a few other
charset-related ones, have been nagging at me for a while, and with
more important tasks, traveling and, hum, holidays, I haven't had
the time to follow up as quickly as I'd like...

Cheers!
-- 
Joel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: printing 0xbeef wchar_t on x86-windows...
  2012-10-16 22:43   ` Joel Brobecker
@ 2012-10-17  1:37     ` Tom Tromey
  2012-10-17 14:58       ` Joel Brobecker
  0 siblings, 1 reply; 12+ messages in thread
From: Tom Tromey @ 2012-10-17  1:37 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: gdb-patches

>>>>> "Joel" == Joel Brobecker <brobecker@adacore.com> writes:

Tom> Before putting something like that in, though, I would like to look at
Tom> Keith's pending patch that reworks this code.  Maybe he already fixed
Tom> the bug.

Joel> OK. I couldn't find the patch in question, so I couldn't test it.

It is here:

http://sourceware.org/ml/gdb-patches/2012-08/msg00780.html

Tom> Also, I think this should have a regression test.

Joel> We actually already have one (see wchar.exp).

Oh super, thanks.

Tom

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: printing 0xbeef wchar_t on x86-windows...
  2012-10-16 23:31   ` Joel Brobecker
@ 2012-10-17  1:38     ` Tom Tromey
  0 siblings, 0 replies; 12+ messages in thread
From: Tom Tromey @ 2012-10-17  1:38 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: gdb-patches

>>>>> "Joel" == Joel Brobecker <brobecker@adacore.com> writes:

Joel> A small request: If Keith's patch is still some ways off, I'd
Joel> love to have a fix put in while we wait. This bug, and a few other
Joel> charset-related ones, have been nagging at me for a while, and with
Joel> more important tasks, traveling and, hum, holidays, I haven't had
Joel> the time to follow up as quickly as I'd like...

I will try to look at it this week.
Just glancing at it now, it seems like it doesn't touch
generic_emit_char, so we'll at least need the fix there...

Tom

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: printing 0xbeef wchar_t on x86-windows...
  2012-10-17  1:37     ` Tom Tromey
@ 2012-10-17 14:58       ` Joel Brobecker
  2012-10-17 18:28         ` Tom Tromey
  0 siblings, 1 reply; 12+ messages in thread
From: Joel Brobecker @ 2012-10-17 14:58 UTC (permalink / raw)
  To: Tom Tromey; +Cc: gdb-patches

> Tom> Before putting something like that in, though, I would like to look at
> Tom> Keith's pending patch that reworks this code.  Maybe he already fixed
> Tom> the bug.
> 
> http://sourceware.org/ml/gdb-patches/2012-08/msg00780.html

Thanks! I gave it a whirl, JIC, and as you suspected, it does not
fix the 0xdeadbeef issue on Windows. It looks like a nice improvement,
however - with maybe one comment which I will send by replying
to the email...

-- 
Joel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: printing 0xbeef wchar_t on x86-windows...
  2012-10-17 14:58       ` Joel Brobecker
@ 2012-10-17 18:28         ` Tom Tromey
  2012-10-17 18:43           ` Joel Brobecker
  0 siblings, 1 reply; 12+ messages in thread
From: Tom Tromey @ 2012-10-17 18:28 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: gdb-patches

>>>>> "Joel" == Joel Brobecker <brobecker@adacore.com> writes:

Joel> Thanks! I gave it a whirl, JIC, and as you suspected, it does not
Joel> fix the 0xdeadbeef issue on Windows. It looks like a nice improvement,
Joel> however - with maybe one comment which I will send by replying
Joel> to the email...

Thanks.

FWIW the appended builds and regtests ok; but really it needs testing in
your environment.

Tom

diff --git a/gdb/valprint.c b/gdb/valprint.c
index 6e651f6..583329d 100644
--- a/gdb/valprint.c
+++ b/gdb/valprint.c
@@ -2037,7 +2037,7 @@ generic_emit_char (int c, struct type *type, struct ui_file *stream,
   convert_between_encodings (INTERMEDIATE_ENCODING, host_charset (),
 			     obstack_base (&wchar_buf),
 			     obstack_object_size (&wchar_buf),
-			     1, &output, translit_char);
+			     sizeof (gdb_wchar_t), &output, translit_char);
   obstack_1grow (&output, '\0');
 
   fputs_filtered (obstack_base (&output), stream);
@@ -2278,7 +2278,7 @@ generic_printstr (struct ui_file *stream, struct type *type,
   convert_between_encodings (INTERMEDIATE_ENCODING, host_charset (),
 			     obstack_base (&wchar_buf),
 			     obstack_object_size (&wchar_buf),
-			     1, &output, translit_char);
+			     sizeof (gdb_wchar_t), &output, translit_char);
   obstack_1grow (&output, '\0');
 
   fputs_filtered (obstack_base (&output), stream);


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: printing 0xbeef wchar_t on x86-windows...
  2012-10-17 18:28         ` Tom Tromey
@ 2012-10-17 18:43           ` Joel Brobecker
  2012-10-17 19:20             ` Tom Tromey
  0 siblings, 1 reply; 12+ messages in thread
From: Joel Brobecker @ 2012-10-17 18:43 UTC (permalink / raw)
  To: Tom Tromey; +Cc: gdb-patches

> FWIW the appended builds and regtests ok; but really it needs testing in
> your environment.

Already done :-). But JIC, I applied your patch, and tested it again,
and same (good) results.

Would you like me to commit it for you?

Thanks, Tom.

> diff --git a/gdb/valprint.c b/gdb/valprint.c
> index 6e651f6..583329d 100644
> --- a/gdb/valprint.c
> +++ b/gdb/valprint.c
> @@ -2037,7 +2037,7 @@ generic_emit_char (int c, struct type *type, struct ui_file *stream,
>    convert_between_encodings (INTERMEDIATE_ENCODING, host_charset (),
>  			     obstack_base (&wchar_buf),
>  			     obstack_object_size (&wchar_buf),
> -			     1, &output, translit_char);
> +			     sizeof (gdb_wchar_t), &output, translit_char);
>    obstack_1grow (&output, '\0');
>  
>    fputs_filtered (obstack_base (&output), stream);
> @@ -2278,7 +2278,7 @@ generic_printstr (struct ui_file *stream, struct type *type,
>    convert_between_encodings (INTERMEDIATE_ENCODING, host_charset (),
>  			     obstack_base (&wchar_buf),
>  			     obstack_object_size (&wchar_buf),
> -			     1, &output, translit_char);
> +			     sizeof (gdb_wchar_t), &output, translit_char);
>    obstack_1grow (&output, '\0');
>  
>    fputs_filtered (obstack_base (&output), stream);

-- 
Joel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: printing 0xbeef wchar_t on x86-windows...
  2012-10-17 18:43           ` Joel Brobecker
@ 2012-10-17 19:20             ` Tom Tromey
  0 siblings, 0 replies; 12+ messages in thread
From: Tom Tromey @ 2012-10-17 19:20 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: gdb-patches

Joel> Would you like me to commit it for you?

I'll do it shortly.

Tom


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-10-17 19:20 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-15 19:01 printing 0xbeef wchar_t on x86-windows Joel Brobecker
2012-10-15 19:46 ` Eli Zaretskii
2012-10-15 20:14   ` Joel Brobecker
2012-10-16 20:43 ` Tom Tromey
2012-10-16 22:43   ` Joel Brobecker
2012-10-17  1:37     ` Tom Tromey
2012-10-17 14:58       ` Joel Brobecker
2012-10-17 18:28         ` Tom Tromey
2012-10-17 18:43           ` Joel Brobecker
2012-10-17 19:20             ` Tom Tromey
2012-10-16 23:31   ` Joel Brobecker
2012-10-17  1:38     ` Tom Tromey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox