From: Joel Brobecker <brobecker@adacore.com>
To: gdb-patches@sourceware.org
Cc: Tom Tromey <tromey@redhat.com>
Subject: printing 0xbeef wchar_t on x86-windows...
Date: Mon, 15 Oct 2012 19:01:00 -0000 [thread overview]
Message-ID: <20121015190052.GH3034@adacore.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 3203 bytes --]
Hello,
I have a variable of type wchar_t whose value is 0xbeef, simply
defined as follow:
wchar_t single = 0xbeef;
But with the current HEAD, I get:
(gdb) print single
$5 = 48879 L'\357'
In chronological order:
* valprint.c:generic_emit_char calls wchar_iterate, and finds
one valid character according to the intermediate encoding
("wchar_t"), even though the character isn't valid in the
original/target charset ("CP1252").
* valprint.c:print_wchar then checks whether the character is
printable or not. If it wasn't, then print_wchar would have
converted the multi-byte sequence into a hex string image.
But unfortunately for us, Window's iswprint likes 0xbeef as
printable, as so print_wchar puts it in the buffer as is to
be printed.
* Before actually printing the buffer, generic_emit_char converts
the string from the intermediate encoding into the host encoding,
which is "CP1252". The converstion routine now finds that,
although the multi-bypte sequence is printable, it isn't valid
in the target encoding (iconv returns EILSEQ), and thus
replaces the wchar by a string with a sequence of octal numbers,
one for each byte. For instance \357 is 0xef.
But the problem is that convert_between_encodings was called
with the width set to 1, instead of using the character type's
size.
With the attached patch, we now get the following output...
(gdb) print single
$2 = 48879 L'\357\276'
... which is no longer missing half of the wide character value.
For completeness' sake, GDB 7.5 used to produce the following output:
(gdb) print single
$2 = 48879 L'\xbeef'
I prefer this output, as it provides the wide character as one number,
rather than two. The reason why GDB 7.5 presented the value this way
is because it took a different path during the initial iteration, thanks
to the fact that the intermediate encoding was "CP1252" instead of
"wchar_t", making the character invalid the whole way. This comes from
a change in defs.h which added an include of build-gnulib/config.h,
which itself caused HAVE_WCHAR_H to be defined, thus influencing
the intermediate encoding.
I have a feeling that going back to "CP1252" as the intermediate
encoding isn't something that we'd like to do. What I explored for
a while, was the idea of having convert_between_encodings transform
invalid sequences into one single number, the same way print_wchar
does. But I think that there is an endianness issue - not sure -
as we don't really know whether the buffer is following the target
or host endinaness. We need that piece of info in order to extract
the wide character's value.
Nonetheless, I think that this can be looked at separately if desired.
In the meantime, the following patch updates the calls to
convert_between_encodings to pass the correct width, and the new
output is already an improvement. So I think that the attached
patch is worth checking in on its own.
gdb/ChangeLog:
* valprint.c (generic_emit_char): Pass correct width in call to
convert_between_encodings.
(generic_printstr): Likewise.
Tested on x86-linux. OK to commit?
Thanks,
--
Joel
[-- Attachment #2: wchar-0xbeef.diff --]
[-- Type: text/x-diff, Size: 934 bytes --]
diff --git a/gdb/valprint.c b/gdb/valprint.c
index 6e651f6..31cef54 100644
--- a/gdb/valprint.c
+++ b/gdb/valprint.c
@@ -2037,7 +2037,7 @@ generic_emit_char (int c, struct type *type, struct ui_file *stream,
convert_between_encodings (INTERMEDIATE_ENCODING, host_charset (),
obstack_base (&wchar_buf),
obstack_object_size (&wchar_buf),
- 1, &output, translit_char);
+ TYPE_LENGTH (type), &output, translit_char);
obstack_1grow (&output, '\0');
fputs_filtered (obstack_base (&output), stream);
@@ -2278,7 +2278,7 @@ generic_printstr (struct ui_file *stream, struct type *type,
convert_between_encodings (INTERMEDIATE_ENCODING, host_charset (),
obstack_base (&wchar_buf),
obstack_object_size (&wchar_buf),
- 1, &output, translit_char);
+ width, &output, translit_char);
obstack_1grow (&output, '\0');
fputs_filtered (obstack_base (&output), stream);
next reply other threads:[~2012-10-15 19:01 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-15 19:01 Joel Brobecker [this message]
2012-10-15 19:46 ` Eli Zaretskii
2012-10-15 20:14 ` Joel Brobecker
2012-10-16 20:43 ` Tom Tromey
2012-10-16 22:43 ` Joel Brobecker
2012-10-17 1:37 ` Tom Tromey
2012-10-17 14:58 ` Joel Brobecker
2012-10-17 18:28 ` Tom Tromey
2012-10-17 18:43 ` Joel Brobecker
2012-10-17 19:20 ` Tom Tromey
2012-10-16 23:31 ` Joel Brobecker
2012-10-17 1:38 ` Tom Tromey
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121015190052.GH3034@adacore.com \
--to=brobecker@adacore.com \
--cc=gdb-patches@sourceware.org \
--cc=tromey@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox