From: Tom Tromey <tromey@redhat.com>
To: Joel Brobecker <brobecker@adacore.com>
Cc: gdb-patches@sourceware.org
Subject: Re: printing 0xbeef wchar_t on x86-windows...
Date: Tue, 16 Oct 2012 20:43:00 -0000 [thread overview]
Message-ID: <87wqyq6tcl.fsf@fleche.redhat.com> (raw)
In-Reply-To: <20121015190052.GH3034@adacore.com> (Joel Brobecker's message of "Mon, 15 Oct 2012 12:00:52 -0700")
>>>>> "Joel" == Joel Brobecker <brobecker@adacore.com> writes:
Joel> * valprint.c:generic_emit_char calls wchar_iterate, and finds
Joel> one valid character according to the intermediate encoding
Joel> ("wchar_t"), even though the character isn't valid in the
Joel> original/target charset ("CP1252").
FWIW I think Eli's analysis here is correct.
generic_emit_char should be assuming that the character is in the target
wide charset, not in the target charset. That is, "show
target-wide-charset".
If the 'encoding' argument to generic_emit_char is "CP1252" then I think
something went wrong earlier.
Joel> * Before actually printing the buffer, generic_emit_char converts
Joel> the string from the intermediate encoding into the host encoding,
Joel> which is "CP1252". The converstion routine now finds that,
Joel> although the multi-bypte sequence is printable, it isn't valid
Joel> in the target encoding (iconv returns EILSEQ), and thus
Must be the host encoding here, not the target encoding?
Joel> But the problem is that convert_between_encodings was called
Joel> with the width set to 1, instead of using the character type's
Joel> size.
This does seem wrong. But, I don't think that using the type length
here is correct, either.
The width argument to convert_between_encodings is documented as:
WIDTH is the width of a character from the FROM charset, in bytes.
For a variable width encoding, WIDTH should be the size of a "base
character".
(I didn't check whether this comment is accurate.)
And, this call to convert_between_encodings is converting from the
intermediate charset to the host charset. So, I think this should be
sizeof (gdb_wchar_t).
Before putting something like that in, though, I would like to look at
Keith's pending patch that reworks this code. Maybe he already fixed
the bug.
Also, I think this should have a regression test.
Joel> For completeness' sake, GDB 7.5 used to produce the following output:
Joel> (gdb) print single
Joel> $2 = 48879 L'\xbeef'
Joel> I prefer this output, as it provides the wide character as one number,
Joel> rather than two.
Offhandedly I agree, but my recollection is that all these little
decisions have some logic behind them (though sometimes just "that's how
it used to work"), and so you have to dig down to see what the change
would really imply.
Joel> The reason why GDB 7.5 presented the value this way
Joel> is because it took a different path during the initial iteration, thanks
Joel> to the fact that the intermediate encoding was "CP1252" instead of
Joel> "wchar_t", making the character invalid the whole way. This comes from
Joel> a change in defs.h which added an include of build-gnulib/config.h,
Joel> which itself caused HAVE_WCHAR_H to be defined, thus influencing
Joel> the intermediate encoding.
This area is quite fiddly unfortunately.
It sounds like the recent gnulib imports have invalidated some of the
logic in gdb_wchar.h. It seems that we can now always rely on wchar.h
being available. So maybe we could at least remove some configury and
#ifs.
Tom
next prev parent reply other threads:[~2012-10-16 20:43 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-15 19:01 Joel Brobecker
2012-10-15 19:46 ` Eli Zaretskii
2012-10-15 20:14 ` Joel Brobecker
2012-10-16 20:43 ` Tom Tromey [this message]
2012-10-16 22:43 ` Joel Brobecker
2012-10-17 1:37 ` Tom Tromey
2012-10-17 14:58 ` Joel Brobecker
2012-10-17 18:28 ` Tom Tromey
2012-10-17 18:43 ` Joel Brobecker
2012-10-17 19:20 ` Tom Tromey
2012-10-16 23:31 ` Joel Brobecker
2012-10-17 1:38 ` Tom Tromey
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87wqyq6tcl.fsf@fleche.redhat.com \
--to=tromey@redhat.com \
--cc=brobecker@adacore.com \
--cc=gdb-patches@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox