From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-95251-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 4510 invoked by alias); 16 Oct 2012 20:43:43 -0000
Received: (qmail 4500 invoked by uid 22791); 16 Oct 2012 20:43:42 -0000
X-SWARE-Spam-Status: No, hits=-6.9 required=5.0	tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,RCVD_IN_DNSWL_HI,RCVD_IN_HOSTKARMA_W,RP_MATCHES_RCVD,SPF_HELO_PASS
X-Spam-Check-By: sourceware.org
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 16 Oct 2012 20:43:34 +0000
Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23])	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q9GKhW6c014920	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);	Tue, 16 Oct 2012 16:43:33 -0400
Received: from barimba (ovpn01.gateway.prod.ext.phx2.redhat.com [10.5.9.1])	by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id q9GKhMFi029570	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO);	Tue, 16 Oct 2012 16:43:27 -0400
From: Tom Tromey <tromey@redhat.com>
To: Joel Brobecker <brobecker@adacore.com>
Cc: gdb-patches@sourceware.org
Subject: Re: printing 0xbeef wchar_t on x86-windows...
References: <20121015190052.GH3034@adacore.com>
Date: Tue, 16 Oct 2012 20:43:00 -0000
In-Reply-To: <20121015190052.GH3034@adacore.com> (Joel Brobecker's message of	"Mon, 15 Oct 2012 12:00:52 -0700")
Message-ID: <87wqyq6tcl.fsf@fleche.redhat.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
X-SW-Source: 2012-10/txt/msg00259.txt.bz2

>>>>> "Joel" == Joel Brobecker <brobecker@adacore.com> writes:

Joel>   * valprint.c:generic_emit_char calls wchar_iterate, and finds
Joel>     one valid character according to the intermediate encoding
Joel>     ("wchar_t"), even though the character isn't valid in the
Joel>     original/target charset ("CP1252").

FWIW I think Eli's analysis here is correct.

generic_emit_char should be assuming that the character is in the target
wide charset, not in the target charset.  That is, "show
target-wide-charset".

If the 'encoding' argument to generic_emit_char is "CP1252" then I think
something went wrong earlier.

Joel>   * Before actually printing the buffer, generic_emit_char converts
Joel>     the string from the intermediate encoding into the host encoding,
Joel>     which is "CP1252". The converstion routine now finds that,
Joel>     although the multi-bypte sequence is printable, it isn't valid
Joel>     in the target encoding (iconv returns EILSEQ), and thus

Must be the host encoding here, not the target encoding?

Joel>     But the problem is that convert_between_encodings was called
Joel>     with the width set to 1, instead of using the character type's
Joel>     size.

This does seem wrong.  But, I don't think that using the type length
here is correct, either.

The width argument to convert_between_encodings is documented as:

   WIDTH is the width of a character from the FROM charset, in bytes.
   For a variable width encoding, WIDTH should be the size of a "base
   character".

(I didn't check whether this comment is accurate.)

And, this call to convert_between_encodings is converting from the
intermediate charset to the host charset.  So, I think this should be
sizeof (gdb_wchar_t).

Before putting something like that in, though, I would like to look at
Keith's pending patch that reworks this code.  Maybe he already fixed
the bug.

Also, I think this should have a regression test.

Joel> For completeness' sake, GDB 7.5 used to produce the following output:
Joel>     (gdb) print single
Joel>     $2 = 48879 L'\xbeef'
Joel> I prefer this output, as it provides the wide character as one number,
Joel> rather than two.

Offhandedly I agree, but my recollection is that all these little
decisions have some logic behind them (though sometimes just "that's how
it used to work"), and so you have to dig down to see what the change
would really imply.

Joel> The reason why GDB 7.5 presented the value this way
Joel> is because it took a different path during the initial iteration, thanks
Joel> to the fact that the intermediate encoding was "CP1252" instead of
Joel> "wchar_t", making the character invalid the whole way. This comes from
Joel> a change in defs.h which added an include of build-gnulib/config.h,
Joel> which itself caused HAVE_WCHAR_H to be defined, thus influencing
Joel> the intermediate encoding.

This area is quite fiddly unfortunately.

It sounds like the recent gnulib imports have invalidated some of the
logic in gdb_wchar.h.  It seems that we can now always rely on wchar.h
being available.  So maybe we could at least remove some configury and
#ifs.

Tom