From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4510 invoked by alias); 16 Oct 2012 20:43:43 -0000 Received: (qmail 4500 invoked by uid 22791); 16 Oct 2012 20:43:42 -0000 X-SWARE-Spam-Status: No, hits=-6.9 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,RCVD_IN_DNSWL_HI,RCVD_IN_HOSTKARMA_W,RP_MATCHES_RCVD,SPF_HELO_PASS X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 16 Oct 2012 20:43:34 +0000 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id q9GKhW6c014920 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 16 Oct 2012 16:43:33 -0400 Received: from barimba (ovpn01.gateway.prod.ext.phx2.redhat.com [10.5.9.1]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id q9GKhMFi029570 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO); Tue, 16 Oct 2012 16:43:27 -0400 From: Tom Tromey To: Joel Brobecker Cc: gdb-patches@sourceware.org Subject: Re: printing 0xbeef wchar_t on x86-windows... References: <20121015190052.GH3034@adacore.com> Date: Tue, 16 Oct 2012 20:43:00 -0000 In-Reply-To: <20121015190052.GH3034@adacore.com> (Joel Brobecker's message of "Mon, 15 Oct 2012 12:00:52 -0700") Message-ID: <87wqyq6tcl.fsf@fleche.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2012-10/txt/msg00259.txt.bz2 >>>>> "Joel" == Joel Brobecker writes: Joel> * valprint.c:generic_emit_char calls wchar_iterate, and finds Joel> one valid character according to the intermediate encoding Joel> ("wchar_t"), even though the character isn't valid in the Joel> original/target charset ("CP1252"). FWIW I think Eli's analysis here is correct. generic_emit_char should be assuming that the character is in the target wide charset, not in the target charset. That is, "show target-wide-charset". If the 'encoding' argument to generic_emit_char is "CP1252" then I think something went wrong earlier. Joel> * Before actually printing the buffer, generic_emit_char converts Joel> the string from the intermediate encoding into the host encoding, Joel> which is "CP1252". The converstion routine now finds that, Joel> although the multi-bypte sequence is printable, it isn't valid Joel> in the target encoding (iconv returns EILSEQ), and thus Must be the host encoding here, not the target encoding? Joel> But the problem is that convert_between_encodings was called Joel> with the width set to 1, instead of using the character type's Joel> size. This does seem wrong. But, I don't think that using the type length here is correct, either. The width argument to convert_between_encodings is documented as: WIDTH is the width of a character from the FROM charset, in bytes. For a variable width encoding, WIDTH should be the size of a "base character". (I didn't check whether this comment is accurate.) And, this call to convert_between_encodings is converting from the intermediate charset to the host charset. So, I think this should be sizeof (gdb_wchar_t). Before putting something like that in, though, I would like to look at Keith's pending patch that reworks this code. Maybe he already fixed the bug. Also, I think this should have a regression test. Joel> For completeness' sake, GDB 7.5 used to produce the following output: Joel> (gdb) print single Joel> $2 = 48879 L'\xbeef' Joel> I prefer this output, as it provides the wide character as one number, Joel> rather than two. Offhandedly I agree, but my recollection is that all these little decisions have some logic behind them (though sometimes just "that's how it used to work"), and so you have to dig down to see what the change would really imply. Joel> The reason why GDB 7.5 presented the value this way Joel> is because it took a different path during the initial iteration, thanks Joel> to the fact that the intermediate encoding was "CP1252" instead of Joel> "wchar_t", making the character invalid the whole way. This comes from Joel> a change in defs.h which added an include of build-gnulib/config.h, Joel> which itself caused HAVE_WCHAR_H to be defined, thus influencing Joel> the intermediate encoding. This area is quite fiddly unfortunately. It sounds like the recent gnulib imports have invalidated some of the logic in gdb_wchar.h. It seems that we can now always rely on wchar.h being available. So maybe we could at least remove some configury and #ifs. Tom