From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 10518 invoked by alias); 15 Oct 2012 19:01:01 -0000 Received: (qmail 10502 invoked by uid 22791); 15 Oct 2012 19:01:01 -0000 X-SWARE-Spam-Status: No, hits=-1.9 required=5.0 tests=AWL,BAYES_00,RCVD_IN_HOSTKARMA_NO,TW_SW X-Spam-Check-By: sourceware.org Received: from rock.gnat.com (HELO rock.gnat.com) (205.232.38.15) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 15 Oct 2012 19:00:57 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by filtered-rock.gnat.com (Postfix) with ESMTP id EF0821C7DD3; Mon, 15 Oct 2012 15:00:56 -0400 (EDT) Received: from rock.gnat.com ([127.0.0.1]) by localhost (rock.gnat.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id ziUXKtTeY7Qk; Mon, 15 Oct 2012 15:00:56 -0400 (EDT) Received: from joel.gnat.com (localhost.localdomain [127.0.0.1]) by rock.gnat.com (Postfix) with ESMTP id BB86C1C7DCF; Mon, 15 Oct 2012 15:00:56 -0400 (EDT) Received: by joel.gnat.com (Postfix, from userid 1000) id 32EE9CB492; Mon, 15 Oct 2012 12:00:52 -0700 (PDT) Date: Mon, 15 Oct 2012 19:01:00 -0000 From: Joel Brobecker To: gdb-patches@sourceware.org Cc: Tom Tromey Subject: printing 0xbeef wchar_t on x86-windows... Message-ID: <20121015190052.GH3034@adacore.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="UugvWAfsgieZRqgk" Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2012-10/txt/msg00232.txt.bz2 --UugvWAfsgieZRqgk Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-length: 3203 Hello, I have a variable of type wchar_t whose value is 0xbeef, simply defined as follow: wchar_t single = 0xbeef; But with the current HEAD, I get: (gdb) print single $5 = 48879 L'\357' In chronological order: * valprint.c:generic_emit_char calls wchar_iterate, and finds one valid character according to the intermediate encoding ("wchar_t"), even though the character isn't valid in the original/target charset ("CP1252"). * valprint.c:print_wchar then checks whether the character is printable or not. If it wasn't, then print_wchar would have converted the multi-byte sequence into a hex string image. But unfortunately for us, Window's iswprint likes 0xbeef as printable, as so print_wchar puts it in the buffer as is to be printed. * Before actually printing the buffer, generic_emit_char converts the string from the intermediate encoding into the host encoding, which is "CP1252". The converstion routine now finds that, although the multi-bypte sequence is printable, it isn't valid in the target encoding (iconv returns EILSEQ), and thus replaces the wchar by a string with a sequence of octal numbers, one for each byte. For instance \357 is 0xef. But the problem is that convert_between_encodings was called with the width set to 1, instead of using the character type's size. With the attached patch, we now get the following output... (gdb) print single $2 = 48879 L'\357\276' ... which is no longer missing half of the wide character value. For completeness' sake, GDB 7.5 used to produce the following output: (gdb) print single $2 = 48879 L'\xbeef' I prefer this output, as it provides the wide character as one number, rather than two. The reason why GDB 7.5 presented the value this way is because it took a different path during the initial iteration, thanks to the fact that the intermediate encoding was "CP1252" instead of "wchar_t", making the character invalid the whole way. This comes from a change in defs.h which added an include of build-gnulib/config.h, which itself caused HAVE_WCHAR_H to be defined, thus influencing the intermediate encoding. I have a feeling that going back to "CP1252" as the intermediate encoding isn't something that we'd like to do. What I explored for a while, was the idea of having convert_between_encodings transform invalid sequences into one single number, the same way print_wchar does. But I think that there is an endianness issue - not sure - as we don't really know whether the buffer is following the target or host endinaness. We need that piece of info in order to extract the wide character's value. Nonetheless, I think that this can be looked at separately if desired. In the meantime, the following patch updates the calls to convert_between_encodings to pass the correct width, and the new output is already an improvement. So I think that the attached patch is worth checking in on its own. gdb/ChangeLog: * valprint.c (generic_emit_char): Pass correct width in call to convert_between_encodings. (generic_printstr): Likewise. Tested on x86-linux. OK to commit? Thanks, -- Joel --UugvWAfsgieZRqgk Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="wchar-0xbeef.diff" Content-length: 934 diff --git a/gdb/valprint.c b/gdb/valprint.c index 6e651f6..31cef54 100644 --- a/gdb/valprint.c +++ b/gdb/valprint.c @@ -2037,7 +2037,7 @@ generic_emit_char (int c, struct type *type, struct ui_file *stream, convert_between_encodings (INTERMEDIATE_ENCODING, host_charset (), obstack_base (&wchar_buf), obstack_object_size (&wchar_buf), - 1, &output, translit_char); + TYPE_LENGTH (type), &output, translit_char); obstack_1grow (&output, '\0'); fputs_filtered (obstack_base (&output), stream); @@ -2278,7 +2278,7 @@ generic_printstr (struct ui_file *stream, struct type *type, convert_between_encodings (INTERMEDIATE_ENCODING, host_charset (), obstack_base (&wchar_buf), obstack_object_size (&wchar_buf), - 1, &output, translit_char); + width, &output, translit_char); obstack_1grow (&output, '\0'); fputs_filtered (obstack_base (&output), stream); --UugvWAfsgieZRqgk--