From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 863 invoked by alias); 16 Jan 2009 00:01:03 -0000 Received: (qmail 762 invoked by uid 22791); 16 Jan 2009 00:01:01 -0000 X-SWARE-Spam-Status: No, hits=-2.1 required=5.0 tests=AWL,BAYES_00,KAM_MX,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: sourceware.org Received: from mx2.redhat.com (HELO mx2.redhat.com) (66.187.237.31) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 16 Jan 2009 00:00:26 +0000 Received: from int-mx2.corp.redhat.com (int-mx2.corp.redhat.com [172.16.27.26]) by mx2.redhat.com (8.13.8/8.13.8) with ESMTP id n0FNwN10029551; Thu, 15 Jan 2009 18:58:23 -0500 Received: from ns3.rdu.redhat.com (ns3.rdu.redhat.com [10.11.255.199]) by int-mx2.corp.redhat.com (8.13.1/8.13.1) with ESMTP id n0FNwNev013798; Thu, 15 Jan 2009 18:58:24 -0500 Received: from opsy.redhat.com (vpn-12-7.rdu.redhat.com [10.11.12.7]) by ns3.rdu.redhat.com (8.13.8/8.13.8) with ESMTP id n0FNwNP1022051; Thu, 15 Jan 2009 18:58:23 -0500 Received: by opsy.redhat.com (Postfix, from userid 500) id 6DDD4C880EC; Thu, 15 Jan 2009 16:58:21 -0700 (MST) To: "Joseph S. Myers" Cc: Julian Brown , gdb-patches@sourceware.org Subject: Re: [PATCH/WIP] C/C++ wchar_t/Unicode printing support References: <20090115202411.5f154657@rex.config> From: Tom Tromey Reply-To: tromey@redhat.com Date: Fri, 16 Jan 2009 00:01:00 -0000 In-Reply-To: (Joseph S. Myers's message of "Thu\, 15 Jan 2009 21\:17\:46 +0000 \(UTC\)") Message-ID: User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2009-01/txt/msg00367.txt.bz2 >>>>> "Joseph" == Joseph S Myers writes: Joseph> (Of course, now C++0x has and C1x has accepted (not yet in a Joseph> draft) a lot of further new string syntax that Jakub has Joseph> implemented for GCC 4.5.) Yeah, I haven't looked at that yet. Joseph> If you handle input of the new string syntax, do you also Joseph> handle the interesting concatenation issues? "\xab" L"c" is a Joseph> wide string with two characters, L'\xab' and L'c' (plus the Joseph> trailing NUL); you do not interpret '\xab' as a member of the Joseph> target narrow character set and convert to the target wide Joseph> character set (nor do you interpret it as L"\xabc", with a Joseph> single escape sequence), so you can't convert escape sequences Joseph> to bytes of a string until after you know whether the final Joseph> string is narrow or wide (or some other variant, in Joseph> C++0x/C1x). I think my patch handles this correctly, though I have not written any tests for it yet. What I do is construct an OP_STRING in a new format. This is done in the C parser. This format describes the resulting type, and then has each sub-string included separately. Some escape processing is done in the lexer, but not everything, and in particular not \x. Then, the C language overrides the interpretation of OP_STRING to do its work. This step converts the strings to the desired target format. This could all be done in the parser, of course, but I chose to defer part of it to expression evaluation for a reason. This approach gives us the ability to use a single expression across multiple inferiors, which may (in theory -- not practice, yet) have different target-charset settings. It does have another user-visible effect, which is that a string in a breakpoint condition will change when the target-charset is changed. I tend to think this is a feature. Finally, my patch supports UCNs in strings and character literals, though, I suspect, incorrectly. I haven't dug into it. In any case the differences are only likely to be noticed in fairly unusual code. Tom