From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 30889 invoked by alias); 15 Jan 2009 21:02:41 -0000 Received: (qmail 30605 invoked by uid 22791); 15 Jan 2009 21:02:39 -0000 X-SWARE-Spam-Status: No, hits=-2.1 required=5.0 tests=AWL,BAYES_00,KAM_MX,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: sourceware.org Received: from mx2.redhat.com (HELO mx2.redhat.com) (66.187.237.31) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 15 Jan 2009 21:01:58 +0000 Received: from int-mx2.corp.redhat.com (int-mx2.corp.redhat.com [172.16.27.26]) by mx2.redhat.com (8.13.8/8.13.8) with ESMTP id n0FKxuF9029558; Thu, 15 Jan 2009 15:59:56 -0500 Received: from ns3.rdu.redhat.com (ns3.rdu.redhat.com [10.11.255.199]) by int-mx2.corp.redhat.com (8.13.1/8.13.1) with ESMTP id n0FKxuCO032418; Thu, 15 Jan 2009 15:59:56 -0500 Received: from opsy.redhat.com (vpn-12-7.rdu.redhat.com [10.11.12.7]) by ns3.rdu.redhat.com (8.13.8/8.13.8) with ESMTP id n0FKxsNR014620; Thu, 15 Jan 2009 15:59:55 -0500 Received: by opsy.redhat.com (Postfix, from userid 500) id 6183A5082E2; Thu, 15 Jan 2009 13:59:52 -0700 (MST) To: Julian Brown Cc: gdb-patches@sourceware.org Subject: Re: [PATCH/WIP] C/C++ wchar_t/Unicode printing support References: <20090115202411.5f154657@rex.config> From: Tom Tromey Reply-To: Tom Tromey Date: Thu, 15 Jan 2009 21:02:00 -0000 In-Reply-To: <20090115202411.5f154657@rex.config> (Julian Brown's message of "Thu\, 15 Jan 2009 20\:24\:11 +0000") Message-ID: User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2009-01/txt/msg00363.txt.bz2 >>>>> "Julian" == Julian Brown writes: Julian> 3. Types which are literally called "wchar_t" are assumed to Julian> be wide characters. I did something similar -- my patch looks at TYPE_NAME to see if it is "wchar_t". In C, this is a typedef, and so I needed the appended to make it work. Without this patch, lookup_typename will find a "wchar_t" symbol whose type has a TYPE_NAME which is not "wchar_t". That seemed odd. The patch changes the dwarf reader so that the wchar_t symbol points to a type whose name is "wchar_t". I think the failing case here was "p L'a'", so I suppose it would not necessarily show up with your patch. Anyway I'd appreciate comments on the appended. Julian> $3 = (wchar_t *) 0x85c4 "Sch\x00f6ne Gr\x00fc\x00dfe" It should probably print L"..." :-) Julian> 2. I've probably broken building with iconv disabled (actually I Julian> couldn't figure out how to build without iconv() support -- even for Julian> e.g. a mingw32 host which shouldn't support it). FWIW, my patch goes even further -- I deleted all the existing conversion code in charset.c. I thin it is reasonable to require a working iconv; folks on hosts without iconv can use the capable GNU libiconv. This does make it a little harder to build gdb, but we can write a script to download libiconv and drop it into the src build infrastructure. Julian> Tom Tromey is working on a patch related to this. Yeah. Mine: * Handles input and output of wide characters and strings, and also the new C0X u"" and U"" syntax. * Adds "%ls" and "%lc" to the gdb printf. * Handles all target character sets, in particular variable length encodings are handled. * Auto-selects the appropriate endianness for wide characters on the target. Mine also has a few limitations: * Like your patch, mine doesn't deal with non-C-family languages. I'll probably fix up Java at some point, but I just don't know the others. * I got rid of the apparently undocumented gdb extension '\^c'. The plain form could probably be restored, but the form '\^\242' is a real pain, and IMO not useful enough anyhow. * Getting the list of character sets support by iconv is a pain. Right now I just have a list of dubious provenance (read: iconv -l | sed). Perhaps we can invoke "iconv -l" at startup... eww. Also there is no good way, that I know of, to distinguish between character sets suitable for "target-wide-charset" and the others. Another difference is that I have the intermediate step go through the host wchar_t rather than UCS-4. This is nice because it means we can use iswprint to decide if something is printable. But, it may have limitations, I suppose, on a host where wchar_t is less capable. Julian> OK to apply, or any comments? If you wouldn't mind holding off, my patch is nearing completion. It is feature complete, and at the moment I am writing test cases. I'm happy to send what I have now, if you want to see it. Or it is all in the archer git repository on the tromey-archer-charset branch. I've lifted stuff -- ideas and code -- from your patch, but the result is pretty different. Perhaps we could discuss the areas where we made different decisions and try to plot the best route forward. Tom diff --git a/gdb/dwarf2read.c b/gdb/dwarf2read.c index 4f2f7fb..0d30abc 100644 --- a/gdb/dwarf2read.c +++ b/gdb/dwarf2read.c @@ -2809,6 +2809,7 @@ process_die (struct die_info *die, struct dwarf2_cu *cu) case DW_TAG_base_type: case DW_TAG_subrange_type: + case DW_TAG_typedef: /* Add a typedef symbol for the type definition, if it has a DW_AT_name. */ new_symbol (die, read_type_die (die, cu), cu); diff --git a/gdb/eval.c b/gdb/eval.c index 78d03f5..804e9c4 100644 --- a/gdb/eval.c +++ b/gdb/eval.c @@ -2475,7 +2475,17 @@ evaluate_subexp_standard (struct type *expect_type, if (noside == EVAL_SKIP) goto nosideret; else if (noside == EVAL_AVOID_SIDE_EFFECTS) - return allocate_value (exp->elts[pc + 1].type); + { + struct type *type = exp->elts[pc + 1].type; + /* If this is a typedef, then find its immediate target. We + use check_typedef to resolve stubs, but we ignore its + result because we do not want to dig past all + typedefs. */ + check_typedef (type); + if (TYPE_CODE (type) == TYPE_CODE_TYPEDEF) + type = TYPE_TARGET_TYPE (type); + return allocate_value (type); + } else error (_("Attempt to use a type name as an expression"));