From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 31962 invoked by alias); 16 Sep 2009 18:56:15 -0000 Received: (qmail 31954 invoked by uid 22791); 16 Sep 2009 18:56:15 -0000 X-SWARE-Spam-Status: No, hits=-2.1 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_66,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 16 Sep 2009 18:56:09 +0000 Received: from int-mx08.intmail.prod.int.phx2.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.21]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id n8GIttFt016659; Wed, 16 Sep 2009 14:55:55 -0400 Received: from ns3.rdu.redhat.com (ns3.rdu.redhat.com [10.11.255.199]) by int-mx08.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id n8GItsnV027422; Wed, 16 Sep 2009 14:55:54 -0400 Received: from opsy.redhat.com (ovpn01.gateway.prod.ext.phx2.redhat.com [10.5.9.1]) by ns3.rdu.redhat.com (8.13.8/8.13.8) with ESMTP id n8GItrwq001550; Wed, 16 Sep 2009 14:55:53 -0400 Received: by opsy.redhat.com (Postfix, from userid 500) id A4D0837819C; Wed, 16 Sep 2009 12:55:52 -0600 (MDT) From: Tom Tromey To: "Alexey Feldgendler" Cc: gdb-patches@sourceware.org Subject: Re: Default target wide character set References: Reply-To: tromey@redhat.com Date: Wed, 16 Sep 2009 18:56:00 -0000 In-Reply-To: (Alexey Feldgendler's message of "Tue, 15 Sep 2009 16:11:56 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2009-09/txt/msg00524.txt.bz2 >>>>> "Alexey" == Alexey Feldgendler writes: Alexey> I got assigned part-time to contribute to gdb, mostly by fixing Alexey> bugs that affect us, but also to implement new features. Welcome to GDB. I don't know your copyright assignment situation, but if you are planning to submit patches, it doesn't hurt to get started on that early. Send me email off-list if you want to do this. Alexey> A. Have the default target wide character set depend on the size of Alexey> the type named wchar_t. Alexey> Side question: how does gdb figure out sizeof(wchar_t)? Does it come Alexey> from the symbol table or from elsewhere? Yeah, look in c-lang.c for a call to lookup_typename with an argument of "wchar_t". The resulting type can be queried for its attributes. Alexey> B. Have charset_for_string_type() check after calling Alexey> target_wide_charset() whether the width of the returned character set Alexey> matches the width of the actual string type, and use fallback similar Alexey> to what's done for C_STRING_16 and C_STRING_32 if it doesn't. Alexey> What do you think of options A and B? Or is there maybe another Alexey> possiblity that I'm overlooking? Yeah, I think there is another solution. It is pretty similar to these, though. The general problem is that the relevant standards put very few constraints on wchar_t. There is no guarantee that they use any form of UCS -- and there are systems which in fact do not. Therefore, if the user picks some random target wide charset, I think we ought to honor his request. Another wrinkle is that there are no good ways to determine any characteristics of character sets. This simply isn't part of any standardized API (we could of course implement our own database for this... but I was not motivated to do so). What this means is that we can also do very little error checking in practice -- if the target uses UCS-4, but the user says "set target-wide-charset SJIS", well, he will get nonsense in response, with no warning from GDB. What I would propose doing is adding a new charset named "UCS". If this is selected as the target wide charset, then we would automatically pick UCS-2 or UCS-4 depending on sizeof(target wchar_t). This would probably mean having a few special cases in the code (like we do for the -BE and -LE variants). We would then make this the default target wide charset. What do you think of that? Tom