From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 8661 invoked by alias); 7 May 2010 17:26:01 -0000 Received: (qmail 8563 invoked by uid 22791); 7 May 2010 17:25:57 -0000 X-SWARE-Spam-Status: No, hits=-1.9 required=5.0 tests=AWL,BAYES_00,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from aussmtpmrkpc120.us.dell.com (HELO aussmtpmrkpc120.us.dell.com) (143.166.82.159) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 07 May 2010 17:25:53 +0000 X-Loopcount0: from 12.110.134.31 Received: from unknown (HELO M31.equallogic.com) ([12.110.134.31]) by aussmtpmrkpc120.us.dell.com with SMTP; 07 May 2010 12:25:52 -0500 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Subject: RE: [RFC] Make string printing work on NetBSD (iconv issue) Date: Fri, 07 May 2010 17:26:00 -0000 Message-ID: In-Reply-To: References: <19424.30941.651367.946330@pkoning-laptop.equallogic.com> From: "Paul Koning" To: Cc: X-IsSubscribed: yes Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2010-05/txt/msg00188.txt.bz2 > >>>>> "Paul" =3D=3D Paul Koning writes: >=20 > Paul> The attached patch fixes this by having configure pick a suitable > Paul> codeset name to use. "wchar_t" is used if available, otherwise > ucs-2 > Paul> or ucs-4 with the appropriate byte order suffix is used instead. >=20 > This will yield incorrect results unless the chosen intermediate > charset > is actually the one used for wchar_t. Tom, thanks for your feedback. Yes, it clearly depends on picking the correct codeset. If there were a foolproof way to determine what that codeset is, that would be the best answer. I could not find one. My reasoning is that UCS-n for n byte wchar_t is a likely answer, so while it may be wrong for some platforms (at least in theory) it will also be right for some, hopefully for most. It clearly can't make matters worse, because any platform that doesn't have the codeset name "wchar_t" currently doesn't work at all.=20 =20 > Note that if this is the case for UCS-4, then your platform headers > ought to define __STDC_ISO_10646__. So, you could test that in > gdb_wchar.h rather than do any configury. NetBSD clearly is using UCS-4 for wchar_t, but it does not define that symbol. =20 > Alternatively, it is always safe to fall back to the code that uses > narrow intermediate characters and host_charset for the intermediate > encoding. Yes, but doesn't that mean you end up not being able to accurately print a wide string if one occurs in your program -- because it gets mapped to the intermediate encoding first and with narrow chars for intermediate coding you have a lossy translation? =20 > Perhaps this "wchar_t" thing is not the best way for us to go. Maybe > better would be to test __STDC_ISO_10646__ and fall back to narrow > chars > in all other cases.=20 That sounds attractive. But given that __STDC_ISO_10646__ isn't defined in NetBSD even though it clearly supports wide chars and knows about ucs-4, it doesn't seem to be workable. > Other approaches are available too, but they are generally more work > than simply using GNU libiconv. Right, if you use libiconv then the issue goes away, and the patch I wrote should handle that case cleanly. I wanted to offer a solution to people who don't want to install libiconv because they have a functional iconv in libc, as is the case for NetBSD. paul