From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 27733 invoked by alias); 13 Sep 2002 14:02:55 -0000 Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sources.redhat.com Received: (qmail 27694 invoked from network); 13 Sep 2002 14:02:55 -0000 Received: from unknown (HELO crack.them.org) (65.125.64.184) by sources.redhat.com with SMTP; 13 Sep 2002 14:02:55 -0000 Received: from nevyn.them.org ([66.93.61.169] ident=mail) by crack.them.org with asmtp (Exim 3.12 #1 (Debian)) id 17pryD-0001sV-00; Fri, 13 Sep 2002 10:02:57 -0500 Received: from drow by nevyn.them.org with local (Exim 3.35 #1 (Debian)) id 17pr2O-0002xZ-00; Fri, 13 Sep 2002 10:03:12 -0400 Date: Fri, 13 Sep 2002 07:02:00 -0000 From: Daniel Jacobowitz To: Jim Blandy Cc: Kevin Buettner , gdb-patches@sources.redhat.com Subject: Re: [PATCH RFC] Character set support Message-ID: <20020913140312.GA10942@nevyn.them.org> Mail-Followup-To: Jim Blandy , Kevin Buettner , gdb-patches@sources.redhat.com References: <1020913003056.ZM15701@localhost.localdomain> <20020913004205.GB19479@nevyn.them.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.1i X-SW-Source: 2002-09/txt/msg00245.txt.bz2 On Thu, Sep 12, 2002 at 10:11:10PM -0500, Jim Blandy wrote: > > > Two comments: > > There's a lot of passing integers around to refer to a character. > > That doesn't make a lot of sense to me; we should either be passing > > char *, so that we can decode multibyte sequences, or using wchar_t > > explicitly and autoconfing for it. > > > > I see hardcoded support for a couple of simplistic charsets; would it > > be worthwhile to add (minimal!) support for UTF-8 in case iconv is not > > available? Gcj is natively UTF-8, and I have some open Debian bug > > reports about this. > > Absolutely --- as I say in the comments to charset.c: > > At the moment, GDB only supports single-byte, stateless character > sets. This includes the ISO-8859 family (ASCII extended with > accented characters, and (I think) Cyrillic, for European > languages), and the EBCDIC family (used on IBM's mainframes). > Unfortunately, it excludes many Asian scripts, the fixed- and > variable-width Unicode encodings, and other desireable things. > Patches are welcome! (For example, it would be nice if the Java > string support could simply get absorbed into some more general > multi-byte encoding support.) > > But it seemed to me that supporting stateless variable-width encodings > was going to be a *lot* of work. Specifically, how the printing code > should change was a bit beyond me. + /* These all suggest that the input or output character sets + have multi-byte encodings of some characters, which means + it's unsuitable for use as a GDB character set. We should + never have selected it. */ Sigh - OK, I see that this can't even use iconv for UTF-8->ASCII. That's a real shame. I have some code which does this so if I get a chance I can try to improve it in GDB; or someone who (unlike me) actually groks iconv can try it... > Regarding `int' vs. `wchar_t': the wchar_t we could detect with > autoconf is a host type. It has no necessary relationship to the > `wchar_t' on the target. LONGEST might be a better choice than `int', > but `wchar_t' is worse. The first part is accurate but not relevant. I'm not suggesting reading wchar_t's from the target; that's not terribly useful a thing to do. You _want_ the host wchar_t. It is a host type capable of holding a wide character; the type changes based on platform and on whether or not the platform actually has wide character support. There's not much you can do if it doesn't, is there? Rather than using iconv, which is meant for converting strings of text, it seemed to me when I wrote the above comments that we should be using mbrtowc/wctomb functions. However, unlike iconv, they appear to operate based on the current locale rather than a specified charset. I suppose they are unsuitable and we'll have to figure out how to use iconv appropriately. -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer