From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 25533 invoked by alias); 13 Sep 2002 17:33:36 -0000 Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sources.redhat.com Received: (qmail 25526 invoked from network); 13 Sep 2002 17:33:36 -0000 Received: from unknown (HELO crack.them.org) (65.125.64.184) by sources.redhat.com with SMTP; 13 Sep 2002 17:33:36 -0000 Received: from nevyn.them.org ([66.93.61.169] ident=mail) by crack.them.org with asmtp (Exim 3.12 #1 (Debian)) id 17pvG3-0002Ax-00; Fri, 13 Sep 2002 13:33:35 -0500 Received: from drow by nevyn.them.org with local (Exim 3.35 #1 (Debian)) id 17puKF-0006hR-00; Fri, 13 Sep 2002 13:33:51 -0400 Date: Fri, 13 Sep 2002 10:33:00 -0000 From: Daniel Jacobowitz To: Jim Blandy Cc: Kevin Buettner , gdb-patches@sources.redhat.com Subject: Re: [PATCH RFC] Character set support Message-ID: <20020913173351.GA25544@nevyn.them.org> Mail-Followup-To: Jim Blandy , Kevin Buettner , gdb-patches@sources.redhat.com References: <1020913003056.ZM15701@localhost.localdomain> <20020913004205.GB19479@nevyn.them.org> <20020913140312.GA10942@nevyn.them.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.1i X-SW-Source: 2002-09/txt/msg00250.txt.bz2 On Fri, Sep 13, 2002 at 12:02:29PM -0500, Jim Blandy wrote: > > Daniel Jacobowitz writes: > > I'm not suggesting reading wchar_t's from the target; that's not > > terribly useful a thing to do. You _want_ the host wchar_t. It is > > a host type capable of holding a wide character; the type changes > > based on platform and on whether or not the platform actually has > > wide character support. > > If you're suggesting using the host's wchar_t to hold characters after > conversion from the target charset to the host charset, then I'm with > you. > > If you're suggesting using the host's wchar_t to hold character values > that have been read from the target, but not yet converted to the > host's charset, then I really disagree. The target's wchar_t could be > 32 bits, while the host's might be 16 bits. Precisely. I was suggesting using host wchar_t after conversion to host format. > > There's not much you can do if it doesn't, is there? > > Print things in hex? When the target->host conversion fails, you > can't just drop the character, but you still need some way to > represent the pre-conversion value in GDB. > > > Rather than using iconv, which is meant for converting strings of > > text, it seemed to me when I wrote the above comments that we should > > be using mbrtowc/wctomb functions. However, unlike iconv, they > > appear to operate based on the current locale rather than a > > specified charset. I suppose they are unsuitable and we'll have to > > figure out how to use iconv appropriately. > > Yes, exactly. If mbrtowc were parameterized with the charset, it > would work pretty well. Yeah. Maybe there's some way we can get the same effect. It would be really nice to be able to handle target-wchar_t variables and figure out what character they're supposed to be! The only standard way to do this seems to be: The behaviour of mbsrtowcs depends on the LC_CTYPE category of the current locale. and that would be too ridiculously slow. I guess iconv is the way to go, then. -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer