From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 23013 invoked by alias); 13 Sep 2002 03:24:48 -0000 Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sources.redhat.com Received: (qmail 23005 invoked from network); 13 Sep 2002 03:24:46 -0000 Received: from unknown (HELO zenia.red-bean.com) (66.244.67.22) by sources.redhat.com with SMTP; 13 Sep 2002 03:24:46 -0000 Received: (from jimb@localhost) by zenia.red-bean.com (8.11.6/8.11.6) id g8D3BAr10609; Thu, 12 Sep 2002 22:11:10 -0500 To: Daniel Jacobowitz Cc: Kevin Buettner , gdb-patches@sources.redhat.com Subject: Re: [PATCH RFC] Character set support References: <1020913003056.ZM15701@localhost.localdomain> <20020913004205.GB19479@nevyn.them.org> From: Jim Blandy Date: Thu, 12 Sep 2002 20:24:00 -0000 In-Reply-To: <20020913004205.GB19479@nevyn.them.org> Message-ID: User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2.90 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-SW-Source: 2002-09/txt/msg00243.txt.bz2 > Two comments: > There's a lot of passing integers around to refer to a character. > That doesn't make a lot of sense to me; we should either be passing > char *, so that we can decode multibyte sequences, or using wchar_t > explicitly and autoconfing for it. > > I see hardcoded support for a couple of simplistic charsets; would it > be worthwhile to add (minimal!) support for UTF-8 in case iconv is not > available? Gcj is natively UTF-8, and I have some open Debian bug > reports about this. Absolutely --- as I say in the comments to charset.c: At the moment, GDB only supports single-byte, stateless character sets. This includes the ISO-8859 family (ASCII extended with accented characters, and (I think) Cyrillic, for European languages), and the EBCDIC family (used on IBM's mainframes). Unfortunately, it excludes many Asian scripts, the fixed- and variable-width Unicode encodings, and other desireable things. Patches are welcome! (For example, it would be nice if the Java string support could simply get absorbed into some more general multi-byte encoding support.) But it seemed to me that supporting stateless variable-width encodings was going to be a *lot* of work. Specifically, how the printing code should change was a bit beyond me. Regarding `int' vs. `wchar_t': the wchar_t we could detect with autoconf is a host type. It has no necessary relationship to the `wchar_t' on the target. LONGEST might be a better choice than `int', but `wchar_t' is worse.