From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-18569-listarch-gdb-patches=sourceware.cygnus.com@sources.redhat.com>
Received: (qmail 23013 invoked by alias); 13 Sep 2002 03:24:48 -0000
Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-patches-subscribe@sources.redhat.com>
List-Archive: <http://sources.redhat.com/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sources.redhat.com>
List-Help: <mailto:gdb-patches-help@sources.redhat.com>, <http://sources.redhat.com/ml/#faqs>
Sender: gdb-patches-owner@sources.redhat.com
Received: (qmail 23005 invoked from network); 13 Sep 2002 03:24:46 -0000
Received: from unknown (HELO zenia.red-bean.com) (66.244.67.22)
  by sources.redhat.com with SMTP; 13 Sep 2002 03:24:46 -0000
Received: (from jimb@localhost)
	by zenia.red-bean.com (8.11.6/8.11.6) id g8D3BAr10609;
	Thu, 12 Sep 2002 22:11:10 -0500
To: Daniel Jacobowitz <drow@mvista.com>
Cc: Kevin Buettner <kevinb@redhat.com>, gdb-patches@sources.redhat.com
Subject: Re: [PATCH RFC] Character set support
References: <1020913003056.ZM15701@localhost.localdomain>
	<20020913004205.GB19479@nevyn.them.org>
From: Jim Blandy <jimb@redhat.com>
Date: Thu, 12 Sep 2002 20:24:00 -0000
In-Reply-To: <20020913004205.GB19479@nevyn.them.org>
Message-ID: <vt2n0qmwpm9.fsf@zenia.red-bean.com>
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2.90
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-SW-Source: 2002-09/txt/msg00243.txt.bz2


> Two comments:
>   There's a lot of passing integers around to refer to a character. 
> That doesn't make a lot of sense to me; we should either be passing
> char *, so that we can decode multibyte sequences, or using wchar_t
> explicitly and autoconfing for it.
> 
>   I see hardcoded support for a couple of simplistic charsets; would it
> be worthwhile to add (minimal!) support for UTF-8 in case iconv is not
> available?  Gcj is natively UTF-8, and I have some open Debian bug
> reports about this.

Absolutely --- as I say in the comments to charset.c:

   At the moment, GDB only supports single-byte, stateless character
   sets.  This includes the ISO-8859 family (ASCII extended with
   accented characters, and (I think) Cyrillic, for European
   languages), and the EBCDIC family (used on IBM's mainframes).
   Unfortunately, it excludes many Asian scripts, the fixed- and
   variable-width Unicode encodings, and other desireable things.
   Patches are welcome!  (For example, it would be nice if the Java
   string support could simply get absorbed into some more general
   multi-byte encoding support.)

But it seemed to me that supporting stateless variable-width encodings
was going to be a *lot* of work.  Specifically, how the printing code
should change was a bit beyond me.

Regarding `int' vs. `wchar_t': the wchar_t we could detect with
autoconf is a host type.  It has no necessary relationship to the
`wchar_t' on the target.  LONGEST might be a better choice than `int',
but `wchar_t' is worse.