From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-18570-listarch-gdb-patches=sourceware.cygnus.com@sources.redhat.com>
Received: (qmail 28637 invoked by alias); 13 Sep 2002 03:45:57 -0000
Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-patches-subscribe@sources.redhat.com>
List-Archive: <http://sources.redhat.com/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sources.redhat.com>
List-Help: <mailto:gdb-patches-help@sources.redhat.com>, <http://sources.redhat.com/ml/#faqs>
Sender: gdb-patches-owner@sources.redhat.com
Received: (qmail 28626 invoked from network); 13 Sep 2002 03:45:55 -0000
Received: from unknown (HELO localhost.redhat.com) (24.112.240.27)
  by sources.redhat.com with SMTP; 13 Sep 2002 03:45:55 -0000
Received: from ges.redhat.com (localhost [127.0.0.1])
	by localhost.redhat.com (Postfix) with ESMTP
	id 1829B3C44; Thu, 12 Sep 2002 23:45:43 -0400 (EDT)
Message-ID: <3D815F66.4030605@ges.redhat.com>
Date: Thu, 12 Sep 2002 20:45:00 -0000
From: Andrew Cagney <ac131313@ges.redhat.com>
User-Agent: Mozilla/5.0 (X11; U; NetBSD macppc; en-US; rv:1.0.0) Gecko/20020824
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Jim Blandy <jimb@redhat.com>
Cc: Daniel Jacobowitz <drow@mvista.com>,
	Kevin Buettner <kevinb@redhat.com>, gdb-patches@sources.redhat.com
Subject: Re: [PATCH RFC] Character set support
References: <1020913003056.ZM15701@localhost.localdomain>	<20020913004205.GB19479@nevyn.them.org> <vt2n0qmwpm9.fsf@zenia.red-bean.com>
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
X-SW-Source: 2002-09/txt/msg00244.txt.bz2

>> Two comments:
>>   There's a lot of passing integers around to refer to a character. 
>> That doesn't make a lot of sense to me; we should either be passing
>> char *, so that we can decode multibyte sequences, or using wchar_t
>> explicitly and autoconfing for it.
>> 
>>   I see hardcoded support for a couple of simplistic charsets; would it
>> be worthwhile to add (minimal!) support for UTF-8 in case iconv is not
>> available?  Gcj is natively UTF-8, and I have some open Debian bug
>> reports about this.
> 
> 
> Absolutely --- as I say in the comments to charset.c:
> 
>    At the moment, GDB only supports single-byte, stateless character
>    sets.  This includes the ISO-8859 family (ASCII extended with
>    accented characters, and (I think) Cyrillic, for European
>    languages), and the EBCDIC family (used on IBM's mainframes).
>    Unfortunately, it excludes many Asian scripts, the fixed- and
>    variable-width Unicode encodings, and other desireable things.
>    Patches are welcome!  (For example, it would be nice if the Java
>    string support could simply get absorbed into some more general
>    multi-byte encoding support.)

I think this should be mentioned in the documentation.

Andrew

> But it seemed to me that supporting stateless variable-width encodings
> was going to be a *lot* of work.  Specifically, how the printing code
> should change was a bit beyond me.
> 
> Regarding `int' vs. `wchar_t': the wchar_t we could detect with
> autoconf is a host type.  It has no necessary relationship to the
> `wchar_t' on the target.  LONGEST might be a better choice than `int',
> but `wchar_t' is worse.
>