From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 28637 invoked by alias); 13 Sep 2002 03:45:57 -0000 Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sources.redhat.com Received: (qmail 28626 invoked from network); 13 Sep 2002 03:45:55 -0000 Received: from unknown (HELO localhost.redhat.com) (24.112.240.27) by sources.redhat.com with SMTP; 13 Sep 2002 03:45:55 -0000 Received: from ges.redhat.com (localhost [127.0.0.1]) by localhost.redhat.com (Postfix) with ESMTP id 1829B3C44; Thu, 12 Sep 2002 23:45:43 -0400 (EDT) Message-ID: <3D815F66.4030605@ges.redhat.com> Date: Thu, 12 Sep 2002 20:45:00 -0000 From: Andrew Cagney User-Agent: Mozilla/5.0 (X11; U; NetBSD macppc; en-US; rv:1.0.0) Gecko/20020824 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Jim Blandy Cc: Daniel Jacobowitz , Kevin Buettner , gdb-patches@sources.redhat.com Subject: Re: [PATCH RFC] Character set support References: <1020913003056.ZM15701@localhost.localdomain> <20020913004205.GB19479@nevyn.them.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-SW-Source: 2002-09/txt/msg00244.txt.bz2 >> Two comments: >> There's a lot of passing integers around to refer to a character. >> That doesn't make a lot of sense to me; we should either be passing >> char *, so that we can decode multibyte sequences, or using wchar_t >> explicitly and autoconfing for it. >> >> I see hardcoded support for a couple of simplistic charsets; would it >> be worthwhile to add (minimal!) support for UTF-8 in case iconv is not >> available? Gcj is natively UTF-8, and I have some open Debian bug >> reports about this. > > > Absolutely --- as I say in the comments to charset.c: > > At the moment, GDB only supports single-byte, stateless character > sets. This includes the ISO-8859 family (ASCII extended with > accented characters, and (I think) Cyrillic, for European > languages), and the EBCDIC family (used on IBM's mainframes). > Unfortunately, it excludes many Asian scripts, the fixed- and > variable-width Unicode encodings, and other desireable things. > Patches are welcome! (For example, it would be nice if the Java > string support could simply get absorbed into some more general > multi-byte encoding support.) I think this should be mentioned in the documentation. Andrew > But it seemed to me that supporting stateless variable-width encodings > was going to be a *lot* of work. Specifically, how the printing code > should change was a bit beyond me. > > Regarding `int' vs. `wchar_t': the wchar_t we could detect with > autoconf is a host type. It has no necessary relationship to the > `wchar_t' on the target. LONGEST might be a better choice than `int', > but `wchar_t' is worse. >