From: Daniel Jacobowitz <drow@mvista.com>
To: Jim Blandy <jimb@redhat.com>
Cc: Kevin Buettner <kevinb@redhat.com>, gdb-patches@sources.redhat.com
Subject: Re: [PATCH RFC] Character set support
Date: Fri, 13 Sep 2002 07:02:00 -0000 [thread overview]
Message-ID: <20020913140312.GA10942@nevyn.them.org> (raw)
In-Reply-To: <vt2n0qmwpm9.fsf@zenia.red-bean.com>
On Thu, Sep 12, 2002 at 10:11:10PM -0500, Jim Blandy wrote:
>
> > Two comments:
> > There's a lot of passing integers around to refer to a character.
> > That doesn't make a lot of sense to me; we should either be passing
> > char *, so that we can decode multibyte sequences, or using wchar_t
> > explicitly and autoconfing for it.
> >
> > I see hardcoded support for a couple of simplistic charsets; would it
> > be worthwhile to add (minimal!) support for UTF-8 in case iconv is not
> > available? Gcj is natively UTF-8, and I have some open Debian bug
> > reports about this.
>
> Absolutely --- as I say in the comments to charset.c:
>
> At the moment, GDB only supports single-byte, stateless character
> sets. This includes the ISO-8859 family (ASCII extended with
> accented characters, and (I think) Cyrillic, for European
> languages), and the EBCDIC family (used on IBM's mainframes).
> Unfortunately, it excludes many Asian scripts, the fixed- and
> variable-width Unicode encodings, and other desireable things.
> Patches are welcome! (For example, it would be nice if the Java
> string support could simply get absorbed into some more general
> multi-byte encoding support.)
>
> But it seemed to me that supporting stateless variable-width encodings
> was going to be a *lot* of work. Specifically, how the printing code
> should change was a bit beyond me.
+ /* These all suggest that the input or output character sets
+ have multi-byte encodings of some characters, which means
+ it's unsuitable for use as a GDB character set. We should
+ never have selected it. */
Sigh - OK, I see that this can't even use iconv for UTF-8->ASCII.
That's a real shame. I have some code which does this so if I get a
chance I can try to improve it in GDB; or someone who (unlike me)
actually groks iconv can try it...
> Regarding `int' vs. `wchar_t': the wchar_t we could detect with
> autoconf is a host type. It has no necessary relationship to the
> `wchar_t' on the target. LONGEST might be a better choice than `int',
> but `wchar_t' is worse.
The first part is accurate but not relevant. I'm not suggesting
reading wchar_t's from the target; that's not terribly useful a thing
to do. You _want_ the host wchar_t. It is a host type capable of
holding a wide character; the type changes based on platform and on
whether or not the platform actually has wide character support.
There's not much you can do if it doesn't, is there? Rather than using
iconv, which is meant for converting strings of text, it seemed to me
when I wrote the above comments that we should be using mbrtowc/wctomb
functions. However, unlike iconv, they appear to operate based on the
current locale rather than a specified charset. I suppose they are
unsuitable and we'll have to figure out how to use iconv appropriately.
--
Daniel Jacobowitz
MontaVista Software Debian GNU/Linux Developer
next prev parent reply other threads:[~2002-09-13 14:02 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-09-12 17:31 Kevin Buettner
2002-09-12 17:41 ` Daniel Jacobowitz
2002-09-12 20:24 ` Jim Blandy
2002-09-12 20:45 ` Andrew Cagney
2002-09-13 10:05 ` Jim Blandy
2002-09-13 11:39 ` Andrew Cagney
2002-09-13 19:06 ` Jim Blandy
2002-09-16 22:25 ` Andrew Cagney
2002-09-19 12:52 ` Jim Blandy
2002-09-19 13:27 ` Kevin Buettner
2002-09-20 13:49 ` Eli Zaretskii
2002-09-13 7:02 ` Daniel Jacobowitz [this message]
2002-09-13 10:16 ` Jim Blandy
2002-09-13 10:33 ` Daniel Jacobowitz
2002-09-13 11:42 ` Andrew Cagney
2002-09-13 12:11 ` Kevin Buettner
2002-09-13 12:15 ` Daniel Jacobowitz
2002-09-12 17:53 ` Andrew Cagney
2002-09-12 18:08 ` Kevin Buettner
2002-09-19 17:36 ` Kevin Buettner
2002-09-19 17:46 ` Kevin Buettner
2002-09-19 18:12 ` Andrew Cagney
2002-09-20 0:12 ` Kevin Buettner
2002-09-20 9:08 ` Andrew Cagney
2002-09-20 14:58 ` Kevin Buettner
2002-09-20 16:33 ` Andrew Cagney
2002-09-21 1:26 ` Eli Zaretskii
2002-09-20 16:55 ` Kevin Buettner
2002-09-20 8:59 ` Eli Zaretskii
2002-09-20 17:34 ` Kevin Buettner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20020913140312.GA10942@nevyn.them.org \
--to=drow@mvista.com \
--cc=gdb-patches@sources.redhat.com \
--cc=jimb@redhat.com \
--cc=kevinb@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox