From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-24114-listarch-gdb=sources.redhat.com@sourceware.org>
Received: (qmail 3403 invoked by alias); 27 Jan 2006 18:57:44 -0000
Received: (qmail 3392 invoked by uid 22791); 27 Jan 2006 18:57:43 -0000
X-Spam-Check-By: sourceware.org
Received: from nevyn.them.org (HELO nevyn.them.org) (66.93.172.17)     by sourceware.org (qpsmtpd/0.31.1) with ESMTP; Fri, 27 Jan 2006 18:57:42 +0000
Received: from drow by nevyn.them.org with local (Exim 4.54) 	id 1F2Ymx-0004RM-LJ 	for gdb@sourceware.org; Fri, 27 Jan 2006 13:57:39 -0500
Date: Sat, 28 Jan 2006 05:24:00 -0000
From: Daniel Jacobowitz <drow@false.org>
To: gdb@sourceware.org
Subject: Re: Using XML in GDB?
Message-ID: <20060127185739.GA16811@nevyn.them.org>
Mail-Followup-To: gdb@sourceware.org
References: <20060126055744.GA29647@nevyn.them.org> <u8xt1efg5.fsf@gnu.org> <20060127180429.GA15726@nevyn.them.org> <u64o5edeh.fsf@gnu.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <u64o5edeh.fsf@gnu.org>
User-Agent: Mutt/1.5.8i
X-IsSubscribed: yes
Mailing-List: contact gdb-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb/>
List-Post: <mailto:gdb@sourceware.org>
List-Help: <mailto:gdb-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-owner@sourceware.org
X-SW-Source: 2006-01/txt/msg00314.txt.bz2

On Fri, Jan 27, 2006 at 08:25:42PM +0200, Eli Zaretskii wrote:
> Can we at least have a pipe-dream list of things we think GDB would
> ideally like to know about targets, and how structured each one of
> them is?

Well, I don't think I can.  I haven't a clue; every time I think I've
got a handle on the set, people come up with creative new ones.  For
instance I hadn't considered that we might want memory-mapped I/O
devices to be explicitly explained to GDB.

> > If we're going to do that, it would be a real shame not to consider
> > localization; most ARM system programmers can probably manage the
> > English names of the registers, but if we want to offer help text,
> > being able to provide it in Japanese is a big win.  So that means
> > character encodings, and in turn that means we need to be somewhat
> > careful with the contents of descriptions.
> 
> That part is something I never understood in your reasoning: XML does
> not do anything special to allow UTF-8, nor help you deal with the
> resulting non-ASCII text on the GDB side.  If the underlying libc
> supports UTF-8, you have that now; if it doesn't, you won't be better
> off even if the target speaks XML.

The mere existance of character encodings isn't the issue; the
issue is encoding free-form text, possibly containing strange
"characters", within a structured element.  In particular, within a
structured element that a client may not recognize and support.

We've got field separators - colon and semicolon in my working copy,
and the status of newlines is fuzzy.  If they may validly occur within
free-form text we need to have an alternate way to escape them. In
ASCII how to do this is quite clear-cut.  In UTF-8 it's a little less
clear-cut although still pretty simple - but it does require knowing
something about the contents of UTF-8 when defining the encoding, if
you want the encoded result to still be valid UTF-8.  And I do, because
otherwise it will become awkward to edit the descriptions in a text
editor.

If you want to optionally support other encodings rather than UTF-8 it
becomes even trickier.  You have to know, eventually, how fields are
encoded.  For us I don't think that's necessary; we can define all
encoded text as UTF-8.  But there's a similar problem if someone wants
to add a descriptive element transfered as a binary blob for some
reason - I don't have an example for this, but I can certainly accept
that someone will come up with one someday.  Maybe bytecode!

XML's already considered this and solved it.  There are defined ways
to express a document's encoding, and to escape characters that
would otherwise serve as syntax elements.  You can store arbitrary text
or byte sequences in an element (e.g OpenDocument).

> > The biggest win of XML, for me, is that there are standard answers to
> > all of these problems and standard tools for editing and
> > checking XML files.
> 
> Is XML the only widely used standard that supports what we want?

I'm sure it isn't, but I think it's the most standardized.  You could
do something similar in an RFC-822 style format, for instance (Header:
value as in email, in case any of the list readers aren't familiar with
RFC-822; it also does handle multiline values, but I'm not sure how it
is on encoded text).

I'm not a die-hard XML advocate.  In fact I've never used it before
for a new project, although I'm fairly familiar with it.  If someone
has an alternate representation that they believe is superior, I'm
listening.  What I want to do, however, is draw the line past which
we should use standardized representations instead of ad-hoc.

-- 
Daniel Jacobowitz
CodeSourcery