From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 3403 invoked by alias); 27 Jan 2006 18:57:44 -0000 Received: (qmail 3392 invoked by uid 22791); 27 Jan 2006 18:57:43 -0000 X-Spam-Check-By: sourceware.org Received: from nevyn.them.org (HELO nevyn.them.org) (66.93.172.17) by sourceware.org (qpsmtpd/0.31.1) with ESMTP; Fri, 27 Jan 2006 18:57:42 +0000 Received: from drow by nevyn.them.org with local (Exim 4.54) id 1F2Ymx-0004RM-LJ for gdb@sourceware.org; Fri, 27 Jan 2006 13:57:39 -0500 Date: Sat, 28 Jan 2006 05:24:00 -0000 From: Daniel Jacobowitz To: gdb@sourceware.org Subject: Re: Using XML in GDB? Message-ID: <20060127185739.GA16811@nevyn.them.org> Mail-Followup-To: gdb@sourceware.org References: <20060126055744.GA29647@nevyn.them.org> <20060127180429.GA15726@nevyn.them.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.8i X-IsSubscribed: yes Mailing-List: contact gdb-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sourceware.org X-SW-Source: 2006-01/txt/msg00314.txt.bz2 On Fri, Jan 27, 2006 at 08:25:42PM +0200, Eli Zaretskii wrote: > Can we at least have a pipe-dream list of things we think GDB would > ideally like to know about targets, and how structured each one of > them is? Well, I don't think I can. I haven't a clue; every time I think I've got a handle on the set, people come up with creative new ones. For instance I hadn't considered that we might want memory-mapped I/O devices to be explicitly explained to GDB. > > If we're going to do that, it would be a real shame not to consider > > localization; most ARM system programmers can probably manage the > > English names of the registers, but if we want to offer help text, > > being able to provide it in Japanese is a big win. So that means > > character encodings, and in turn that means we need to be somewhat > > careful with the contents of descriptions. > > That part is something I never understood in your reasoning: XML does > not do anything special to allow UTF-8, nor help you deal with the > resulting non-ASCII text on the GDB side. If the underlying libc > supports UTF-8, you have that now; if it doesn't, you won't be better > off even if the target speaks XML. The mere existance of character encodings isn't the issue; the issue is encoding free-form text, possibly containing strange "characters", within a structured element. In particular, within a structured element that a client may not recognize and support. We've got field separators - colon and semicolon in my working copy, and the status of newlines is fuzzy. If they may validly occur within free-form text we need to have an alternate way to escape them. In ASCII how to do this is quite clear-cut. In UTF-8 it's a little less clear-cut although still pretty simple - but it does require knowing something about the contents of UTF-8 when defining the encoding, if you want the encoded result to still be valid UTF-8. And I do, because otherwise it will become awkward to edit the descriptions in a text editor. If you want to optionally support other encodings rather than UTF-8 it becomes even trickier. You have to know, eventually, how fields are encoded. For us I don't think that's necessary; we can define all encoded text as UTF-8. But there's a similar problem if someone wants to add a descriptive element transfered as a binary blob for some reason - I don't have an example for this, but I can certainly accept that someone will come up with one someday. Maybe bytecode! XML's already considered this and solved it. There are defined ways to express a document's encoding, and to escape characters that would otherwise serve as syntax elements. You can store arbitrary text or byte sequences in an element (e.g OpenDocument). > > The biggest win of XML, for me, is that there are standard answers to > > all of these problems and standard tools for editing and > > checking XML files. > > Is XML the only widely used standard that supports what we want? I'm sure it isn't, but I think it's the most standardized. You could do something similar in an RFC-822 style format, for instance (Header: value as in email, in case any of the list readers aren't familiar with RFC-822; it also does handle multiline values, but I'm not sure how it is on encoded text). I'm not a die-hard XML advocate. In fact I've never used it before for a new project, although I'm fairly familiar with it. If someone has an alternate representation that they believe is superior, I'm listening. What I want to do, however, is draw the line past which we should use standardized representations instead of ad-hoc. -- Daniel Jacobowitz CodeSourcery