From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 21723 invoked by alias); 21 Feb 2006 21:33:36 -0000 Received: (qmail 21706 invoked by uid 22791); 21 Feb 2006 21:33:33 -0000 X-Spam-Check-By: sourceware.org Received: from nevyn.them.org (HELO nevyn.them.org) (66.93.172.17) by sourceware.org (qpsmtpd/0.31.1) with ESMTP; Tue, 21 Feb 2006 21:33:28 +0000 Received: from drow by nevyn.them.org with local (Exim 4.54) id 1FBf8P-0008Vc-0l for gdb@sourceware.org; Tue, 21 Feb 2006 16:33:25 -0500 Date: Wed, 22 Feb 2006 04:30:00 -0000 From: Daniel Jacobowitz To: gdb@sourceware.org Subject: Quoting, backslashes, CLI and MI Message-ID: <20060221213324.GA30729@nevyn.them.org> Mail-Followup-To: gdb@sourceware.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.8i X-IsSubscribed: yes Mailing-List: contact gdb-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sourceware.org X-SW-Source: 2006-02/txt/msg00285.txt.bz2 Prompted by some of Andrew's paths-with-spaces patches, and by another project I'm working on that had to add a var_filename set variable, I've been looking over the way GDB handles quoting of arguments. It's a mess. For most users this is not a big deal; the big losers are (A) pathnames with spaces in them, and (B) pathnames with backslashes in them, e.g. DJGPP and MinGW32 paths. My summary of the current state of commands is after the divider in this message; it's a bit long, so let's do the more interesting bits first: what can we do about this, and where do we want to end up? I think that we want to continue using buildargv-style quoting for CLI commands, and that it would be desirable to use only MI-style quoting for MI commands. Does anyone disagree with this? The fact that the two are somewhat inconsistent is regrettable, but they are nominally independent interfaces. While the Eclipse CDT does have quoting all messed up for commands like -file-exec-file (more or less sort of matching the current state, which is an accident, rather than matching the MI protocol documentation), it doesn't actually use those commands. There may be other cases where the disagreement about backslashes will bite it, but I couldn't come up with anything in a quick survey. The only broken thing I found was -environment-cd; the fix to PR gdb/741 three years ago changed GDB to behave a little more sensibly, but I think there's still trouble on both the GDB and Eclipse sides of the fence, and definitely things don't work if you start Eclipse on a workspace containing a space. So I think that we should take this opportunity to fix up all MI commands to quote like the documentation says they do. I think this would have to go into the unfinished -interpreter=mi3 level and leave -interpreter=mi2 alone, for maximum compatibility. I'd like to fix up all the CLI and "set" commands to use buildargv style quoting, too. "All the set commands" is reasonable and pretty easy to track down. "All the CLI commands" is the next best thing to impossible; it would be a huge amount of work to go through everything that gets registered as a CLI command that might possibly want a filename or other quotable arguments. It'd be nice to do, but I think it's less urgent considering the amount of work involved. And Andrew Stubbs has kindly fixed the worst offenders. So if folks agree with the general ideas I've put together here, I can try to sweep for the affected "set" and MI commands. Andrew Cagney pointed out in a PR that we might need to update the readline filename completion, too. I'm not sure if that's still relevant, it needs a third look. == First, CLI commands. Some of them take arguments literally or in an ad-hoc manner; others use buildargv() to turn the argument string into an argv vector. Buildargv quoting works like this (the implementation is in libiberty/argv.c): - Backslash always escapes the following character. - Single quotes and double quotes escape each other and whitespace. - Unquoted whitespace separates arguments. It's not quite the same as POSIX shell quoting or C quoting, but somewhere in the middle. But it's fairly simple and seems to work. buildargv is used in: exec-file symbol-file file handle interpreter-exec info proc target mips load target sim run (sim) backtrace add-symbol-file maint print symbols maint print psymbols maint print msymbols path directory Some of these that didn't used to be the case; Andrew Stubbs has been hard at work on these. path, directory, and add-symbol-file at least have been recently changed. Most of these which actually take filenames also do tilde expansion. There are probably some important GDB commands that aren't covered on this list; but at least it covers the most frequently used file-related commands (especially "file"). And it's got a reasonably well-defined quoting strategy. I suggest that we continue to use it more and more aggressively. In fact, I'd be happy if we could use it _everywhere_ and have commands take an argv argument instead of an args string. Someone would have to do a study on what commands might be broken by such a change. == Next. I left the "set" commands out of the above; they have their own quoting rules. - var_string variables use C-style escape sequences, but do not strip quote marks. They take the whole rest of the line. There are fairly few of these and I think we could change all six of them. It's not clear that any of them need to handle C-style escape sequences unless someone wants to put control characters in their GDB prompt. All the others are in target-specific functionality, some of which is scheduled to go away. - var_string_noescape is the same, without the C-style escape sequences. I think we could change all of these, also - there's only 8. - var_filename and var_optional_filename are a bit different. var_filename does tilde expansion and var_optional_filename (often used for paths???) does not. Neither eats quotes or escape characters and they both eat to the end of the line. This is messy. There are a sufficiently small number of affected commands that I would recommend changing them all to use buildargv-style quoting, sooner rather than later, and living with the problems. == Third. GDB/MI commands. The MI documentation says: `PARAMETER ==>' `NON-BLANK-SEQUENCE | C-STRING' `NON-BLANK-SEQUENCE ==>' _anything, provided it doesn't contain special characters such as "-", NL, """ and of course " "_ `C-STRING ==>' `""" SEVEN-BIT-ISO-C-STRING-CONTENT """' SEVEN-BIT-ISO-C-STRING-CONTENT is not further defined. The MI implementation does one of two things. For some commands, those with an MI-specific implementation - which is 100% a GDB implementation detail that we really shouldn't be exposing, and don't document, but do somewhat expose - it uses its own quoting rules. These are implemented by mi_parse_argv. Strings surrounded by double quotes get C escape processing. Strings not surrounded by double quotes are split at whitespace and get no further processing - backslashes are passed right through. For commands _without_ an MI implementation, GDB passes everything after -the-mi-command and one or more blank spaces to the equivalent CLI command. Which then parses the arguments however it wants. This is very confusing. The MI documentation suggests that these two should be equivalent: -file-exec-file .\T_MATH.ELF -file-exec-file ".\\T_MATH.ELF" But since this string gets passed straight to the CLI command "file", this isn't true - if you don't use the quotes, you need to say this instead: -file-exec-file .\\T_MATH.ELF Which, incidentally, is how Eclipse quotes MI commands; it's got the quoting rules all buggered up. Argh! -- Daniel Jacobowitz CodeSourcery