From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 6531 invoked by alias); 1 Mar 2007 00:43:24 -0000 Received: (qmail 6522 invoked by uid 22791); 1 Mar 2007 00:43:23 -0000 X-Spam-Check-By: sourceware.org Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (65.74.133.4) by sourceware.org (qpsmtpd/0.31) with ESMTP; Thu, 01 Mar 2007 00:43:17 +0000 Received: (qmail 5204 invoked from network); 1 Mar 2007 00:43:15 -0000 Received: from unknown (HELO localhost) (jimb@127.0.0.2) by mail.codesourcery.com with ESMTPA; 1 Mar 2007 00:43:15 -0000 To: Mark Kettenis Cc: eliz@gnu.org, dewar@adacore.com, nickrob@snap.net.nz, jan.kratochvil@redhat.com, Mathieu.Lacage@sophia.inria.fr, gdb@sourceware.org Subject: Re: [RFC] Signed/unsigned character arrays are not strings References: <17888.39894.136355.447008@kahikatea.snap.net.nz> <1172390381.2584.18.camel@mathieu> <20070225195350.GA12811@host0.dyn.jankratochvil.net> <20070226004457.GA9926@caradoc.them.org> <17892.4014.160191.285423@kahikatea.snap.net.nz> <45E42969.1030007@adacore.com> <20070227131442.GA20718@caradoc.them.org> <20070227215316.GA26262@caradoc.them.org> <200702272211.l1RMBVvI028239@brahms.sibelius.xs4all.nl> <20070228134640.GA11261@caradoc.them.org> From: Jim Blandy Date: Thu, 01 Mar 2007 00:43:00 -0000 In-Reply-To: <20070228134640.GA11261@caradoc.them.org> (Daniel Jacobowitz's message of "Wed, 28 Feb 2007 08:46:40 -0500") Message-ID: User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-IsSubscribed: yes Mailing-List: contact gdb-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sourceware.org X-SW-Source: 2007-03/txt/msg00006.txt.bz2 Daniel Jacobowitz writes: > On Tue, Feb 27, 2007 at 11:11:32PM +0100, Mark Kettenis wrote: >> Anyway, I'm in favour of restore the traditional gdb behaviour of >> printing all three as strings. > > While I haven't seen responses to some of my arguments in favor of the > new behavior, it's obvious that I'm in the minority here. So, the > behavior ought to change. > > I was concerned about Jim's patch, but thinking about it some more, I > would be happy with it (probably with Paul's suggestion). That's > because "typedef char *name" is reasonably common, but "typedef char > name_unit;" is much less so. What do those who prefer the old > behavior think of that? For whatever it's worth, here's a revised patch that implements Paul's suggestions and has better comments (more accurately apologetic): gdb/ChangeLog: 2007-02-27 Jim Blandy * c-valprint.c (textual_element_type): New function. (c_val_print): Use textual_element_type to decide whether to print arrays, pointer types, and integer types as strings and characters. (c_value_print): Doc fix. Index: gdb/c-valprint.c =================================================================== RCS file: /cvs/src/src/gdb/c-valprint.c,v retrieving revision 1.42 diff -u -r1.42 c-valprint.c --- gdb/c-valprint.c 26 Jan 2007 20:54:16 -0000 1.42 +++ gdb/c-valprint.c 28 Feb 2007 23:09:29 -0000 @@ -56,6 +56,66 @@ } +/* Apply a heuristic to decide whether an array of TYPE or a pointer + to TYPE should be printed as a textual string. Return non-zero if + it should, or zero if it should be treated as an array of /pointer + to integers. + + It's a shame that we need to use a heuristic here, but C hasn't + historically provided distinct types for bytes and characters; you + get 'char', which is guaranteed to occupy a byte. You can put + qualifiers on it, but people use the qualified char types for text + too, so the qualifier's presence doesn't really tell you anything. + + What's especially a shame here is that the heuristic is fixed. The + user knows which of their types are textual and which are numeric, + but we don't give them a way to tell us, beyond using '/s' every time + they print something. */ +static int +textual_element_type (struct type *type) +{ + /* GDB doesn't use TYPE_CODE_CHAR for the C 'char' types; instead, + it uses one-byte TYPE_CODE_INT types, with TYPE_NAMEs like + "char", "unsigned char", etc. and appropriate flags. For various + reasons, this works out well in some places. */ + + + struct type *true_type = check_typedef (type); + + /* TYPE_CODE_CHAR is always textual. But I don't think it ever + occurs in C code. */ + if (TYPE_CODE (true_type) == TYPE_CODE_CHAR) + return 1; + + /* If a one-byte TYPE_CODE_INT has a TYPE_NAME of "char" or + something ending with " char", then we treat it as text; + otherwise, we assume it's being used as data. This makes all our + SIMD types like builtin_type_v8_int8 and the types + like uint8_t print numerically, but all 'char' types print + textually. Code which says what it means does well. + We also recognize types ending in 'char_t'. */ + if (TYPE_CODE (true_type) == TYPE_CODE_INT + && TYPE_LENGTH (true_type) == 1) + { + int name_len; + + /* All integer types should have names. */ + gdb_assert (TYPE_NAME (type)); + + name_len = strlen (TYPE_NAME (type)); + + if (strcmp (TYPE_NAME (type), "char") == 0 + || (name_len > 5 + && strcmp (TYPE_NAME (type) + name_len - 5, " char") == 0) + || (name_len >= 6 + && strcmp (TYPE_NAME (type) + name_len - 6, " char_t") == 0)) + return 1; + } + + return 0; +} + + /* Print data of type TYPE located at VALADDR (within GDB), which came from the inferior at address ADDRESS, onto stdio stream STREAM according to FORMAT (a letter or 0 for natural format). The data at VALADDR is in @@ -94,11 +154,9 @@ { print_spaces_filtered (2 + 2 * recurse, stream); } - /* For an array of chars, print with string syntax. */ - if (eltlen == 1 && - ((TYPE_CODE (elttype) == TYPE_CODE_INT && TYPE_NOSIGN (elttype)) - || ((current_language->la_language == language_m2) - && (TYPE_CODE (elttype) == TYPE_CODE_CHAR))) + + /* Print arrays of textual chars with a string syntax. */ + if (textual_element_type (TYPE_TARGET_TYPE (type)) && (format == 0 || format == 's')) { /* If requested, look for the first null char and only print @@ -191,12 +249,11 @@ deprecated_print_address_numeric (addr, 1, stream); } - /* For a pointer to char or unsigned char, also print the string + /* For a pointer to a textual type, also print the string pointed to, unless pointer is null. */ /* FIXME: need to handle wchar_t here... */ - if (TYPE_LENGTH (elttype) == 1 - && TYPE_CODE (elttype) == TYPE_CODE_INT + if (textual_element_type (TYPE_TARGET_TYPE (type)) && (format == 0 || format == 's') && addr != 0) { @@ -398,7 +455,7 @@ Since we don't know whether the value is really intended to be used as an integer or a character, print the character equivalent as well. */ - if (TYPE_LENGTH (type) == 1) + if (textual_element_type (type)) { fputs_filtered (" ", stream); LA_PRINT_CHAR ((unsigned char) unpack_long (type, valaddr + embedded_offset), @@ -500,7 +557,9 @@ || TYPE_CODE (type) == TYPE_CODE_REF) { /* Hack: remove (char *) for char strings. Their - type is indicated by the quoted string anyway. */ + type is indicated by the quoted string anyway. + (Don't use textual_element_type here; quoted strings + are always exactly (char *). */ if (TYPE_CODE (type) == TYPE_CODE_PTR && TYPE_NAME (type) == NULL && TYPE_NAME (TYPE_TARGET_TYPE (type)) != NULL