Re: [RFC] Signed/unsigned character arrays are not strings

Mirror of the gdb mailing list
 help / color / mirror / Atom feed

From: Jim Blandy <jimb@codesourcery.com>
To: Mark Kettenis <mark.kettenis@xs4all.nl>
Cc: eliz@gnu.org,  dewar@adacore.com,  nickrob@snap.net.nz,
		  jan.kratochvil@redhat.com,  Mathieu.Lacage@sophia.inria.fr,
		  gdb@sourceware.org
Subject: Re: [RFC] Signed/unsigned character arrays are not strings
Date: Thu, 01 Mar 2007 00:43:00 -0000	[thread overview]
Message-ID: <m3zm6x7pzx.fsf@codesourcery.com> (raw)
In-Reply-To: <20070228134640.GA11261@caradoc.them.org> (Daniel Jacobowitz's message of "Wed, 28 Feb 2007 08:46:40 -0500")


Daniel Jacobowitz <drow@false.org> writes:
> On Tue, Feb 27, 2007 at 11:11:32PM +0100, Mark Kettenis wrote:
>> Anyway, I'm in favour of restore the traditional gdb behaviour of
>> printing all three as strings.
>
> While I haven't seen responses to some of my arguments in favor of the
> new behavior, it's obvious that I'm in the minority here.  So, the
> behavior ought to change.
>
> I was concerned about Jim's patch, but thinking about it some more, I
> would be happy with it (probably with Paul's suggestion).  That's
> because "typedef char *name" is reasonably common, but "typedef char
> name_unit;" is much less so.  What do those who prefer the old
> behavior think of that?

For whatever it's worth, here's a revised patch that implements Paul's
suggestions and has better comments (more accurately apologetic):

gdb/ChangeLog:
2007-02-27  Jim Blandy  <jimb@codesourcery.com>

	* c-valprint.c (textual_element_type): New function.
	(c_val_print): Use textual_element_type to decide whether to print
	arrays, pointer types, and integer types as strings and characters.
	(c_value_print): Doc fix.

Index: gdb/c-valprint.c
===================================================================
RCS file: /cvs/src/src/gdb/c-valprint.c,v
retrieving revision 1.42
diff -u -r1.42 c-valprint.c
--- gdb/c-valprint.c	26 Jan 2007 20:54:16 -0000	1.42
+++ gdb/c-valprint.c	28 Feb 2007 23:09:29 -0000
@@ -56,6 +56,66 @@
 }
 
 
+/* Apply a heuristic to decide whether an array of TYPE or a pointer
+   to TYPE should be printed as a textual string.  Return non-zero if
+   it should, or zero if it should be treated as an array of /pointer
+   to integers.
+
+   It's a shame that we need to use a heuristic here, but C hasn't
+   historically provided distinct types for bytes and characters; you
+   get 'char', which is guaranteed to occupy a byte.  You can put
+   qualifiers on it, but people use the qualified char types for text
+   too, so the qualifier's presence doesn't really tell you anything.
+
+   What's especially a shame here is that the heuristic is fixed.  The
+   user knows which of their types are textual and which are numeric,
+   but we don't give them a way to tell us, beyond using '/s' every time
+   they print something.  */
+static int
+textual_element_type (struct type *type)
+{
+  /* GDB doesn't use TYPE_CODE_CHAR for the C 'char' types; instead,
+     it uses one-byte TYPE_CODE_INT types, with TYPE_NAMEs like
+     "char", "unsigned char", etc. and appropriate flags.  For various
+     reasons, this works out well in some places.  */
+
+
+  struct type *true_type = check_typedef (type);
+
+  /* TYPE_CODE_CHAR is always textual.  But I don't think it ever
+     occurs in C code.  */
+  if (TYPE_CODE (true_type) == TYPE_CODE_CHAR)
+    return 1;
+
+  /* If a one-byte TYPE_CODE_INT has a TYPE_NAME of "char" or
+     something ending with " char", then we treat it as text;
+     otherwise, we assume it's being used as data.  This makes all our
+     SIMD types like builtin_type_v8_int8 and the <stdint.h> types
+     like uint8_t print numerically, but all 'char' types print
+     textually.  Code which says what it means does well.
+     We also recognize types ending in 'char_t'.  */
+  if (TYPE_CODE (true_type) == TYPE_CODE_INT
+      && TYPE_LENGTH (true_type) == 1)
+    {
+      int name_len;
+      
+      /* All integer types should have names.  */
+      gdb_assert (TYPE_NAME (type));
+
+      name_len = strlen (TYPE_NAME (type));
+
+      if (strcmp (TYPE_NAME (type), "char") == 0
+          || (name_len > 5
+              && strcmp (TYPE_NAME (type) + name_len - 5, " char") == 0)
+          || (name_len >= 6
+              && strcmp (TYPE_NAME (type) + name_len - 6, " char_t") == 0))
+        return 1;
+    }
+
+  return 0;
+}
+
+
 /* Print data of type TYPE located at VALADDR (within GDB), which came from
    the inferior at address ADDRESS, onto stdio stream STREAM according to
    FORMAT (a letter or 0 for natural format).  The data at VALADDR is in
@@ -94,11 +154,9 @@
 	    {
 	      print_spaces_filtered (2 + 2 * recurse, stream);
 	    }
-	  /* For an array of chars, print with string syntax.  */
-	  if (eltlen == 1 &&
-	      ((TYPE_CODE (elttype) == TYPE_CODE_INT && TYPE_NOSIGN (elttype))
-	       || ((current_language->la_language == language_m2)
-		   && (TYPE_CODE (elttype) == TYPE_CODE_CHAR)))
+
+	  /* Print arrays of textual chars with a string syntax.  */
+          if (textual_element_type (TYPE_TARGET_TYPE (type))
 	      && (format == 0 || format == 's'))
 	    {
 	      /* If requested, look for the first null char and only print
@@ -191,12 +249,11 @@
 	      deprecated_print_address_numeric (addr, 1, stream);
 	    }
 
-	  /* For a pointer to char or unsigned char, also print the string
+	  /* For a pointer to a textual type, also print the string
 	     pointed to, unless pointer is null.  */
 	  /* FIXME: need to handle wchar_t here... */
 
-	  if (TYPE_LENGTH (elttype) == 1
-	      && TYPE_CODE (elttype) == TYPE_CODE_INT
+	  if (textual_element_type (TYPE_TARGET_TYPE (type))
 	      && (format == 0 || format == 's')
 	      && addr != 0)
 	    {
@@ -398,7 +455,7 @@
 	     Since we don't know whether the value is really intended to
 	     be used as an integer or a character, print the character
 	     equivalent as well. */
-	  if (TYPE_LENGTH (type) == 1)
+	  if (textual_element_type (type))
 	    {
 	      fputs_filtered (" ", stream);
 	      LA_PRINT_CHAR ((unsigned char) unpack_long (type, valaddr + embedded_offset),
@@ -500,7 +557,9 @@
       || TYPE_CODE (type) == TYPE_CODE_REF)
     {
       /* Hack:  remove (char *) for char strings.  Their
-         type is indicated by the quoted string anyway. */
+         type is indicated by the quoted string anyway.
+         (Don't use textual_element_type here; quoted strings
+         are always exactly (char *).  */
       if (TYPE_CODE (type) == TYPE_CODE_PTR
 	  && TYPE_NAME (type) == NULL
 	  && TYPE_NAME (TYPE_TARGET_TYPE (type)) != NULL

next prev parent reply	other threads:[~2007-03-01  0:43 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-24 16:13 Nick Roberts
2007-02-24 20:11 ` Daniel Jacobowitz
2007-02-24 20:53   ` Nick Roberts
2007-02-24 21:07     ` Jan Kratochvil
2007-02-25  8:00       ` Daniel Jacobowitz
2007-02-25 19:54         ` Nick Roberts
2007-02-25 21:07     ` mathieu lacage
2007-02-26  0:45       ` Jan Kratochvil
2007-02-27  7:17         ` Eli Zaretskii
2007-02-27  9:29         ` Daniel Jacobowitz
2007-02-27 12:02           ` Nick Roberts
2007-02-27 17:06             ` Robert Dewar
2007-02-27 18:42               ` Daniel Jacobowitz
2007-02-27 21:53                 ` Eli Zaretskii
2007-02-27 22:12                   ` Daniel Jacobowitz
2007-02-27 22:14                     ` Mark Kettenis
2007-02-28  0:47                       ` Paul Koning
2007-02-28  1:14                       ` Jim Blandy
2007-02-28  1:59                         ` Jim Blandy
2007-02-28  5:26                           ` Nick Roberts
2007-02-28 14:35                       ` Daniel Jacobowitz
2007-03-01  0:43                         ` Jim Blandy [this message]
2007-03-01  0:54                         ` Nick Roberts
2007-02-27 21:47               ` Eli Zaretskii
2007-02-27 22:12             ` Daniel Jacobowitz
2007-04-10 21:59         ` Daniel Jacobowitz
2007-02-28 13:05 pkoning
2007-03-01 11:01 ` Mark Kettenis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m3zm6x7pzx.fsf@codesourcery.com \
    --to=jimb@codesourcery.com \
    --cc=Mathieu.Lacage@sophia.inria.fr \
    --cc=dewar@adacore.com \
    --cc=eliz@gnu.org \
    --cc=gdb@sourceware.org \
    --cc=jan.kratochvil@redhat.com \
    --cc=mark.kettenis@xs4all.nl \
    --cc=nickrob@snap.net.nz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox