[RFC/RFA] Cleaner handling of character entities ?

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

* [RFC/RFA] Cleaner handling of character entities ?
@ 2006-05-05 18:24 Joel Brobecker
  2006-05-05 18:29 ` Daniel Jacobowitz
  0 siblings, 1 reply; 9+ messages in thread
From: Joel Brobecker @ 2006-05-05 18:24 UTC (permalink / raw)
  To: gdb-patches

[-- Attachment #1: Type: text/plain, Size: 2000 bytes --]

Hello,

We are currently working on transitioning from GCC 3.4 to GCC 4.1,
and we found an issue with character types. For a program like this:

        procedure P is
           A : Character := 'A';
        begin
           A := 'B'; --  START
        end;

The debugging info produced for the character is:

        .uleb128 0x2    # (DIE (0x62) DW_TAG_base_type)
        .long   .LASF0  # DW_AT_name: "__unknown__"
        .byte   0x1     # DW_AT_byte_size
        .byte   0x7     # DW_AT_encoding

The DW_AT_name used to be "character", and ada-lang.c was using
the name to identify character types.

After a small discussion with the company engineer for GCC debug
info production, he agreed that the name is wrong, and should be
changed back.

However, he also suggested that the debugger should avoid relying
on the type name, and use the encoding if available. In the case
above, 0x7 is DW_ATE_unsigned but it should be DW_ATE_unsigned_char
(0x8), so he will change that.

I looked at the GDB side and came up with a few changes here and
there that implement his suggestion. I discovered that we rely
quite a bit on the type name to identify characters, and I guessed
that it was historical because of debugging format shortcomings
(with stabs for instance).

I think the attached patch improves the situation in terms of making
things cleaner in the case of dwarf2, without impacting targets that
still use older debugging format like stabs.

2006-05-05  Joel Brobecker  <brobecker@adacore.com>

        * dwarf2read.c (read_base_type): Set code to TYPE_CODE_CHAR
        for char and unsigned char types.
        * ada-lang.c (ada_is_character_type): Always return true if
        the type code is TYPE_CODE_CHAR.
        * c-valprint.c (c_val_print): Print arrays whose element type
        code is TYPE_CODE_CHAR as strings.

Tested on x86-linux, with GCC 3.4 (dwarf2, stabs+), GCC 4.1 (dwarf2).
No regression.

What do you guys think? Wouldn't that be a step forward?

Thanks,
-- 
Joel

[-- Attachment #2: char.diff --]
[-- Type: text/plain, Size: 2725 bytes --]

Index: dwarf2read.c
===================================================================
RCS file: /cvs/src/src/gdb/dwarf2read.c,v
retrieving revision 1.194
diff -u -p -r1.194 dwarf2read.c
--- dwarf2read.c	21 Apr 2006 20:26:07 -0000	1.194
+++ dwarf2read.c	5 May 2006 17:31:53 -0000
@@ -4728,10 +4728,15 @@ read_base_type (struct die_info *die, st
 	  code = TYPE_CODE_FLT;
 	  break;
 	case DW_ATE_signed:
+	  break;
 	case DW_ATE_signed_char:
+          code = TYPE_CODE_CHAR;
 	  break;
 	case DW_ATE_unsigned:
+	  type_flags |= TYPE_FLAG_UNSIGNED;
+	  break;
 	case DW_ATE_unsigned_char:
+          code = TYPE_CODE_CHAR;
 	  type_flags |= TYPE_FLAG_UNSIGNED;
 	  break;
 	default:
Index: ada-lang.c
===================================================================
RCS file: /cvs/src/src/gdb/ada-lang.c,v
retrieving revision 1.84
diff -u -p -r1.84 ada-lang.c
--- ada-lang.c	12 Jan 2006 08:36:29 -0000	1.84
+++ ada-lang.c	5 May 2006 17:30:55 -0000
@@ -7145,10 +7145,15 @@ int
 ada_is_character_type (struct type *type)
 {
   const char *name = ada_type_name (type);
+
+  /* If the type code says it's a character, then assume it really is,
+     and don't check any further.  */
+  if (TYPE_CODE (type) == TYPE_CODE_CHAR)
+    return 1;
+  
   return
     name != NULL
-    && (TYPE_CODE (type) == TYPE_CODE_CHAR
-        || TYPE_CODE (type) == TYPE_CODE_INT
+    && (TYPE_CODE (type) == TYPE_CODE_INT
         || TYPE_CODE (type) == TYPE_CODE_RANGE)
     && (strcmp (name, "character") == 0
         || strcmp (name, "wide_character") == 0
Index: c-valprint.c
===================================================================
RCS file: /cvs/src/src/gdb/c-valprint.c,v
retrieving revision 1.39
diff -u -p -r1.39 c-valprint.c
--- c-valprint.c	18 Jan 2006 21:24:19 -0000	1.39
+++ c-valprint.c	5 May 2006 17:31:16 -0000
@@ -96,9 +96,8 @@ c_val_print (struct type *type, const gd
 	    }
 	  /* For an array of chars, print with string syntax.  */
 	  if (eltlen == 1 &&
-	      ((TYPE_CODE (elttype) == TYPE_CODE_INT)
-	       || ((current_language->la_language == language_m2)
-		   && (TYPE_CODE (elttype) == TYPE_CODE_CHAR)))
+	      (TYPE_CODE (elttype) == TYPE_CODE_INT
+	       || TYPE_CODE (elttype) == TYPE_CODE_CHAR)
 	      && (format == 0 || format == 's'))
 	    {
 	      /* If requested, look for the first null char and only print
@@ -192,7 +191,8 @@ c_val_print (struct type *type, const gd
 	  /* FIXME: need to handle wchar_t here... */
 
 	  if (TYPE_LENGTH (elttype) == 1
-	      && TYPE_CODE (elttype) == TYPE_CODE_INT
+	      && (TYPE_CODE (elttype) == TYPE_CODE_INT
+                  || TYPE_CODE (elttype) == TYPE_CODE_CHAR)
 	      && (format == 0 || format == 's')
 	      && addr != 0)
 	    {

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC/RFA] Cleaner handling of character entities ?
  2006-05-05 18:24 [RFC/RFA] Cleaner handling of character entities ? Joel Brobecker
@ 2006-05-05 18:29 ` Daniel Jacobowitz
  2006-05-05 19:06   ` Joel Brobecker
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Jacobowitz @ 2006-05-05 18:29 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: gdb-patches

On Fri, May 05, 2006 at 11:23:51AM -0700, Joel Brobecker wrote:
> I think the attached patch improves the situation in terms of making
> things cleaner in the case of dwarf2, without impacting targets that
> still use older debugging format like stabs.
> 
> 2006-05-05  Joel Brobecker  <brobecker@adacore.com>
> 
>         * dwarf2read.c (read_base_type): Set code to TYPE_CODE_CHAR
>         for char and unsigned char types.
>         * ada-lang.c (ada_is_character_type): Always return true if
>         the type code is TYPE_CODE_CHAR.
>         * c-valprint.c (c_val_print): Print arrays whose element type
>         code is TYPE_CODE_CHAR as strings.
> 
> Tested on x86-linux, with GCC 3.4 (dwarf2, stabs+), GCC 4.1 (dwarf2).
> No regression.
> 
> What do you guys think? Wouldn't that be a step forward?

I really suspect this impacts the existing producers in some way,
though I don't know exactly how.  You should take a look at the most
recent Modula-2 patch, and the discussion around the previous posting
of it, which contained very similar changes (conditionalized for M2).

-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC/RFA] Cleaner handling of character entities ?
  2006-05-05 18:29 ` Daniel Jacobowitz
@ 2006-05-05 19:06   ` Joel Brobecker
  2006-05-05 19:40     ` Jim Blandy
  0 siblings, 1 reply; 9+ messages in thread
From: Joel Brobecker @ 2006-05-05 19:06 UTC (permalink / raw)
  To: gdb-patches

> > 2006-05-05  Joel Brobecker  <brobecker@adacore.com>
> > 
> >         * dwarf2read.c (read_base_type): Set code to TYPE_CODE_CHAR
> >         for char and unsigned char types.
> >         * ada-lang.c (ada_is_character_type): Always return true if
> >         the type code is TYPE_CODE_CHAR.
> >         * c-valprint.c (c_val_print): Print arrays whose element type
> >         code is TYPE_CODE_CHAR as strings.
> > 
> > Tested on x86-linux, with GCC 3.4 (dwarf2, stabs+), GCC 4.1 (dwarf2).
> > No regression.
> > 
> > What do you guys think? Wouldn't that be a step forward?
> 
> I really suspect this impacts the existing producers in some way,
> though I don't know exactly how.  You should take a look at the most
> recent Modula-2 patch, and the discussion around the previous posting
> of it, which contained very similar changes (conditionalized for M2).

Thanks for the pointer. I think I found the actual discussion, which
started in Feb. Here is the message from Jim expressing these concerns.

        http://sources.redhat.com/ml/gdb-patches/2006-02/msg00396.html

He only says that a discussion about this lead you to stick with INT
as opposed to CHAR for C at least, and the resolution was to use CHAR
only for Modula 2. It would have been nice to have a pointer to the
actual discussion so we could remember the reasons for this, but it's
not important.

It looks like the Modula-2 patch has been approved, but not applied
yet. But I can make the same modification in dwarf2read.c regarding
the type code for character types, so that Ada follows Modula-2's
lead. Then the changes in c-valprint.c become obsolete, I believe.

We would be left with:

        * dwarf2read.c (read_base_type): Set code to TYPE_CODE_CHAR
        for char and unsigned char types of Ada compilation units.
        * ada-lang.c (ada_is_character_type): Always return true if
        the type code is TYPE_CODE_CHAR.

Would that be OK?

Thanks,
-- 
Joel

PS: Testing with -gstabs+ where we do end up with the char type being
    defined as an int did not reveal any regression with both the
    official testsuite and our internal testsuite. Being a curious
    little bugger, I can't help but wondering what the problems are.
    Darn curiosity!


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC/RFA] Cleaner handling of character entities ?
  2006-05-05 19:06   ` Joel Brobecker
@ 2006-05-05 19:40     ` Jim Blandy
  2006-05-05 19:48       ` Daniel Jacobowitz
  0 siblings, 1 reply; 9+ messages in thread
From: Jim Blandy @ 2006-05-05 19:40 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: gdb-patches


Joel Brobecker <brobecker@adacore.com> writes:
> We would be left with:
>
>         * dwarf2read.c (read_base_type): Set code to TYPE_CODE_CHAR
>         for char and unsigned char types of Ada compilation units.
>         * ada-lang.c (ada_is_character_type): Always return true if
>         the type code is TYPE_CODE_CHAR.
>
> Would that be OK?

Yeah, I think that sounds like the right thing.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC/RFA] Cleaner handling of character entities ?
  2006-05-05 19:40     ` Jim Blandy
@ 2006-05-05 19:48       ` Daniel Jacobowitz
  2006-05-05 20:54         ` Joel Brobecker
  2006-05-05 21:47         ` Jim Blandy
  0 siblings, 2 replies; 9+ messages in thread
From: Daniel Jacobowitz @ 2006-05-05 19:48 UTC (permalink / raw)
  To: Jim Blandy; +Cc: Joel Brobecker, gdb-patches

On Fri, May 05, 2006 at 12:40:02PM -0700, Jim Blandy wrote:
> 
> Joel Brobecker <brobecker@adacore.com> writes:
> > We would be left with:
> >
> >         * dwarf2read.c (read_base_type): Set code to TYPE_CODE_CHAR
> >         for char and unsigned char types of Ada compilation units.
> >         * ada-lang.c (ada_is_character_type): Always return true if
> >         the type code is TYPE_CODE_CHAR.
> >
> > Would that be OK?
> 
> Yeah, I think that sounds like the right thing.

Alternatively, do we think we ought to be using TYPE_CODE_CHAR, and if
so, should we try it?

One possible problem: there are three times as many or so references to
TYPE_CODE_INT in tdep files as there are to TYPE_CODE_CHAR so this
might break argument passing for some targets for C.  On those targets
it would probably already be broken for Ada.

-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC/RFA] Cleaner handling of character entities ?
  2006-05-05 19:48       ` Daniel Jacobowitz
@ 2006-05-05 20:54         ` Joel Brobecker
  2006-05-05 21:47         ` Jim Blandy
  1 sibling, 0 replies; 9+ messages in thread
From: Joel Brobecker @ 2006-05-05 20:54 UTC (permalink / raw)
  To: Jim Blandy, gdb-patches

> Alternatively, do we think we ought to be using TYPE_CODE_CHAR, and if
> so, should we try it?

I would be willing to give it a try, and possibly back the change out
if we end up finding some compelling reasons to do so.

> One possible problem: there are three times as many or so references to
> TYPE_CODE_INT in tdep files as there are to TYPE_CODE_CHAR so this
> might break argument passing for some targets for C.  On those targets
> it would probably already be broken for Ada.

You are probably correct. This means I would have to review all uses
of TYPE_CODE_INT in tdep files. There are not that many as far as I
can tell. I think I can review some of them, but many architectures
are alien to me (arm, h8300, m32c, m88k, mn10300, s390).

-- 
Joel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC/RFA] Cleaner handling of character entities ?
  2006-05-05 19:48       ` Daniel Jacobowitz
  2006-05-05 20:54         ` Joel Brobecker
@ 2006-05-05 21:47         ` Jim Blandy
  2006-05-15 19:06           ` Daniel Jacobowitz
  1 sibling, 1 reply; 9+ messages in thread
From: Jim Blandy @ 2006-05-05 21:47 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: gdb-patches


Daniel Jacobowitz <drow@false.org> writes:
> On Fri, May 05, 2006 at 12:40:02PM -0700, Jim Blandy wrote:
>> 
>> Joel Brobecker <brobecker@adacore.com> writes:
>> > We would be left with:
>> >
>> >         * dwarf2read.c (read_base_type): Set code to TYPE_CODE_CHAR
>> >         for char and unsigned char types of Ada compilation units.
>> >         * ada-lang.c (ada_is_character_type): Always return true if
>> >         the type code is TYPE_CODE_CHAR.
>> >
>> > Would that be OK?
>> 
>> Yeah, I think that sounds like the right thing.
>
> Alternatively, do we think we ought to be using TYPE_CODE_CHAR, and if
> so, should we try it?

If we do that, we're effectively signing up to go through GDB and make
CHAR cases behave more like the INT cases.  Which makes that code less
likely to work properly in languages that really do distinguish the
two.

I think we should reserve the TYPE_CODE_INT / TYPE_CODE_CHAR
distinction for use in source languages that really make the
distinction, and let languages where characters are just another kind
of integer use TYPE_CODE_INT for everything.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC/RFA] Cleaner handling of character entities ?
  2006-05-05 21:47         ` Jim Blandy
@ 2006-05-15 19:06           ` Daniel Jacobowitz
  2007-12-24  8:00             ` Joel Brobecker
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Jacobowitz @ 2006-05-15 19:06 UTC (permalink / raw)
  To: Jim Blandy; +Cc: Joel Brobecker, gdb-patches

On Fri, May 05, 2006 at 02:47:00PM -0700, Jim Blandy wrote:
> 
> Daniel Jacobowitz <drow@false.org> writes:
> > On Fri, May 05, 2006 at 12:40:02PM -0700, Jim Blandy wrote:
> >> 
> >> Joel Brobecker <brobecker@adacore.com> writes:
> >> > We would be left with:
> >> >
> >> >         * dwarf2read.c (read_base_type): Set code to TYPE_CODE_CHAR
> >> >         for char and unsigned char types of Ada compilation units.
> >> >         * ada-lang.c (ada_is_character_type): Always return true if
> >> >         the type code is TYPE_CODE_CHAR.
> >> >
> >> > Would that be OK?
> >> 
> >> Yeah, I think that sounds like the right thing.
> >
> > Alternatively, do we think we ought to be using TYPE_CODE_CHAR, and if
> > so, should we try it?
> 
> If we do that, we're effectively signing up to go through GDB and make
> CHAR cases behave more like the INT cases.  Which makes that code less
> likely to work properly in languages that really do distinguish the
> two.

Or more likely to work properly.  Because C does not treat them the
same, the TYPE_CODE_CHAR case is fairly likely to be bitrotten anyway.

> I think we should reserve the TYPE_CODE_INT / TYPE_CODE_CHAR
> distinction for use in source languages that really make the
> distinction, and let languages where characters are just another kind
> of integer use TYPE_CODE_INT for everything.

Anyway, this is fine by me.  However, in tdep files, we don't want to
make language distinctions.  So, Joel, I think that a patch along the
lines of that changelog entry above is probably the way to go, but then
you may want to audit uses of TYPE_CODE_INT and TYPE_CODE_CHAR in
backends or else calling Ada procedures which take chars may not work
well.

[Sounds like a new testcase, doesn't it?]

-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC/RFA] Cleaner handling of character entities ?
  2006-05-15 19:06           ` Daniel Jacobowitz
@ 2007-12-24  8:00             ` Joel Brobecker
  0 siblings, 0 replies; 9+ messages in thread
From: Joel Brobecker @ 2007-12-24  8:00 UTC (permalink / raw)
  To: gdb-patches

[-- Attachment #1: Type: text/plain, Size: 1058 bytes --]

This is an old patch that, for some reason, I forgot to commit:

> So, Joel, I think that a patch along the lines of that changelog entry
> above is probably the way to go,

So I took the opportunity to add some comments, and do a minor
reformatting...

2007-12-24  Joel Brobecker  <brobecker@adacore.com>

        * dwarf2read.c (read_base_type): Set code to TYPE_CODE_CHAR
        for char and unsigned char types of Ada compilation units.
        * ada-lang.c (ada_is_character_type): Always return true if
        the type code is TYPE_CODE_CHAR.

Tested on x86-linux with DWARF and stabs. Checked in.

You also said:

> you may want to audit uses of TYPE_CODE_INT and TYPE_CODE_CHAR in
> backends or else calling Ada procedures which take chars may not work
> well.
> [Sounds like a new testcase, doesn't it?]

That's a good idea. I will write one ASAP.

Incidentally, that reminded me of a recent change that Daniel made
regarding handling of (unsigned?) characters. Perhaps one day we
won't have to set TYPE_CODE_CHAR only for Ada and M2...

-- 
Joel

[-- Attachment #2: char-ada.diff --]
[-- Type: text/plain, Size: 2200 bytes --]

Index: dwarf2read.c
===================================================================
RCS file: /cvs/src/src/gdb/dwarf2read.c,v
retrieving revision 1.243
diff -u -p -r1.243 dwarf2read.c
--- dwarf2read.c	22 Dec 2007 20:58:30 -0000	1.243
+++ dwarf2read.c	24 Dec 2007 04:35:03 -0000
@@ -5007,11 +5007,11 @@ read_base_type (struct die_info *die, st
 	type_flags |= TYPE_FLAG_UNSIGNED;
 	break;
       case DW_ATE_signed_char:
-	if (cu->language == language_m2)
+	if (cu->language == language_ada && cu->language == language_m2)
 	  code = TYPE_CODE_CHAR;
 	break;
       case DW_ATE_unsigned_char:
-	if (cu->language == language_m2)
+	if (cu->language == language_ada && cu->language == language_m2)
 	  code = TYPE_CODE_CHAR;
 	type_flags |= TYPE_FLAG_UNSIGNED;
 	break;
Index: ada-lang.c
===================================================================
RCS file: /cvs/src/src/gdb/ada-lang.c,v
retrieving revision 1.110
diff -u -p -r1.110 ada-lang.c
--- ada-lang.c	21 Dec 2007 11:50:11 -0000	1.110
+++ ada-lang.c	24 Dec 2007 04:35:28 -0000
@@ -7286,15 +7286,22 @@ value_val_atr (struct type *type, struct
 int
 ada_is_character_type (struct type *type)
 {
-  const char *name = ada_type_name (type);
-  return
-    name != NULL
-    && (TYPE_CODE (type) == TYPE_CODE_CHAR
-        || TYPE_CODE (type) == TYPE_CODE_INT
-        || TYPE_CODE (type) == TYPE_CODE_RANGE)
-    && (strcmp (name, "character") == 0
-        || strcmp (name, "wide_character") == 0
-        || strcmp (name, "unsigned char") == 0);
+  const char *name;
+
+  /* If the type code says it's a character, then assume it really is,
+     and don't check any further.  */
+  if (TYPE_CODE (type) == TYPE_CODE_CHAR)
+    return 1;
+  
+  /* Otherwise, assume it's a character type iff it is a discrete type
+     with a known character type name.  */
+  name = ada_type_name (type);
+  return (name != NULL
+          && (TYPE_CODE (type) == TYPE_CODE_INT
+              || TYPE_CODE (type) == TYPE_CODE_RANGE)
+          && (strcmp (name, "character") == 0
+              || strcmp (name, "wide_character") == 0
+              || strcmp (name, "unsigned char") == 0));
 }
 
 /* True if TYPE appears to be an Ada string type.  */

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-12-24  6:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-05-05 18:24 [RFC/RFA] Cleaner handling of character entities ? Joel Brobecker
2006-05-05 18:29 ` Daniel Jacobowitz
2006-05-05 19:06   ` Joel Brobecker
2006-05-05 19:40     ` Jim Blandy
2006-05-05 19:48       ` Daniel Jacobowitz
2006-05-05 20:54         ` Joel Brobecker
2006-05-05 21:47         ` Jim Blandy
2006-05-15 19:06           ` Daniel Jacobowitz
2007-12-24  8:00             ` Joel Brobecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox