From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 5555 invoked by alias); 5 May 2006 18:24:07 -0000 Received: (qmail 5546 invoked by uid 22791); 5 May 2006 18:24:06 -0000 X-Spam-Check-By: sourceware.org Received: from nile.gnat.com (HELO nile.gnat.com) (205.232.38.5) by sourceware.org (qpsmtpd/0.31) with ESMTP; Fri, 05 May 2006 18:23:59 +0000 Received: from localhost (localhost [127.0.0.1]) by filtered-nile.gnat.com (Postfix) with ESMTP id 6610D48CBDB for ; Fri, 5 May 2006 14:23:53 -0400 (EDT) Received: from nile.gnat.com ([127.0.0.1]) by localhost (nile.gnat.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 27388-01-8 for ; Fri, 5 May 2006 14:23:53 -0400 (EDT) Received: from takamaka.act-europe.fr (s142-179-108-108.bc.hsia.telus.net [142.179.108.108]) by nile.gnat.com (Postfix) with ESMTP id CD35448CBDA for ; Fri, 5 May 2006 14:23:52 -0400 (EDT) Received: by takamaka.act-europe.fr (Postfix, from userid 507) id 09D4647E7F; Fri, 5 May 2006 11:23:52 -0700 (PDT) Date: Fri, 05 May 2006 18:24:00 -0000 From: Joel Brobecker To: gdb-patches@sources.redhat.com Subject: [RFC/RFA] Cleaner handling of character entities ? Message-ID: <20060505182351.GK1109@adacore.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="BOKacYhQ+x31HxR3" Content-Disposition: inline User-Agent: Mutt/1.4i Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2006-05/txt/msg00077.txt.bz2 --BOKacYhQ+x31HxR3 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-length: 2000 Hello, We are currently working on transitioning from GCC 3.4 to GCC 4.1, and we found an issue with character types. For a program like this: procedure P is A : Character := 'A'; begin A := 'B'; -- START end; The debugging info produced for the character is: .uleb128 0x2 # (DIE (0x62) DW_TAG_base_type) .long .LASF0 # DW_AT_name: "__unknown__" .byte 0x1 # DW_AT_byte_size .byte 0x7 # DW_AT_encoding The DW_AT_name used to be "character", and ada-lang.c was using the name to identify character types. After a small discussion with the company engineer for GCC debug info production, he agreed that the name is wrong, and should be changed back. However, he also suggested that the debugger should avoid relying on the type name, and use the encoding if available. In the case above, 0x7 is DW_ATE_unsigned but it should be DW_ATE_unsigned_char (0x8), so he will change that. I looked at the GDB side and came up with a few changes here and there that implement his suggestion. I discovered that we rely quite a bit on the type name to identify characters, and I guessed that it was historical because of debugging format shortcomings (with stabs for instance). I think the attached patch improves the situation in terms of making things cleaner in the case of dwarf2, without impacting targets that still use older debugging format like stabs. 2006-05-05 Joel Brobecker * dwarf2read.c (read_base_type): Set code to TYPE_CODE_CHAR for char and unsigned char types. * ada-lang.c (ada_is_character_type): Always return true if the type code is TYPE_CODE_CHAR. * c-valprint.c (c_val_print): Print arrays whose element type code is TYPE_CODE_CHAR as strings. Tested on x86-linux, with GCC 3.4 (dwarf2, stabs+), GCC 4.1 (dwarf2). No regression. What do you guys think? Wouldn't that be a step forward? Thanks, -- Joel --BOKacYhQ+x31HxR3 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="char.diff" Content-length: 2725 Index: dwarf2read.c =================================================================== RCS file: /cvs/src/src/gdb/dwarf2read.c,v retrieving revision 1.194 diff -u -p -r1.194 dwarf2read.c --- dwarf2read.c 21 Apr 2006 20:26:07 -0000 1.194 +++ dwarf2read.c 5 May 2006 17:31:53 -0000 @@ -4728,10 +4728,15 @@ read_base_type (struct die_info *die, st code = TYPE_CODE_FLT; break; case DW_ATE_signed: + break; case DW_ATE_signed_char: + code = TYPE_CODE_CHAR; break; case DW_ATE_unsigned: + type_flags |= TYPE_FLAG_UNSIGNED; + break; case DW_ATE_unsigned_char: + code = TYPE_CODE_CHAR; type_flags |= TYPE_FLAG_UNSIGNED; break; default: Index: ada-lang.c =================================================================== RCS file: /cvs/src/src/gdb/ada-lang.c,v retrieving revision 1.84 diff -u -p -r1.84 ada-lang.c --- ada-lang.c 12 Jan 2006 08:36:29 -0000 1.84 +++ ada-lang.c 5 May 2006 17:30:55 -0000 @@ -7145,10 +7145,15 @@ int ada_is_character_type (struct type *type) { const char *name = ada_type_name (type); + + /* If the type code says it's a character, then assume it really is, + and don't check any further. */ + if (TYPE_CODE (type) == TYPE_CODE_CHAR) + return 1; + return name != NULL - && (TYPE_CODE (type) == TYPE_CODE_CHAR - || TYPE_CODE (type) == TYPE_CODE_INT + && (TYPE_CODE (type) == TYPE_CODE_INT || TYPE_CODE (type) == TYPE_CODE_RANGE) && (strcmp (name, "character") == 0 || strcmp (name, "wide_character") == 0 Index: c-valprint.c =================================================================== RCS file: /cvs/src/src/gdb/c-valprint.c,v retrieving revision 1.39 diff -u -p -r1.39 c-valprint.c --- c-valprint.c 18 Jan 2006 21:24:19 -0000 1.39 +++ c-valprint.c 5 May 2006 17:31:16 -0000 @@ -96,9 +96,8 @@ c_val_print (struct type *type, const gd } /* For an array of chars, print with string syntax. */ if (eltlen == 1 && - ((TYPE_CODE (elttype) == TYPE_CODE_INT) - || ((current_language->la_language == language_m2) - && (TYPE_CODE (elttype) == TYPE_CODE_CHAR))) + (TYPE_CODE (elttype) == TYPE_CODE_INT + || TYPE_CODE (elttype) == TYPE_CODE_CHAR) && (format == 0 || format == 's')) { /* If requested, look for the first null char and only print @@ -192,7 +191,8 @@ c_val_print (struct type *type, const gd /* FIXME: need to handle wchar_t here... */ if (TYPE_LENGTH (elttype) == 1 - && TYPE_CODE (elttype) == TYPE_CODE_INT + && (TYPE_CODE (elttype) == TYPE_CODE_INT + || TYPE_CODE (elttype) == TYPE_CODE_CHAR) && (format == 0 || format == 's') && addr != 0) { --BOKacYhQ+x31HxR3--