From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-27899-listarch-gdb=sources.redhat.com@sourceware.org>
Received: (qmail 5153 invoked by alias); 27 Feb 2007 21:53:35 -0000
Received: (qmail 5145 invoked by uid 22791); 27 Feb 2007 21:53:34 -0000
X-Spam-Check-By: sourceware.org
Received: from nevyn.them.org (HELO nevyn.them.org) (66.93.172.17)     by sourceware.org (qpsmtpd/0.31.1) with ESMTP; Tue, 27 Feb 2007 21:53:28 +0000
Received: from dsl093-172-095.pit1.dsl.speakeasy.net ([66.93.172.95] helo=caradoc.them.org) 	by nevyn.them.org with esmtp (Exim 4.63) 	(envelope-from <drow@false.org>) 	id 1HMAG6-0008PG-07; Tue, 27 Feb 2007 16:53:18 -0500
Received: from drow by caradoc.them.org with local (Exim 4.63) 	(envelope-from <drow@caradoc.them.org>) 	id 1HMAG5-0007Ky-4a; Tue, 27 Feb 2007 16:53:17 -0500
Date: Tue, 27 Feb 2007 22:12:00 -0000
From: Daniel Jacobowitz <drow@false.org>
To: Eli Zaretskii <eliz@gnu.org>
Cc: dewar@adacore.com, nickrob@snap.net.nz, jan.kratochvil@redhat.com, 	Mathieu.Lacage@sophia.inria.fr, gdb@sourceware.org
Subject: Re: [RFC] Signed/unsigned character arrays are not strings
Message-ID: <20070227215316.GA26262@caradoc.them.org>
Mail-Followup-To: Eli Zaretskii <eliz@gnu.org>, dewar@adacore.com, 	nickrob@snap.net.nz, jan.kratochvil@redhat.com, 	Mathieu.Lacage@sophia.inria.fr, gdb@sourceware.org
References: <17887.62990.937672.281975@kahikatea.snap.net.nz> <20070224161315.GA27534@caradoc.them.org> <17888.39894.136355.447008@kahikatea.snap.net.nz> <1172390381.2584.18.camel@mathieu> <20070225195350.GA12811@host0.dyn.jankratochvil.net> <20070226004457.GA9926@caradoc.them.org> <17892.4014.160191.285423@kahikatea.snap.net.nz> <45E42969.1030007@adacore.com> <20070227131442.GA20718@caradoc.them.org> <ulkij2tva.fsf@gnu.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <ulkij2tva.fsf@gnu.org>
User-Agent: Mutt/1.5.13 (2006-08-11)
X-IsSubscribed: yes
Mailing-List: contact gdb-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb.sourceware.org>
List-Subscribe: <mailto:gdb-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb/>
List-Post: <mailto:gdb@sourceware.org>
List-Help: <mailto:gdb-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-owner@sourceware.org
X-SW-Source: 2007-02/txt/msg00283.txt.bz2

On Tue, Feb 27, 2007 at 11:06:17PM +0200, Eli Zaretskii wrote:
> Doesn't a similar situation exist with "unsigned int" and "int", or
> with "unsigned long" and "long"?  And yet we don't treat them
> differently.
> 
> IOW, I think it's quite expected that explicit signedness is
> relatively rare, since in the vast majority of cases it is simply not
> needed.  Interpreting this phenomenon as saying something about what
> kind of data is stored is not necessarily a good idea.

I feel that this is different for two reasons.  One is that the
situation for int and long is not the same, because "int" and "signed
int" are the same type in C - but "char" and "signed char" are not.
Char is explicitly of indeterminate sign.  The other is that there is
a widespread use of "char" for string data and "signed char" or
"unsigned char" for non-string data.

Of course, the first reason is a matter of standards and the second is
only a matter of my feeling and fumbling around with search engines.

> > I know that as a GDB developer, debugging GDB, I'd want explicitly
> > signed or unsigned characters to be printed as data
> 
> That is indeed one reason to use unsigned char.  But there is another,
> as demonstrated by Emacs's Lisp_String type: to store non-ASCII
> characters whose upper bit might be set.  And in those latter cases,
> we do want the data displayed as text, not as numeric codes.

Yes, there are good counterexamples.  Though I believe emacs also
stores some numeric non-character data in its strings (isn't there a
length or kind byte?).  Plus, this gets dangerously close to support
for explicitly printing strings of different character sets and
encodings - UTF-8 support is requested once a year or so.

Anyway, that's a project for another week :-)

> > This is user interface, not core
> > functionality.  It's more like clarifying the text of one of GCC's
> > warning messages than changing the dialect of C it accepts.  I think
> > we have a lot of freedom to adapt our default output to be more useful
> > to our users, especially when we provide a way to get the old
> > behavior.
> 
> The issue is precisely that it is controversial whether the proposed
> output is necessarily more useful to the user.  It is clearly more
> useful in some cases, but not in the others.

Yes, this is the part I think is really important.  I've provided what
information I can to support the fact that the new behavior is more
useful:

  - it's right for debugging GDB itself
  - it's right for processor vector registers supported by GDB
  - it seems to be right more often than not, given my abuse of
    CodeSearch.

But this is fuzzy.  I don't know how to find out more.  We have no way
to poll users.  I tried polling my coworkers, and got only agreement
that strings are usually stored in char * and not in unsigned char *.

There's some concrete reasons to do that, too.  GCC in some
configurations warns about passing an unsigned char * to a function
expecting a char * - functions like strlen.  As you know, some
applications have disabled this warning because they disagree or
because they agree but it would be too much labor to clean up; but I
know of plenty of projects that are fine with the new warning.

> > In this case that method is even completely backwards compatible.
> 
> ??? Now _I_ would like to ask for explanations.  Do you mean the cast
> to "char *" trick? if so, that's not backward compatibility, because
> existing scripts in .gdbinit files need to be modified to get back
> past behavior.

Other way round: using (char *) results would result in a backwards
compatible .gdbinit, because it would work with both old and new
versions of GDB.

Anyway, if we end up leaving the change in I will try to clarify the
manual.  There is almost nothing in it now about printing strings
using "print".

-- 
Daniel Jacobowitz
CodeSourcery