From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 5153 invoked by alias); 27 Feb 2007 21:53:35 -0000 Received: (qmail 5145 invoked by uid 22791); 27 Feb 2007 21:53:34 -0000 X-Spam-Check-By: sourceware.org Received: from nevyn.them.org (HELO nevyn.them.org) (66.93.172.17) by sourceware.org (qpsmtpd/0.31.1) with ESMTP; Tue, 27 Feb 2007 21:53:28 +0000 Received: from dsl093-172-095.pit1.dsl.speakeasy.net ([66.93.172.95] helo=caradoc.them.org) by nevyn.them.org with esmtp (Exim 4.63) (envelope-from ) id 1HMAG6-0008PG-07; Tue, 27 Feb 2007 16:53:18 -0500 Received: from drow by caradoc.them.org with local (Exim 4.63) (envelope-from ) id 1HMAG5-0007Ky-4a; Tue, 27 Feb 2007 16:53:17 -0500 Date: Tue, 27 Feb 2007 22:12:00 -0000 From: Daniel Jacobowitz To: Eli Zaretskii Cc: dewar@adacore.com, nickrob@snap.net.nz, jan.kratochvil@redhat.com, Mathieu.Lacage@sophia.inria.fr, gdb@sourceware.org Subject: Re: [RFC] Signed/unsigned character arrays are not strings Message-ID: <20070227215316.GA26262@caradoc.them.org> Mail-Followup-To: Eli Zaretskii , dewar@adacore.com, nickrob@snap.net.nz, jan.kratochvil@redhat.com, Mathieu.Lacage@sophia.inria.fr, gdb@sourceware.org References: <17887.62990.937672.281975@kahikatea.snap.net.nz> <20070224161315.GA27534@caradoc.them.org> <17888.39894.136355.447008@kahikatea.snap.net.nz> <1172390381.2584.18.camel@mathieu> <20070225195350.GA12811@host0.dyn.jankratochvil.net> <20070226004457.GA9926@caradoc.them.org> <17892.4014.160191.285423@kahikatea.snap.net.nz> <45E42969.1030007@adacore.com> <20070227131442.GA20718@caradoc.them.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.13 (2006-08-11) X-IsSubscribed: yes Mailing-List: contact gdb-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sourceware.org X-SW-Source: 2007-02/txt/msg00283.txt.bz2 On Tue, Feb 27, 2007 at 11:06:17PM +0200, Eli Zaretskii wrote: > Doesn't a similar situation exist with "unsigned int" and "int", or > with "unsigned long" and "long"? And yet we don't treat them > differently. > > IOW, I think it's quite expected that explicit signedness is > relatively rare, since in the vast majority of cases it is simply not > needed. Interpreting this phenomenon as saying something about what > kind of data is stored is not necessarily a good idea. I feel that this is different for two reasons. One is that the situation for int and long is not the same, because "int" and "signed int" are the same type in C - but "char" and "signed char" are not. Char is explicitly of indeterminate sign. The other is that there is a widespread use of "char" for string data and "signed char" or "unsigned char" for non-string data. Of course, the first reason is a matter of standards and the second is only a matter of my feeling and fumbling around with search engines. > > I know that as a GDB developer, debugging GDB, I'd want explicitly > > signed or unsigned characters to be printed as data > > That is indeed one reason to use unsigned char. But there is another, > as demonstrated by Emacs's Lisp_String type: to store non-ASCII > characters whose upper bit might be set. And in those latter cases, > we do want the data displayed as text, not as numeric codes. Yes, there are good counterexamples. Though I believe emacs also stores some numeric non-character data in its strings (isn't there a length or kind byte?). Plus, this gets dangerously close to support for explicitly printing strings of different character sets and encodings - UTF-8 support is requested once a year or so. Anyway, that's a project for another week :-) > > This is user interface, not core > > functionality. It's more like clarifying the text of one of GCC's > > warning messages than changing the dialect of C it accepts. I think > > we have a lot of freedom to adapt our default output to be more useful > > to our users, especially when we provide a way to get the old > > behavior. > > The issue is precisely that it is controversial whether the proposed > output is necessarily more useful to the user. It is clearly more > useful in some cases, but not in the others. Yes, this is the part I think is really important. I've provided what information I can to support the fact that the new behavior is more useful: - it's right for debugging GDB itself - it's right for processor vector registers supported by GDB - it seems to be right more often than not, given my abuse of CodeSearch. But this is fuzzy. I don't know how to find out more. We have no way to poll users. I tried polling my coworkers, and got only agreement that strings are usually stored in char * and not in unsigned char *. There's some concrete reasons to do that, too. GCC in some configurations warns about passing an unsigned char * to a function expecting a char * - functions like strlen. As you know, some applications have disabled this warning because they disagree or because they agree but it would be too much labor to clean up; but I know of plenty of projects that are fine with the new warning. > > In this case that method is even completely backwards compatible. > > ??? Now _I_ would like to ask for explanations. Do you mean the cast > to "char *" trick? if so, that's not backward compatibility, because > existing scripts in .gdbinit files need to be modified to get back > past behavior. Other way round: using (char *) results would result in a backwards compatible .gdbinit, because it would work with both old and new versions of GDB. Anyway, if we end up leaving the change in I will try to clarify the manual. There is almost nothing in it now about printing strings using "print". -- Daniel Jacobowitz CodeSourcery