From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 21589 invoked by alias); 10 Apr 2007 21:59:56 -0000 Received: (qmail 21580 invoked by uid 22791); 10 Apr 2007 21:59:55 -0000 X-Spam-Check-By: sourceware.org Received: from return.false.org (HELO return.false.org) (66.207.162.98) by sourceware.org (qpsmtpd/0.31) with ESMTP; Tue, 10 Apr 2007 22:59:54 +0100 Received: from return.false.org (localhost [127.0.0.1]) by return.false.org (Postfix) with ESMTP id 9EE424B267; Tue, 10 Apr 2007 16:59:52 -0500 (CDT) Received: from caradoc.them.org (dsl093-172-095.pit1.dsl.speakeasy.net [66.93.172.95]) by return.false.org (Postfix) with ESMTP id 412074B262; Tue, 10 Apr 2007 16:59:52 -0500 (CDT) Received: from drow by caradoc.them.org with local (Exim 4.63) (envelope-from ) id 1HbONT-000271-EU; Tue, 10 Apr 2007 17:59:51 -0400 Date: Tue, 10 Apr 2007 21:59:00 -0000 From: Daniel Jacobowitz To: Jan Kratochvil Cc: mathieu lacage , Nick Roberts , gdb@sourceware.org Subject: Re: [RFC] Signed/unsigned character arrays are not strings Message-ID: <20070410215951.GC6338@caradoc.them.org> Mail-Followup-To: Jan Kratochvil , mathieu lacage , Nick Roberts , gdb@sourceware.org References: <17887.62990.937672.281975@kahikatea.snap.net.nz> <20070224161315.GA27534@caradoc.them.org> <17888.39894.136355.447008@kahikatea.snap.net.nz> <1172390381.2584.18.camel@mathieu> <20070225195350.GA12811@host0.dyn.jankratochvil.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070225195350.GA12811@host0.dyn.jankratochvil.net> User-Agent: Mutt/1.5.15 (2007-04-09) X-IsSubscribed: yes Mailing-List: contact gdb-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sourceware.org X-SW-Source: 2007-04/txt/msg00057.txt.bz2 On Sun, Feb 25, 2007 at 08:53:50PM +0100, Jan Kratochvil wrote: > On Sun, 25 Feb 2007 08:59:41 +0100, mathieu lacage wrote: > ... > > I don't know how useful that is to you but a lot of people (the first > > which comes to my mind is libxml2) decided to use "unsigned char *" to > > identify utf-8 encoded strings in C. > > Together with the attached RMS's response I became more inclined to revert this > change and provide only "$xmm"-specific fix instead (probably for the GDB > int8_t/uint8_t internal types). There was a lot of discussion about how to treat signed char, unsigned char, signed char *, et cetera. There weren't a lot of conclusions, but several people did not like the new behavior, and then discussion trailed off. I don't want to just revert the patch, because the problem that Jan was fixing (unuseful display of $xmm registers) is really quite annoying. I see these options: 1. Make vector types special. Treat arrays of single byte integers as characters, like before, unless they occur in a vector type. This is reasonable, but tricky to implement. 2. Make two special single byte integer types, with a GDB internal "not a char" flag set. Use them for our builtin int8_t and uint8_t. Use these to build types for vector registers. Print all other single byte types from user code as chars or strings. This is similar to #1, a little less helpful, but fairly easy. 3. Treat "char" as a character, but "unsigned char" and "signed char" as numbers (Jan's patch started down this road and Jim's went a bit further). Treat pointers/arrays of char as strings and pointers/arrays of unsigned or signed char as numbers. Add a "/s" flag to the print command that treats single byte types as characters or strings. For example: char str[] = "hi"; unsigned char version[] = "6.5"; (gdb) p version $1 = { 54, 46, 53 } (gdb) p/s version $2 = "6.5" (gdb) p str $3 = "hi" 4. Like #3, except that instead of adding a /s modifier, add a "set" knob. Of course in this case we get to argue about the default value. I think it's important that we resolve this open issue before we release a new version of GDB, so please post which you prefer. I like #3 best, followed by #2; #4 is a good compromise but I worry that we are proliferating knobs that no one ever changes. I'm interested in any other suggestions, though I think we've ruled out guessing based on the type name. -- Daniel Jacobowitz CodeSourcery