From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-61196-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 24163 invoked by alias); 15 Jan 2009 22:16:17 -0000
Received: (qmail 23784 invoked by uid 22791); 15 Jan 2009 22:16:16 -0000
X-SWARE-Spam-Status: No, hits=-2.1 required=5.0 	tests=AWL,BAYES_00,SPF_PASS
X-Spam-Check-By: sourceware.org
Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (65.74.133.4)     by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 15 Jan 2009 22:15:29 +0000
Received: (qmail 11708 invoked from network); 15 Jan 2009 22:15:27 -0000
Received: from unknown (HELO rex.config) (julian@127.0.0.2)   by mail.codesourcery.com with ESMTPA; 15 Jan 2009 22:15:27 -0000
Date: Thu, 15 Jan 2009 22:16:00 -0000
From: Julian Brown <julian@codesourcery.com>
To: Tom Tromey <tromey@redhat.com>
Cc: gdb-patches@sourceware.org
Subject: Re: [PATCH/WIP] C/C++ wchar_t/Unicode printing support
Message-ID: <20090115221523.28c15971@rex.config>
In-Reply-To: <m3r634jneg.fsf@fleche.redhat.com>
References: <20090115202411.5f154657@rex.config> 	<m3r634jneg.fsf@fleche.redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
X-SW-Source: 2009-01/txt/msg00366.txt.bz2

Thanks for the quick reply!

On Thu, 15 Jan 2009 13:59:51 -0700
Tom Tromey <tromey@redhat.com> wrote:

> >>>>> "Julian" == Julian Brown <julian@codesourcery.com> writes:
> 
> Julian> 3. Types which are literally called "wchar_t" are assumed to
> Julian> be wide characters.
> 
> I did something similar -- my patch looks at TYPE_NAME to see if it is
> "wchar_t".  In C, this is a typedef, and so I needed the appended to
> make it work.  Without this patch, lookup_typename will find a
> "wchar_t" symbol whose type has a TYPE_NAME which is not "wchar_t".
> That seemed odd.  The patch changes the dwarf reader so that the
> wchar_t symbol points to a type whose name is "wchar_t".
> 
> I think the failing case here was "p L'a'", so I suppose it would not
> necessarily show up with your patch.

I don't think I'd run across that problem, no...

> Julian> $3 = (wchar_t *) 0x85c4 "Sch\x00f6ne Gr\x00fc\x00dfe"
> 
> It should probably print L"..." :-)

Yeah, true.

> Yeah.  Mine:
> 
> * Handles input and output of wide characters and strings, and also
>   the new C0X u"" and U"" syntax.
> * Adds "%ls" and "%lc" to the gdb printf.

Sounds good.

> * Handles all target character sets, in particular variable length
>   encodings are handled.

My patch is supposed to handle variable-length encodings for target
wide character set -- but that's not tested, so is probably broken :-)

> * Auto-selects the appropriate endianness for wide characters on the
>   target.

Cool.

> * Getting the list of character sets support by iconv is a pain.
>   Right now I just have a list of dubious provenance (read: iconv -l
> | sed).
> 
>   Perhaps we can invoke "iconv -l" at startup... eww.

I ran into this problem too. An earlier version of my patch had this,
in register_iconv_charsets():

  FILE *fh;
  /* Fixed buffers never caused anyone problems did they?  */
  char charset[200];
  int seen_a_charset = 0;
  struct charset *cs;

  fh = popen ("iconv -l", "r");

  if (!fh)
    return 1;

  while (! feof (fh))
    {
      int n = fscanf (fh, " %s/%*s/", &charset[0]);

      if (n != 1)
        break;

      seen_a_charset = 1;

      register_charset (simple_charset (xstrdup (charset), 1, NULL,
                        NULL, NULL, NULL));
    }

  pclose (fh);

  return !seen_a_charset;

...which isn't quite right, but can maybe be adapted into something
which is.

> Another difference is that I have the intermediate step go through the
> host wchar_t rather than UCS-4.  This is nice because it means we can
> use iswprint to decide if something is printable.  But, it may have
> limitations, I suppose, on a host where wchar_t is less capable.

I think that might break for recent win32, where wchar_t is UTF-16 (i.e.
more than one wide character may be needed for a given code point).

> Julian> OK to apply, or any comments?
> 
> If you wouldn't mind holding off, my patch is nearing completion.  It
> is feature complete, and at the moment I am writing test cases.

Sure, I don't mind holding off.

> I'm happy to send what I have now, if you want to see it.  Or it is
> all in the archer git repository on the tromey-archer-charset branch.
> 
> I've lifted stuff -- ideas and code -- from your patch, but the result
> is pretty different.  Perhaps we could discuss the areas where we made
> different decisions and try to plot the best route forward.

OK, I'll have a look, but I'm not sure if I'll have anything sensible to
say :-)

Cheers,

Julian