From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-80464-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 23260 invoked by alias); 20 Apr 2011 07:59:58 -0000
Received: (qmail 23251 invoked by uid 22791); 20 Apr 2011 07:59:56 -0000
X-SWARE-Spam-Status: No, hits=-1.5 required=5.0	tests=AWL,BAYES_00,MSGID_MULTIPLE_AT
X-Spam-Check-By: sourceware.org
Received: from mailhost.u-strasbg.fr (HELO mailhost.u-strasbg.fr) (130.79.200.153)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 20 Apr 2011 07:59:41 +0000
Received: from md2.u-strasbg.fr (md2.u-strasbg.fr [IPv6:2001:660:2402::187])           by mailhost.u-strasbg.fr (8.14.3/jtpda-5.5pre1) with ESMTP id p3K7xXVW078545           ; Wed, 20 Apr 2011 09:59:33 +0200 (CEST) (envelope-from pierre.muller@ics-cnrs.unistra.fr)
Received: from mailserver.u-strasbg.fr (ms1.u-strasbg.fr [130.79.204.10])           by md2.u-strasbg.fr (8.14.4/jtpda-5.5pre1) with ESMTP id p3K7xWhY023716           ; Wed, 20 Apr 2011 09:59:32 +0200 (CEST) (envelope-from pierre.muller@ics-cnrs.unistra.fr)
Received: from E6510Muller (gw-ics.u-strasbg.fr [130.79.210.225]) (user=mullerp mech=LOGIN)          by mailserver.u-strasbg.fr (8.14.4/jtpda-5.5pre1) with ESMTP id p3K7xVho070814 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO)          ; Wed, 20 Apr 2011 09:59:32 +0200 (CEST) (envelope-from pierre.muller@ics-cnrs.unistra.fr)
From: "Pierre Muller" <pierre.muller@ics-cnrs.unistra.fr>
To: "'Tom Tromey'" <tromey@redhat.com>
Cc: <gdb-patches@sourceware.org>
References: <5928.31498147479$1302882967@news.gmane.org>	<m3ei53cres.fsf@fleche.redhat.com>	<005101cbfc50$193136b0$4b93a410$%muller@ics-cnrs.unistra.fr>	<20110416162455.GA5599@host1.jankratochvil.net>	<000001cbfc7d$3f67f440$be37dcc0$%muller@ics-cnrs.unistra.fr>	<83zknpoacd.fsf@gnu.org> <21014.6501930014$1303139687@news.gmane.org>	<m3zknn7a2v.fsf@fleche.redhat.com>	<34716.7311156683$1303204711@news.gmane.org>	<m3fwpe5qg1.fsf@fleche.redhat.com>	<16656.7281041809$1303221408@news.gmane.org> <m3wriq3zbu.fsf@fleche.redhat.com>
In-Reply-To: <m3wriq3zbu.fsf@fleche.redhat.com>
Subject: RE: [RFC-v5] Handle cygwin wchar_t specifics
Date: Wed, 20 Apr 2011 07:59:00 -0000
Message-ID: <000201cbff30$e2266030$a6732090$@muller@ics-cnrs.unistra.fr>
MIME-Version: 1.0
Content-Type: text/plain;	charset="us-ascii"
Content-Transfer-Encoding: 7bit
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
X-SW-Source: 2011-04/txt/msg00346.txt.bz2

> Tom> I don't think DEFAULT_INTERMEDIATE_ENCODING is needed.
> 
> Pierre>   I assumed you ment: not necessary if PHONY_ICONV is defined,
> Pierre> and this is what I changed below.
> Pierre> (I would personally have favored to completely remove
> Pierre> INTERMEDIATE_ENCODING macro and call the function directly.)
> 
> Sorry, that isn't what I meant.
  Hopefully I got it right this time...
 
> All this new code is needed only in the __STDC_ISO_10646__ case.
> All other cases are already handled ok.
> So, I think it is best to only introduce new code along the
> __STDC_ISO_10646__ branches.  Thus far your patches have touched all the
> other branches -- but there is no reason to do that, and I think it just
> makes it more complicated without an associated benefit.
> 
> Pierre> +#ifdef __STDC_ISO_10646__
> Pierre> +  if (sizeof (gdb_wchar_t) == 2)
> 
> You might as well unify the 2 and 4 byte cases like I said earlier, and
> just die for any other value.
Done below.
>  You can use a static assert trick to make
> it die during compilation, which I think is better than dying at
> runtime.  E.g.:
> extern char your_platform_is_bogus[(sizeof (gdb_wchar_t) == 2
>                                     || sizeof (gdb_wchar_t) == 4)
>                                     ? 1 : -1];
Used below (renamed your_gdb_wchar_t_is_bogus).
 
> Pierre> +      /* Check that the name is in the list of handled charsets.
> */
> Pierre> +      for (i = 0; charset_enum[i]; i++)
> 
> I don't think this is really needed either.
> Or, if you really want to do the check, do it by calling iconv_open at
> initialization, and then just make gdb die early -- whatever platform
> does this is really messed up.
 Also done below. 
> Pierre> +  /* if gdb_wchar_t is not of size 2, or if "UTF-16XE" and "UCS-
> 2XE" are
> Pierre> +     not known, use DEFAULT_INTERMEDIATE_ENCODING macro.  */
> Pierre> +  return DEFAULT_INTERMEDIATE_ENCODING;
> 
> I don't think this will generally do the right thing.
> For example, your patch defines DEFAULT_INTERMEDIATE_ENCODING to
> "UCS-4LE" in the !WORDS_BIGENDIAN case.  But we already know that
> gdb_wchar_t has 2 bytes.  So I think this will just result in the same
> bug as today.

  I hope I now understood what you wanted:
the new code makes less changes to gdb_wchar_t.
It only uses intermediate_encoding function in the case where UCS-4LE/BE
where set before.
  To avoid having this code compiled in other cases,
I defined a new macro called USE_INTERMEDIATE_ENCODING_FUNCTION
and charset.c code changes are limited to this conditional.
  I used iconv_open to check for working charset names
and added a call to error if none is found.

  Comments?

Pierre 


2011-04-20  Pierre Muller  <muller@ics.u-strasbg.fr>

	* gdb_wchar.h (USE_INTERMEDIATE_ENCODING_FUNCTION): New macro.
	(INTERMEDIATE_ENCODING): Change value to intermediate_encoding
	function call if __STDC_ISO_10646__ macro is defined.
	(intermediate_encoding): New prototype.
	* charset.c (your_gdb_wchar_t_is_bogus): New test variable
	to generate compile time error for unsupported gdb_wchar_t
	size.
	(ENDIAN_SUFFIX): New macro.
	(intermediate_encoding): New function.
	

Index: charset.c
===================================================================
RCS file: /cvs/src/src/gdb/charset.c,v
retrieving revision 1.43
diff -u -p -r1.43 charset.c
--- charset.c	11 Jan 2011 15:10:01 -0000	1.43
+++ charset.c	20 Apr 2011 07:48:21 -0000
@@ -922,6 +922,70 @@ default_auto_wide_charset (void)
   return GDB_DEFAULT_TARGET_WIDE_CHARSET;
 }
 
+
+#ifdef USE_INTERMEDIATE_ENCODING_FUNCTION
+/* Macro used for UTF or UCS endianness suffix.  */
+#if WORDS_BIGENDIAN
+#define ENDIAN_SUFFIX "BE"
+#else
+#define ENDIAN_SUFFIX "LE"
+#endif
+
+/* The code below serves to generate a compile time error if
+   gdb_wchar_t type is not of size 2 nor 4, despite the fact that
+   macro __STDC_ISO_10646__ is defined.
+   This is better than a gdb_assert call, because GDB cannot handle
+   strings correctly if this size is different.  */
+
+static char your_gdb_wchar_t_is_bogus[(sizeof (gdb_wchar_t) == 2
+				       || sizeof (gdb_wchar_t) == 4)
+				      ? 1 : -1];
+
+/* intermediate_encoding returns the charset unsed internally by
+   GDB to convert between target and host encodings. As the test above
+   compiled, sizeof (gdb_wchar_t) is either 2 or 4 bytes.
+   UTF-16/32 is tested first, UCS-2/4 is tested as a second option,
+   otherwise an error is generated.  */
+
+const char *
+intermediate_encoding (void)
+{
+  iconv_t desc;
+  static const char *stored_result = NULL;
+  const char *result;
+  int i;
+
+  if (stored_result)
+    return stored_result;
+  result = xstrprintf ("UTF-%d%s", sizeof (gdb_wchar_t) * 8,
ENDIAN_SUFFIX);
+  /* Check that the name is supported by iconv_open.  */
+  desc = iconv_open (result, host_charset ());
+  if (desc != (iconv_t) -1)
+    {
+      iconv_close (desc);
+      stored_result = result;
+      return result;
+    }
+  /* Not valid, free the allocated memory.  */
+  xfree ((void *) result);
+  /* Second try, with UCS-2 type.  */
+  result = xstrprintf ("UCS-%d%s", sizeof (gdb_wchar_t), ENDIAN_SUFFIX);
+  /* Check that the name is supported by iconv_open.  */
+  desc = iconv_open (result, host_charset ());
+  if (desc != (iconv_t) -1)
+    {
+      iconv_close (desc);
+      stored_result = result;
+      return result;
+    }
+  /* Not valid, free the allocated memory.  */
+  xfree ((void *) result);
+  /* No valid charset found, generate error here.  */
+  error ("Unable to find a vaild charset for string conversions");
+}
+
+#endif /* USE_INTERMEDIATE_ENCODING_FUNCTION */
+
 void
 _initialize_charset (void)
 {
Index: gdb_wchar.h
===================================================================
RCS file: /cvs/src/src/gdb/gdb_wchar.h,v
retrieving revision 1.6
diff -u -p -r1.6 gdb_wchar.h
--- gdb_wchar.h	1 Jan 2011 15:33:05 -0000	1.6
+++ gdb_wchar.h	20 Apr 2011 07:48:21 -0000
@@ -78,11 +78,10 @@ typedef wint_t gdb_wint_t;
    iconv_open.  We put the endianness into the encoding name to avoid
    hosts that emit a BOM when the unadorned name is used.  */
 #if defined (__STDC_ISO_10646__)
-#if WORDS_BIGENDIAN
-#define INTERMEDIATE_ENCODING "UCS-4BE"
-#else
-#define INTERMEDIATE_ENCODING "UCS-4LE"
-#endif
+#define USE_INTERMEDIATE_ENCODING_FUNCTION
+#define INTERMEDIATE_ENCODING intermediate_encoding ()
+const char *intermediate_encoding (void);
+
 #elif defined (_LIBICONV_VERSION) && _LIBICONV_VERSION >= 0x108
 #define INTERMEDIATE_ENCODING "wchar_t"
 #else