From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-13526-listarch-gdb=sources.redhat.com@sources.redhat.com>
Received: (qmail 16618 invoked by alias); 22 Apr 2003 19:59:31 -0000
Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-subscribe@sources.redhat.com>
List-Archive: <http://sources.redhat.com/ml/gdb/>
List-Post: <mailto:gdb@sources.redhat.com>
List-Help: <mailto:gdb-help@sources.redhat.com>, <http://sources.redhat.com/ml/#faqs>
Sender: gdb-owner@sources.redhat.com
Received: (qmail 16605 invoked from network); 22 Apr 2003 19:59:30 -0000
Received: from unknown (HELO mx1.redhat.com) (66.187.233.31)
  by sources.redhat.com with SMTP; 22 Apr 2003 19:59:30 -0000
Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254])
	by mx1.redhat.com (8.11.6/8.11.6) with ESMTP id h3MJxUD18169
	for <gdb@sources.redhat.com>; Tue, 22 Apr 2003 15:59:30 -0400
Received: from pobox.corp.redhat.com (pobox.corp.redhat.com [172.16.52.156])
	by int-mx1.corp.redhat.com (8.11.6/8.11.6) with ESMTP id h3MJxUq21571
	for <gdb@sources.redhat.com>; Tue, 22 Apr 2003 15:59:30 -0400
Received: from localhost.redhat.com (romulus-int.sfbay.redhat.com [172.16.27.46])
	by pobox.corp.redhat.com (8.11.6/8.11.6) with ESMTP id h3MJxTk32653;
	Tue, 22 Apr 2003 15:59:29 -0400
Received: by localhost.redhat.com (Postfix, from userid 469)
	id 34DAC2C438; Tue, 22 Apr 2003 16:04:03 -0400 (EDT)
From: Elena Zannoni <ezannoni@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <16037.41011.517603.566953@localhost.redhat.com>
Date: Tue, 22 Apr 2003 19:59:00 -0000
To: gdb@sources.redhat.com, jimb@redhat.com
Subject: charset.c problem with non-en_US locales
X-SW-Source: 2003-04/txt/msg00258.txt.bz2


I got a bug report against the gdb in RH Linux which I found
interesting, and since it occurs in FSF gdb as well...

Problem: If you set the locale to Turkish, gdb errors out, complaining
that it cannot find the ISO-8859-1 charset.

[ezannoni@localhost gdb]$ LC_ALL=tr_TR.UTF-8 ./gdb
GDB doesn't know of any character set named `ISO-8859-1'.
No display number 0.
Disabling display 0 to avoid infinite recursion.
[ezannoni@localhost gdb]$ 

[ezannoni@localhost gdb]$ LC_ALL=tr_TR.ISO-8859-9 ./gdb
GDB doesn't know of any character set named `ISO-8859-1'.
No display number 0.
Disabling display 0 to avoid infinite recursion.
[ezannoni@localhost gdb]$


The real problem is in the use of the tolower() function in charset.c
to do a case insensitive comparison between the two strings
"ISO-8859-1" and "iso-8859-1". 

/* Character set names are always compared ignoring case.  */
static int
strcmp_case_insensitive (const char *p, const char *q)
{
  while (*p && *q && tolower (*p) == tolower (*q))
    p++, q++;

  return tolower (*p) - tolower (*q);
}


When the locale is set to Turkish (or any other non-Latin), the
tolower/toupper functions don't work as they would in English.  The
lowercase version of 'I' is not 'i', for instance but some other
chracter ('i' w/o the dot). Indeed the man pages for tolower/toupper
warn about this. strcasecmp() also has the problem.

So, I think the whole case-insensitive approach for the names of the
charsets and the translation tables should probably be removed.  What
was the reason behind it? Was it that the user could type upper/lower
case charset names at the command line? After all the official name is
'ISO' not 'iso'.

This patch works, but I am not confident that this it's enough.

elena


Index: charset.c
===================================================================
RCS file: /cvs/uberbaum/gdb/charset.c,v
retrieving revision 1.3
diff -u -p -r1.3 charset.c
--- charset.c	14 Jan 2003 00:49:03 -0000	1.3
+++ charset.c	22 Apr 2003 19:56:53 -0000
@@ -160,10 +160,13 @@ struct translation {
 static int
 strcmp_case_insensitive (const char *p, const char *q)
 {
-  while (*p && *q && tolower (*p) == tolower (*q))
+#if 0 
+ while (*p && *q && tolower (*p) == tolower (*q))
     p++, q++;
 
   return tolower (*p) - tolower (*q);
+#endif
+  return strcmp (p, q);
 }
 
 
@@ -1207,24 +1210,24 @@ _initialize_charset (void)
   register_charset (simple_charset ("ascii", 1,
                                     ascii_print_literally, 0,
                                     ascii_to_control, 0));
-  register_charset (iso_8859_family_charset ("iso-8859-1"));
+  register_charset (iso_8859_family_charset ("ISO-8859-1"));
   register_charset (ebcdic_family_charset ("ebcdic-us"));
   register_charset (ebcdic_family_charset ("ibm1047"));
   register_iconv_charsets ();
 
   {
     struct { char *from; char *to; int *table; } tlist[] = {
-      { "ascii",      "iso-8859-1", ascii_to_iso_8859_1_table },
+      { "ascii",      "ISO-8859-1", ascii_to_iso_8859_1_table },
       { "ascii",      "ebcdic-us",  ascii_to_ebcdic_us_table },
       { "ascii",      "ibm1047",    ascii_to_ibm1047_table },
-      { "iso-8859-1", "ascii",      iso_8859_1_to_ascii_table },
-      { "iso-8859-1", "ebcdic-us",  iso_8859_1_to_ebcdic_us_table },
-      { "iso-8859-1", "ibm1047",    iso_8859_1_to_ibm1047_table },
+      { "ISO-8859-1", "ascii",      iso_8859_1_to_ascii_table },
+      { "ISO-8859-1", "ebcdic-us",  iso_8859_1_to_ebcdic_us_table },
+      { "ISO-8859-1", "ibm1047",    iso_8859_1_to_ibm1047_table },
       { "ebcdic-us",  "ascii",      ebcdic_us_to_ascii_table },
-      { "ebcdic-us",  "iso-8859-1", ebcdic_us_to_iso_8859_1_table },
+      { "ebcdic-us",  "ISO-8859-1", ebcdic_us_to_iso_8859_1_table },
       { "ebcdic-us",  "ibm1047",    ebcdic_us_to_ibm1047_table },
       { "ibm1047",    "ascii",      ibm1047_to_ascii_table },
-      { "ibm1047",    "iso-8859-1", ibm1047_to_iso_8859_1_table },
+      { "ibm1047",    "ISO-8859-1", ibm1047_to_iso_8859_1_table },
       { "ibm1047",    "ebcdic-us",  ibm1047_to_ebcdic_us_table }
     };