From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16618 invoked by alias); 22 Apr 2003 19:59:31 -0000 Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sources.redhat.com Received: (qmail 16605 invoked from network); 22 Apr 2003 19:59:30 -0000 Received: from unknown (HELO mx1.redhat.com) (66.187.233.31) by sources.redhat.com with SMTP; 22 Apr 2003 19:59:30 -0000 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.11.6/8.11.6) with ESMTP id h3MJxUD18169 for ; Tue, 22 Apr 2003 15:59:30 -0400 Received: from pobox.corp.redhat.com (pobox.corp.redhat.com [172.16.52.156]) by int-mx1.corp.redhat.com (8.11.6/8.11.6) with ESMTP id h3MJxUq21571 for ; Tue, 22 Apr 2003 15:59:30 -0400 Received: from localhost.redhat.com (romulus-int.sfbay.redhat.com [172.16.27.46]) by pobox.corp.redhat.com (8.11.6/8.11.6) with ESMTP id h3MJxTk32653; Tue, 22 Apr 2003 15:59:29 -0400 Received: by localhost.redhat.com (Postfix, from userid 469) id 34DAC2C438; Tue, 22 Apr 2003 16:04:03 -0400 (EDT) From: Elena Zannoni MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <16037.41011.517603.566953@localhost.redhat.com> Date: Tue, 22 Apr 2003 19:59:00 -0000 To: gdb@sources.redhat.com, jimb@redhat.com Subject: charset.c problem with non-en_US locales X-SW-Source: 2003-04/txt/msg00258.txt.bz2 I got a bug report against the gdb in RH Linux which I found interesting, and since it occurs in FSF gdb as well... Problem: If you set the locale to Turkish, gdb errors out, complaining that it cannot find the ISO-8859-1 charset. [ezannoni@localhost gdb]$ LC_ALL=tr_TR.UTF-8 ./gdb GDB doesn't know of any character set named `ISO-8859-1'. No display number 0. Disabling display 0 to avoid infinite recursion. [ezannoni@localhost gdb]$ [ezannoni@localhost gdb]$ LC_ALL=tr_TR.ISO-8859-9 ./gdb GDB doesn't know of any character set named `ISO-8859-1'. No display number 0. Disabling display 0 to avoid infinite recursion. [ezannoni@localhost gdb]$ The real problem is in the use of the tolower() function in charset.c to do a case insensitive comparison between the two strings "ISO-8859-1" and "iso-8859-1". /* Character set names are always compared ignoring case. */ static int strcmp_case_insensitive (const char *p, const char *q) { while (*p && *q && tolower (*p) == tolower (*q)) p++, q++; return tolower (*p) - tolower (*q); } When the locale is set to Turkish (or any other non-Latin), the tolower/toupper functions don't work as they would in English. The lowercase version of 'I' is not 'i', for instance but some other chracter ('i' w/o the dot). Indeed the man pages for tolower/toupper warn about this. strcasecmp() also has the problem. So, I think the whole case-insensitive approach for the names of the charsets and the translation tables should probably be removed. What was the reason behind it? Was it that the user could type upper/lower case charset names at the command line? After all the official name is 'ISO' not 'iso'. This patch works, but I am not confident that this it's enough. elena Index: charset.c =================================================================== RCS file: /cvs/uberbaum/gdb/charset.c,v retrieving revision 1.3 diff -u -p -r1.3 charset.c --- charset.c 14 Jan 2003 00:49:03 -0000 1.3 +++ charset.c 22 Apr 2003 19:56:53 -0000 @@ -160,10 +160,13 @@ struct translation { static int strcmp_case_insensitive (const char *p, const char *q) { - while (*p && *q && tolower (*p) == tolower (*q)) +#if 0 + while (*p && *q && tolower (*p) == tolower (*q)) p++, q++; return tolower (*p) - tolower (*q); +#endif + return strcmp (p, q); } @@ -1207,24 +1210,24 @@ _initialize_charset (void) register_charset (simple_charset ("ascii", 1, ascii_print_literally, 0, ascii_to_control, 0)); - register_charset (iso_8859_family_charset ("iso-8859-1")); + register_charset (iso_8859_family_charset ("ISO-8859-1")); register_charset (ebcdic_family_charset ("ebcdic-us")); register_charset (ebcdic_family_charset ("ibm1047")); register_iconv_charsets (); { struct { char *from; char *to; int *table; } tlist[] = { - { "ascii", "iso-8859-1", ascii_to_iso_8859_1_table }, + { "ascii", "ISO-8859-1", ascii_to_iso_8859_1_table }, { "ascii", "ebcdic-us", ascii_to_ebcdic_us_table }, { "ascii", "ibm1047", ascii_to_ibm1047_table }, - { "iso-8859-1", "ascii", iso_8859_1_to_ascii_table }, - { "iso-8859-1", "ebcdic-us", iso_8859_1_to_ebcdic_us_table }, - { "iso-8859-1", "ibm1047", iso_8859_1_to_ibm1047_table }, + { "ISO-8859-1", "ascii", iso_8859_1_to_ascii_table }, + { "ISO-8859-1", "ebcdic-us", iso_8859_1_to_ebcdic_us_table }, + { "ISO-8859-1", "ibm1047", iso_8859_1_to_ibm1047_table }, { "ebcdic-us", "ascii", ebcdic_us_to_ascii_table }, - { "ebcdic-us", "iso-8859-1", ebcdic_us_to_iso_8859_1_table }, + { "ebcdic-us", "ISO-8859-1", ebcdic_us_to_iso_8859_1_table }, { "ebcdic-us", "ibm1047", ebcdic_us_to_ibm1047_table }, { "ibm1047", "ascii", ibm1047_to_ascii_table }, - { "ibm1047", "iso-8859-1", ibm1047_to_iso_8859_1_table }, + { "ibm1047", "ISO-8859-1", ibm1047_to_iso_8859_1_table }, { "ibm1047", "ebcdic-us", ibm1047_to_ebcdic_us_table } };