From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13606 invoked by alias); 9 Oct 2010 21:05:18 -0000 Received: (qmail 13597 invoked by uid 22791); 9 Oct 2010 21:05:18 -0000 X-SWARE-Spam-Status: No, hits=-0.0 required=5.0 tests=AWL,BAYES_05,RCVD_IN_DNSWL_NONE,SPF_SOFTFAIL X-Spam-Check-By: sourceware.org Received: from mtaout22.012.net.il (HELO mtaout22.012.net.il) (80.179.55.172) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sat, 09 Oct 2010 21:05:10 +0000 Received: from conversion-daemon.a-mtaout22.012.net.il by a-mtaout22.012.net.il (HyperSendmail v2007.08) id <0LA100A00IPV9B00@a-mtaout22.012.net.il> for gdb@sources.redhat.com; Sat, 09 Oct 2010 23:04:59 +0200 (IST) Received: from HOME-C4E4A596F7 ([77.126.134.44]) by a-mtaout22.012.net.il (HyperSendmail v2007.08) with ESMTPA id <0LA10096DJW8XZ70@a-mtaout22.012.net.il>; Sat, 09 Oct 2010 23:04:57 +0200 (IST) Date: Sat, 09 Oct 2010 21:05:00 -0000 From: Eli Zaretskii Subject: Re: Sevenbit-strings only partially respected? In-reply-to: To: Vladimir Prus Cc: gdb@sources.redhat.com Reply-to: Eli Zaretskii Message-id: <83k4lrm611.fsf@gnu.org> MIME-version: 1.0 Content-type: text/plain; charset=utf-8 Content-transfer-encoding: 8BIT References: <201010092326.36112.ghost@cs.msu.su> X-IsSubscribed: yes Mailing-List: contact gdb-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sourceware.org X-SW-Source: 2010-10/txt/msg00026.txt.bz2 > From: Vladimir Prus > Date: Sat, 09 Oct 2010 23:48:12 +0400 > > > if (c < 0x20 || /* Low control chars */ > > (c >= 0x7F && c < 0xA0) || /* DEL, High controls */ > > (sevenbit_strings && c >= 0x80)) > > {/* high order bit set */ > > > > Apparently, the second condition fires and causes 0xD0 to be quoted. Is > > this expected behaviour? > > Doh. Of course 0xD0 is larger than 0xA0. The value that causes the actual > problem is 0x83. Russian letter 'у' is encoded in UTF8 as 0xD1 0x83, and > because of the above code, strings with that letter (and some other letters) > get messed up completely. That `(c >= 0x7F && c < 0xA0)' condition assumes ISO-8859-n encodings (probably was coded for 8859-1), and should not be used with anything else.