From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 95597 invoked by alias); 21 May 2018 18:34:07 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 95557 invoked by uid 89); 21 May 2018 18:34:06 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy= X-HELO: esa8.dell-outbound.iphmx.com Received: from esa8.dell-outbound.iphmx.com (HELO esa8.dell-outbound.iphmx.com) (68.232.149.218) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 21 May 2018 18:34:04 +0000 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: =?us-ascii?q?A2HYAAAUEANbmGOa6ERbGQEBAQEBAQEBA?= =?us-ascii?q?QEBAQcBAQEBAYQWgTMKmGKBWCGBD5UuC4RsAoIaITcVAQIBAQEBAQECAQECEAE?= =?us-ascii?q?BAQEBCAsLBigvgjUiglMBAQEDATo/BQsCAQgUAQMeEFcCBA4FgyKBeQiqOIhBg?= =?us-ascii?q?g8JAYgrghOBMgyCXYRzTIJkgiQCjCCEdIc4BwKOV4E3hkyEepB3gSUygXVwegG?= =?us-ascii?q?CGIIujiBvjzeBGAEB?= X-IPAS-Result: =?us-ascii?q?A2HYAAAUEANbmGOa6ERbGQEBAQEBAQEBAQEBAQcBAQEBAYQ?= =?us-ascii?q?WgTMKmGKBWCGBD5UuC4RsAoIaITcVAQIBAQEBAQECAQECEAEBAQEBCAsLBigvg?= =?us-ascii?q?jUiglMBAQEDATo/BQsCAQgUAQMeEFcCBA4FgyKBeQiqOIhBgg8JAYgrghOBMgy?= =?us-ascii?q?CXYRzTIJkgiQCjCCEdIc4BwKOV4E3hkyEepB3gSUygXVwegGCGIIujiBvjzeBG?= =?us-ascii?q?AEB?= Received: from esa6.dell-outbound2.iphmx.com ([68.232.154.99]) by esa8.dell-outbound.iphmx.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 21 May 2018 13:34:03 -0500 From: Received: from ausxippc110.us.dell.com ([143.166.85.200]) by esa6.dell-outbound2.iphmx.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 May 2018 00:34:03 +0600 X-LoopCount0: from 10.166.136.215 X-DLP: DLP_GlobalPCIDSS To: CC: , , Subject: Re: support C/C++ identifiers named with non-ASCII characters Date: Mon, 21 May 2018 19:25:00 -0000 Message-ID: References: <9418d4f0-f22a-c587-cc34-2fa67afbd028@zjz.name> <8c8af079-dbb8-207b-5edf-86b99e9f5db8@simark.ca> <834lj1f0ne.fsf@gnu.org> <83tvr0ev0p.fsf@gnu.org> In-Reply-To: <83tvr0ev0p.fsf@gnu.org> Content-Type: text/plain; charset="us-ascii" Content-ID: <89871D396E94CF47AB26C9A4B52E6742@dell.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-SW-Source: 2018-05/txt/msg00502.txt.bz2 > On May 21, 2018, at 2:14 PM, Eli Zaretskii wrote: >=20 >> From: >> CC: , , >> Date: Mon, 21 May 2018 18:03:17 +0000 >>=20 >>> Is it a fact that non-ASCII identifiers must be encoded in UTF-8, and >>> can not include invalid UTF-8 sequences? >>=20 >> Encoding is a I/O question. >=20 > Not necessarily. >=20 > I asked that question because scanning a string for certain ASCII > characters using a 'char *' pointer will only work reliably if the > string is in UTF-8 or in some single-byte encoding. Otherwise, we > might find false hits for the delimiters, which are actually parts of > multibyte sequences. I see your point. The I/O encoding ties to the internal encoding. UTF-8 can be read into cha= r[] and processed using C string primitives. Other encodings cannot. For = example, if you have UTF-16 or UTF-32, you'd have to read it into a wchar_t= string of the correct character width and use the wchar string functions. So there are two questions: 1. What are the valid characters? (Unicode question, independent of encodi= ng) 2. What encoding do we expect in I/O (UTF question) from which we conclude = what processing functions we need. paul