From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-147484-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 95597 invoked by alias); 21 May 2018 18:34:07 -0000
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
Received: (qmail 95557 invoked by uid 89); 21 May 2018 18:34:06 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW autolearn=ham version=3.3.2 spammy=
X-HELO: esa8.dell-outbound.iphmx.com
Received: from esa8.dell-outbound.iphmx.com (HELO esa8.dell-outbound.iphmx.com) (68.232.149.218) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 21 May 2018 18:34:04 +0000
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: =?us-ascii?q?A2HYAAAUEANbmGOa6ERbGQEBAQEBAQEBA?= =?us-ascii?q?QEBAQcBAQEBAYQWgTMKmGKBWCGBD5UuC4RsAoIaITcVAQIBAQEBAQECAQECEAE?= =?us-ascii?q?BAQEBCAsLBigvgjUiglMBAQEDATo/BQsCAQgUAQMeEFcCBA4FgyKBeQiqOIhBg?= =?us-ascii?q?g8JAYgrghOBMgyCXYRzTIJkgiQCjCCEdIc4BwKOV4E3hkyEepB3gSUygXVwegG?= =?us-ascii?q?CGIIujiBvjzeBGAEB?=
X-IPAS-Result: =?us-ascii?q?A2HYAAAUEANbmGOa6ERbGQEBAQEBAQEBAQEBAQcBAQEBAYQ?= =?us-ascii?q?WgTMKmGKBWCGBD5UuC4RsAoIaITcVAQIBAQEBAQECAQECEAEBAQEBCAsLBigvg?= =?us-ascii?q?jUiglMBAQEDATo/BQsCAQgUAQMeEFcCBA4FgyKBeQiqOIhBgg8JAYgrghOBMgy?= =?us-ascii?q?CXYRzTIJkgiQCjCCEdIc4BwKOV4E3hkyEepB3gSUygXVwegGCGIIujiBvjzeBG?= =?us-ascii?q?AEB?=
Received: from esa6.dell-outbound2.iphmx.com ([68.232.154.99])  by esa8.dell-outbound.iphmx.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 21 May 2018 13:34:03 -0500
From: <Paul.Koning@dell.com>
Received: from ausxippc110.us.dell.com ([143.166.85.200])  by esa6.dell-outbound2.iphmx.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 22 May 2018 00:34:03 +0600
X-LoopCount0: from 10.166.136.215
X-DLP: DLP_GlobalPCIDSS
To: <eliz@gnu.org>
CC: <simark@simark.ca>, <zjz@zjz.name>, <gdb-patches@sourceware.org>
Subject: Re: support C/C++ identifiers named with non-ASCII characters
Date: Mon, 21 May 2018 19:25:00 -0000
Message-ID: <DFDB152A-578E-4FDE-8370-60CADCBD664C@dell.com>
References: <9418d4f0-f22a-c587-cc34-2fa67afbd028@zjz.name> <8c8af079-dbb8-207b-5edf-86b99e9f5db8@simark.ca> <CF83AA8F-D3F8-446C-A078-252ADFB6D4C8@dell.com> <834lj1f0ne.fsf@gnu.org> <FCEC48CC-5F04-438F-9B6C-2D8933E64A97@dell.com> <83tvr0ev0p.fsf@gnu.org>
In-Reply-To: <83tvr0ev0p.fsf@gnu.org>
Content-Type: text/plain; charset="us-ascii"
Content-ID: <89871D396E94CF47AB26C9A4B52E6742@dell.com>
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-SW-Source: 2018-05/txt/msg00502.txt.bz2


> On May 21, 2018, at 2:14 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>=20
>> From: <Paul.Koning@dell.com>
>> CC: <simark@simark.ca>, <zjz@zjz.name>, <gdb-patches@sourceware.org>
>> Date: Mon, 21 May 2018 18:03:17 +0000
>>=20
>>> Is it a fact that non-ASCII identifiers must be encoded in UTF-8, and
>>> can not include invalid UTF-8 sequences?
>>=20
>> Encoding is a I/O question.
>=20
> Not necessarily.
>=20
> I asked that question because scanning a string for certain ASCII
> characters using a 'char *' pointer will only work reliably if the
> string is in UTF-8 or in some single-byte encoding.  Otherwise, we
> might find false hits for the delimiters, which are actually parts of
> multibyte sequences.

I see your point.

The I/O encoding ties to the internal encoding.  UTF-8 can be read into cha=
r[] and processed using C string primitives.  Other encodings cannot.  For =
example, if you have UTF-16 or UTF-32, you'd have to read it into a wchar_t=
 string of the correct character width and use the wchar string functions.

So there are two questions:

1. What are the valid characters?  (Unicode question, independent of encodi=
ng)
2. What encoding do we expect in I/O (UTF question) from which we conclude =
what processing functions we need.

	paul