From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 80307 invoked by alias); 22 May 2018 07:01:52 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 80227 invoked by uid 89); 22 May 2018 07:01:50 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: =?ISO-8859-1?Q?No, score=0.1 required=5.0 tests=BAYES_00,FROM_EXCESS_BASE64,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_NONE autolearn=no version=3.3.2 spammy=validity, Hx-languages-length:812, =e5=af=ab=e9, HContent-Transfer-Encoding:8bit?= X-HELO: m176116.mail.qiye.163.com Received: from m176116.mail.qiye.163.com (HELO m176116.mail.qiye.163.com) (59.111.176.116) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 22 May 2018 07:01:43 +0000 Received: from [192.168.0.105] (unknown [120.38.217.241]) by m176116.mail.qiye.163.com (HMail) with ESMTPSA id 84EAAB41EE9; Tue, 22 May 2018 15:01:25 +0800 (CST) Subject: Re: support C/C++ identifiers named with non-ASCII characters To: Paul.Koning@dell.com, gdb-patches@sourceware.org References: <9418d4f0-f22a-c587-cc34-2fa67afbd028@zjz.name> <8c8af079-dbb8-207b-5edf-86b99e9f5db8@simark.ca> <834lj1f0ne.fsf@gnu.org> From: =?UTF-8?B?5by15L+K6Iqd?= Message-ID: <4bba5cb6-1490-b056-1682-444a08a1a293@zjz.name> Date: Tue, 22 May 2018 08:34:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:55.0) Gecko/20100101 Thunderbird/55.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-HM-Spam-Status: e1ktWUFJV1koWUFPN1dZCBgUCR5ZQUtVS1dZCQ4XHghZQVkyNS06NzI*QU tVS1kG X-HM-Sender-Digest: e1kSHx4VD1lBWUc6MBw6Ogw*CzoqIyoiLTMrLC8xKjNPFBBVSlVKTklN QkxJT0NOTEJDVTMWGhIXVQERATsBEQFVFRoWHkVZV1kMHhlZQR0aFwgeV1kIAVlBSUpDSjdXWRIL WUFZSklLVUhDVUlKTFVJT0pZBg++ X-HM-Tid: 0a6386a70fd8926akuws84eaab41ee9 X-SW-Source: 2018-05/txt/msg00534.txt.bz2 Paul.Koning@dell.com 於 2018/5/22 上午2:03 寫道: > > Not all byte strings are valid UTF-8 strings. When a byte string is delivered from the outside, it makes sense to validate if it's a valid encoding before it is used. Or you can assume that inputs are valid and rely on "symbol not found" as the general way to handle anything that doesn't match. For gdb, that may be good enough. I preferred the latter(I.e. assume all non-ASCII characters are valid and rely on "symbol not found"), and it's actually what the patch does. Although a compiler has to be strict with validity of non-ASCII characters, but for GDB, the latter solution is just good enough - Checking only ASCII characters makes GDB work well with all ASCII-compliant encodings.