From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 67690 invoked by alias); 21 May 2018 17:37:18 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 67675 invoked by uid 89); 21 May 2018 17:37:18 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: =?ISO-8859-1?Q?No, score=-20.9 required=5.0 tests=BAYES_00,BODY_8BITS,FROM_EXCESS_BASE64,GARBLED_BODY,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 spammy==e5=af=ab=e9?= X-HELO: m176116.mail.qiye.163.com Received: from m176116.mail.qiye.163.com (HELO m176116.mail.qiye.163.com) (59.111.176.116) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 21 May 2018 17:37:15 +0000 Received: from [192.168.0.105] (unknown [59.59.164.76]) by m176116.mail.qiye.163.com (HMail) with ESMTPSA id 69E56B426E3 for ; Tue, 22 May 2018 01:37:02 +0800 (CST) Subject: Re: support C/C++ identifiers named with non-ASCII characters References: <1b915196-3e97-4892-7426-be4211fe7889@zjz.name> To: gdb-patches@sourceware.org From: =?UTF-8?B?5by15L+K6Iqd?= X-Forwarded-Message-Id: <1b915196-3e97-4892-7426-be4211fe7889@zjz.name> Message-ID: Date: Mon, 21 May 2018 18:00:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:55.0) Gecko/20100101 Thunderbird/55.0 MIME-Version: 1.0 In-Reply-To: <1b915196-3e97-4892-7426-be4211fe7889@zjz.name> Content-Type: multipart/mixed; boundary="------------FE32E4F35DB3BEE7C1983EA7" X-HM-Spam-Status: e1ktWUFJV1koWUFPN1dZCBgUCR5ZQUtVS1dZCQ4XHghZQVkyNS06NzI*QU tVS1kG X-HM-Sender-Digest: e1kSHx4VD1lBWUc6NzI6Cjo4LzovDSosEStRMhdPNhoKCklVSlVKTklN QklPSUlJT0NMVTMWGhIXVQERATsBEQFVFRoWHkVZV1kMHhlZQR0aFwgeV1kIAVlBQ0lISzdXWRIL WUFZTkJVTkJVSk1PVUxNWQY+ X-HM-Tid: 0a6383c69ff8926akuws69e56b426e3 X-SW-Source: 2018-05/txt/msg00497.txt.bz2 This is a multi-part message in MIME format. --------------FE32E4F35DB3BEE7C1983EA7 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-length: 2837 Simon Marchi 於 2018/5/21 下午10:03 寫道: > On 2018-05-21 04:52 AM, 張俊芝 wrote: > > Hi Zhang, > > Thanks for the patch, I tested it quickly, it seems to work as expected. > > Could you please write a small test case in testsuite/gdb.base with the example > you gave, so we make sure this doesn't get broken later? If you can write it > in such a way that both clang and gcc understand it would be better, because > most people run the testuite using gcc to compile test programs. > > I am not a specialist in lexing and parsing C, so can you explain quickly why > you think this is a good solution? Quickly, I understand that you change the > identifier recognition algorithm to a blacklist of characters rather than > a whitelist, so bytes that are not recognized (such as those that compose > the utf-8 encoded characters) are not rejected. > > Given unlimited time, would the right solution be to use a lib to parse the > string as utf-8, and reject strings that are not valid utf-8? > > Here are some not and formatting comments: > >> +static bool is_identifier_separator (char); > > You don't have to forward declare the function if it's not necessary. > >> + /* '<' should not be a token separator, because it can be an open angle >> + bracket followed by a nested template identifier in C++. */ > > Please use two spaces after the final period (...C++. */). > >> + if (is_identifier_separator(c)) > > Please use a space before the parentheses: > > is_identifier_separator (c) > > >> + for (c = tokstart[namelen]; !is_identifier_separator(c);) > > Here too. > > Thanks! > > Simon > Thank you for the reply, Simon. This new diff addresses all the code style issues you mentioned. Yes, you are right in that the blacklist is limited. Actually, `is_identifier_separator` only blacklists the invalid ASCII characters for an identifier, leaving all the invalid non-ASCII characters unchecked. So seems it would be better if non-ASCII characters were also checked. However, unfortunately, GDB is neither aware of the encoding of the terminal input, nor the encoding of the generated symbolic information in the executable. So the blacklist is made to restrict to the invalid ASCII characters in order to support all ASCII-compliant encodings. Having said that, I find that it does no harm to the user if we only check the ASCII characters. If the user is trying to print an identifier which includes an invalid non-ASCII character, say, p 測】, where 】is invalid, they will get an error message: No symbol "測】" in current context. which doesn't seem any worse than an error message like: Invalid character "】" in expression. Perhaps the former might be even more intuitional. So personally, I think it might be safe enough to use this limited blacklist method. --------------FE32E4F35DB3BEE7C1983EA7 Content-Type: text/plain; charset=UTF-8; name="diff" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="diff" Content-length: 3717 ZGlmZiAtLWdpdCBhL2dkYi9jLWV4cC55IGIvZ2RiL2MtZXhwLnkKaW5kZXgg NWUxMGQyYTNiNC4uZTEwYjZiNDc0ZCAxMDA2NDQKLS0tIGEvZ2RiL2MtZXhw LnkKKysrIGIvZ2RiL2MtZXhwLnkKQEAgLTE3MTgsNiArMTcxOCw1MyBAQCB0 eXBlX2FnZ3JlZ2F0ZV9wIChzdHJ1Y3QgdHlwZSAqdHlwZSkKIAkgICAgICAm JiBUWVBFX0RFQ0xBUkVEX0NMQVNTICh0eXBlKSkpOwogfQogCisvKiBXaGls ZSBpdGVyYXRpbmcgYWxsIHRoZSBjaGFyYWN0ZXJzIGluIGFuIGlkZW50aWZp ZXIsIGFuIGlkZW50aWZpZXIgc2VwYXJhdG9yCisgICBpcyBhIGJvdW5kYXJ5 IHdoZXJlIHdlIGtub3cgdGhlIGl0ZXJhdGlvbiBpcyBkb25lLiAqLworCitz dGF0aWMgYm9vbAoraXNfaWRlbnRpZmllcl9zZXBhcmF0b3IgKGNoYXIgYykK K3sKKyAgc3dpdGNoIChjKQorICAgIHsKKyAgICBjYXNlICcgJzoKKyAgICBj YXNlICdcdCc6CisgICAgY2FzZSAnXG4nOgorICAgIGNhc2UgJ1wwJzoKKyAg ICBjYXNlICdcJyc6CisgICAgY2FzZSAnIic6CisgICAgY2FzZSAnXFwnOgor ICAgIGNhc2UgJygnOgorICAgIGNhc2UgJyknOgorICAgIGNhc2UgJywnOgor ICAgIGNhc2UgJy4nOgorICAgIGNhc2UgJysnOgorICAgIGNhc2UgJy0nOgor ICAgIGNhc2UgJyonOgorICAgIGNhc2UgJy8nOgorICAgIGNhc2UgJ3wnOgor ICAgIGNhc2UgJyYnOgorICAgIGNhc2UgJ14nOgorICAgIGNhc2UgJ34nOgor ICAgIGNhc2UgJyEnOgorICAgIGNhc2UgJ0AnOgorICAgIGNhc2UgJ1snOgor ICAgIGNhc2UgJ10nOgorICAgIC8qICc8JyBzaG91bGQgbm90IGJlIGEgdG9r ZW4gc2VwYXJhdG9yLCBiZWNhdXNlIGl0IGNhbiBiZSBhbiBvcGVuIGFuZ2xl CisgICAgICAgYnJhY2tldCBmb2xsb3dlZCBieSBhIG5lc3RlZCB0ZW1wbGF0 ZSBpZGVudGlmaWVyIGluIEMrKy4gICovCisgICAgY2FzZSAnPic6CisgICAg Y2FzZSAnPyc6CisgICAgY2FzZSAnOic6CisgICAgY2FzZSAnPSc6CisgICAg Y2FzZSAneyc6CisgICAgY2FzZSAnfSc6CisgICAgY2FzZSAnOyc6CisgICAg ICByZXR1cm4gdHJ1ZTsKKyAgICBkZWZhdWx0OgorICAgICAgYnJlYWs7Cisg ICAgfQorICByZXR1cm4gZmFsc2U7Cit9CisKIC8qIFZhbGlkYXRlIGEgcGFy YW1ldGVyIHR5cGVsaXN0LiAgKi8KIAogc3RhdGljIHZvaWQKQEAgLTE5MjAs NyArMTk2Nyw3IEBAIHBhcnNlX251bWJlciAoc3RydWN0IHBhcnNlcl9zdGF0 ZSAqcGFyX3N0YXRlLAogCSBGSVhNRTogVGhpcyBjaGVjayBpcyB3cm9uZzsg Zm9yIGV4YW1wbGUgaXQgZG9lc24ndCBmaW5kIG92ZXJmbG93CiAJIG9uIDB4 MTIzNDU2Nzg5IHdoZW4gTE9OR0VTVCBpcyAzMiBiaXRzLiAgKi8KICAgICAg IGlmIChjICE9ICdsJyAmJiBjICE9ICd1JyAmJiBuICE9IDApCi0JewkKKwl7 CiAJICBpZiAoKHVuc2lnbmVkX3AgJiYgKFVMT05HRVNUKSBwcmV2biA+PSAo VUxPTkdFU1QpIG4pKQogCSAgICBlcnJvciAoXygiTnVtZXJpYyBjb25zdGFu dCB0b28gbGFyZ2UuIikpOwogCX0KQEAgLTI3NDEsMTYgKzI3ODgsMTMgQEAg bGV4X29uZV90b2tlbiAoc3RydWN0IHBhcnNlcl9zdGF0ZSAqcGFyX3N0YXRl LCBib29sICppc19xdW90ZWRfbmFtZSkKICAgICAgIH0KICAgICB9CiAKLSAg aWYgKCEoYyA9PSAnXycgfHwgYyA9PSAnJCcKLQl8fCAoYyA+PSAnYScgJiYg YyA8PSAneicpIHx8IChjID49ICdBJyAmJiBjIDw9ICdaJykpKQorICBpZiAo aXNfaWRlbnRpZmllcl9zZXBhcmF0b3IgKGMpKQogICAgIC8qIFdlIG11c3Qg aGF2ZSBjb21lIGFjcm9zcyBhIGJhZCBjaGFyYWN0ZXIgKGUuZy4gJzsnKS4g ICovCiAgICAgZXJyb3IgKF8oIkludmFsaWQgY2hhcmFjdGVyICclYycgaW4g ZXhwcmVzc2lvbi4iKSwgYyk7CiAKICAgLyogSXQncyBhIG5hbWUuICBTZWUg aG93IGxvbmcgaXQgaXMuICAqLwogICBuYW1lbGVuID0gMDsKLSAgZm9yIChj ID0gdG9rc3RhcnRbbmFtZWxlbl07Ci0gICAgICAgKGMgPT0gJ18nIHx8IGMg PT0gJyQnIHx8IChjID49ICcwJyAmJiBjIDw9ICc5JykKLQl8fCAoYyA+PSAn YScgJiYgYyA8PSAneicpIHx8IChjID49ICdBJyAmJiBjIDw9ICdaJykgfHwg YyA9PSAnPCcpOykKKyAgZm9yIChjID0gdG9rc3RhcnRbbmFtZWxlbl07ICFp c19pZGVudGlmaWVyX3NlcGFyYXRvciAoYyk7KQogICAgIHsKICAgICAgIC8q IFRlbXBsYXRlIHBhcmFtZXRlciBsaXN0cyBhcmUgcGFydCBvZiB0aGUgbmFt ZS4KIAkgRklYTUU6IFRoaXMgbWlzaGFuZGxlcyBgcHJpbnQgJGE8NCYmJGE+ MycuICAqLwpAQCAtMjkzMiw3ICsyOTc2LDcgQEAgY2xhc3NpZnlfbmFtZSAo c3RydWN0IHBhcnNlcl9zdGF0ZSAqcGFyX3N0YXRlLCBjb25zdCBzdHJ1Y3Qg YmxvY2sgKmJsb2NrLAogCSBmaWxlbmFtZS4gIEhvd2V2ZXIsIGlmIHRoZSBu YW1lIHdhcyBxdW90ZWQsIHRoZW4gaXQgaXMgYmV0dGVyCiAJIHRvIGNoZWNr IGZvciBhIGZpbGVuYW1lIG9yIGEgYmxvY2ssIHNpbmNlIHRoaXMgaXMgdGhl IG9ubHkKIAkgd2F5IHRoZSB1c2VyIGhhcyBvZiByZXF1aXJpbmcgdGhlIGV4 dGVuc2lvbiB0byBiZSB1c2VkLiAgKi8KLSAgICAgIGlmICgoaXNfYV9maWVs ZF9vZl90aGlzLnR5cGUgPT0gTlVMTCAmJiAhaXNfYWZ0ZXJfc3RydWN0b3Ap IAorICAgICAgaWYgKChpc19hX2ZpZWxkX29mX3RoaXMudHlwZSA9PSBOVUxM ICYmICFpc19hZnRlcl9zdHJ1Y3RvcCkKIAkgIHx8IGlzX3F1b3RlZF9uYW1l KQogCXsKIAkgIC8qIFNlZSBpZiBpdCdzIGEgZmlsZSBuYW1lLiAqLwo= --------------FE32E4F35DB3BEE7C1983EA7--