From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 32456 invoked by alias); 20 Nov 2017 16:50:33 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 32443 invoked by uid 89); 20 Nov 2017 16:50:32 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.1 required=5.0 tests=AWL,BAYES_00,KB_WAM_FROM_NAME_SINGLEWORD,LIKELY_SPAM_SUBJECT autolearn=no version=3.3.2 spammy=fund X-HELO: sesbmg22.ericsson.net Received: from sesbmg22.ericsson.net (HELO sesbmg22.ericsson.net) (193.180.251.48) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 20 Nov 2017 16:50:31 +0000 Received: from ESESSHC008.ericsson.se (Unknown_Domain [153.88.183.42]) by sesbmg22.ericsson.net (Symantec Mail Security) with SMTP id A5.76.09556.4D7031A5; Mon, 20 Nov 2017 17:50:28 +0100 (CET) Received: from EUR01-DB5-obe.outbound.protection.outlook.com (153.88.183.145) by oa.msg.ericsson.com (153.88.183.42) with Microsoft SMTP Server (TLS) id 14.3.352.0; Mon, 20 Nov 2017 17:50:28 +0100 Received: from [142.133.48.38] (192.75.88.130) by AM3PR07MB305.eurprd07.prod.outlook.com (2a01:111:e400:881b::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.260.2; Mon, 20 Nov 2017 16:50:24 +0000 Subject: Re: [PATCH 1/3] 0xff chars in name components table; cp-name-parser lex UTF-8 identifiers To: Pedro Alves , Simon Marchi , References: <5d721d13-d886-0400-db6b-76485c545142@redhat.com> <1511138515-25996-1-git-send-email-palves@redhat.com> <087a3f24-13ec-77b3-3b2b-fff1d0814ec1@simark.ca> From: Simon Marchi Message-ID: <5cb02694-8f35-94ad-8bb6-6bde13258fe4@ericsson.com> Date: Mon, 20 Nov 2017 16:50:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit X-ClientProxiedBy: MWHPR12CA0046.namprd12.prod.outlook.com (2603:10b6:301:2::32) To AM3PR07MB305.eurprd07.prod.outlook.com (2a01:111:e400:881b::13) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 9a2bd614-f5a9-4ef3-5b44-08d53036cc9c X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(4534020)(4602075)(4627115)(201703031133081)(201702281549075)(2017052603258);SRVR:AM3PR07MB305; X-Microsoft-Exchange-Diagnostics: 1;AM3PR07MB305;3:LBI2H/NXzt/JA5ofy5mPOo37V6Mj8Irym3Qfq3DcJe04d5DJAYoHy3+tkG07DTWXnvMDO/9Oj0/esKewYkOS2DD7H2NHMqlMuL32CZyrxD+gjRjLTuYh9syuisqCVtjjQ+s2lwJCVIxsWfr5NL+Y0osPWg01wfcYNqmwC5aG75C0KqS4ocTmVd4R7ZovSNcAOY+1SxeOK33GX3VAUvJOB/hSP0LpthLLsOBGti43QPyWO4EpgYCl4MuI35ZpZmuc;25:2Fk2/hFeH8ViyygepM+ptDxKW8x+GtgPVrR0b9pAM5gI3lnumZhU2M9K0jQ+65PI6hu1NFsZIgwrafQ+KoFiJOw0RlK6hgbca9DbRguHDOXTB3g22T2n74dE8On8y2YIZspScWsJRhlVWo/aj7AROcGN9AwK9WXVz5SAcOFA2pLY+xY3lgn5vx71nDRUmAMp/FVbfjehYuNuVtzIaOeCSndk/U1B0s2myhwo3eL18q/Phtlcxx0FYQFx7PoHSl5yZlwYrYJAgrV3K1EbIXdWrOrhMIJF3a/aHVJqRFnQNm5UesY8TjieTYAayQEGmflK1JRvjQU+tHX+YmgkefkVKY0r01/I00CsLxR5f+Cb5qg=;31:AKygi4DW4fcOXW2PyCbEWKuob0AjsAj5qxi9PSGFRWyFOXeN3W9AEtCvItmVe4tZ1JSVEbFkc14Z18DaEdKualWOEqBsbxAQ6EosfRFNvRrKovBZp1E7gktZ+AlqDY6Gyb6w+sWGHxzaKss7fPLMappO9WChZ2GyYv2eyQjtXZ7P18Kc1vrVdGghO2nRqFxSkAO454gVOvS1d+9kzeDAj6hwgpjwKhtSW3rXUbS8lsk= X-MS-TrafficTypeDiagnostic: AM3PR07MB305: Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=simon.marchi@ericsson.com; X-Microsoft-Exchange-Diagnostics: 1;AM3PR07MB305;20:Q9h9aBtUpaDj3ITK+MyDcU20xmMZoL+/+8xvWBJ74FLa8a/0O+5BOv00iBalsyi8vgzSNaLUam/ZTyVqBpI/bZEDcb3+wrmi5ngKu8KMKXXRHXcnksfcXmYdDapCtSAFG2mBh/KdIwTp+e0jEI1mp7J+Tv0cvOdrAS2LOeZ0cB07mVWTf3NgQUn/oq0vfV+eWLbUZQnnRQi/uqaoc3Fji1OTmu9YxtTu9JtCSPjjRm9BKBHHdXs7sXtuXZ8hccvkF8CbK9u7CX+Bch5QcU89SbWaJbCXvBJ29t+rpk/bs/A1RIEds4h8gQzGfRztOXf6UFQK+hv593nUNOLvRSxfk2ta63FVBY6KK/HZmdsr7khH6yr/EKuRYidQ0CRS4+VCOzxVNSyd0MHbEn5MZ45qh6DeOWKmfldpEGeLInINawhsw9gzGL0yY8N9JXyQX3fl+aBZewJDqBTC/nGAWvRFN6y9N5KYBLoGqtWtvnKdoVib+Mcc/4HN3IGKPvr1g/Zc;4:FsPWYMxOMKeBeLmz23rq/Pj33LAeBXYnqxd2ugqJdTBGEpAhHqWARFCwE68Vqho4u6fo6ObMhQLUNxYHmBsIokYveyn2GJ0ynzcZG7SbW2z4v9/q99bB88Knq2JmkVQs9t4W5ej1wwiFGtl4MHAw13dmaUPVB8tEJGi3zccl28UFJgQrYWFGb2p++6ap16oxWt46/6EzdqcaeSK9Lua3VqZtjVw9lwqcLuDKGLDQKr3oX8HbbuknqXJZfXAaJaLlliWQyqbOttJL2hCsshOgCg== X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(5005006)(8121501046)(93006095)(93001095)(100000703101)(100105400095)(3231022)(3002001)(10201501046)(6041248)(20161123562025)(20161123558100)(20161123560025)(20161123564025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123555025)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:AM3PR07MB305;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:AM3PR07MB305; X-Forefront-PRVS: 04976078F0 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(6049001)(6009001)(39860400002)(376002)(346002)(199003)(377424004)(24454002)(189002)(23676003)(31696002)(2906002)(81166006)(81156014)(47776003)(54356999)(2870700001)(6486002)(76176999)(7736002)(50986999)(305945005)(4001150100001)(97736004)(31686004)(86362001)(68736007)(229853002)(6666003)(66066001)(65956001)(105586002)(8936002)(106356001)(49976008)(230783001)(478600001)(36756003)(65806001)(33646002)(2950100002)(83506002)(3846002)(6116002)(101416001)(25786009)(53546010)(53936002)(93886005)(16526018)(8676002)(5660300001)(65826007)(50466002)(64126003)(58126008)(6246003)(16576012)(316002)(189998001)(110136005)(78286006);DIR:OUT;SFP:1101;SCL:1;SRVR:AM3PR07MB305;H:[142.133.48.38];FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; Received-SPF: None (protection.outlook.com: ericsson.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtBTTNQUjA3TUIzMDU7MjM6MkV2ZzFib2pDR2h6UG9tQ29jMmhSZytPeUNC?= =?utf-8?B?dVRhSEFpWElLcmdDYzNHeStmMFRMa0RzbVRubmQ2eHNsMitFRjE4SDMzazVi?= =?utf-8?B?dzEvS0FWTlpEQnE1SkNzdE5Nd1hRNkpWSTQ1dWdCL0t1NVVOTmJ3L0kyWFpX?= =?utf-8?B?NW8zdS9pTURTVWNqK0V4cWI2RGZ6MkhrOG1lY3gzZUhPOVZIZEZVQWxETUI2?= =?utf-8?B?aUZac2NJUGlncFp6UVl6RWZkZWxac2VwSmRQTWFxMnBNaEtKTEtmUW5KZSty?= =?utf-8?B?VU9TOFFEbzF0bmsyVDlCcEExQjdnRHJsWmF2NTBUeWxQWXZPTTNOeGdBcHBX?= =?utf-8?B?OGsrWnhzR21JNWY2Rkg4ZFVzWmx4MFE0OFVab1QySmZKWDdUUFFCTkdIN0I4?= =?utf-8?B?UFRJRjVpaUdxMTNTejBsdzZCd3o4V2NDbFRGUkNHSE9aWG5DM3k5eW9icHA3?= =?utf-8?B?enhtQ0RFNzNQK0NSbzFDMEM5YXZ3c0ZUUFA1ZDkwYXIyQWxWTXV4OUR0UzZp?= =?utf-8?B?V0pkVlFlUnhGM0xsM2NoRHRKRlpCVnNMbkY3WjI1N3BPUlRzM1JXaFJUNzQ0?= =?utf-8?B?eHJ6anozVEZtV2RqeWg3RUpBUDZqMXZNYjBGUjZKdDluc0pmUVVwQmh2Nk1V?= =?utf-8?B?OHA4T0hpMUUxNzhxbkJRbkF3QlZvVEJSRUs3WklESFNzN0cwTkRMWUQ0SDhW?= =?utf-8?B?SFZINlRwUjBlYVVrTURFeUF6K2gvWmlvTXh5RkZ2RG1HWkVtUGZhejBVZXpn?= =?utf-8?B?S0NmL2pVakh0NmI0SHE3SWVHaGxmaTZISktaNWhxMFVCMFJXZFN2UGttUFdN?= =?utf-8?B?N0ZZS1o3WHlHTnJVNVIxT0FFNnFoVzFWNmQ2bmprb1RSSUFaN0pjTHBJbFc2?= =?utf-8?B?T1lrazhJQ2ZEbVJxeHpOZUZ4RExZTGxuL0VjeG9BTGtoZTlqYkQ3TGo3ai9s?= =?utf-8?B?WThGT2w3aU5YMkZzRWZOd3NiN3prVVBQUy9FMzNnV3JGUXQxMCthTDBlZWp4?= =?utf-8?B?MjlFWGovUFM4cTJ0R05lMjRKbTF3eHZ5bitxc1NvZGJTVDR4TFZCdnhhSFEr?= =?utf-8?B?andOQXBWbVRROXpzeVlOM2pMaE53b0QrU3V5QUpCRHJoRzRWMjA3L1dMMURR?= =?utf-8?B?M3dDYlBmUGFRSXFxQ0ZQWmZpS0p5TE1mdEdZSjc0V3FKYmJBeVdUcEtvOVhB?= =?utf-8?B?NVpCRnZ2N3BjZ0JkZjVJVHM1RWVpdHdoU3BJbzVhNG9OcGZqZFErVGNLTUhr?= =?utf-8?B?Y2ozK1VITTVya1JTeWY1VnZmam4rN0xacUovMFB0K3IrbmRHZGRZbTR2K2Z3?= =?utf-8?B?cU1nY3JDTDM0U3NLK0ZlZ3RnMVcxRGVFZ2ZmaGhaUzRzUHdZdUdKeDcxOGk2?= =?utf-8?B?TXlFbTNyQVZ4VFN1ZE52Z3Jqc0tjQ1ZJVGdnV3loOURIZk5nOTMwNkxXQm4y?= =?utf-8?B?WHJrN0IrcjRLWXJmdytGQ05GMHBxZTJWcDlkZzVwZGs5dmRiUGMxTHpYa1Zl?= =?utf-8?B?eXVhajRGY2Z6VWZESEFoY1R3eTNybnhYOTFuam8wV2s1UUdUeEY5cjVPeG9X?= =?utf-8?B?R1hTclBMaFlpTU5pRFgzNVYrWXllVzVVb3lyLzNZbE9kQUdWREFQalhOT0lW?= =?utf-8?B?bUo3TFVTbVBlWFJ4eThvK0hEckE1T1BjZ0ptcjJKa0FFMEdjcTVDcXBrQXB4?= =?utf-8?B?dW15ZjhLeDkrZkJ2dVJmUW44WGg0aU1qOVczdEtxTEF1dit2Uyt6blhnL01M?= =?utf-8?B?LzFidHp4TG8xQWJBTm1Ed2ZTc1BPL0hTSyt1TnQ3YXhKeXhNMmNRTE1GTnoy?= =?utf-8?B?RHV6UEFxT2JJVE9uOTE2Vi85eHcwQ3FBa1d4UTlRZ202bjB0cWdWNjFqWjJG?= =?utf-8?B?SnBxUHZtMnpJa2xNRklFaDhSRXV5M1lJOHNaWmNBaGNrWmg4TldBcDR6b1B2?= =?utf-8?Q?VZsVF3vwyO1WeXDqpVBCEO/yS0tl0=3D?= X-Microsoft-Exchange-Diagnostics: 1;AM3PR07MB305;6:KilsHRgXsXwng1It3xL3XqvrQ/a2B8YUegJ3qCOB/K/O/yWTbTdSDBnsJ4sf3xRuUPjoH0iingf/RsJEqP2kpDn8T6o8tBynai5/f4pdc6x4CWFNmg3gb2rTLLLzd6qLNrVRA5gybMEBFps3JutffsTE8c61eYw/zAVvqW6poX8LV23eiGlHCdZRstjBjrH326CBCLBp/lkOgVRz755ma4O0fG+0LXbDTseAm7WBWjinxhalwTqRr/7EOYegdK9dnQyeDLCYfTXC8pnb5sj6oI3B5tkEsJPYRDOO3nLbOfi++QFbwB61DXei4FOspLysAHf6YlFSdiSv8q5PDCxBGABk+1C8KYdtublYOrPp8lI=;5:wZo7G6b8e10/eNMUY22SfRkfq8fqZQhPdLGHcN+m1xqzKOpPg74Gj15ns60E4l90c9EIiIi7w7FmsRo6UZX8E0hnsAcep/w/MbLGenHVUzuXvKghDrUFFsBuoV2zk0cdzUvX4T//kufPpGg34p2gSTm2zd/xb8m3zYzh6x3f9xA=;24:s5zWVS9u1T05pS9cu4VrBq6xN28vmM2nZ6AZtoyspg1282sHNG0e4ZSyDTklzektdGF+VTXq9nZxWH2Q0JzaGFhuSZL0WYgEHZXdWuZnHuw=;7:sJcqI8Wj2ICwln3issg2HCGtGaL+AQOWEBlt8lVA2FVrI59TwtdlRi7MyeZkewbYt2YMnTGOL7F+sytZRPwTtEslm8D6UwcpM21j1oxMczZuR4IKuQ2Fdwq19bY/NiRTH0QxQXGXZX4Fm6xYJswB4JfOuZMEKNDqE1JA5eDY2sa8huf4EiaD24kLKpEw9hzIBjtGbeTrJvQcvTAhabuJ5Y8LFn0MTaLz77hLT7fkApjMITyqZbZeRIy9ycL1Xen2 SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Nov 2017 16:50:24.9928 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 9a2bd614-f5a9-4ef3-5b44-08d53036cc9c X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 92e84ceb-fbfd-47ab-be52-080c6b87953f X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM3PR07MB305 X-OriginatorOrg: ericsson.com X-IsSubscribed: yes X-SW-Source: 2017-11/txt/msg00419.txt.bz2 On 2017-11-20 06:56 AM, Pedro Alves wrote: >>> +/* Starting from a search name, return the string that finds the upper >>> + bound of all strings that start with SEARCH_NAME in a sorted name >>> + list. Returns the empty string to indicate that the upper bound is >>> + the end of the list. */ >>> + >>> +static std::string >>> +make_sort_after_prefix_name (const char *search_name) >>> +{ >>> + /* When looking to complete "func", we find the upper bound of all >>> + symbols that start with "func" by looking for where we'd insert >>> + "func"-with-last-character-incremented, i.e. "fund". */ >>> + std::string after = search_name; >>> + >>> + /* Mind 0xff though, which is a valid character in non-UTF-8 source >>> + character sets (e.g. Latin1 'ÿ'), and we can't rule out compilers >>> + allowing it in identifiers. If we run into it, increment the >>> + previous character instead and shorten the string. If the very >>> + first character turns out to be 0xff, then the upper bound is the >>> + end of the list. >> >> It's a bit of a nit, but I think this explanation could be a bit more >> precise, and maybe simpler. Maybe you could just say that you strip all >> trailing 0xff characters, and increment the last non-0xff character of >> the string. If the string is composed only of 0xff characters, then the >> upper bound is the end of the list. > > My problem with that is that it wouldn't explain _why_ we strip > the 0xffs. Right, the comment should say why, not how. >> >> The "If the very first character turns out to be 0xff" threw me off a bit, >> because if you have the string "\xffa\xff", the upper bound will be "\xffb", >> not the end of the list, despite the very first character being 0xff. > > I like that example. How about the following. It's even longer, but > I think it's justified. > > /* Starting from a search name, return the string that finds the upper > bound of all strings that start with SEARCH_NAME in a sorted name > list. Returns the empty string to indicate that the upper bound is > the end of the list. */ > > static std::string > make_sort_after_prefix_name (const char *search_name) > { > /* When looking to complete "func", we find the upper bound of all > symbols that start with "func" by looking for where we'd insert > the closest string that would follow "func" in lexicographical > order. Usually, that's "func"-with-last-character-incremented, > i.e. "fund". Mind non-ASCII characters, though. Usually those > will be UTF-8 multi-byte sequences, but we can't be certain. > Especially mind the 0xff character, which is a valid character in > non-UTF-8 source character sets (e.g. Latin1 'ÿ'), and we can't > rule out compilers allowing it in identifiers. Note that > conveniently, strcmp/strcasecmp are specified to compare > characters interpreted as unsigned char. So what we do is treat > the whole string as a base 255 number composed of a sequence of > base 255 "digits" and add 1 to it. I.e., adding 1 to 0xff wraps > to 0, and carries 1 to the following more-significant position. > If the very first character carries/overflows, then the upper > bound is the end of the list. Also the string after the empty > string is also the empty string. Making an analogy with base-10 arithmetic is actually what made me understand it. The number after 149 is not 140, it's 150. We're doing the string equivalent of that. Your explanation with base-255 numbers is very good. It doesn't really work for all-0xff strings, because adding one (with carry) to "\xff\xff" would give "\x01\x00\x00", but it doesn't really matter for the explanation :). Simon