From: Simon Marchi <simon.marchi@ericsson.com>
To: Pedro Alves <palves@redhat.com>, Simon Marchi <simark@simark.ca>,
<gdb-patches@sourceware.org>
Subject: Re: [PATCH 1/3] 0xff chars in name components table; cp-name-parser lex UTF-8 identifiers
Date: Mon, 20 Nov 2017 16:50:00 -0000 [thread overview]
Message-ID: <5cb02694-8f35-94ad-8bb6-6bde13258fe4@ericsson.com> (raw)
In-Reply-To: <db3aa595-1d35-51f5-bd34-1aa04707a6c8@redhat.com>
On 2017-11-20 06:56 AM, Pedro Alves wrote:
>>> +/* Starting from a search name, return the string that finds the upper
>>> + bound of all strings that start with SEARCH_NAME in a sorted name
>>> + list. Returns the empty string to indicate that the upper bound is
>>> + the end of the list. */
>>> +
>>> +static std::string
>>> +make_sort_after_prefix_name (const char *search_name)
>>> +{
>>> + /* When looking to complete "func", we find the upper bound of all
>>> + symbols that start with "func" by looking for where we'd insert
>>> + "func"-with-last-character-incremented, i.e. "fund". */
>>> + std::string after = search_name;
>>> +
>>> + /* Mind 0xff though, which is a valid character in non-UTF-8 source
>>> + character sets (e.g. Latin1 'ÿ'), and we can't rule out compilers
>>> + allowing it in identifiers. If we run into it, increment the
>>> + previous character instead and shorten the string. If the very
>>> + first character turns out to be 0xff, then the upper bound is the
>>> + end of the list.
>>
>> It's a bit of a nit, but I think this explanation could be a bit more
>> precise, and maybe simpler. Maybe you could just say that you strip all
>> trailing 0xff characters, and increment the last non-0xff character of
>> the string. If the string is composed only of 0xff characters, then the
>> upper bound is the end of the list.
>
> My problem with that is that it wouldn't explain _why_ we strip
> the 0xffs.
Right, the comment should say why, not how.
>>
>> The "If the very first character turns out to be 0xff" threw me off a bit,
>> because if you have the string "\xffa\xff", the upper bound will be "\xffb",
>> not the end of the list, despite the very first character being 0xff.
>
> I like that example. How about the following. It's even longer, but
> I think it's justified.
>
> /* Starting from a search name, return the string that finds the upper
> bound of all strings that start with SEARCH_NAME in a sorted name
> list. Returns the empty string to indicate that the upper bound is
> the end of the list. */
>
> static std::string
> make_sort_after_prefix_name (const char *search_name)
> {
> /* When looking to complete "func", we find the upper bound of all
> symbols that start with "func" by looking for where we'd insert
> the closest string that would follow "func" in lexicographical
> order. Usually, that's "func"-with-last-character-incremented,
> i.e. "fund". Mind non-ASCII characters, though. Usually those
> will be UTF-8 multi-byte sequences, but we can't be certain.
> Especially mind the 0xff character, which is a valid character in
> non-UTF-8 source character sets (e.g. Latin1 'ÿ'), and we can't
> rule out compilers allowing it in identifiers. Note that
> conveniently, strcmp/strcasecmp are specified to compare
> characters interpreted as unsigned char. So what we do is treat
> the whole string as a base 255 number composed of a sequence of
> base 255 "digits" and add 1 to it. I.e., adding 1 to 0xff wraps
> to 0, and carries 1 to the following more-significant position.
> If the very first character carries/overflows, then the upper
> bound is the end of the list. Also the string after the empty
> string is also the empty string.
Making an analogy with base-10 arithmetic is actually what made me
understand it. The number after 149 is not 140, it's 150. We're
doing the string equivalent of that. Your explanation with base-255
numbers is very good. It doesn't really work for all-0xff strings,
because adding one (with carry) to "\xff\xff" would give "\x01\x00\x00",
but it doesn't really matter for the explanation :).
Simon
next prev parent reply other threads:[~2017-11-20 16:50 UTC|newest]
Thread overview: 182+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-02 12:22 [PATCH 00/40] C++ debugging improvements: breakpoints, TAB completion, more Pedro Alves
2017-06-02 12:22 ` [PATCH 02/40] Eliminate make_cleanup_obstack_free, introduce auto_obstack Pedro Alves
2017-06-26 13:47 ` Yao Qi
2017-06-27 10:25 ` Pedro Alves
2017-06-28 10:36 ` Yao Qi
2017-06-28 14:39 ` Pedro Alves
2017-06-28 21:33 ` Yao Qi
2017-06-02 12:22 ` [PATCH 03/40] Fix gdb.base/completion.exp with --target_board=dwarf4-gdb-index Pedro Alves
2017-07-13 20:28 ` Keith Seitz
2017-07-14 16:02 ` Pedro Alves
2017-06-02 12:22 ` [PATCH 01/40] Make gdb.base/dmsym.exp independent of "set language ada" Pedro Alves
2017-07-18 19:42 ` Simon Marchi
2017-07-20 17:00 ` Pedro Alves
2017-06-02 12:22 ` [PATCH 14/40] Introduce CP_OPERATOR_STR/CP_OPERATOR_LEN and use throughout Pedro Alves
2017-07-14 18:04 ` Keith Seitz
2017-07-17 14:55 ` Pedro Alves
2017-06-02 12:22 ` [PATCH 06/40] Expression completer should not match explicit location options Pedro Alves
2017-06-29 8:29 ` Yao Qi
2017-06-29 10:56 ` Pedro Alves
2017-06-29 11:08 ` Pedro Alves
2017-06-29 15:23 ` Pedro Alves
2017-06-29 11:24 ` Yao Qi
2017-06-29 15:25 ` Pedro Alves
2017-06-02 12:22 ` [PATCH 08/40] completion_list_add_name wrapper functions Pedro Alves
2017-06-27 12:56 ` Yao Qi
2017-06-27 15:35 ` Pedro Alves
2017-06-02 12:23 ` [PATCH 11/40] Introduce class completion_tracker & rewrite completion<->readline interaction Pedro Alves
2017-07-14 17:23 ` Keith Seitz
2017-07-17 13:56 ` Pedro Alves
2017-07-18 8:23 ` Christophe Lyon
[not found] ` <845f435e-d3d5-b327-4e3a-ce9434bd6ffd@redhat.com>
2017-07-18 10:42 ` [pushed] Fix GDB builds that include the simulator (Re: [PATCH 11/40] Introduce class completion_tracker & rewrite completion<->readline interaction) Pedro Alves
2018-03-05 21:43 ` [PATCH 11/40] Introduce class completion_tracker & rewrite completion<->readline interaction Simon Marchi
2017-06-02 12:23 ` [PATCH 38/40] Use TOLOWER in SYMBOL_HASH_NEXT Pedro Alves
2017-08-09 19:25 ` Keith Seitz
2017-11-25 0:35 ` [pushed] " Pedro Alves
2017-06-02 12:23 ` [PATCH 13/40] Introduce strncmp_iw Pedro Alves
2017-06-29 8:42 ` Yao Qi
2017-07-17 19:16 ` Pedro Alves
2017-06-02 12:23 ` [PATCH 09/40] Rename make_symbol_completion_list_fn -> symbol_completer Pedro Alves
2017-06-28 21:40 ` Yao Qi
2017-07-13 20:46 ` Keith Seitz
2017-07-17 11:00 ` Pedro Alves
2017-06-02 12:23 ` [PATCH 40/40] Document breakpoints / linespec & co improvements (manual + NEWS) Pedro Alves
2017-06-02 13:01 ` Eli Zaretskii
2017-06-02 13:33 ` Pedro Alves
2017-06-21 15:50 ` Pedro Alves
2017-06-21 19:14 ` Pedro Alves
2017-06-22 19:45 ` Eli Zaretskii
2017-06-22 19:42 ` Eli Zaretskii
2017-06-21 13:32 ` Pedro Alves
2017-06-21 18:26 ` Eli Zaretskii
2017-06-21 19:01 ` Pedro Alves
2017-06-22 19:43 ` Eli Zaretskii
2017-06-02 12:23 ` [PATCH 19/40] Fix cp_find_first_component_aux bug Pedro Alves
2017-07-17 19:17 ` Keith Seitz
2017-07-17 19:50 ` Pedro Alves
2017-07-17 21:38 ` Keith Seitz
2017-07-20 17:03 ` Pedro Alves
2017-06-02 12:23 ` [PATCH 28/40] lookup_name_info::make_ignore_params Pedro Alves
2017-08-08 20:55 ` Keith Seitz
2017-11-08 16:18 ` Pedro Alves
2017-06-02 12:23 ` [PATCH 18/40] A smarter linespec completer Pedro Alves
2017-07-15 0:07 ` Keith Seitz
2017-07-17 18:21 ` Pedro Alves
2017-07-17 19:02 ` Keith Seitz
2017-07-17 19:33 ` Pedro Alves
2017-06-02 12:23 ` [PATCH 37/40] Fix completing an empty string Pedro Alves
2017-08-09 18:01 ` Keith Seitz
2017-11-25 0:28 ` Pedro Alves
2017-06-02 12:23 ` [PATCH 27/40] Make cp_remove_params return a unique_ptr Pedro Alves
2017-08-08 20:35 ` Keith Seitz
2017-10-09 15:13 ` Pedro Alves
2017-06-02 12:23 ` [PATCH 10/40] Clean up "completer_handle_brkchars" callback handling Pedro Alves
2017-07-13 21:08 ` Keith Seitz
2017-07-17 11:14 ` Pedro Alves
2017-06-02 12:23 ` [PATCH 34/40] Make strcmp_iw NOT ignore whitespace in the middle of tokens Pedro Alves
2017-08-09 15:48 ` Keith Seitz
2017-11-24 23:38 ` [pushed] " Pedro Alves
2017-06-02 12:23 ` [PATCH 15/40] Rewrite/enhance explicit locations completer, parse left->right Pedro Alves
2017-07-14 20:55 ` Keith Seitz
2017-07-17 19:24 ` Pedro Alves
2017-06-02 12:23 ` [PATCH 39/40] Breakpoints in symbols with ABI tags (PR c++/19436) Pedro Alves
2017-08-09 19:34 ` Keith Seitz
2017-11-27 17:14 ` Pedro Alves
2017-06-02 12:23 ` [PATCH 35/40] Comprehensive C++ linespec/completer tests Pedro Alves
2017-08-09 17:30 ` Keith Seitz
2017-11-24 16:25 ` Pedro Alves
2017-06-02 12:23 ` [PATCH 36/40] Add comprehensive C++ operator linespec/location/completion tests Pedro Alves
2017-08-09 17:59 ` Keith Seitz
2017-11-25 0:18 ` [pushed] " Pedro Alves
2017-11-30 15:43 ` Yao Qi
2017-11-30 16:06 ` Pedro Alves
2017-11-30 16:35 ` [pushed] Fix gdb.linespec/cpls-ops.exp on 32-bit (Re: [pushed] Re: [PATCH 36/40] Add comprehensive C++ operator linespec/location/completion tests) Pedro Alves
2017-06-02 12:28 ` [PATCH 24/40] Per-language symbol name hashing algorithm Pedro Alves
2017-07-18 17:33 ` Keith Seitz
2017-07-20 18:53 ` Pedro Alves
2017-11-08 16:08 ` Pedro Alves
2017-06-02 12:29 ` [PATCH 07/40] objfile_per_bfd_storage non-POD Pedro Alves
2017-06-27 12:00 ` Yao Qi
2017-06-27 15:30 ` Pedro Alves
2017-06-02 12:29 ` [PATCH 16/40] Explicit locations -label completer Pedro Alves
2017-07-14 21:32 ` Keith Seitz
2017-06-02 12:29 ` [PATCH 17/40] Linespec lexing and C++ operators Pedro Alves
2017-07-14 21:45 ` Keith Seitz
2017-07-17 19:34 ` Pedro Alves
2017-06-02 12:29 ` [PATCH 12/40] "complete" command and completion word break characters Pedro Alves
2017-07-14 17:50 ` Keith Seitz
2017-07-17 14:36 ` Pedro Alves
2017-06-02 12:29 ` [PATCH 33/40] Make the linespec/location completer ignore data symbols Pedro Alves
2017-08-09 15:42 ` Keith Seitz
2017-11-08 16:22 ` Pedro Alves
2017-06-02 12:29 ` [PATCH 21/40] Use SYMBOL_MATCHES_SEARCH_NAME some more Pedro Alves
2017-07-17 21:39 ` Keith Seitz
2017-07-20 17:08 ` Pedro Alves
2017-06-02 12:29 ` [PATCH 05/40] command.h: Include scoped_restore_command.h Pedro Alves
2017-06-27 11:30 ` Yao Qi
2017-06-27 11:45 ` Pedro Alves
2017-06-27 11:52 ` Pedro Alves
2017-06-27 12:03 ` Pedro Alves
2017-06-27 15:46 ` [PATCH 05/40] command.h: Include common/scoped_restore.h Pedro Alves
2017-06-28 7:54 ` Yao Qi
2017-06-28 14:20 ` Pedro Alves
2017-06-02 12:30 ` [PATCH 32/40] Make "break foo" find "A::foo", A::B::foo", etc. [C++ and wild matching] Pedro Alves
2017-08-08 23:48 ` Keith Seitz
2017-11-22 16:48 ` Pedro Alves
2017-11-24 16:48 ` Pedro Alves
2017-11-24 16:57 ` Pedro Alves
2017-11-28 0:39 ` Keith Seitz
2017-11-28 0:02 ` Keith Seitz
2017-11-28 0:21 ` Pedro Alves
2017-11-28 0:42 ` Keith Seitz
2017-06-02 12:30 ` [PATCH 30/40] Use search_domain::FUNCTIONS_DOMAIN when setting breakpoints Pedro Alves
2017-08-08 21:07 ` Keith Seitz
2017-11-08 16:20 ` Pedro Alves
2017-06-02 12:30 ` [PATCH 20/40] Eliminate block_iter_name_* Pedro Alves
2017-07-17 19:47 ` Keith Seitz
2017-07-20 17:05 ` Pedro Alves
2017-06-02 12:31 ` [PATCH 29/40] Simplify completion_list_add_name | remove sym_text / sym_text_len Pedro Alves
2017-08-08 20:59 ` Keith Seitz
2017-11-08 16:19 ` Pedro Alves
2017-06-02 12:31 ` [PATCH 04/40] Fix TAB-completion + .gdb_index slowness (generalize filename_seen_cache) Pedro Alves
2017-07-13 20:41 ` Keith Seitz
2017-07-14 19:40 ` Pedro Alves
2017-07-17 10:51 ` Pedro Alves
2017-06-02 12:31 ` [PATCH 22/40] get_int_var_value Pedro Alves
2017-07-17 22:11 ` Keith Seitz
2017-07-20 17:15 ` Pedro Alves
2017-06-02 12:33 ` [PATCH 31/40] Handle custom completion match prefix / LCD Pedro Alves
2017-08-08 21:28 ` Keith Seitz
2017-11-27 17:11 ` Pedro Alves
2017-06-02 12:39 ` [PATCH 25/40] Introduce lookup_name_info and generalize Ada's FULL/WILD name matching Pedro Alves
2017-07-18 20:14 ` Keith Seitz
2017-07-18 22:31 ` Pedro Alves
2017-07-20 19:00 ` Pedro Alves
2017-07-20 19:06 ` Pedro Alves
2017-08-08 20:29 ` Keith Seitz
2017-10-19 17:36 ` Pedro Alves
2017-11-01 15:38 ` Joel Brobecker
2017-11-08 16:10 ` Pedro Alves
2017-11-08 22:15 ` Joel Brobecker
2017-06-02 12:39 ` [PATCH 26/40] Optimize .gdb_index symbol name searching Pedro Alves
2017-08-08 20:32 ` Keith Seitz
2017-11-08 16:14 ` Pedro Alves
2017-11-08 16:16 ` [pushed] Reorder/reindent dw2_expand_symtabs_matching & friends (Re: [PATCH 26/40] Optimize .gdb_index symbol name searching) Pedro Alves
2017-11-18 5:23 ` [PATCH 26/40] Optimize .gdb_index symbol name searching Simon Marchi
2017-11-20 0:33 ` Pedro Alves
2017-11-20 0:42 ` [PATCH 3/3] Fix mapped_index::find_name_components_bounds upper bound computation Pedro Alves
2017-11-20 3:17 ` Simon Marchi
2017-11-20 0:42 ` [PATCH 2/3] Unit test name-component bounds searching directly Pedro Alves
2017-11-20 3:16 ` Simon Marchi
2017-11-20 14:17 ` Pedro Alves
2017-11-20 0:42 ` [PATCH 1/3] 0xff chars in name components table; cp-name-parser lex UTF-8 identifiers Pedro Alves
2017-11-20 1:38 ` Simon Marchi
2017-11-20 11:56 ` Pedro Alves
2017-11-20 16:50 ` Simon Marchi [this message]
2017-11-21 0:11 ` Pedro Alves
2017-06-02 12:39 ` [PATCH 23/40] Make language_def O(1) Pedro Alves
2017-07-17 23:03 ` Keith Seitz
2017-07-20 17:40 ` Pedro Alves
2017-07-20 18:12 ` Get rid of "set language local"? (was: Re: [PATCH 23/40] Make language_def O(1)) Pedro Alves
2017-07-20 23:44 ` Matt Rice
2017-06-02 15:26 ` [PATCH 00/40] C++ debugging improvements: breakpoints, TAB completion, more Pedro Alves
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5cb02694-8f35-94ad-8bb6-6bde13258fe4@ericsson.com \
--to=simon.marchi@ericsson.com \
--cc=gdb-patches@sourceware.org \
--cc=palves@redhat.com \
--cc=simark@simark.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox