From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 99087 invoked by alias); 24 Nov 2017 23:38:32 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 99072 invoked by uid 89); 24 Nov 2017 23:38:30 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.2 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_STOCKGEN,KB_WAM_FROM_NAME_SINGLEWORD,SPF_HELO_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=us!, teaching, tor, START X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 24 Nov 2017 23:38:26 +0000 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A9408806A9 for ; Fri, 24 Nov 2017 23:38:25 +0000 (UTC) Received: from [127.0.0.1] (ovpn04.gateway.prod.ext.ams2.redhat.com [10.39.146.4]) by smtp.corp.redhat.com (Postfix) with ESMTP id EEE2360A98; Fri, 24 Nov 2017 23:38:23 +0000 (UTC) From: Pedro Alves Subject: [pushed] Re: [PATCH 34/40] Make strcmp_iw NOT ignore whitespace in the middle of tokens To: Keith Seitz , gdb-patches@sourceware.org References: <1496406158-12663-1-git-send-email-palves@redhat.com> <1496406158-12663-35-git-send-email-palves@redhat.com> <50337ab0-512c-92e5-5b37-115f5dd8c427@redhat.com> Message-ID: <5bd82949-0f23-4b20-2c7f-1b6b357c9396@redhat.com> Date: Fri, 24 Nov 2017 23:38:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <50337ab0-512c-92e5-5b37-115f5dd8c427@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-SW-Source: 2017-11/txt/msg00639.txt.bz2 On 08/09/2017 04:48 PM, Keith Seitz wrote: > On 06/02/2017 05:22 AM, Pedro Alves wrote: >> currently "b func tion" manages to set a breakpoint at "function" ! >> >> All this years I had never noticed this, but now that the linespec >> completer actually works, this easily happens by accident, with: > > That makes two of us! > >> The operator_stoken changes are necessary due to a latent bug -- >> currently "operator char" becomes "operatorchar", and later look ups >> only find it because strcmp_iw ignores the whitespace... > > I have a similar fix on the compile branch. :-) > >> gdb/ChangeLog: >> yyyy-mm-dd Pedro Alves >> >> * c-exp.y (oper): Add space to operator names. >> * cp-support.c (cp_symbol_name_matches_1) >> (cp_fq_symbol_name_matches): Pass language to >> strncmp_iw_with_mode. >> (test_cp_symbol_name_cmp): Add unit tests. >> * language.c (default_symbol_name_matcher): Pass language to >> strncmp_iw_with_mode. >> * utils.c: Include "cp-support.h" and . >> (valid_identifier_name_char, cp_skip_operator_token, skip_ws) >> (cp_is_operator): New functions. >> (strncmp_iw_with_mode): Use them. Add language parameter. Don't >> skip whitespace in the symbol name when the lookup name doesn't >> have spaces, and vice versa. >> (strncmp_iw, strcmp_iw): Pass language to strncmp_iw_with_mode. > ^^^^^^^^ > > Not just language, but language_minimal. > >> * utils.h (strncmp_iw_with_mode): Add language parameter. > >> diff --git a/gdb/c-exp.y b/gdb/c-exp.y >> index 24a2fbd..0a182cc 100644 >> --- a/gdb/c-exp.y >> +++ b/gdb/c-exp.y >> @@ -1487,7 +1487,7 @@ oper: OPERATOR NEW >> | OPERATOR '>' >> { $$ = operator_stoken (">"); } >> | OPERATOR ASSIGN_MODIFY >> - { const char *op = "unknown"; >> + { const char *op = " unknown"; > > Good catch. I missed that. > >> switch ($2) >> { >> case BINOP_RSH: >> @@ -1563,7 +1563,8 @@ oper: OPERATOR NEW >> >> c_print_type ($2, NULL, &buf, -1, 0, >> &type_print_raw_options); >> - $$ = operator_stoken (buf.c_str ()); >> + std::string name = " " + buf.string (); >> + $$ = operator_stoken (name.c_str ()); >> } >> ; >> > > The only additional change that I have in my compile branch is > that since this type's name could come from a user, it needs to be canonicalized. > But I can hit that when(ever?!) I start submitting some of the precursor > patches that I have. [I have a test that demonstrates the need for > canonicalization in the c++compile branch.] > >> diff --git a/gdb/cp-support.c b/gdb/cp-support.c >> index 84d8a6b..4c353c5 100644 >> --- a/gdb/cp-support.c >> +++ b/gdb/cp-support.c >> @@ -1857,6 +1857,67 @@ test_cp_symbol_name_cmp () >> CHECK_MATCH_C ("function(int)", "function(int)"); > > [snip a WHOLE LOTTA TESTS] > > AWESOME! > >> diff --git a/gdb/utils.c b/gdb/utils.c >> index 9798edc..484c1ef 100644 >> --- a/gdb/utils.c >> +++ b/gdb/utils.c >> @@ -65,6 +65,8 @@ >> #include "gdb_usleep.h" >> #include "interps.h" >> #include "gdb_regex.h" >> +#include "cp-support.h" >> +#include >> >> #if !HAVE_DECL_MALLOC >> extern PTR malloc (); /* ARI: PTR */ >> @@ -2418,22 +2420,227 @@ fprintf_symbol_filtered (struct ui_file *stream, const char *name, >> } >> } >> >> +/* True if CH is a character that can be part of a symbol name. I.e., >> + either a number, a letter, or a '_'. */ >> + >> +static bool >> +valid_identifier_name_char (int ch) >> +{ >> + return (isalnum (ch) || ch == '_'); >> +} > > Couldn't this be language-dependent? [Yikes!] > Also note that there are a handful of places where this could be used > [follow-up patch?] in linespec.c, location.c, symtab.c. Maybe more. > Yeah, and in the language parsers too. I have a follow up patch (in another branch where I'm playing with UTF-8) that normalizes this. It involves including safe-ctype.h, which then forces isalnum -> ISALNUM etc., so I'd rather leave such normalization for that branch. > We have logic for c++ operators all over the place. Note to self: this needs > to be cleaned up/consolidated. Definitely agreed. I'd like to move some if this code out of the gdb/utils.c kitchensink too, but I was leaving that for follow up patches, because moving this code around makes it harder to maintain/review. >> + >> +/* Skip to end of token, or to END, whatever comes first. */ >> + > > I think a(n explicit) mention that the input is assumed to be an operator name. It's > mentioned in the name, but please consider repeating that in the comment. It's important. Fixed. >> + >> +static bool >> +cp_is_operator (const char *string, const char *start) > > Missing comment? Yes, fixed. > > Just a passing comment: I'm kinda torn on this. When new languages are added, > this is going to be yet another place that language implementers are going to > have to modify. While a language method would probably be better (for some > definition of "better"), I don't want to see the language vector bloat beyond > control either. So IMO there's no clear better path. Inheritance is overrated. :-P >> + if (cp_is_operator (string1, string1_start)) >> + { >> + /* An operator name in STRING1. Check STRING2. */ >> + size_t cmplen = std::min (CP_OPERATOR_LEN, end_str2 - string2); > > line length == 85 Indeed. Fixed. >> strcmp_iw (const char *string1, const char *string2) >> { >> return strncmp_iw_with_mode (string1, string2, strlen (string2), >> - strncmp_iw_mode::MATCH_PARAMS); >> + strncmp_iw_mode::MATCH_PARAMS, language_minimal); >> } > > I think the comments for both of these functions should be updated, since > they pass language_minimal to strncmp_iw_with_mode. Therefore, > > strncmp_iw_with_mode (string1, string2, len, MATCH_PARAMS, a_language) > > may not necessarily equal > > strncmp_iw (string1, string2) > > That may not be obvious to the casual user. Some sort of caveat seems prudent. Agreed, I've added comments. >> --- a/gdb/utils.h >> +++ b/gdb/utils.h >> @@ -56,7 +56,8 @@ enum class strncmp_iw_mode >> extern int strncmp_iw_with_mode (const char *string1, >> const char *string2, >> size_t string2_len, >> - strncmp_iw_mode mode); >> + strncmp_iw_mode mode, >> + enum language language); > > While most of these parameters are rather obvious usage, it is not obvious to > me why a strncmp-like function needs a language definition. [Of course, > I understand why after reading the code, but a brief mention of how LANGUAGE > affects the operation might be useful IMO. YMMV.] > Fixed. I realized I could rebase this on top of master (instead of on top of the wildmatching), and pushed it in. This will allow pushing in a good part of the tests (in a follow up patch), exercising the earlier get-rid-of-quoting-linespecs improvements, label completion, etc., before the wild matching part is in. Here's what I pushed in. >From 0662b6a7c1b3b04a4ca31a09af703c91c7aa9646 Mon Sep 17 00:00:00 2001 From: Pedro Alves Date: Fri, 24 Nov 2017 23:30:04 +0000 Subject: [PATCH] Make strcmp_iw NOT ignore whitespace in the middle of tokens currently "b func tion" manages to set a breakpoint at "function" ! All these years I had never noticed this, but now that the linespec completer actually works, this easily happens by accident, with: "b func t" expecting to get "thread", but getting instead: "b func tion" ... Also, this: "b rettypefunc" manages to set a breakpoint on "rettype func()". These things happen due to strcmp_iw "magic". Fix it by teaching strcmp_iw about when can it skip whitespace. This required handling user-defined operators, and scope operators, complicating the code a bit, unfortunately. I added unit tests for all the corner cases I stumbled on, as I was developing this, and then in the end wrote a testsuite testcase covering many of the same things and more (to be added later). gdb/ChangeLog: 2017-11-24 Pedro Alves * cp-support.c (cp_symbol_name_matches_1): New, factored out from cp_fq_symbol_name_matches. Pass language_cplus to strncmp_with_mode. (cp_fq_symbol_name_matches): Call cp_symbol_name_matches_1. (selftests::test_cp_symbol_name_cmp): New. (_initialize_cp_support): Register "cp_symbol_name_matches" selftests. * language.c (default_symbol_name_matcher): Pass language_minimal to strncmp_iw_with_mode. * utils.c: Include "cp-support.h" and . (valid_identifier_name_char, cp_skip_operator_token, skip_ws) (cp_is_operator): New functions. (strncmp_iw_with_mode): Use them. Add language parameter. Don't skip whitespace in the symbol name when the lookup name doesn't have spaces, and vice versa. (strncmp_iw, strcmp_iw): Pass language to strncmp_iw_with_mode. * utils.h (strncmp_iw_with_mode): Add language parameter. --- gdb/ChangeLog | 20 +++++ gdb/cp-support.c | 178 +++++++++++++++++++++++++++++++++++++++--- gdb/language.c | 2 +- gdb/utils.c | 233 +++++++++++++++++++++++++++++++++++++++++++++++++++++-- gdb/utils.h | 18 ++++- 5 files changed, 429 insertions(+), 22 deletions(-) diff --git a/gdb/ChangeLog b/gdb/ChangeLog index 26d5cd3..befce60 100644 --- a/gdb/ChangeLog +++ b/gdb/ChangeLog @@ -1,3 +1,23 @@ +2017-11-24 Pedro Alves + + * cp-support.c (cp_symbol_name_matches_1): New, factored out from + cp_fq_symbol_name_matches. Pass language_cplus to + strncmp_with_mode. + (cp_fq_symbol_name_matches): Call cp_symbol_name_matches_1. + (selftests::test_cp_symbol_name_cmp): New. + (_initialize_cp_support): Register "cp_symbol_name_matches" + selftests. + * language.c (default_symbol_name_matcher): Pass language_minimal + to strncmp_iw_with_mode. + * utils.c: Include "cp-support.h" and . + (valid_identifier_name_char, cp_skip_operator_token, skip_ws) + (cp_is_operator): New functions. + (strncmp_iw_with_mode): Use them. Add language parameter. Don't + skip whitespace in the symbol name when the lookup name doesn't + have spaces, and vice versa. + (strncmp_iw, strcmp_iw): Pass language to strncmp_iw_with_mode. + * utils.h (strncmp_iw_with_mode): Add language parameter. + 2017-11-24 Joel Brobecker * ada-lang.c (ada_exception_message_1, ada_exception_message): diff --git a/gdb/cp-support.c b/gdb/cp-support.c index 1cab69b..368112a 100644 --- a/gdb/cp-support.c +++ b/gdb/cp-support.c @@ -1617,6 +1617,39 @@ gdb_sniff_from_mangled_name (const char *mangled, char **demangled) /* C++ symbol_name_matcher_ftype implementation. */ +/* Helper for cp_fq_symbol_name_matches (i.e., + symbol_name_matcher_ftype implementation). Split to a separate + function for unit-testing convenience. + + See symbol_name_matcher_ftype for description of SYMBOL_SEARCH_NAME + and COMP_MATCH_RES. + + LOOKUP_NAME/LOOKUP_NAME_LEN is the name we're looking up. + + See strncmp_iw_with_mode for description of MODE. +*/ + +static bool +cp_symbol_name_matches_1 (const char *symbol_search_name, + const char *lookup_name, + size_t lookup_name_len, + strncmp_iw_mode mode, + completion_match *match) +{ + if (strncmp_iw_with_mode (symbol_search_name, + lookup_name, lookup_name_len, + mode, language_cplus) == 0) + { + if (match != NULL) + match->set_match (symbol_search_name); + return true; + } + + return false; +} + +/* C++ symbol_name_matcher_ftype implementation. */ + static bool cp_fq_symbol_name_matches (const char *symbol_search_name, const lookup_name_info &lookup_name, @@ -1629,16 +1662,9 @@ cp_fq_symbol_name_matches (const char *symbol_search_name, ? strncmp_iw_mode::NORMAL : strncmp_iw_mode::MATCH_PARAMS); - if (strncmp_iw_with_mode (symbol_search_name, - name.c_str (), name.size (), - mode) == 0) - { - if (match != NULL) - match->set_match (symbol_search_name); - return true; - } - - return false; + return cp_symbol_name_matches_1 (symbol_search_name, + name.c_str (), name.size (), + mode, match); } /* See cp-support.h. */ @@ -1653,6 +1679,136 @@ cp_get_symbol_name_matcher (const lookup_name_info &lookup_name) namespace selftests { +void +test_cp_symbol_name_matches () +{ +#define CHECK_MATCH(SYMBOL, INPUT) \ + SELF_CHECK (cp_symbol_name_matches_1 (SYMBOL, \ + INPUT, sizeof (INPUT) - 1, \ + strncmp_iw_mode::MATCH_PARAMS, \ + NULL)) + +#define CHECK_NOT_MATCH(SYMBOL, INPUT) \ + SELF_CHECK (!cp_symbol_name_matches_1 (SYMBOL, \ + INPUT, sizeof (INPUT) - 1, \ + strncmp_iw_mode::MATCH_PARAMS, \ + NULL)) + + /* Like CHECK_MATCH, and also check that INPUT (and all substrings + that start at index 0) completes to SYMBOL. */ +#define CHECK_MATCH_C(SYMBOL, INPUT) \ + do \ + { \ + CHECK_MATCH (SYMBOL, INPUT); \ + for (size_t i = 0; i < sizeof (INPUT) - 1; i++) \ + SELF_CHECK (cp_symbol_name_matches_1 (SYMBOL, INPUT, i, \ + strncmp_iw_mode::NORMAL, \ + NULL)); \ + } while (0) + + /* Like CHECK_NOT_MATCH, and also check that INPUT does NOT complete + to SYMBOL. */ +#define CHECK_NOT_MATCH_C(SYMBOL, INPUT) \ + do \ + { \ + CHECK_NOT_MATCH (SYMBOL, INPUT); \ + SELF_CHECK (!cp_symbol_name_matches_1 (SYMBOL, INPUT, \ + sizeof (INPUT) - 1, \ + strncmp_iw_mode::NORMAL, \ + NULL)); \ + } while (0) + + /* Lookup name without parens matches all overloads. */ + CHECK_MATCH_C ("function()", "function"); + CHECK_MATCH_C ("function(int)", "function"); + + /* Check whitespace around parameters is ignored. */ + CHECK_MATCH_C ("function()", "function ()"); + CHECK_MATCH_C ("function ( )", "function()"); + CHECK_MATCH_C ("function ()", "function( )"); + CHECK_MATCH_C ("func(int)", "func( int )"); + CHECK_MATCH_C ("func(int)", "func ( int ) "); + CHECK_MATCH_C ("func ( int )", "func( int )"); + CHECK_MATCH_C ("func ( int )", "func ( int ) "); + + /* Check symbol name prefixes aren't incorrectly matched. */ + CHECK_NOT_MATCH ("func", "function"); + CHECK_NOT_MATCH ("function", "func"); + CHECK_NOT_MATCH ("function()", "func"); + + /* Check that if the lookup name includes parameters, only the right + overload matches. */ + CHECK_MATCH_C ("function(int)", "function(int)"); + CHECK_NOT_MATCH_C ("function(int)", "function()"); + + /* Check that whitespace within symbol names is not ignored. */ + CHECK_NOT_MATCH_C ("function", "func tion"); + CHECK_NOT_MATCH_C ("func__tion", "func_ _tion"); + CHECK_NOT_MATCH_C ("func11tion", "func1 1tion"); + + /* Check the converse, which can happen with template function, + where the return type is part of the demangled name. */ + CHECK_NOT_MATCH_C ("func tion", "function"); + CHECK_NOT_MATCH_C ("func1 1tion", "func11tion"); + CHECK_NOT_MATCH_C ("func_ _tion", "func__tion"); + + /* Within parameters too. */ + CHECK_NOT_MATCH_C ("func(param)", "func(par am)"); + + /* Check handling of whitespace around C++ operators. */ + CHECK_NOT_MATCH_C ("operator<<", "opera tor<<"); + CHECK_NOT_MATCH_C ("operator<<", "operator< <"); + CHECK_NOT_MATCH_C ("operator<<", "operator < <"); + CHECK_NOT_MATCH_C ("operator==", "operator= ="); + CHECK_NOT_MATCH_C ("operator==", "operator = ="); + CHECK_MATCH_C ("operator<<", "operator <<"); + CHECK_MATCH_C ("operator<<()", "operator <<"); + CHECK_NOT_MATCH_C ("operator<<()", "operator<<(int)"); + CHECK_NOT_MATCH_C ("operator<<(int)", "operator<<()"); + CHECK_MATCH_C ("operator==", "operator =="); + CHECK_MATCH_C ("operator==()", "operator =="); + CHECK_MATCH_C ("operator <<", "operator<<"); + CHECK_MATCH_C ("operator ==", "operator=="); + CHECK_MATCH_C ("operator bool", "operator bool"); + CHECK_MATCH_C ("operator bool ()", "operator bool"); + CHECK_MATCH_C ("operatorX<<", "operatorX < <"); + CHECK_MATCH_C ("Xoperator<<", "Xoperator < <"); + + CHECK_MATCH_C ("operator()(int)", "operator()(int)"); + CHECK_MATCH_C ("operator()(int)", "operator ( ) ( int )"); + CHECK_MATCH_C ("operator()(int)", "operator ( ) < long > ( int )"); + /* The first "()" is not the parameter list. */ + CHECK_NOT_MATCH ("operator()(int)", "operator"); + + /* Misc user-defined operator tests. */ + + CHECK_NOT_MATCH_C ("operator/=()", "operator ^="); + /* Same length at end of input. */ + CHECK_NOT_MATCH_C ("operator>>", "operator[]"); + /* Same length but not at end of input. */ + CHECK_NOT_MATCH_C ("operator>>()", "operator[]()"); + + CHECK_MATCH_C ("base::operator char*()", "base::operator char*()"); + CHECK_MATCH_C ("base::operator char*()", "base::operator char * ()"); + CHECK_MATCH_C ("base::operator char**()", "base::operator char * * ()"); + CHECK_MATCH ("base::operator char**()", "base::operator char * *"); + CHECK_MATCH_C ("base::operator*()", "base::operator*()"); + CHECK_NOT_MATCH_C ("base::operator char*()", "base::operatorc"); + CHECK_NOT_MATCH ("base::operator char*()", "base::operator char"); + CHECK_NOT_MATCH ("base::operator char*()", "base::operat"); + + /* Check handling of whitespace around C++ scope operators. */ + CHECK_NOT_MATCH_C ("foo::bar", "foo: :bar"); + CHECK_MATCH_C ("foo::bar", "foo :: bar"); + CHECK_MATCH_C ("foo :: bar", "foo::bar"); + + CHECK_MATCH_C ("abc::def::ghi()", "abc::def::ghi()"); + CHECK_MATCH_C ("abc::def::ghi ( )", "abc::def::ghi()"); + CHECK_MATCH_C ("abc::def::ghi()", "abc::def::ghi ( )"); + CHECK_MATCH_C ("function()", "function()"); + CHECK_MATCH_C ("bar::function()", "bar::function()"); +} + /* If non-NULL, return STR wrapped in quotes. Otherwise, return a "" string (with no quotes). */ @@ -1856,6 +2012,8 @@ display the offending symbol."), #endif #if GDB_SELF_TEST + selftests::register_test ("cp_symbol_name_matches", + selftests::test_cp_symbol_name_matches); selftests::register_test ("cp_remove_params", selftests::test_cp_remove_params); #endif diff --git a/gdb/language.c b/gdb/language.c index 76047c7..2a1419c 100644 --- a/gdb/language.c +++ b/gdb/language.c @@ -713,7 +713,7 @@ default_symbol_name_matcher (const char *symbol_search_name, : strncmp_iw_mode::MATCH_PARAMS); if (strncmp_iw_with_mode (symbol_search_name, name.c_str (), name.size (), - mode) == 0) + mode, language_minimal) == 0) { if (match != NULL) match->set_match (symbol_search_name); diff --git a/gdb/utils.c b/gdb/utils.c index b5c011b..3e817ed 100644 --- a/gdb/utils.c +++ b/gdb/utils.c @@ -68,6 +68,8 @@ #include "job-control.h" #include "common/selftest.h" #include "common/gdb_optional.h" +#include "cp-support.h" +#include #if !HAVE_DECL_MALLOC extern PTR malloc (); /* ARI: PTR */ @@ -2156,22 +2158,233 @@ fprintf_symbol_filtered (struct ui_file *stream, const char *name, } } +/* True if CH is a character that can be part of a symbol name. I.e., + either a number, a letter, or a '_'. */ + +static bool +valid_identifier_name_char (int ch) +{ + return (isalnum (ch) || ch == '_'); +} + +/* Skip to end of token, or to END, whatever comes first. Input is + assumed to be a C++ operator name. */ + +static const char * +cp_skip_operator_token (const char *token, const char *end) +{ + const char *p = token; + while (p != end && !isspace (*p) && *p != '(') + { + if (valid_identifier_name_char (*p)) + { + while (p != end && valid_identifier_name_char (*p)) + p++; + return p; + } + else + { + /* Note, ordered such that among ops that share a prefix, + longer comes first. This is so that the loop below can + bail on first match. */ + static const char *ops[] = + { + "[", + "]", + "~", + ",", + "-=", "--", "->", "-", + "+=", "++", "+", + "*=", "*", + "/=", "/", + "%=", "%", + "|=", "||", "|", + "&=", "&&", "&", + "^=", "^", + "!=", "!", + "<<=", "<=", "<<", "<", + ">>=", ">=", ">>", ">", + "==", "=", + }; + + for (const char *op : ops) + { + size_t oplen = strlen (op); + size_t lencmp = std::min (oplen, end - p); + + if (strncmp (p, op, lencmp) == 0) + return p + lencmp; + } + /* Some unidentified character. Return it. */ + return p + 1; + } + } + + return p; +} + +/* Advance STRING1/STRING2 past whitespace. */ + +static void +skip_ws (const char *&string1, const char *&string2, const char *end_str2) +{ + while (isspace (*string1)) + string1++; + while (string2 < end_str2 && isspace (*string2)) + string2++; +} + +/* True if STRING points at the start of a C++ operator name. START + is the start of the string that STRING points to, hence when + reading backwards, we must not read any character before START. */ + +static bool +cp_is_operator (const char *string, const char *start) +{ + return ((string == start + || !valid_identifier_name_char (string[-1])) + && strncmp (string, CP_OPERATOR_STR, CP_OPERATOR_LEN) == 0 + && !valid_identifier_name_char (string[CP_OPERATOR_LEN])); +} + /* See utils.h. */ int strncmp_iw_with_mode (const char *string1, const char *string2, - size_t string2_len, strncmp_iw_mode mode) + size_t string2_len, strncmp_iw_mode mode, + enum language language) { + const char *string1_start = string1; const char *end_str2 = string2 + string2_len; + bool skip_spaces = true; + bool have_colon_op = (language == language_cplus + || language == language_rust + || language == language_fortran); while (1) { - while (isspace (*string1)) - string1++; - while (string2 < end_str2 && isspace (*string2)) - string2++; + if (skip_spaces + || ((isspace (*string1) && !valid_identifier_name_char (*string2)) + || (isspace (*string2) && !valid_identifier_name_char (*string1)))) + { + skip_ws (string1, string2, end_str2); + skip_spaces = false; + } + if (*string1 == '\0' || string2 == end_str2) break; + + /* Handle the :: operator. */ + if (have_colon_op && string1[0] == ':' && string1[1] == ':') + { + if (*string2 != ':') + return 1; + + string1++; + string2++; + + if (string2 == end_str2) + break; + + if (*string2 != ':') + return 1; + + string1++; + string2++; + + while (isspace (*string1)) + string1++; + while (string2 < end_str2 && isspace (*string2)) + string2++; + continue; + } + + /* Handle C++ user-defined operators. */ + else if (language == language_cplus + && *string1 == 'o') + { + if (cp_is_operator (string1, string1_start)) + { + /* An operator name in STRING1. Check STRING2. */ + size_t cmplen + = std::min (CP_OPERATOR_LEN, end_str2 - string2); + if (strncmp (string1, string2, cmplen) != 0) + return 1; + + string1 += cmplen; + string2 += cmplen; + + if (string2 != end_str2) + { + /* Check for "operatorX" in STRING2. */ + if (valid_identifier_name_char (*string2)) + return 1; + + skip_ws (string1, string2, end_str2); + } + + /* Handle operator(). */ + if (*string1 == '(') + { + if (string2 == end_str2) + { + if (mode == strncmp_iw_mode::NORMAL) + return 0; + else + { + /* Don't break for the regular return at the + bottom, because "operator" should not + match "operator()", since this open + parentheses is not the parameter list + start. */ + return *string1 != '\0'; + } + } + + if (*string1 != *string2) + return 1; + + string1++; + string2++; + } + + while (1) + { + skip_ws (string1, string2, end_str2); + + /* Skip to end of token, or to END, whatever comes + first. */ + const char *end_str1 = string1 + strlen (string1); + const char *p1 = cp_skip_operator_token (string1, end_str1); + const char *p2 = cp_skip_operator_token (string2, end_str2); + + cmplen = std::min (p1 - string1, p2 - string2); + if (p2 == end_str2) + { + if (strncmp (string1, string2, cmplen) != 0) + return 1; + } + else + { + if (p1 - string1 != p2 - string2) + return 1; + if (strncmp (string1, string2, cmplen) != 0) + return 1; + } + + string1 += cmplen; + string2 += cmplen; + + if (*string1 == '\0' || string2 == end_str2) + break; + if (*string1 == '(' || *string2 == '(') + break; + } + + continue; + } + } + if (case_sensitivity == case_sensitive_on && *string1 != *string2) break; if (case_sensitivity == case_sensitive_off @@ -2179,6 +2392,12 @@ strncmp_iw_with_mode (const char *string1, const char *string2, != tolower ((unsigned char) *string2))) break; + /* If we see any non-whitespace, non-identifier-name character + (any of "()<>*&" etc.), then skip spaces the next time + around. */ + if (!isspace (*string1) && !valid_identifier_name_char (*string1)) + skip_spaces = true; + string1++; string2++; } @@ -2200,7 +2419,7 @@ int strncmp_iw (const char *string1, const char *string2, size_t string2_len) { return strncmp_iw_with_mode (string1, string2, string2_len, - strncmp_iw_mode::NORMAL); + strncmp_iw_mode::NORMAL, language_minimal); } /* See utils.h. */ @@ -2209,7 +2428,7 @@ int strcmp_iw (const char *string1, const char *string2) { return strncmp_iw_with_mode (string1, string2, strlen (string2), - strncmp_iw_mode::MATCH_PARAMS); + strncmp_iw_mode::MATCH_PARAMS, language_minimal); } /* This is like strcmp except that it ignores whitespace and treats diff --git a/gdb/utils.h b/gdb/utils.h index e2fa430..dff4b17 100644 --- a/gdb/utils.h +++ b/gdb/utils.h @@ -48,17 +48,24 @@ enum class strncmp_iw_mode /* Helper for strcmp_iw and strncmp_iw. Exported so that languages can implement both NORMAL and MATCH_PARAMS variants in a single - function and defer part of the work to strncmp_iw_with_mode. */ + function and defer part of the work to strncmp_iw_with_mode. + LANGUAGE is used to implement some context-sensitive + language-specific comparisons. For example, for C++, + "string1=operator()" should not match "string2=operator" even in + MATCH_PARAMS mode. */ extern int strncmp_iw_with_mode (const char *string1, const char *string2, size_t string2_len, - strncmp_iw_mode mode); + strncmp_iw_mode mode, + enum language language); /* Do a strncmp() type operation on STRING1 and STRING2, ignoring any differences in whitespace. STRING2_LEN is STRING2's length. Returns 0 if STRING1 matches STRING2_LEN characters of STRING2, non-zero otherwise (slightly different than strncmp()'s range of - return values). */ + return values). Note: passes language_minimal to + strncmp_iw_with_mode, and should therefore be avoided if a more + suitable language is available. */ extern int strncmp_iw (const char *string1, const char *string2, size_t string2_len); @@ -70,7 +77,10 @@ extern int strncmp_iw (const char *string1, const char *string2, As an extra hack, string1=="FOO(ARGS)" matches string2=="FOO". This "feature" is useful when searching for matching C++ function names (such as if the user types 'break FOO', where FOO is a - mangled C++ function). */ + mangled C++ function). + + Note: passes language_minimal to strncmp_iw_with_mode, and should + therefore be avoided if a more suitable language is available. */ extern int strcmp_iw (const char *string1, const char *string2); extern int strcmp_iw_ordered (const char *, const char *); -- 2.5.5