From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 80635 invoked by alias); 9 Aug 2017 15:48:49 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 80166 invoked by uid 89); 9 Aug 2017 15:48:21 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=casual, prudent X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 09 Aug 2017 15:48:11 +0000 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8B597357E44 for ; Wed, 9 Aug 2017 15:48:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 8B597357E44 Authentication-Results: ext-mx02.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx02.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=keiths@redhat.com Received: from valrhona.uglyboxes.com (ovpn04.gateway.prod.ext.phx2.redhat.com [10.5.9.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 449E05D9CB; Wed, 9 Aug 2017 15:48:10 +0000 (UTC) Subject: Re: [PATCH 34/40] Make strcmp_iw NOT ignore whitespace in the middle of tokens To: Pedro Alves , gdb-patches@sourceware.org References: <1496406158-12663-1-git-send-email-palves@redhat.com> <1496406158-12663-35-git-send-email-palves@redhat.com> From: Keith Seitz Message-ID: <50337ab0-512c-92e5-5b37-115f5dd8c427@redhat.com> Date: Wed, 09 Aug 2017 15:48:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <1496406158-12663-35-git-send-email-palves@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2017-08/txt/msg00187.txt.bz2 On 06/02/2017 05:22 AM, Pedro Alves wrote: > currently "b func tion" manages to set a breakpoint at "function" ! > > All this years I had never noticed this, but now that the linespec > completer actually works, this easily happens by accident, with: That makes two of us! > The operator_stoken changes are necessary due to a latent bug -- > currently "operator char" becomes "operatorchar", and later look ups > only find it because strcmp_iw ignores the whitespace... I have a similar fix on the compile branch. :-) > gdb/ChangeLog: > yyyy-mm-dd Pedro Alves > > * c-exp.y (oper): Add space to operator names. > * cp-support.c (cp_symbol_name_matches_1) > (cp_fq_symbol_name_matches): Pass language to > strncmp_iw_with_mode. > (test_cp_symbol_name_cmp): Add unit tests. > * language.c (default_symbol_name_matcher): Pass language to > strncmp_iw_with_mode. > * utils.c: Include "cp-support.h" and . > (valid_identifier_name_char, cp_skip_operator_token, skip_ws) > (cp_is_operator): New functions. > (strncmp_iw_with_mode): Use them. Add language parameter. Don't > skip whitespace in the symbol name when the lookup name doesn't > have spaces, and vice versa. > (strncmp_iw, strcmp_iw): Pass language to strncmp_iw_with_mode. ^^^^^^^^ Not just language, but language_minimal. > * utils.h (strncmp_iw_with_mode): Add language parameter. > diff --git a/gdb/c-exp.y b/gdb/c-exp.y > index 24a2fbd..0a182cc 100644 > --- a/gdb/c-exp.y > +++ b/gdb/c-exp.y > @@ -1487,7 +1487,7 @@ oper: OPERATOR NEW > | OPERATOR '>' > { $$ = operator_stoken (">"); } > | OPERATOR ASSIGN_MODIFY > - { const char *op = "unknown"; > + { const char *op = " unknown"; Good catch. I missed that. > switch ($2) > { > case BINOP_RSH: > @@ -1563,7 +1563,8 @@ oper: OPERATOR NEW > > c_print_type ($2, NULL, &buf, -1, 0, > &type_print_raw_options); > - $$ = operator_stoken (buf.c_str ()); > + std::string name = " " + buf.string (); > + $$ = operator_stoken (name.c_str ()); > } > ; > The only additional change that I have in my compile branch is that since this type's name could come from a user, it needs to be canonicalized. But I can hit that when(ever?!) I start submitting some of the precursor patches that I have. [I have a test that demonstrates the need for canonicalization in the c++compile branch.] > diff --git a/gdb/cp-support.c b/gdb/cp-support.c > index 84d8a6b..4c353c5 100644 > --- a/gdb/cp-support.c > +++ b/gdb/cp-support.c > @@ -1857,6 +1857,67 @@ test_cp_symbol_name_cmp () > CHECK_MATCH_C ("function(int)", "function(int)"); [snip a WHOLE LOTTA TESTS] AWESOME! > diff --git a/gdb/utils.c b/gdb/utils.c > index 9798edc..484c1ef 100644 > --- a/gdb/utils.c > +++ b/gdb/utils.c > @@ -65,6 +65,8 @@ > #include "gdb_usleep.h" > #include "interps.h" > #include "gdb_regex.h" > +#include "cp-support.h" > +#include > > #if !HAVE_DECL_MALLOC > extern PTR malloc (); /* ARI: PTR */ > @@ -2418,22 +2420,227 @@ fprintf_symbol_filtered (struct ui_file *stream, const char *name, > } > } > > +/* True if CH is a character that can be part of a symbol name. I.e., > + either a number, a letter, or a '_'. */ > + > +static bool > +valid_identifier_name_char (int ch) > +{ > + return (isalnum (ch) || ch == '_'); > +} Couldn't this be language-dependent? [Yikes!] Also note that there are a handful of places where this could be used [follow-up patch?] in linespec.c, location.c, symtab.c. Maybe more. We have logic for c++ operators all over the place. Note to self: this needs to be cleaned up/consolidated. > + > +/* Skip to end of token, or to END, whatever comes first. */ > + I think a(n explicit) mention that the input is assumed to be an operator name. It's mentioned in the name, but please consider repeating that in the comment. It's important. > +static const char * > +cp_skip_operator_token (const char *token, const char *end) > +{ > + const char *p = token; > + while (p != end && !isspace (*p) && *p != '(') > + { > + if (valid_identifier_name_char (*p)) > + { > + while (p != end && valid_identifier_name_char (*p)) > + p++; > + return p; > + } > + else > + { > + /* Note, ordered such that among ops that share a prefix, > + longer comes first. This is so that the loop below can > + bail on first match. */ > + static const char *ops[] = > + { > + "[", > + "]", > + "~", > + ",", > + "-=", "--", "->", "-", > + "+=", "++", "+", > + "*=", "*", > + "/=", "/", > + "%=", "%", > + "|=", "||", "|", > + "&=", "&&", "&", > + "^=", "^", > + "!=", "!", > + "<<=", "<=", "<<", "<", > + ">>=", ">=", ">>", ">", > + "==", "=", > + }; > + Man, I'd *swear* that we have this code repeated in numerous places, but we actually don't. [Not that I looked that hard...] > + for (const char *op : ops) > + { > + size_t oplen = strlen (op); > + size_t lencmp = std::min (oplen, end - p); > + > + if (strncmp (p, op, lencmp) == 0) > + return p + lencmp; > + } > + /* Some unidentified character. Return it. */ > + return p + 1; > + } > + } > + > + return p; > +} > + > +/* Advance string1/string2 past whitespace. */ > + > +static void > +skip_ws (const char *&string1, const char *&string2, const char *end_str2) > +{ > + while (isspace (*string1)) > + string1++; > + while (string2 < end_str2 && isspace (*string2)) > + string2++; > +} > + > +static bool > +cp_is_operator (const char *string, const char *start) Missing comment? > +{ > + return ((string == start > + || !valid_identifier_name_char (string[-1])) > + && strncmp (string, CP_OPERATOR_STR, CP_OPERATOR_LEN) == 0 > + && !valid_identifier_name_char (string[CP_OPERATOR_LEN])); > +} > + > /* See utils.h. */ > > int > strncmp_iw_with_mode (const char *string1, const char *string2, > - size_t string2_len, strncmp_iw_mode mode) > + size_t string2_len, strncmp_iw_mode mode, > + enum language language) > { > + const char *string1_start = string1; > const char *end_str2 = string2 + string2_len; > + bool skip_spaces = true; > + bool have_colon_op = (language == language_cplus > + || language == language_rust > + || language == language_fortran); > Just a passing comment: I'm kinda torn on this. When new languages are added, this is going to be yet another place that language implementers are going to have to modify. While a language method would probably be better (for some definition of "better"), I don't want to see the language vector bloat beyond control either. So IMO there's no clear better path. > while (1) > { > - while (isspace (*string1)) > - string1++; > - while (string2 < end_str2 && isspace (*string2)) > - string2++; > + if (skip_spaces > + || ((isspace (*string1) && !valid_identifier_name_char (*string2)) > + || (isspace (*string2) && !valid_identifier_name_char (*string1)))) > + { > + skip_ws (string1, string2, end_str2); > + skip_spaces = false; > + } > + > if (*string1 == '\0' || string2 == end_str2) > break; > + > + /* Handle the :: operator. */ > + if (have_colon_op && string1[0] == ':' && string1[1] == ':') At least this part is language-agnostic. That's a plus. > + { > + if (*string2 != ':') > + return 1; > + > + string1++; > + string2++; > + > + if (string2 == end_str2) > + break; > + > + if (*string2 != ':') > + return 1; > + > + string1++; > + string2++; > + > + while (isspace (*string1)) > + string1++; > + while (string2 < end_str2 && isspace (*string2)) > + string2++; > + continue; > + } > + > + /* Handle C++ user-defined operators. */ > + else if (language == language_cplus > + && *string1 == 'o') > + { > + if (cp_is_operator (string1, string1_start)) > + { > + /* An operator name in STRING1. Check STRING2. */ > + size_t cmplen = std::min (CP_OPERATOR_LEN, end_str2 - string2); line length == 85 > + if (strncmp (string1, string2, cmplen) != 0) > + return 1; [snip] > @@ -2462,7 +2675,7 @@ int > strncmp_iw (const char *string1, const char *string2, size_t string2_len) > { > return strncmp_iw_with_mode (string1, string2, string2_len, > - strncmp_iw_mode::NORMAL); > + strncmp_iw_mode::NORMAL, language_minimal); > } > > /* See utils.h. */ > @@ -2471,7 +2684,7 @@ int > strcmp_iw (const char *string1, const char *string2) > { > return strncmp_iw_with_mode (string1, string2, strlen (string2), > - strncmp_iw_mode::MATCH_PARAMS); > + strncmp_iw_mode::MATCH_PARAMS, language_minimal); > } I think the comments for both of these functions should be updated, since they pass language_minimal to strncmp_iw_with_mode. Therefore, strncmp_iw_with_mode (string1, string2, len, MATCH_PARAMS, a_language) may not necessarily equal strncmp_iw (string1, string2) That may not be obvious to the casual user. Some sort of caveat seems prudent. > /* This is like strcmp except that it ignores whitespace and treats > diff --git a/gdb/utils.h b/gdb/utils.h > index 9e531e0..4ce263e 100644 > --- a/gdb/utils.h > +++ b/gdb/utils.h > @@ -56,7 +56,8 @@ enum class strncmp_iw_mode > extern int strncmp_iw_with_mode (const char *string1, > const char *string2, > size_t string2_len, > - strncmp_iw_mode mode); > + strncmp_iw_mode mode, > + enum language language); While most of these parameters are rather obvious usage, it is not obvious to me why a strncmp-like function needs a language definition. [Of course, I understand why after reading the code, but a brief mention of how LANGUAGE affects the operation might be useful IMO. YMMV.] > > /* Do a strncmp() type operation on STRING1 and STRING2, ignoring any > differences in whitespace. STRING2_LEN is STRING2's length. > Keith