Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed
From: Pedro Alves <palves@redhat.com>
To: Keith Seitz <keiths@redhat.com>, gdb-patches@sourceware.org
Subject: [pushed] Re: [PATCH 34/40] Make strcmp_iw NOT ignore whitespace in the middle of tokens
Date: Fri, 24 Nov 2017 23:38:00 -0000	[thread overview]
Message-ID: <5bd82949-0f23-4b20-2c7f-1b6b357c9396@redhat.com> (raw)
In-Reply-To: <50337ab0-512c-92e5-5b37-115f5dd8c427@redhat.com>

On 08/09/2017 04:48 PM, Keith Seitz wrote:
> On 06/02/2017 05:22 AM, Pedro Alves wrote:
>> currently "b func tion" manages to set a breakpoint at "function" !
>>
>> All this years I had never noticed this, but now that the linespec
>> completer actually works, this easily happens by accident, with:
> 
> That makes two of us!
> 
>> The operator_stoken changes are necessary due to a latent bug --
>> currently "operator char" becomes "operatorchar", and later look ups
>> only find it because strcmp_iw ignores the whitespace...
> 
> I have a similar fix on the compile branch. :-)
> 
>> gdb/ChangeLog:
>> yyyy-mm-dd  Pedro Alves  <palves@redhat.com>
>>
>> 	* c-exp.y (oper): Add space to operator names.
>> 	* cp-support.c (cp_symbol_name_matches_1)
>> 	(cp_fq_symbol_name_matches): Pass language to
>> 	strncmp_iw_with_mode.
>> 	(test_cp_symbol_name_cmp): Add unit tests.
>> 	* language.c (default_symbol_name_matcher): Pass language to
>> 	strncmp_iw_with_mode.
>> 	* utils.c: Include "cp-support.h" and <algorithm>.
>> 	(valid_identifier_name_char, cp_skip_operator_token, skip_ws)
>> 	(cp_is_operator): New functions.
>> 	(strncmp_iw_with_mode): Use them.  Add language parameter.  Don't
>> 	skip whitespace in the symbol name when the lookup name doesn't
>> 	have spaces, and vice versa.
>> 	(strncmp_iw, strcmp_iw): Pass language to strncmp_iw_with_mode.
>                                       ^^^^^^^^
> 
> Not just language, but language_minimal.
> 
>> 	* utils.h (strncmp_iw_with_mode): Add language parameter.
> 
>> diff --git a/gdb/c-exp.y b/gdb/c-exp.y
>> index 24a2fbd..0a182cc 100644
>> --- a/gdb/c-exp.y
>> +++ b/gdb/c-exp.y
>> @@ -1487,7 +1487,7 @@ oper:	OPERATOR NEW
>>  	|	OPERATOR '>'
>>  			{ $$ = operator_stoken (">"); }
>>  	|	OPERATOR ASSIGN_MODIFY
>> -			{ const char *op = "unknown";
>> +			{ const char *op = " unknown";
> 
> Good catch. I missed that.
> 
>>  			  switch ($2)
>>  			    {
>>  			    case BINOP_RSH:
>> @@ -1563,7 +1563,8 @@ oper:	OPERATOR NEW
>>  
>>  			  c_print_type ($2, NULL, &buf, -1, 0,
>>  					&type_print_raw_options);
>> -			  $$ = operator_stoken (buf.c_str ());
>> +			  std::string name = " " + buf.string ();
>> +			  $$ = operator_stoken (name.c_str ());
>>  			}
>>  	;
>>  
> 
> The only additional change that I have in my compile branch is
> that since this type's name could come from a user, it needs to be canonicalized.
> But I can hit that when(ever?!) I start submitting some of the precursor
> patches that I have. [I have a test that demonstrates the need for
> canonicalization in the c++compile branch.]
> 
>> diff --git a/gdb/cp-support.c b/gdb/cp-support.c
>> index 84d8a6b..4c353c5 100644
>> --- a/gdb/cp-support.c
>> +++ b/gdb/cp-support.c
>> @@ -1857,6 +1857,67 @@ test_cp_symbol_name_cmp ()
>>    CHECK_MATCH_C ("function(int)", "function(int)");
> 
> [snip a WHOLE LOTTA TESTS]
> 
> AWESOME!
> 
>> diff --git a/gdb/utils.c b/gdb/utils.c
>> index 9798edc..484c1ef 100644
>> --- a/gdb/utils.c
>> +++ b/gdb/utils.c
>> @@ -65,6 +65,8 @@
>>  #include "gdb_usleep.h"
>>  #include "interps.h"
>>  #include "gdb_regex.h"
>> +#include "cp-support.h"
>> +#include <algorithm>
>>  
>>  #if !HAVE_DECL_MALLOC
>>  extern PTR malloc ();		/* ARI: PTR */
>> @@ -2418,22 +2420,227 @@ fprintf_symbol_filtered (struct ui_file *stream, const char *name,
>>      }
>>  }
>>  
>> +/* True if CH is a character that can be part of a symbol name.  I.e.,
>> +   either a number, a letter, or a '_'.  */
>> +
>> +static bool
>> +valid_identifier_name_char (int ch)
>> +{
>> +  return (isalnum (ch) || ch == '_');
>> +}
> 
> Couldn't this be language-dependent? [Yikes!]
> Also note that there are a handful of places where this could be used
> [follow-up patch?] in linespec.c, location.c, symtab.c. Maybe more.
> 

Yeah, and in the language parsers too.  I have a follow up patch
(in another branch where I'm playing with UTF-8) that normalizes this.
It involves including safe-ctype.h, which then forces isalnum -> ISALNUM
etc., so I'd rather leave such normalization for that branch.

> We have logic for c++ operators all over the place. Note to self: this needs
> to be cleaned up/consolidated.

Definitely agreed.  I'd like to move some if this code out of the gdb/utils.c
kitchensink too, but I was leaving that for follow up patches, because moving
this code around makes it harder to maintain/review.

>> +
>> +/* Skip to end of token, or to END, whatever comes first.  */
>> +
> 
> I think a(n explicit) mention that the input is assumed to be an operator name. It's
> mentioned in the name, but please consider repeating that in the comment. It's important.

Fixed.


>> +
>> +static bool
>> +cp_is_operator (const char *string, const char *start)
> 
> Missing comment?

Yes, fixed.


> 
> Just a passing comment: I'm kinda torn on this. When new languages are added,
> this is going to be yet another place that language implementers are going to
> have to modify. While a language method would probably be better (for some
> definition of "better"), I don't want to see the language vector bloat beyond
> control either. So IMO there's no clear better path.

Inheritance is overrated.  :-P


>> +	  if (cp_is_operator (string1, string1_start))
>> +	    {
>> +	      /* An operator name in STRING1.  Check STRING2.  */
>> +	      size_t cmplen = std::min<size_t> (CP_OPERATOR_LEN, end_str2 - string2);
> 
> line length == 85

Indeed.  Fixed.


>>  strcmp_iw (const char *string1, const char *string2)
>>  {
>>    return strncmp_iw_with_mode (string1, string2, strlen (string2),
>> -			       strncmp_iw_mode::MATCH_PARAMS);
>> +			       strncmp_iw_mode::MATCH_PARAMS, language_minimal);
>>  }
> 
> I think the comments for both of these functions should be updated, since
> they pass language_minimal to strncmp_iw_with_mode. Therefore,
> 
>   strncmp_iw_with_mode (string1, string2, len, MATCH_PARAMS, a_language)
> 
> may not necessarily equal
> 
>   strncmp_iw (string1, string2)
> 
> That may not be obvious to the casual user. Some sort of caveat seems prudent.

Agreed, I've added comments.

>> --- a/gdb/utils.h
>> +++ b/gdb/utils.h
>> @@ -56,7 +56,8 @@ enum class strncmp_iw_mode
>>  extern int strncmp_iw_with_mode (const char *string1,
>>  				 const char *string2,
>>  				 size_t string2_len,
>> -				 strncmp_iw_mode mode);
>> +				 strncmp_iw_mode mode,
>> +				 enum language language);
> 
> While most of these parameters are rather obvious usage, it is not obvious to
> me why a strncmp-like function needs a language definition. [Of course,
> I understand why after reading the code, but a brief mention of how LANGUAGE
> affects the operation might be useful IMO. YMMV.]
> 

Fixed.

I realized I could rebase this on top of master (instead of on top of the
wildmatching), and pushed it in.  This will allow pushing in a good
part of the tests (in a follow up patch), exercising the earlier
get-rid-of-quoting-linespecs improvements, label completion, etc., before
the wild matching part is in.

Here's what I pushed in.

From 0662b6a7c1b3b04a4ca31a09af703c91c7aa9646 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Fri, 24 Nov 2017 23:30:04 +0000
Subject: [PATCH] Make strcmp_iw NOT ignore whitespace in the middle of tokens

currently "b func tion" manages to set a breakpoint at "function" !

All these years I had never noticed this, but now that the linespec
completer actually works, this easily happens by accident, with:

  "b func t<tab>"

expecting to get "thread", but getting instead:

  "b func tion"

...

Also, this:

  "b rettypefunc<int>"

manages to set a breakpoint on "rettype func<int>()".

These things happen due to strcmp_iw "magic".

Fix it by teaching strcmp_iw about when can it skip whitespace.  This
required handling user-defined operators, and scope operators,
complicating the code a bit, unfortunately.  I added unit tests for
all the corner cases I stumbled on, as I was developing this, and then
in the end wrote a testsuite testcase covering many of the same things
and more (to be added later).

gdb/ChangeLog:
2017-11-24  Pedro Alves  <palves@redhat.com>

	* cp-support.c (cp_symbol_name_matches_1): New, factored out from
	cp_fq_symbol_name_matches.  Pass language_cplus to
	strncmp_with_mode.
	(cp_fq_symbol_name_matches): Call cp_symbol_name_matches_1.
	(selftests::test_cp_symbol_name_cmp): New.
	(_initialize_cp_support): Register "cp_symbol_name_matches"
	selftests.
	* language.c (default_symbol_name_matcher): Pass language_minimal
	to strncmp_iw_with_mode.
	* utils.c: Include "cp-support.h" and <algorithm>.
	(valid_identifier_name_char, cp_skip_operator_token, skip_ws)
	(cp_is_operator): New functions.
	(strncmp_iw_with_mode): Use them.  Add language parameter.  Don't
	skip whitespace in the symbol name when the lookup name doesn't
	have spaces, and vice versa.
	(strncmp_iw, strcmp_iw): Pass language to strncmp_iw_with_mode.
	* utils.h (strncmp_iw_with_mode): Add language parameter.
---
 gdb/ChangeLog    |  20 +++++
 gdb/cp-support.c | 178 +++++++++++++++++++++++++++++++++++++++---
 gdb/language.c   |   2 +-
 gdb/utils.c      | 233 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
 gdb/utils.h      |  18 ++++-
 5 files changed, 429 insertions(+), 22 deletions(-)

diff --git a/gdb/ChangeLog b/gdb/ChangeLog
index 26d5cd3..befce60 100644
--- a/gdb/ChangeLog
+++ b/gdb/ChangeLog
@@ -1,3 +1,23 @@
+2017-11-24  Pedro Alves  <palves@redhat.com>
+
+	* cp-support.c (cp_symbol_name_matches_1): New, factored out from
+	cp_fq_symbol_name_matches.  Pass language_cplus to
+	strncmp_with_mode.
+	(cp_fq_symbol_name_matches): Call cp_symbol_name_matches_1.
+	(selftests::test_cp_symbol_name_cmp): New.
+	(_initialize_cp_support): Register "cp_symbol_name_matches"
+	selftests.
+	* language.c (default_symbol_name_matcher): Pass language_minimal
+	to strncmp_iw_with_mode.
+	* utils.c: Include "cp-support.h" and <algorithm>.
+	(valid_identifier_name_char, cp_skip_operator_token, skip_ws)
+	(cp_is_operator): New functions.
+	(strncmp_iw_with_mode): Use them.  Add language parameter.  Don't
+	skip whitespace in the symbol name when the lookup name doesn't
+	have spaces, and vice versa.
+	(strncmp_iw, strcmp_iw): Pass language to strncmp_iw_with_mode.
+	* utils.h (strncmp_iw_with_mode): Add language parameter.
+
 2017-11-24  Joel Brobecker  <brobecker@adacore.com>
 
 	* ada-lang.c (ada_exception_message_1, ada_exception_message):
diff --git a/gdb/cp-support.c b/gdb/cp-support.c
index 1cab69b..368112a 100644
--- a/gdb/cp-support.c
+++ b/gdb/cp-support.c
@@ -1617,6 +1617,39 @@ gdb_sniff_from_mangled_name (const char *mangled, char **demangled)
 
 /* C++ symbol_name_matcher_ftype implementation.  */
 
+/* Helper for cp_fq_symbol_name_matches (i.e.,
+   symbol_name_matcher_ftype implementation).  Split to a separate
+   function for unit-testing convenience.
+
+   See symbol_name_matcher_ftype for description of SYMBOL_SEARCH_NAME
+   and COMP_MATCH_RES.
+
+   LOOKUP_NAME/LOOKUP_NAME_LEN is the name we're looking up.
+
+   See strncmp_iw_with_mode for description of MODE.
+*/
+
+static bool
+cp_symbol_name_matches_1 (const char *symbol_search_name,
+			  const char *lookup_name,
+			  size_t lookup_name_len,
+			  strncmp_iw_mode mode,
+			  completion_match *match)
+{
+  if (strncmp_iw_with_mode (symbol_search_name,
+			    lookup_name, lookup_name_len,
+			    mode, language_cplus) == 0)
+    {
+      if (match != NULL)
+	match->set_match (symbol_search_name);
+      return true;
+    }
+
+  return false;
+}
+
+/* C++ symbol_name_matcher_ftype implementation.  */
+
 static bool
 cp_fq_symbol_name_matches (const char *symbol_search_name,
 			   const lookup_name_info &lookup_name,
@@ -1629,16 +1662,9 @@ cp_fq_symbol_name_matches (const char *symbol_search_name,
 			  ? strncmp_iw_mode::NORMAL
 			  : strncmp_iw_mode::MATCH_PARAMS);
 
-  if (strncmp_iw_with_mode (symbol_search_name,
-			    name.c_str (), name.size (),
-			    mode) == 0)
-    {
-      if (match != NULL)
-	match->set_match (symbol_search_name);
-      return true;
-    }
-
-  return false;
+  return cp_symbol_name_matches_1 (symbol_search_name,
+				   name.c_str (), name.size (),
+				   mode, match);
 }
 
 /* See cp-support.h.  */
@@ -1653,6 +1679,136 @@ cp_get_symbol_name_matcher (const lookup_name_info &lookup_name)
 
 namespace selftests {
 
+void
+test_cp_symbol_name_matches ()
+{
+#define CHECK_MATCH(SYMBOL, INPUT)					\
+  SELF_CHECK (cp_symbol_name_matches_1 (SYMBOL,				\
+					INPUT, sizeof (INPUT) - 1,	\
+					strncmp_iw_mode::MATCH_PARAMS,	\
+					NULL))
+
+#define CHECK_NOT_MATCH(SYMBOL, INPUT)					\
+  SELF_CHECK (!cp_symbol_name_matches_1 (SYMBOL,			\
+					 INPUT, sizeof (INPUT) - 1,	\
+					 strncmp_iw_mode::MATCH_PARAMS,	\
+					 NULL))
+
+  /* Like CHECK_MATCH, and also check that INPUT (and all substrings
+     that start at index 0) completes to SYMBOL.  */
+#define CHECK_MATCH_C(SYMBOL, INPUT)					\
+  do									\
+    {									\
+      CHECK_MATCH (SYMBOL, INPUT);					\
+      for (size_t i = 0; i < sizeof (INPUT) - 1; i++)			\
+	SELF_CHECK (cp_symbol_name_matches_1 (SYMBOL, INPUT, i,		\
+					      strncmp_iw_mode::NORMAL,	\
+					      NULL));			\
+    } while (0)
+
+  /* Like CHECK_NOT_MATCH, and also check that INPUT does NOT complete
+     to SYMBOL.  */
+#define CHECK_NOT_MATCH_C(SYMBOL, INPUT)				\
+  do									\
+    { 									\
+      CHECK_NOT_MATCH (SYMBOL, INPUT);					\
+      SELF_CHECK (!cp_symbol_name_matches_1 (SYMBOL, INPUT,		\
+					     sizeof (INPUT) - 1,	\
+					     strncmp_iw_mode::NORMAL,	\
+					     NULL));			\
+    } while (0)
+
+  /* Lookup name without parens matches all overloads.  */
+  CHECK_MATCH_C ("function()", "function");
+  CHECK_MATCH_C ("function(int)", "function");
+
+  /* Check whitespace around parameters is ignored.  */
+  CHECK_MATCH_C ("function()", "function ()");
+  CHECK_MATCH_C ("function ( )", "function()");
+  CHECK_MATCH_C ("function ()", "function( )");
+  CHECK_MATCH_C ("func(int)", "func( int )");
+  CHECK_MATCH_C ("func(int)", "func ( int ) ");
+  CHECK_MATCH_C ("func ( int )", "func( int )");
+  CHECK_MATCH_C ("func ( int )", "func ( int ) ");
+
+  /* Check symbol name prefixes aren't incorrectly matched.  */
+  CHECK_NOT_MATCH ("func", "function");
+  CHECK_NOT_MATCH ("function", "func");
+  CHECK_NOT_MATCH ("function()", "func");
+
+  /* Check that if the lookup name includes parameters, only the right
+     overload matches.  */
+  CHECK_MATCH_C ("function(int)", "function(int)");
+  CHECK_NOT_MATCH_C ("function(int)", "function()");
+
+  /* Check that whitespace within symbol names is not ignored.  */
+  CHECK_NOT_MATCH_C ("function", "func tion");
+  CHECK_NOT_MATCH_C ("func__tion", "func_ _tion");
+  CHECK_NOT_MATCH_C ("func11tion", "func1 1tion");
+
+  /* Check the converse, which can happen with template function,
+     where the return type is part of the demangled name.  */
+  CHECK_NOT_MATCH_C ("func tion", "function");
+  CHECK_NOT_MATCH_C ("func1 1tion", "func11tion");
+  CHECK_NOT_MATCH_C ("func_ _tion", "func__tion");
+
+  /* Within parameters too.  */
+  CHECK_NOT_MATCH_C ("func(param)", "func(par am)");
+
+  /* Check handling of whitespace around C++ operators.  */
+  CHECK_NOT_MATCH_C ("operator<<", "opera tor<<");
+  CHECK_NOT_MATCH_C ("operator<<", "operator< <");
+  CHECK_NOT_MATCH_C ("operator<<", "operator < <");
+  CHECK_NOT_MATCH_C ("operator==", "operator= =");
+  CHECK_NOT_MATCH_C ("operator==", "operator = =");
+  CHECK_MATCH_C ("operator<<", "operator <<");
+  CHECK_MATCH_C ("operator<<()", "operator <<");
+  CHECK_NOT_MATCH_C ("operator<<()", "operator<<(int)");
+  CHECK_NOT_MATCH_C ("operator<<(int)", "operator<<()");
+  CHECK_MATCH_C ("operator==", "operator ==");
+  CHECK_MATCH_C ("operator==()", "operator ==");
+  CHECK_MATCH_C ("operator <<", "operator<<");
+  CHECK_MATCH_C ("operator ==", "operator==");
+  CHECK_MATCH_C ("operator bool", "operator  bool");
+  CHECK_MATCH_C ("operator bool ()", "operator  bool");
+  CHECK_MATCH_C ("operatorX<<", "operatorX < <");
+  CHECK_MATCH_C ("Xoperator<<", "Xoperator < <");
+
+  CHECK_MATCH_C ("operator()(int)", "operator()(int)");
+  CHECK_MATCH_C ("operator()(int)", "operator ( ) ( int )");
+  CHECK_MATCH_C ("operator()<long>(int)", "operator ( ) < long > ( int )");
+  /* The first "()" is not the parameter list.  */
+  CHECK_NOT_MATCH ("operator()(int)", "operator");
+
+  /* Misc user-defined operator tests.  */
+
+  CHECK_NOT_MATCH_C ("operator/=()", "operator ^=");
+  /* Same length at end of input.  */
+  CHECK_NOT_MATCH_C ("operator>>", "operator[]");
+  /* Same length but not at end of input.  */
+  CHECK_NOT_MATCH_C ("operator>>()", "operator[]()");
+
+  CHECK_MATCH_C ("base::operator char*()", "base::operator char*()");
+  CHECK_MATCH_C ("base::operator char*()", "base::operator char * ()");
+  CHECK_MATCH_C ("base::operator char**()", "base::operator char * * ()");
+  CHECK_MATCH ("base::operator char**()", "base::operator char * *");
+  CHECK_MATCH_C ("base::operator*()", "base::operator*()");
+  CHECK_NOT_MATCH_C ("base::operator char*()", "base::operatorc");
+  CHECK_NOT_MATCH ("base::operator char*()", "base::operator char");
+  CHECK_NOT_MATCH ("base::operator char*()", "base::operat");
+
+  /* Check handling of whitespace around C++ scope operators.  */
+  CHECK_NOT_MATCH_C ("foo::bar", "foo: :bar");
+  CHECK_MATCH_C ("foo::bar", "foo :: bar");
+  CHECK_MATCH_C ("foo :: bar", "foo::bar");
+
+  CHECK_MATCH_C ("abc::def::ghi()", "abc::def::ghi()");
+  CHECK_MATCH_C ("abc::def::ghi ( )", "abc::def::ghi()");
+  CHECK_MATCH_C ("abc::def::ghi()", "abc::def::ghi ( )");
+  CHECK_MATCH_C ("function()", "function()");
+  CHECK_MATCH_C ("bar::function()", "bar::function()");
+}
+
 /* If non-NULL, return STR wrapped in quotes.  Otherwise, return a
    "<null>" string (with no quotes).  */
 
@@ -1856,6 +2012,8 @@ display the offending symbol."),
 #endif
 
 #if GDB_SELF_TEST
+  selftests::register_test ("cp_symbol_name_matches",
+			    selftests::test_cp_symbol_name_matches);
   selftests::register_test ("cp_remove_params",
 			    selftests::test_cp_remove_params);
 #endif
diff --git a/gdb/language.c b/gdb/language.c
index 76047c7..2a1419c 100644
--- a/gdb/language.c
+++ b/gdb/language.c
@@ -713,7 +713,7 @@ default_symbol_name_matcher (const char *symbol_search_name,
 			  : strncmp_iw_mode::MATCH_PARAMS);
 
   if (strncmp_iw_with_mode (symbol_search_name, name.c_str (), name.size (),
-			    mode) == 0)
+			    mode, language_minimal) == 0)
     {
       if (match != NULL)
 	match->set_match (symbol_search_name);
diff --git a/gdb/utils.c b/gdb/utils.c
index b5c011b..3e817ed 100644
--- a/gdb/utils.c
+++ b/gdb/utils.c
@@ -68,6 +68,8 @@
 #include "job-control.h"
 #include "common/selftest.h"
 #include "common/gdb_optional.h"
+#include "cp-support.h"
+#include <algorithm>
 
 #if !HAVE_DECL_MALLOC
 extern PTR malloc ();		/* ARI: PTR */
@@ -2156,22 +2158,233 @@ fprintf_symbol_filtered (struct ui_file *stream, const char *name,
     }
 }
 
+/* True if CH is a character that can be part of a symbol name.  I.e.,
+   either a number, a letter, or a '_'.  */
+
+static bool
+valid_identifier_name_char (int ch)
+{
+  return (isalnum (ch) || ch == '_');
+}
+
+/* Skip to end of token, or to END, whatever comes first.  Input is
+   assumed to be a C++ operator name.  */
+
+static const char *
+cp_skip_operator_token (const char *token, const char *end)
+{
+  const char *p = token;
+  while (p != end && !isspace (*p) && *p != '(')
+    {
+      if (valid_identifier_name_char (*p))
+	{
+	  while (p != end && valid_identifier_name_char (*p))
+	    p++;
+	  return p;
+	}
+      else
+	{
+	  /* Note, ordered such that among ops that share a prefix,
+	     longer comes first.  This is so that the loop below can
+	     bail on first match.  */
+	  static const char *ops[] =
+	    {
+	      "[",
+	      "]",
+	      "~",
+	      ",",
+	      "-=", "--", "->", "-",
+	      "+=", "++", "+",
+	      "*=", "*",
+	      "/=", "/",
+	      "%=", "%",
+	      "|=", "||", "|",
+	      "&=", "&&", "&",
+	      "^=", "^",
+	      "!=", "!",
+	      "<<=", "<=", "<<", "<",
+	      ">>=", ">=", ">>", ">",
+	      "==", "=",
+	    };
+
+	  for (const char *op : ops)
+	    {
+	      size_t oplen = strlen (op);
+	      size_t lencmp = std::min<size_t> (oplen, end - p);
+
+	      if (strncmp (p, op, lencmp) == 0)
+		return p + lencmp;
+	    }
+	  /* Some unidentified character.  Return it.  */
+	  return p + 1;
+	}
+    }
+
+  return p;
+}
+
+/* Advance STRING1/STRING2 past whitespace.  */
+
+static void
+skip_ws (const char *&string1, const char *&string2, const char *end_str2)
+{
+  while (isspace (*string1))
+    string1++;
+  while (string2 < end_str2 && isspace (*string2))
+    string2++;
+}
+
+/* True if STRING points at the start of a C++ operator name.  START
+   is the start of the string that STRING points to, hence when
+   reading backwards, we must not read any character before START.  */
+
+static bool
+cp_is_operator (const char *string, const char *start)
+{
+  return ((string == start
+	   || !valid_identifier_name_char (string[-1]))
+	  && strncmp (string, CP_OPERATOR_STR, CP_OPERATOR_LEN) == 0
+	  && !valid_identifier_name_char (string[CP_OPERATOR_LEN]));
+}
+
 /* See utils.h.  */
 
 int
 strncmp_iw_with_mode (const char *string1, const char *string2,
-		      size_t string2_len, strncmp_iw_mode mode)
+		      size_t string2_len, strncmp_iw_mode mode,
+		      enum language language)
 {
+  const char *string1_start = string1;
   const char *end_str2 = string2 + string2_len;
+  bool skip_spaces = true;
+  bool have_colon_op = (language == language_cplus
+			|| language == language_rust
+			|| language == language_fortran);
 
   while (1)
     {
-      while (isspace (*string1))
-	string1++;
-      while (string2 < end_str2 && isspace (*string2))
-	string2++;
+      if (skip_spaces
+	  || ((isspace (*string1) && !valid_identifier_name_char (*string2))
+	      || (isspace (*string2) && !valid_identifier_name_char (*string1))))
+	{
+	  skip_ws (string1, string2, end_str2);
+	  skip_spaces = false;
+	}
+
       if (*string1 == '\0' || string2 == end_str2)
 	break;
+
+      /* Handle the :: operator.  */
+      if (have_colon_op && string1[0] == ':' && string1[1] == ':')
+	{
+	  if (*string2 != ':')
+	    return 1;
+
+	  string1++;
+	  string2++;
+
+	  if (string2 == end_str2)
+	    break;
+
+	  if (*string2 != ':')
+	    return 1;
+
+	  string1++;
+	  string2++;
+
+	  while (isspace (*string1))
+	    string1++;
+	  while (string2 < end_str2 && isspace (*string2))
+	    string2++;
+	  continue;
+	}
+
+      /* Handle C++ user-defined operators.  */
+      else if (language == language_cplus
+	       && *string1 == 'o')
+	{
+	  if (cp_is_operator (string1, string1_start))
+	    {
+	      /* An operator name in STRING1.  Check STRING2.  */
+	      size_t cmplen
+		= std::min<size_t> (CP_OPERATOR_LEN, end_str2 - string2);
+	      if (strncmp (string1, string2, cmplen) != 0)
+		return 1;
+
+	      string1 += cmplen;
+	      string2 += cmplen;
+
+	      if (string2 != end_str2)
+		{
+		  /* Check for "operatorX" in STRING2.  */
+		  if (valid_identifier_name_char (*string2))
+		    return 1;
+
+		  skip_ws (string1, string2, end_str2);
+		}
+
+	      /* Handle operator().  */
+	      if (*string1 == '(')
+		{
+		  if (string2 == end_str2)
+		    {
+		      if (mode == strncmp_iw_mode::NORMAL)
+			return 0;
+		      else
+			{
+			  /* Don't break for the regular return at the
+			     bottom, because "operator" should not
+			     match "operator()", since this open
+			     parentheses is not the parameter list
+			     start.  */
+			  return *string1 != '\0';
+			}
+		    }
+
+		  if (*string1 != *string2)
+		    return 1;
+
+		  string1++;
+		  string2++;
+		}
+
+	      while (1)
+		{
+		  skip_ws (string1, string2, end_str2);
+
+		  /* Skip to end of token, or to END, whatever comes
+		     first.  */
+		  const char *end_str1 = string1 + strlen (string1);
+		  const char *p1 = cp_skip_operator_token (string1, end_str1);
+		  const char *p2 = cp_skip_operator_token (string2, end_str2);
+
+		  cmplen = std::min (p1 - string1, p2 - string2);
+		  if (p2 == end_str2)
+		    {
+		      if (strncmp (string1, string2, cmplen) != 0)
+			return 1;
+		    }
+		  else
+		    {
+		      if (p1 - string1 != p2 - string2)
+			return 1;
+		      if (strncmp (string1, string2, cmplen) != 0)
+			return 1;
+		    }
+
+		  string1 += cmplen;
+		  string2 += cmplen;
+
+		  if (*string1 == '\0' || string2 == end_str2)
+		    break;
+		  if (*string1 == '(' || *string2 == '(')
+		    break;
+		}
+
+	      continue;
+	    }
+	}
+
       if (case_sensitivity == case_sensitive_on && *string1 != *string2)
 	break;
       if (case_sensitivity == case_sensitive_off
@@ -2179,6 +2392,12 @@ strncmp_iw_with_mode (const char *string1, const char *string2,
 	      != tolower ((unsigned char) *string2)))
 	break;
 
+      /* If we see any non-whitespace, non-identifier-name character
+	 (any of "()<>*&" etc.), then skip spaces the next time
+	 around.  */
+      if (!isspace (*string1) && !valid_identifier_name_char (*string1))
+	skip_spaces = true;
+
       string1++;
       string2++;
     }
@@ -2200,7 +2419,7 @@ int
 strncmp_iw (const char *string1, const char *string2, size_t string2_len)
 {
   return strncmp_iw_with_mode (string1, string2, string2_len,
-			       strncmp_iw_mode::NORMAL);
+			       strncmp_iw_mode::NORMAL, language_minimal);
 }
 
 /* See utils.h.  */
@@ -2209,7 +2428,7 @@ int
 strcmp_iw (const char *string1, const char *string2)
 {
   return strncmp_iw_with_mode (string1, string2, strlen (string2),
-			       strncmp_iw_mode::MATCH_PARAMS);
+			       strncmp_iw_mode::MATCH_PARAMS, language_minimal);
 }
 
 /* This is like strcmp except that it ignores whitespace and treats
diff --git a/gdb/utils.h b/gdb/utils.h
index e2fa430..dff4b17 100644
--- a/gdb/utils.h
+++ b/gdb/utils.h
@@ -48,17 +48,24 @@ enum class strncmp_iw_mode
 
 /* Helper for strcmp_iw and strncmp_iw.  Exported so that languages
    can implement both NORMAL and MATCH_PARAMS variants in a single
-   function and defer part of the work to strncmp_iw_with_mode.  */
+   function and defer part of the work to strncmp_iw_with_mode.
+   LANGUAGE is used to implement some context-sensitive
+   language-specific comparisons.  For example, for C++,
+   "string1=operator()" should not match "string2=operator" even in
+   MATCH_PARAMS mode.  */
 extern int strncmp_iw_with_mode (const char *string1,
 				 const char *string2,
 				 size_t string2_len,
-				 strncmp_iw_mode mode);
+				 strncmp_iw_mode mode,
+				 enum language language);
 
 /* Do a strncmp() type operation on STRING1 and STRING2, ignoring any
    differences in whitespace.  STRING2_LEN is STRING2's length.
    Returns 0 if STRING1 matches STRING2_LEN characters of STRING2,
    non-zero otherwise (slightly different than strncmp()'s range of
-   return values).  */
+   return values).  Note: passes language_minimal to
+   strncmp_iw_with_mode, and should therefore be avoided if a more
+   suitable language is available.  */
 extern int strncmp_iw (const char *string1, const char *string2,
 		       size_t string2_len);
 
@@ -70,7 +77,10 @@ extern int strncmp_iw (const char *string1, const char *string2,
    As an extra hack, string1=="FOO(ARGS)" matches string2=="FOO".
    This "feature" is useful when searching for matching C++ function
    names (such as if the user types 'break FOO', where FOO is a
-   mangled C++ function).  */
+   mangled C++ function).
+
+   Note: passes language_minimal to strncmp_iw_with_mode, and should
+   therefore be avoided if a more suitable language is available.  */
 extern int strcmp_iw (const char *string1, const char *string2);
 
 extern int strcmp_iw_ordered (const char *, const char *);
-- 
2.5.5


  reply	other threads:[~2017-11-24 23:38 UTC|newest]

Thread overview: 182+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-02 12:22 [PATCH 00/40] C++ debugging improvements: breakpoints, TAB completion, more Pedro Alves
2017-06-02 12:22 ` [PATCH 03/40] Fix gdb.base/completion.exp with --target_board=dwarf4-gdb-index Pedro Alves
2017-07-13 20:28   ` Keith Seitz
2017-07-14 16:02     ` Pedro Alves
2017-06-02 12:22 ` [PATCH 01/40] Make gdb.base/dmsym.exp independent of "set language ada" Pedro Alves
2017-07-18 19:42   ` Simon Marchi
2017-07-20 17:00     ` Pedro Alves
2017-06-02 12:22 ` [PATCH 14/40] Introduce CP_OPERATOR_STR/CP_OPERATOR_LEN and use throughout Pedro Alves
2017-07-14 18:04   ` Keith Seitz
2017-07-17 14:55     ` Pedro Alves
2017-06-02 12:22 ` [PATCH 02/40] Eliminate make_cleanup_obstack_free, introduce auto_obstack Pedro Alves
2017-06-26 13:47   ` Yao Qi
2017-06-27 10:25     ` Pedro Alves
2017-06-28 10:36   ` Yao Qi
2017-06-28 14:39     ` Pedro Alves
2017-06-28 21:33       ` Yao Qi
2017-06-02 12:22 ` [PATCH 08/40] completion_list_add_name wrapper functions Pedro Alves
2017-06-27 12:56   ` Yao Qi
2017-06-27 15:35     ` Pedro Alves
2017-06-02 12:22 ` [PATCH 06/40] Expression completer should not match explicit location options Pedro Alves
2017-06-29  8:29   ` Yao Qi
2017-06-29 10:56     ` Pedro Alves
2017-06-29 11:08       ` Pedro Alves
2017-06-29 15:23         ` Pedro Alves
2017-06-29 11:24       ` Yao Qi
2017-06-29 15:25         ` Pedro Alves
2017-06-02 12:23 ` [PATCH 38/40] Use TOLOWER in SYMBOL_HASH_NEXT Pedro Alves
2017-08-09 19:25   ` Keith Seitz
2017-11-25  0:35     ` [pushed] " Pedro Alves
2017-06-02 12:23 ` [PATCH 13/40] Introduce strncmp_iw Pedro Alves
2017-06-29  8:42   ` Yao Qi
2017-07-17 19:16     ` Pedro Alves
2017-06-02 12:23 ` [PATCH 09/40] Rename make_symbol_completion_list_fn -> symbol_completer Pedro Alves
2017-06-28 21:40   ` Yao Qi
2017-07-13 20:46   ` Keith Seitz
2017-07-17 11:00     ` Pedro Alves
2017-06-02 12:23 ` [PATCH 19/40] Fix cp_find_first_component_aux bug Pedro Alves
2017-07-17 19:17   ` Keith Seitz
2017-07-17 19:50     ` Pedro Alves
2017-07-17 21:38       ` Keith Seitz
2017-07-20 17:03         ` Pedro Alves
2017-06-02 12:23 ` [PATCH 40/40] Document breakpoints / linespec & co improvements (manual + NEWS) Pedro Alves
2017-06-02 13:01   ` Eli Zaretskii
2017-06-02 13:33     ` Pedro Alves
2017-06-21 15:50       ` Pedro Alves
2017-06-21 19:14         ` Pedro Alves
2017-06-22 19:45           ` Eli Zaretskii
2017-06-22 19:42         ` Eli Zaretskii
2017-06-21 13:32     ` Pedro Alves
2017-06-21 18:26       ` Eli Zaretskii
2017-06-21 19:01         ` Pedro Alves
2017-06-22 19:43           ` Eli Zaretskii
2017-06-02 12:23 ` [PATCH 28/40] lookup_name_info::make_ignore_params Pedro Alves
2017-08-08 20:55   ` Keith Seitz
2017-11-08 16:18     ` Pedro Alves
2017-06-02 12:23 ` [PATCH 18/40] A smarter linespec completer Pedro Alves
2017-07-15  0:07   ` Keith Seitz
2017-07-17 18:21     ` Pedro Alves
2017-07-17 19:02       ` Keith Seitz
2017-07-17 19:33         ` Pedro Alves
2017-06-02 12:23 ` [PATCH 11/40] Introduce class completion_tracker & rewrite completion<->readline interaction Pedro Alves
2017-07-14 17:23   ` Keith Seitz
2017-07-17 13:56     ` Pedro Alves
2017-07-18  8:23       ` Christophe Lyon
     [not found]         ` <845f435e-d3d5-b327-4e3a-ce9434bd6ffd@redhat.com>
2017-07-18 10:42           ` [pushed] Fix GDB builds that include the simulator (Re: [PATCH 11/40] Introduce class completion_tracker & rewrite completion<->readline interaction) Pedro Alves
2018-03-05 21:43   ` [PATCH 11/40] Introduce class completion_tracker & rewrite completion<->readline interaction Simon Marchi
2017-06-02 12:23 ` [PATCH 15/40] Rewrite/enhance explicit locations completer, parse left->right Pedro Alves
2017-07-14 20:55   ` Keith Seitz
2017-07-17 19:24     ` Pedro Alves
2017-06-02 12:23 ` [PATCH 39/40] Breakpoints in symbols with ABI tags (PR c++/19436) Pedro Alves
2017-08-09 19:34   ` Keith Seitz
2017-11-27 17:14     ` Pedro Alves
2017-06-02 12:23 ` [PATCH 35/40] Comprehensive C++ linespec/completer tests Pedro Alves
2017-08-09 17:30   ` Keith Seitz
2017-11-24 16:25     ` Pedro Alves
2017-06-02 12:23 ` [PATCH 36/40] Add comprehensive C++ operator linespec/location/completion tests Pedro Alves
2017-08-09 17:59   ` Keith Seitz
2017-11-25  0:18     ` [pushed] " Pedro Alves
2017-11-30 15:43       ` Yao Qi
2017-11-30 16:06         ` Pedro Alves
2017-11-30 16:35           ` [pushed] Fix gdb.linespec/cpls-ops.exp on 32-bit (Re: [pushed] Re: [PATCH 36/40] Add comprehensive C++ operator linespec/location/completion tests) Pedro Alves
2017-06-02 12:23 ` [PATCH 37/40] Fix completing an empty string Pedro Alves
2017-08-09 18:01   ` Keith Seitz
2017-11-25  0:28     ` Pedro Alves
2017-06-02 12:23 ` [PATCH 27/40] Make cp_remove_params return a unique_ptr Pedro Alves
2017-08-08 20:35   ` Keith Seitz
2017-10-09 15:13     ` Pedro Alves
2017-06-02 12:23 ` [PATCH 10/40] Clean up "completer_handle_brkchars" callback handling Pedro Alves
2017-07-13 21:08   ` Keith Seitz
2017-07-17 11:14     ` Pedro Alves
2017-06-02 12:23 ` [PATCH 34/40] Make strcmp_iw NOT ignore whitespace in the middle of tokens Pedro Alves
2017-08-09 15:48   ` Keith Seitz
2017-11-24 23:38     ` Pedro Alves [this message]
2017-06-02 12:28 ` [PATCH 24/40] Per-language symbol name hashing algorithm Pedro Alves
2017-07-18 17:33   ` Keith Seitz
2017-07-20 18:53     ` Pedro Alves
2017-11-08 16:08       ` Pedro Alves
2017-06-02 12:29 ` [PATCH 12/40] "complete" command and completion word break characters Pedro Alves
2017-07-14 17:50   ` Keith Seitz
2017-07-17 14:36     ` Pedro Alves
2017-06-02 12:29 ` [PATCH 33/40] Make the linespec/location completer ignore data symbols Pedro Alves
2017-08-09 15:42   ` Keith Seitz
2017-11-08 16:22     ` Pedro Alves
2017-06-02 12:29 ` [PATCH 16/40] Explicit locations -label completer Pedro Alves
2017-07-14 21:32   ` Keith Seitz
2017-06-02 12:29 ` [PATCH 07/40] objfile_per_bfd_storage non-POD Pedro Alves
2017-06-27 12:00   ` Yao Qi
2017-06-27 15:30     ` Pedro Alves
2017-06-02 12:29 ` [PATCH 17/40] Linespec lexing and C++ operators Pedro Alves
2017-07-14 21:45   ` Keith Seitz
2017-07-17 19:34     ` Pedro Alves
2017-06-02 12:29 ` [PATCH 05/40] command.h: Include scoped_restore_command.h Pedro Alves
2017-06-27 11:30   ` Yao Qi
2017-06-27 11:45     ` Pedro Alves
2017-06-27 11:52       ` Pedro Alves
2017-06-27 12:03         ` Pedro Alves
2017-06-27 15:46           ` [PATCH 05/40] command.h: Include common/scoped_restore.h Pedro Alves
2017-06-28  7:54             ` Yao Qi
2017-06-28 14:20               ` Pedro Alves
2017-06-02 12:29 ` [PATCH 21/40] Use SYMBOL_MATCHES_SEARCH_NAME some more Pedro Alves
2017-07-17 21:39   ` Keith Seitz
2017-07-20 17:08     ` Pedro Alves
2017-06-02 12:30 ` [PATCH 20/40] Eliminate block_iter_name_* Pedro Alves
2017-07-17 19:47   ` Keith Seitz
2017-07-20 17:05     ` Pedro Alves
2017-06-02 12:30 ` [PATCH 32/40] Make "break foo" find "A::foo", A::B::foo", etc. [C++ and wild matching] Pedro Alves
2017-08-08 23:48   ` Keith Seitz
2017-11-22 16:48     ` Pedro Alves
2017-11-24 16:48       ` Pedro Alves
2017-11-24 16:57         ` Pedro Alves
2017-11-28  0:39         ` Keith Seitz
2017-11-28  0:02       ` Keith Seitz
2017-11-28  0:21         ` Pedro Alves
2017-11-28  0:42           ` Keith Seitz
2017-06-02 12:30 ` [PATCH 30/40] Use search_domain::FUNCTIONS_DOMAIN when setting breakpoints Pedro Alves
2017-08-08 21:07   ` Keith Seitz
2017-11-08 16:20     ` Pedro Alves
2017-06-02 12:31 ` [PATCH 29/40] Simplify completion_list_add_name | remove sym_text / sym_text_len Pedro Alves
2017-08-08 20:59   ` Keith Seitz
2017-11-08 16:19     ` Pedro Alves
2017-06-02 12:31 ` [PATCH 04/40] Fix TAB-completion + .gdb_index slowness (generalize filename_seen_cache) Pedro Alves
2017-07-13 20:41   ` Keith Seitz
2017-07-14 19:40     ` Pedro Alves
2017-07-17 10:51       ` Pedro Alves
2017-06-02 12:31 ` [PATCH 22/40] get_int_var_value Pedro Alves
2017-07-17 22:11   ` Keith Seitz
2017-07-20 17:15     ` Pedro Alves
2017-06-02 12:33 ` [PATCH 31/40] Handle custom completion match prefix / LCD Pedro Alves
2017-08-08 21:28   ` Keith Seitz
2017-11-27 17:11     ` Pedro Alves
2017-06-02 12:39 ` [PATCH 25/40] Introduce lookup_name_info and generalize Ada's FULL/WILD name matching Pedro Alves
2017-07-18 20:14   ` Keith Seitz
2017-07-18 22:31     ` Pedro Alves
2017-07-20 19:00       ` Pedro Alves
2017-07-20 19:06         ` Pedro Alves
2017-08-08 20:29           ` Keith Seitz
2017-10-19 17:36             ` Pedro Alves
2017-11-01 15:38               ` Joel Brobecker
2017-11-08 16:10                 ` Pedro Alves
2017-11-08 22:15                   ` Joel Brobecker
2017-06-02 12:39 ` [PATCH 26/40] Optimize .gdb_index symbol name searching Pedro Alves
2017-08-08 20:32   ` Keith Seitz
2017-11-08 16:14     ` Pedro Alves
2017-11-08 16:16       ` [pushed] Reorder/reindent dw2_expand_symtabs_matching & friends (Re: [PATCH 26/40] Optimize .gdb_index symbol name searching) Pedro Alves
2017-11-18  5:23   ` [PATCH 26/40] Optimize .gdb_index symbol name searching Simon Marchi
2017-11-20  0:33     ` Pedro Alves
2017-11-20  0:42       ` [PATCH 3/3] Fix mapped_index::find_name_components_bounds upper bound computation Pedro Alves
2017-11-20  3:17         ` Simon Marchi
2017-11-20  0:42       ` [PATCH 2/3] Unit test name-component bounds searching directly Pedro Alves
2017-11-20  3:16         ` Simon Marchi
2017-11-20 14:17           ` Pedro Alves
2017-11-20  0:42       ` [PATCH 1/3] 0xff chars in name components table; cp-name-parser lex UTF-8 identifiers Pedro Alves
2017-11-20  1:38         ` Simon Marchi
2017-11-20 11:56           ` Pedro Alves
2017-11-20 16:50             ` Simon Marchi
2017-11-21  0:11               ` Pedro Alves
2017-06-02 12:39 ` [PATCH 23/40] Make language_def O(1) Pedro Alves
2017-07-17 23:03   ` Keith Seitz
2017-07-20 17:40     ` Pedro Alves
2017-07-20 18:12       ` Get rid of "set language local"? (was: Re: [PATCH 23/40] Make language_def O(1)) Pedro Alves
2017-07-20 23:44         ` Matt Rice
2017-06-02 15:26 ` [PATCH 00/40] C++ debugging improvements: breakpoints, TAB completion, more Pedro Alves

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5bd82949-0f23-4b20-2c7f-1b6b357c9396@redhat.com \
    --to=palves@redhat.com \
    --cc=gdb-patches@sourceware.org \
    --cc=keiths@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox