[PATCH] Fix slow and non-deterministic behavior of isspace() and tolower()

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

* [PATCH] Fix slow and non-deterministic behavior of isspace() and tolower()
@ 2019-06-09 15:17 Shawn Landden
  2019-06-09 15:42 ` Eli Zaretskii
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Shawn Landden @ 2019-06-09 15:17 UTC (permalink / raw)
  To: gdb-patches; +Cc: Shawn Landden

I was getting 8% and 6% cpu usage in tolower() and isspace(),
respectively, waiting for a breakpoint on ppc64el.

Also, gdb doesn't want non-deterministic behavior here.
---
 gdb/utils.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/gdb/utils.c b/gdb/utils.c
index 9686927473..d36b942cc5 100644
--- a/gdb/utils.c
+++ b/gdb/utils.c
@@ -2626,10 +2626,29 @@ strcmp_iw (const char *string1, const char *string2)
    user searches for "foo", then strcmp will sort "foo" before "foo$".
    Then lookup_partial_symbol will notice that strcmp_iw("foo$",
    "foo") is false, so it won't proceed to the actual match of
    "foo(int)" with "foo".  */
 
+/* glibc versions of these have non-deterministic locale-dependant behavior,
+   and are very slow, taking 8% and 6% of total CPU time with some use-cases */
+#undef isspace
+static int isspace(int c)
+{
+  return c == ' ' || (unsigned)c-'\t' < 5;
+}
+#undef isupper
+static int isupper(int c)
+{
+  return (unsigned)c-'A' < 26;
+}
+#undef tolower
+static int tolower(int c)
+{
+  if (isupper(c)) return c | 32;
+  return c;
+}
+
 int
 strcmp_iw_ordered (const char *string1, const char *string2)
 {
   const char *saved_string1 = string1, *saved_string2 = string2;
   enum case_sensitivity case_pass = case_sensitive_off;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix slow and non-deterministic behavior of isspace() and tolower()
  2019-06-09 15:17 [PATCH] Fix slow and non-deterministic behavior of isspace() and tolower() Shawn Landden
@ 2019-06-09 15:42 ` Eli Zaretskii
  2019-06-09 15:51   ` Shawn Landden
  2019-06-10 21:14 ` John Baldwin
  2019-06-10 21:30 ` [PATCH v2] " Shawn Landden
  2 siblings, 1 reply; 12+ messages in thread
From: Eli Zaretskii @ 2019-06-09 15:42 UTC (permalink / raw)
  To: Shawn Landden; +Cc: gdb-patches

> From: Shawn Landden <shawn@git.icu>
> Cc: Shawn Landden <shawn@git.icu>
> Date: Sun,  9 Jun 2019 10:17:04 -0500
> 
> I was getting 8% and 6% cpu usage in tolower() and isspace(),
> respectively, waiting for a breakpoint on ppc64el.
> 
> Also, gdb doesn't want non-deterministic behavior here.

Instead of inventing our own wheel, how about using the Gnulib's
c-ctype module?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix slow and non-deterministic behavior of isspace() and tolower()
  2019-06-09 15:42 ` Eli Zaretskii
@ 2019-06-09 15:51   ` Shawn Landden
  0 siblings, 0 replies; 12+ messages in thread
From: Shawn Landden @ 2019-06-09 15:51 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gdb-patches

On Sun, Jun 9, 2019 at 10:42 AM Eli Zaretskii <eliz@gnu.org> wrote:
>
> > From: Shawn Landden <shawn@git.icu>
> > Cc: Shawn Landden <shawn@git.icu>
> > Date: Sun,  9 Jun 2019 10:17:04 -0500
> >
> > I was getting 8% and 6% cpu usage in tolower() and isspace(),
> > respectively, waiting for a breakpoint on ppc64el.
> >
> > Also, gdb doesn't want non-deterministic behavior here.
>
> Instead of inventing our own wheel, how about using the Gnulib's
> c-ctype module?
The tolower() really should only apply to languages that need it, like Ada.
It is kinda a bug to use it with C, which is case sensitive. Also we
probably don't need
the full isspace() either, and these are simple functions. I think it
is much better to be able to
see the full algorithm right in front of you. (even if I just did the
base minimum to speed this up,
without having to study it).

-Shawn Landden


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix slow and non-deterministic behavior of isspace() and tolower()
  2019-06-09 15:17 [PATCH] Fix slow and non-deterministic behavior of isspace() and tolower() Shawn Landden
  2019-06-09 15:42 ` Eli Zaretskii
@ 2019-06-10 21:14 ` John Baldwin
  2019-06-10 21:30 ` [PATCH v2] " Shawn Landden
  2 siblings, 0 replies; 12+ messages in thread
From: John Baldwin @ 2019-06-10 21:14 UTC (permalink / raw)
  To: Shawn Landden, gdb-patches

On 6/9/19 8:17 AM, Shawn Landden wrote:
> I was getting 8% and 6% cpu usage in tolower() and isspace(),
> respectively, waiting for a breakpoint on ppc64el.
> 
> Also, gdb doesn't want non-deterministic behavior here.

These routines are not always macros.  For example, FreeBSD's ctype.h
exposes these as functions for C++ and only as macros for pure C.
Perhaps it would be better to define custom functions that don't use
the POSIX names to avoid clashing with whatever libc provides if we
want something with very specific behavior.

-- 
John Baldwin


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2] Fix slow and non-deterministic behavior of isspace() and tolower()
  2019-06-09 15:17 [PATCH] Fix slow and non-deterministic behavior of isspace() and tolower() Shawn Landden
  2019-06-09 15:42 ` Eli Zaretskii
  2019-06-10 21:14 ` John Baldwin
@ 2019-06-10 21:30 ` Shawn Landden
  2019-06-21 17:35   ` Pedro Alves
  2019-06-21 19:20   ` [PATCH] " Shawn Landden
  2 siblings, 2 replies; 12+ messages in thread
From: Shawn Landden @ 2019-06-10 21:30 UTC (permalink / raw)
  To: gdb-patches; +Cc: jhb, eliz, Shawn Landden

I was getting 8% and 6% cpu usage in tolower() and isspace(),
respectively, waiting for a breakpoint on ppc64el.

Also, gdb doesn't want non-deterministic behavior here.

v2: do not clash with C99 names
---
 gdb/utils.c | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/gdb/utils.c b/gdb/utils.c
index 9686927473..0b68fabe4d 100644
--- a/gdb/utils.c
+++ b/gdb/utils.c
@@ -2626,10 +2626,29 @@ strcmp_iw (const char *string1, const char *string2)
    user searches for "foo", then strcmp will sort "foo" before "foo$".
    Then lookup_partial_symbol will notice that strcmp_iw("foo$",
    "foo") is false, so it won't proceed to the actual match of
    "foo(int)" with "foo".  */
 
+/* glibc versions of these have non-deterministic locale-dependant behavior,
+   and are very slow, taking 8% and 6% of total CPU time with some use-cases */
+
+static inline int isspace_inline(int c)
+{
+  return c == ' ' || (unsigned)c-'\t' < 5;
+}
+
+static inline int isupper_inline(int c)
+{
+  return (unsigned)c-'A' < 26;
+}
+
+static inline int tolower_inline(int c)
+{
+  if (isupper(c)) return c | 32;
+  return c;
+}
+
 int
 strcmp_iw_ordered (const char *string1, const char *string2)
 {
   const char *saved_string1 = string1, *saved_string2 = string2;
   enum case_sensitivity case_pass = case_sensitive_off;
@@ -2641,20 +2660,20 @@ strcmp_iw_ordered (const char *string1, const char *string2)
 	 strings.  */
       char c1 = 'X', c2 = 'X';
 
       while (*string1 != '\0' && *string2 != '\0')
 	{
-	  while (isspace (*string1))
+	  while (isspace_inline (*string1))
 	    string1++;
-	  while (isspace (*string2))
+	  while (isspace_inline (*string2))
 	    string2++;
 
 	  switch (case_pass)
 	  {
 	    case case_sensitive_off:
-	      c1 = tolower ((unsigned char) *string1);
-	      c2 = tolower ((unsigned char) *string2);
+	      c1 = tolower_inline ((unsigned char) *string1);
+	      c2 = tolower_inline ((unsigned char) *string2);
 	      break;
 	    case case_sensitive_on:
 	      c1 = *string1;
 	      c2 = *string2;
 	      break;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] Fix slow and non-deterministic behavior of isspace() and tolower()
  2019-06-10 21:30 ` [PATCH v2] " Shawn Landden
@ 2019-06-21 17:35   ` Pedro Alves
  2019-06-21 17:59     ` Shawn Landden
  2019-06-21 19:20   ` [PATCH] " Shawn Landden
  1 sibling, 1 reply; 12+ messages in thread
From: Pedro Alves @ 2019-06-21 17:35 UTC (permalink / raw)
  To: Shawn Landden, gdb-patches; +Cc: jhb, eliz

Hi,

On 6/10/19 10:30 PM, Shawn Landden wrote:
> I was getting 8% and 6% cpu usage in tolower() and isspace(),
> respectively, waiting for a breakpoint on ppc64el.
> 
> Also, gdb doesn't want non-deterministic behavior here.
> 
> v2: do not clash with C99 names

When I was working on the C++ wildmatching support a
couple years ago, I had some testcases that would stress name
parsing that I was running under perf, and I also noticed these functions
higher up on the profile.  I wrote a few patches back then:

 https://github.com/palves/gdb/commits/palves/ada-decode-speedups

And this one has the same idea as yours:

 https://github.com/palves/gdb/commit/f701531b79380356134d53db97adb6f467f9d784

So, I agree that this makes sense.

I also agree that we don't want to depend on the current
locale when parsing symbol names.

In my version I was naming the functions gdb_xxx while in
your version you're using xxx_inline.  gdb_xxx naming is
more common as a gdb version of some standard function,
so I would prefer that.

But, meanwhile, last year I merged this patch:
<https://sourceware.org/ml/gdb-patches/2018-05/msg00561.html>
which touches a somewhat subject.

That patch uses the existing libiberty uppercase TOLOWER, ISXDIGIT,
etc. macros, which are inline and locale independent by design.
See include/safe-ctype.h.  Can we use those instead of adding new
functions?  I don't recall if I benchmarked ISSPACE vs the gdb_isspace
in that optimization patch on my github, but I think I just didn't
remember ISSPACE back then.

Least but not least, the patch as is is not following the 
GNU/GDB coding format conventions.  Take a look here:

 https://sourceware.org/gdb/wiki/ContributionChecklist

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] Fix slow and non-deterministic behavior of isspace() and tolower()
  2019-06-21 17:35   ` Pedro Alves
@ 2019-06-21 17:59     ` Shawn Landden
  2019-06-21 18:08       ` Pedro Alves
  0 siblings, 1 reply; 12+ messages in thread
From: Shawn Landden @ 2019-06-21 17:59 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches, jhb, eliz

El vie., 21 de jun. de 2019 12:35, Pedro Alves <palves@redhat.com> escribió:

> Hi,
>
> On 6/10/19 10:30 PM, Shawn Landden wrote:
> > I was getting 8% and 6% cpu usage in tolower() and isspace(),
> > respectively, waiting for a breakpoint on ppc64el.
> >
> > Also, gdb doesn't want non-deterministic behavior here.
> >
> > v2: do not clash with C99 names
>
> When I was working on the C++ wildmatching support a
> couple years ago, I had some testcases that would stress name
> parsing that I was running under perf, and I also noticed these functions
> higher up on the profile.  I wrote a few patches back then:
>
>  https://github.com/palves/gdb/commits/palves/ada-decode-speedups
>
> And this one has the same idea as yours:
>
>
> https://github.com/palves/gdb/commit/f701531b79380356134d53db97adb6f467f9d784
>
> So, I agree that this makes sense.
>
I don't care how it gets fixed, and the GNU coding standard (which I write
to for glibc) will take more time than writing this patch. (Or your
well-documented response) Also, while I have a copyright assignment for
glibc, mine for GCC and binutils-gdb is only pending.

Go ahead and fix this, and give me credit.

Cheers.

>
> I also agree that we don't want to depend on the current
> locale when parsing symbol names.
>
> In my version I was naming the functions gdb_xxx while in
> your version you're using xxx_inline.  gdb_xxx naming is
> more common as a gdb version of some standard function,
> so I would prefer that.
>
> But, meanwhile, last year I merged this patch:
> <https://sourceware.org/ml/gdb-patches/2018-05/msg00561.html>
> which touches a somewhat subject.
>
> That patch uses the existing libiberty uppercase TOLOWER, ISXDIGIT,
> etc. macros, which are inline and locale independent by design.
> See include/safe-ctype.h.  Can we use those instead of adding new
> functions?  I don't recall if I benchmarked ISSPACE vs the gdb_isspace
> in that optimization patch on my github, but I think I just didn't
> remember ISSPACE back then.
>
> Least but not least, the patch as is is not following the
> GNU/GDB coding format conventions.  Take a look here:
>
>  https://sourceware.org/gdb/wiki/ContributionChecklist
>
> Thanks,
> Pedro Alves
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] Fix slow and non-deterministic behavior of isspace() and tolower()
  2019-06-21 17:59     ` Shawn Landden
@ 2019-06-21 18:08       ` Pedro Alves
  0 siblings, 0 replies; 12+ messages in thread
From: Pedro Alves @ 2019-06-21 18:08 UTC (permalink / raw)
  To: Shawn Landden; +Cc: gdb-patches, jhb, eliz

On 6/21/19 6:59 PM, Shawn Landden wrote:
>> On 6/10/19 10:30 PM, Shawn Landden wrote:
>>> I was getting 8% and 6% cpu usage in tolower() and isspace(),
>>> respectively, waiting for a breakpoint on ppc64el.
>>>
>>> Also, gdb doesn't want non-deterministic behavior here.
>>>
>>> v2: do not clash with C99 names
>>
>> When I was working on the C++ wildmatching support a
>> couple years ago, I had some testcases that would stress name
>> parsing that I was running under perf, and I also noticed these functions
>> higher up on the profile.  I wrote a few patches back then:
>>
>>  https://github.com/palves/gdb/commits/palves/ada-decode-speedups
>>
>> And this one has the same idea as yours:
>>
>>
>> https://github.com/palves/gdb/commit/f701531b79380356134d53db97adb6f467f9d784
>>
>> So, I agree that this makes sense.
>>
> I don't care how it gets fixed, and the GNU coding standard (which I write
> to for glibc) will take more time than writing this patch. (Or your
> well-documented response) Also, while I have a copyright assignment for
> glibc, mine for GCC and binutils-gdb is only pending.
> 
> Go ahead and fix this, and give me credit.

The interesting thing to do here is 

 >> That patch uses the existing libiberty uppercase TOLOWER, ISXDIGIT,
 >> etc. macros, which are inline and locale independent by design.
 >> See include/safe-ctype.h.  Can we use those instead of adding new
 >> functions?  I don't recall if I benchmarked ISSPACE vs the gdb_isspace
 >> in that optimization patch on my github, but I think I just didn't
 >> remember ISSPACE back then.

How were you benchmarking this?

Thanks,
Pedro Alves


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] Fix slow and non-deterministic behavior of isspace() and tolower()
  2019-06-10 21:30 ` [PATCH v2] " Shawn Landden
  2019-06-21 17:35   ` Pedro Alves
@ 2019-06-21 19:20   ` Shawn Landden
  2019-06-21 20:12     ` Pedro Alves
  1 sibling, 1 reply; 12+ messages in thread
From: Shawn Landden @ 2019-06-21 19:20 UTC (permalink / raw)
  To: gdb-patches; +Cc: Pedro Alves, jhb, eliz, Shawn Landden

I was getting 8% and 6% cpu usage in tolower() and isspace(),
respectively, waiting for a breakpoint on ppc64el.

Also, gdb doesn't want non-deterministic behavior here.

v2: do not touch C namespace
v3: Turns out there are two places using these in performance-critical
parts.
    Follow GNU coding standards.

Signed-off-by: Shawn Landden <shawn@git.icu>
Co-Author: Pedro Alves <palves@redhat.com>
Mailing-list: https://sourceware.org/ml/gdb-patches/2019-06/threads.html#00187
---
 gdb/common/common-utils.c |  6 +++---
 gdb/common/common-utils.h | 22 ++++++++++++++++++++++
 gdb/utils.c               |  8 ++++----
 3 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/gdb/common/common-utils.c b/gdb/common/common-utils.c
index ae2dd9db2b..8215f505e8 100644
--- a/gdb/common/common-utils.c
+++ b/gdb/common/common-utils.c
@@ -338,7 +338,7 @@ skip_spaces (char *chp)
 {
   if (chp == NULL)
     return NULL;
-  while (*chp && isspace (*chp))
+  while (*chp && gdb_isspace (*chp))
     chp++;
   return chp;
 }
@@ -350,7 +350,7 @@ skip_spaces (const char *chp)
 {
   if (chp == NULL)
     return NULL;
-  while (*chp && isspace (*chp))
+  while (*chp && gdb_isspace (*chp))
     chp++;
   return chp;
 }
@@ -362,7 +362,7 @@ skip_to_space (const char *chp)
 {
   if (chp == NULL)
     return NULL;
-  while (*chp && !isspace (*chp))
+  while (*chp && !gdb_isspace (*chp))
     chp++;
   return chp;
 }
diff --git a/gdb/common/common-utils.h b/gdb/common/common-utils.h
index 2320318de7..5e6e0afb9c 100644
--- a/gdb/common/common-utils.h
+++ b/gdb/common/common-utils.h
@@ -146,4 +146,26 @@ in_inclusive_range (T value, T low, T high)
   return value >= low && value <= high;
 }
 
+static inline
+int
+gdb_isspace (int c)
+{
+  return c == ' ' || (unsigned)c - '\t' < 5;
+}
+
+static inline
+int
+gdb_isupper (int c)
+{
+  return (unsigned)c - 'A' < 26;
+}
+
+static inline
+int
+gdb_tolower (int c)
+{
+  if (isupper(c)) return c | 32;
+  return c;
+}
+
 #endif
diff --git a/gdb/utils.c b/gdb/utils.c
index c531748fe4..f92e8f5346 100644
--- a/gdb/utils.c
+++ b/gdb/utils.c
@@ -2572,16 +2572,16 @@ strcmp_iw_ordered (const char *string1, const char *string2)
 
       while (*string1 != '\0' && *string2 != '\0')
 	{
-	  while (isspace (*string1))
+	  while (gdb_isspace (*string1))
 	    string1++;
-	  while (isspace (*string2))
+	  while (gdb_isspace (*string2))
 	    string2++;
 
 	  switch (case_pass)
 	  {
 	    case case_sensitive_off:
-	      c1 = tolower ((unsigned char) *string1);
-	      c2 = tolower ((unsigned char) *string2);
+	      c1 = gdb_tolower ((unsigned char) *string1);
+	      c2 = gdb_tolower ((unsigned char) *string2);
 	      break;
 	    case case_sensitive_on:
 	      c1 = *string1;
-- 
2.20.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix slow and non-deterministic behavior of isspace() and tolower()
  2019-06-21 19:20   ` [PATCH] " Shawn Landden
@ 2019-06-21 20:12     ` Pedro Alves
  2019-06-21 20:30       ` Shawn Landden
  0 siblings, 1 reply; 12+ messages in thread
From: Pedro Alves @ 2019-06-21 20:12 UTC (permalink / raw)
  To: Shawn Landden, gdb-patches; +Cc: jhb, eliz

On 6/21/19 8:19 PM, Shawn Landden wrote:
> I was getting 8% and 6% cpu usage in tolower() and isspace(),
> respectively, waiting for a breakpoint on ppc64el.
> 
> Also, gdb doesn't want non-deterministic behavior here.
> 
> v2: do not touch C namespace
> v3: Turns out there are two places using these in performance-critical
> parts.
>     Follow GNU coding standards.

This doesn't address my comments about
<https://sourceware.org/ml/gdb-patches/2019-06/msg00442.html>.

Something like this.  How does this compare?  Any significant
difference in your benchmark?  And again, how are you benchmarking
this?

From 1bd3203a26f989ba146376842f5b6a78bdfae181 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Fri, 21 Jun 2019 19:12:30 +0100
Subject: [PATCH] Use ISSPACE etc

---
 gdb/common/gdb-safe-ctype.h | 46 ++++++++++++++++++++++++++++++++++++++++++++
 gdb/utils.c                 | 47 +++++++++++++++++++++++----------------------
 2 files changed, 70 insertions(+), 23 deletions(-)
 create mode 100644 gdb/common/gdb-safe-ctype.h

diff --git a/gdb/common/gdb-safe-ctype.h b/gdb/common/gdb-safe-ctype.h
new file mode 100644
index 00000000000..01a97b7ce39
--- /dev/null
+++ b/gdb/common/gdb-safe-ctype.h
@@ -0,0 +1,46 @@
+/* Wrapper around libiberty's safe-ctype.h for GDB, the GNU debugger.
+
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#ifndef GDB_SAFE_CTYPE_H
+#define GDB_SAFE_CTYPE_H
+
+/* After safe-ctype.h is included, we can no longer use the host's
+   ctype routines.  Trying to do so results in compile errors.  Code
+   that uses safe-ctype.h that wants to refer to the locale-dependent
+   ctype functions must call these wrapper versions instead.  */
+
+static inline int
+gdb_isprint (int ch)
+{
+  return isprint (ch);
+}
+
+/* readline.h defines these symbols too, but we want libiberty's
+   versions.  */
+#undef ISALPHA
+#undef ISALNUM
+#undef ISDIGIT
+#undef ISLOWER
+#undef ISPRINT
+#undef ISUPPER
+#undef ISXDIGIT
+
+#include "safe-ctype.h"
+
+#endif
diff --git a/gdb/utils.c b/gdb/utils.c
index c7922cf7f56..367ea277c7d 100644
--- a/gdb/utils.c
+++ b/gdb/utils.c
@@ -73,6 +73,7 @@
 #include "common/pathstuff.h"
 #include "cli/cli-style.h"
 #include "common/scope-exit.h"
+#include "common/gdb-safe-ctype.h"
 
 void (*deprecated_error_begin_hook) (void);
 
@@ -1057,7 +1058,7 @@ parse_escape (struct gdbarch *gdbarch, const char **string_ptr)
 	  while (++count < 3)
 	    {
 	      c = (**string_ptr);
-	      if (isdigit (c) && c != '8' && c != '9')
+	      if (ISDIGIT (c) && c != '8' && c != '9')
 		{
 		  (*string_ptr)++;
 		  i *= 8;
@@ -1985,7 +1986,7 @@ puts_debug (char *prefix, char *string, char *suffix)
       switch (ch)
 	{
 	default:
-	  if (isprint (ch))
+	  if (gdb_isprint (ch))
 	    fputc_unfiltered (ch, gdb_stdlog);
 
 	  else
@@ -2268,7 +2269,7 @@ fprintf_symbol_filtered (struct ui_file *stream, const char *name,
 static bool
 valid_identifier_name_char (int ch)
 {
-  return (isalnum (ch) || ch == '_');
+  return (ISALNUM (ch) || ch == '_');
 }
 
 /* Skip to end of token, or to END, whatever comes first.  Input is
@@ -2278,7 +2279,7 @@ static const char *
 cp_skip_operator_token (const char *token, const char *end)
 {
   const char *p = token;
-  while (p != end && !isspace (*p) && *p != '(')
+  while (p != end && !ISSPACE (*p) && *p != '(')
     {
       if (valid_identifier_name_char (*p))
 	{
@@ -2332,9 +2333,9 @@ cp_skip_operator_token (const char *token, const char *end)
 static void
 skip_ws (const char *&string1, const char *&string2, const char *end_str2)
 {
-  while (isspace (*string1))
+  while (ISSPACE (*string1))
     string1++;
-  while (string2 < end_str2 && isspace (*string2))
+  while (string2 < end_str2 && ISSPACE (*string2))
     string2++;
 }
 
@@ -2396,8 +2397,8 @@ strncmp_iw_with_mode (const char *string1, const char *string2,
   while (1)
     {
       if (skip_spaces
-	  || ((isspace (*string1) && !valid_identifier_name_char (*string2))
-	      || (isspace (*string2) && !valid_identifier_name_char (*string1))))
+	  || ((ISSPACE (*string1) && !valid_identifier_name_char (*string2))
+	      || (ISSPACE (*string2) && !valid_identifier_name_char (*string1))))
 	{
 	  skip_ws (string1, string2, end_str2);
 	  skip_spaces = false;
@@ -2430,7 +2431,7 @@ strncmp_iw_with_mode (const char *string1, const char *string2,
 	  if (match_for_lcd != NULL && abi_start != string1)
 	    match_for_lcd->mark_ignored_range (abi_start, string1);
 
-	  while (isspace (*string1))
+	  while (ISSPACE (*string1))
 	    string1++;
 	}
 
@@ -2455,9 +2456,9 @@ strncmp_iw_with_mode (const char *string1, const char *string2,
 	  string1++;
 	  string2++;
 
-	  while (isspace (*string1))
+	  while (ISSPACE (*string1))
 	    string1++;
-	  while (string2 < end_str2 && isspace (*string2))
+	  while (string2 < end_str2 && ISSPACE (*string2))
 	    string2++;
 	  continue;
 	}
@@ -2551,14 +2552,14 @@ strncmp_iw_with_mode (const char *string1, const char *string2,
       if (case_sensitivity == case_sensitive_on && *string1 != *string2)
 	break;
       if (case_sensitivity == case_sensitive_off
-	  && (tolower ((unsigned char) *string1)
-	      != tolower ((unsigned char) *string2)))
+	  && (TOLOWER ((unsigned char) *string1)
+	      != TOLOWER ((unsigned char) *string2)))
 	break;
 
       /* If we see any non-whitespace, non-identifier-name character
 	 (any of "()<>*&" etc.), then skip spaces the next time
 	 around.  */
-      if (!isspace (*string1) && !valid_identifier_name_char (*string1))
+      if (!ISSPACE (*string1) && !valid_identifier_name_char (*string1))
 	skip_spaces = true;
 
       string1++;
@@ -2679,16 +2680,16 @@ strcmp_iw_ordered (const char *string1, const char *string2)
 
       while (*string1 != '\0' && *string2 != '\0')
 	{
-	  while (isspace (*string1))
+	  while (ISSPACE (*string1))
 	    string1++;
-	  while (isspace (*string2))
+	  while (ISSPACE (*string2))
 	    string2++;
 
 	  switch (case_pass)
 	  {
 	    case case_sensitive_off:
-	      c1 = tolower ((unsigned char) *string1);
-	      c2 = tolower ((unsigned char) *string2);
+	      c1 = TOLOWER ((unsigned char) *string1);
+	      c2 = TOLOWER ((unsigned char) *string2);
 	      break;
 	    case case_sensitive_on:
 	      c1 = *string1;
@@ -2927,17 +2928,17 @@ string_to_core_addr (const char *my_string)
 {
   CORE_ADDR addr = 0;
 
-  if (my_string[0] == '0' && tolower (my_string[1]) == 'x')
+  if (my_string[0] == '0' && TOLOWER (my_string[1]) == 'x')
     {
       /* Assume that it is in hex.  */
       int i;
 
       for (i = 2; my_string[i] != '\0'; i++)
 	{
-	  if (isdigit (my_string[i]))
+	  if (ISDIGIT (my_string[i]))
 	    addr = (my_string[i] - '0') + (addr * 16);
-	  else if (isxdigit (my_string[i]))
-	    addr = (tolower (my_string[i]) - 'a' + 0xa) + (addr * 16);
+	  else if (ISXDIGIT (my_string[i]))
+	    addr = (TOLOWER (my_string[i]) - 'a' + 0xa) + (addr * 16);
 	  else
 	    error (_("invalid hex \"%s\""), my_string);
 	}
@@ -2949,7 +2950,7 @@ string_to_core_addr (const char *my_string)
 
       for (i = 0; my_string[i] != '\0'; i++)
 	{
-	  if (isdigit (my_string[i]))
+	  if (ISDIGIT (my_string[i]))
 	    addr = (my_string[i] - '0') + (addr * 10);
 	  else
 	    error (_("invalid decimal \"%s\""), my_string);
-- 
2.14.5


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix slow and non-deterministic behavior of isspace() and tolower()
  2019-06-21 20:12     ` Pedro Alves
@ 2019-06-21 20:30       ` Shawn Landden
  2019-06-21 21:14         ` Pedro Alves
  0 siblings, 1 reply; 12+ messages in thread
From: Shawn Landden @ 2019-06-21 20:30 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches, jhb, Eli Zaretskii

On Fri, Jun 21, 2019 at 3:12 PM Pedro Alves <palves@redhat.com> wrote:
>
> On 6/21/19 8:19 PM, Shawn Landden wrote:
> > I was getting 8% and 6% cpu usage in tolower() and isspace(),
> > respectively, waiting for a breakpoint on ppc64el.
> >
> > Also, gdb doesn't want non-deterministic behavior here.
> >
> > v2: do not touch C namespace
> > v3: Turns out there are two places using these in performance-critical
> > parts.
> >     Follow GNU coding standards.
>
> This doesn't address my comments about
> <https://sourceware.org/ml/gdb-patches/2019-06/msg00442.html>.
>
> Something like this.  How does this compare?  Any significant
> difference in your benchmark?  And again, how are you benchmarking
> this?
perf record. goes from like 6-8% to almost nothing (if you remove the static).
>
> From 1bd3203a26f989ba146376842f5b6a78bdfae181 Mon Sep 17 00:00:00 2001
> From: Pedro Alves <palves@redhat.com>
> Date: Fri, 21 Jun 2019 19:12:30 +0100
> Subject: [PATCH] Use ISSPACE etc
>
> ---
>  gdb/common/gdb-safe-ctype.h | 46 ++++++++++++++++++++++++++++++++++++++++++++
>  gdb/utils.c                 | 47 +++++++++++++++++++++++----------------------
>  2 files changed, 70 insertions(+), 23 deletions(-)
>  create mode 100644 gdb/common/gdb-safe-ctype.h
>
> diff --git a/gdb/common/gdb-safe-ctype.h b/gdb/common/gdb-safe-ctype.h
> new file mode 100644
> index 00000000000..01a97b7ce39
> --- /dev/null
> +++ b/gdb/common/gdb-safe-ctype.h
> @@ -0,0 +1,46 @@
> +/* Wrapper around libiberty's safe-ctype.h for GDB, the GNU debugger.
> +
> +   Copyright (C) 2019 Free Software Foundation, Inc.
> +
> +   This file is part of GDB.
> +
> +   This program is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   This program is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GDB_SAFE_CTYPE_H
> +#define GDB_SAFE_CTYPE_H
> +
> +/* After safe-ctype.h is included, we can no longer use the host's
> +   ctype routines.  Trying to do so results in compile errors.  Code
> +   that uses safe-ctype.h that wants to refer to the locale-dependent
> +   ctype functions must call these wrapper versions instead.  */
> +
> +static inline int
> +gdb_isprint (int ch)
> +{
> +  return isprint (ch);
> +}
> +
> +/* readline.h defines these symbols too, but we want libiberty's
> +   versions.  */
> +#undef ISALPHA
> +#undef ISALNUM
> +#undef ISDIGIT
> +#undef ISLOWER
> +#undef ISPRINT
> +#undef ISUPPER
> +#undef ISXDIGIT
> +
> +#include "safe-ctype.h"
> +
> +#endif
> diff --git a/gdb/utils.c b/gdb/utils.c
> index c7922cf7f56..367ea277c7d 100644
> --- a/gdb/utils.c
> +++ b/gdb/utils.c
> @@ -73,6 +73,7 @@
>  #include "common/pathstuff.h"
>  #include "cli/cli-style.h"
>  #include "common/scope-exit.h"
> +#include "common/gdb-safe-ctype.h"
>
>  void (*deprecated_error_begin_hook) (void);
>
> @@ -1057,7 +1058,7 @@ parse_escape (struct gdbarch *gdbarch, const char **string_ptr)
>           while (++count < 3)
>             {
>               c = (**string_ptr);
> -             if (isdigit (c) && c != '8' && c != '9')
> +             if (ISDIGIT (c) && c != '8' && c != '9')
>                 {
>                   (*string_ptr)++;
>                   i *= 8;
> @@ -1985,7 +1986,7 @@ puts_debug (char *prefix, char *string, char *suffix)
>        switch (ch)
>         {
>         default:
> -         if (isprint (ch))
> +         if (gdb_isprint (ch))
>             fputc_unfiltered (ch, gdb_stdlog);
>
>           else
> @@ -2268,7 +2269,7 @@ fprintf_symbol_filtered (struct ui_file *stream, const char *name,
>  static bool
>  valid_identifier_name_char (int ch)
>  {
> -  return (isalnum (ch) || ch == '_');
> +  return (ISALNUM (ch) || ch == '_');
>  }
>
>  /* Skip to end of token, or to END, whatever comes first.  Input is
> @@ -2278,7 +2279,7 @@ static const char *
>  cp_skip_operator_token (const char *token, const char *end)
>  {
>    const char *p = token;
> -  while (p != end && !isspace (*p) && *p != '(')
> +  while (p != end && !ISSPACE (*p) && *p != '(')
>      {
>        if (valid_identifier_name_char (*p))
>         {
> @@ -2332,9 +2333,9 @@ cp_skip_operator_token (const char *token, const char *end)
>  static void
>  skip_ws (const char *&string1, const char *&string2, const char *end_str2)
>  {
> -  while (isspace (*string1))
> +  while (ISSPACE (*string1))
>      string1++;
> -  while (string2 < end_str2 && isspace (*string2))
> +  while (string2 < end_str2 && ISSPACE (*string2))
>      string2++;
>  }
>
> @@ -2396,8 +2397,8 @@ strncmp_iw_with_mode (const char *string1, const char *string2,
>    while (1)
>      {
>        if (skip_spaces
> -         || ((isspace (*string1) && !valid_identifier_name_char (*string2))
> -             || (isspace (*string2) && !valid_identifier_name_char (*string1))))
> +         || ((ISSPACE (*string1) && !valid_identifier_name_char (*string2))
> +             || (ISSPACE (*string2) && !valid_identifier_name_char (*string1))))
>         {
>           skip_ws (string1, string2, end_str2);
>           skip_spaces = false;
> @@ -2430,7 +2431,7 @@ strncmp_iw_with_mode (const char *string1, const char *string2,
>           if (match_for_lcd != NULL && abi_start != string1)
>             match_for_lcd->mark_ignored_range (abi_start, string1);
>
> -         while (isspace (*string1))
> +         while (ISSPACE (*string1))
>             string1++;
>         }
>
> @@ -2455,9 +2456,9 @@ strncmp_iw_with_mode (const char *string1, const char *string2,
>           string1++;
>           string2++;
>
> -         while (isspace (*string1))
> +         while (ISSPACE (*string1))
>             string1++;
> -         while (string2 < end_str2 && isspace (*string2))
> +         while (string2 < end_str2 && ISSPACE (*string2))
>             string2++;
>           continue;
>         }
> @@ -2551,14 +2552,14 @@ strncmp_iw_with_mode (const char *string1, const char *string2,
>        if (case_sensitivity == case_sensitive_on && *string1 != *string2)
>         break;
>        if (case_sensitivity == case_sensitive_off
> -         && (tolower ((unsigned char) *string1)
> -             != tolower ((unsigned char) *string2)))
> +         && (TOLOWER ((unsigned char) *string1)
> +             != TOLOWER ((unsigned char) *string2)))
>         break;
>
>        /* If we see any non-whitespace, non-identifier-name character
>          (any of "()<>*&" etc.), then skip spaces the next time
>          around.  */
> -      if (!isspace (*string1) && !valid_identifier_name_char (*string1))
> +      if (!ISSPACE (*string1) && !valid_identifier_name_char (*string1))
>         skip_spaces = true;
>
>        string1++;
> @@ -2679,16 +2680,16 @@ strcmp_iw_ordered (const char *string1, const char *string2)
>
>        while (*string1 != '\0' && *string2 != '\0')
>         {
> -         while (isspace (*string1))
> +         while (ISSPACE (*string1))
>             string1++;
> -         while (isspace (*string2))
> +         while (ISSPACE (*string2))
>             string2++;
>
>           switch (case_pass)
>           {
>             case case_sensitive_off:
> -             c1 = tolower ((unsigned char) *string1);
> -             c2 = tolower ((unsigned char) *string2);
> +             c1 = TOLOWER ((unsigned char) *string1);
> +             c2 = TOLOWER ((unsigned char) *string2);
>               break;
>             case case_sensitive_on:
>               c1 = *string1;
> @@ -2927,17 +2928,17 @@ string_to_core_addr (const char *my_string)
>  {
>    CORE_ADDR addr = 0;
>
> -  if (my_string[0] == '0' && tolower (my_string[1]) == 'x')
> +  if (my_string[0] == '0' && TOLOWER (my_string[1]) == 'x')
>      {
>        /* Assume that it is in hex.  */
>        int i;
>
>        for (i = 2; my_string[i] != '\0'; i++)
>         {
> -         if (isdigit (my_string[i]))
> +         if (ISDIGIT (my_string[i]))
>             addr = (my_string[i] - '0') + (addr * 16);
> -         else if (isxdigit (my_string[i]))
> -           addr = (tolower (my_string[i]) - 'a' + 0xa) + (addr * 16);
> +         else if (ISXDIGIT (my_string[i]))
> +           addr = (TOLOWER (my_string[i]) - 'a' + 0xa) + (addr * 16);
>           else
>             error (_("invalid hex \"%s\""), my_string);
>         }
> @@ -2949,7 +2950,7 @@ string_to_core_addr (const char *my_string)
>
>        for (i = 0; my_string[i] != '\0'; i++)
>         {
> -         if (isdigit (my_string[i]))
> +         if (ISDIGIT (my_string[i]))
>             addr = (my_string[i] - '0') + (addr * 10);
>           else
>             error (_("invalid decimal \"%s\""), my_string);
> --
> 2.14.5
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fix slow and non-deterministic behavior of isspace() and tolower()
  2019-06-21 20:30       ` Shawn Landden
@ 2019-06-21 21:14         ` Pedro Alves
  0 siblings, 0 replies; 12+ messages in thread
From: Pedro Alves @ 2019-06-21 21:14 UTC (permalink / raw)
  To: Shawn Landden; +Cc: gdb-patches, jhb, Eli Zaretskii

On 6/21/19 9:30 PM, Shawn Landden wrote:
> On Fri, Jun 21, 2019 at 3:12 PM Pedro Alves <palves@redhat.com> wrote:
>> On 6/21/19 8:19 PM, Shawn Landden wrote:
>>> I was getting 8% and 6% cpu usage in tolower() and isspace(),
>>> respectively, waiting for a breakpoint on ppc64el.
>>>
>>> Also, gdb doesn't want non-deterministic behavior here.
>>>
>>> v2: do not touch C namespace
>>> v3: Turns out there are two places using these in performance-critical
>>> parts.
>>>     Follow GNU coding standards.
>> This doesn't address my comments about
>> <https://sourceware.org/ml/gdb-patches/2019-06/msg00442.html>.
>>
>> Something like this.  How does this compare?  Any significant
>> difference in your benchmark?  And again, how are you benchmarking
>> this?
> perf record. 

What was the use case?  Did you have some speficic set of commands?
Some script?  Please be detailed/more specific, so that
others can reproduce it, either now, or in the future when we
look back into this thread.

> goes from like 6-8% to almost nothing (if you remove the static).

What do you mean by "remove the static"?

Thanks,
Pedro Alves


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-06-21 21:14 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-09 15:17 [PATCH] Fix slow and non-deterministic behavior of isspace() and tolower() Shawn Landden
2019-06-09 15:42 ` Eli Zaretskii
2019-06-09 15:51   ` Shawn Landden
2019-06-10 21:14 ` John Baldwin
2019-06-10 21:30 ` [PATCH v2] " Shawn Landden
2019-06-21 17:35   ` Pedro Alves
2019-06-21 17:59     ` Shawn Landden
2019-06-21 18:08       ` Pedro Alves
2019-06-21 19:20   ` [PATCH] " Shawn Landden
2019-06-21 20:12     ` Pedro Alves
2019-06-21 20:30       ` Shawn Landden
2019-06-21 21:14         ` Pedro Alves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox