* [PATCH] Fix slow and non-deterministic behavior of isspace() and tolower()
@ 2019-06-09 15:17 Shawn Landden
2019-06-09 15:42 ` Eli Zaretskii
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Shawn Landden @ 2019-06-09 15:17 UTC (permalink / raw)
To: gdb-patches; +Cc: Shawn Landden
I was getting 8% and 6% cpu usage in tolower() and isspace(),
respectively, waiting for a breakpoint on ppc64el.
Also, gdb doesn't want non-deterministic behavior here.
---
gdb/utils.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/gdb/utils.c b/gdb/utils.c
index 9686927473..d36b942cc5 100644
--- a/gdb/utils.c
+++ b/gdb/utils.c
@@ -2626,10 +2626,29 @@ strcmp_iw (const char *string1, const char *string2)
user searches for "foo", then strcmp will sort "foo" before "foo$".
Then lookup_partial_symbol will notice that strcmp_iw("foo$",
"foo") is false, so it won't proceed to the actual match of
"foo(int)" with "foo". */
+/* glibc versions of these have non-deterministic locale-dependant behavior,
+ and are very slow, taking 8% and 6% of total CPU time with some use-cases */
+#undef isspace
+static int isspace(int c)
+{
+ return c == ' ' || (unsigned)c-'\t' < 5;
+}
+#undef isupper
+static int isupper(int c)
+{
+ return (unsigned)c-'A' < 26;
+}
+#undef tolower
+static int tolower(int c)
+{
+ if (isupper(c)) return c | 32;
+ return c;
+}
+
int
strcmp_iw_ordered (const char *string1, const char *string2)
{
const char *saved_string1 = string1, *saved_string2 = string2;
enum case_sensitivity case_pass = case_sensitive_off;
--
2.20.1
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] Fix slow and non-deterministic behavior of isspace() and tolower() 2019-06-09 15:17 [PATCH] Fix slow and non-deterministic behavior of isspace() and tolower() Shawn Landden @ 2019-06-09 15:42 ` Eli Zaretskii 2019-06-09 15:51 ` Shawn Landden 2019-06-10 21:14 ` John Baldwin 2019-06-10 21:30 ` [PATCH v2] " Shawn Landden 2 siblings, 1 reply; 12+ messages in thread From: Eli Zaretskii @ 2019-06-09 15:42 UTC (permalink / raw) To: Shawn Landden; +Cc: gdb-patches > From: Shawn Landden <shawn@git.icu> > Cc: Shawn Landden <shawn@git.icu> > Date: Sun, 9 Jun 2019 10:17:04 -0500 > > I was getting 8% and 6% cpu usage in tolower() and isspace(), > respectively, waiting for a breakpoint on ppc64el. > > Also, gdb doesn't want non-deterministic behavior here. Instead of inventing our own wheel, how about using the Gnulib's c-ctype module? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Fix slow and non-deterministic behavior of isspace() and tolower() 2019-06-09 15:42 ` Eli Zaretskii @ 2019-06-09 15:51 ` Shawn Landden 0 siblings, 0 replies; 12+ messages in thread From: Shawn Landden @ 2019-06-09 15:51 UTC (permalink / raw) To: Eli Zaretskii; +Cc: gdb-patches On Sun, Jun 9, 2019 at 10:42 AM Eli Zaretskii <eliz@gnu.org> wrote: > > > From: Shawn Landden <shawn@git.icu> > > Cc: Shawn Landden <shawn@git.icu> > > Date: Sun, 9 Jun 2019 10:17:04 -0500 > > > > I was getting 8% and 6% cpu usage in tolower() and isspace(), > > respectively, waiting for a breakpoint on ppc64el. > > > > Also, gdb doesn't want non-deterministic behavior here. > > Instead of inventing our own wheel, how about using the Gnulib's > c-ctype module? The tolower() really should only apply to languages that need it, like Ada. It is kinda a bug to use it with C, which is case sensitive. Also we probably don't need the full isspace() either, and these are simple functions. I think it is much better to be able to see the full algorithm right in front of you. (even if I just did the base minimum to speed this up, without having to study it). -Shawn Landden ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Fix slow and non-deterministic behavior of isspace() and tolower() 2019-06-09 15:17 [PATCH] Fix slow and non-deterministic behavior of isspace() and tolower() Shawn Landden 2019-06-09 15:42 ` Eli Zaretskii @ 2019-06-10 21:14 ` John Baldwin 2019-06-10 21:30 ` [PATCH v2] " Shawn Landden 2 siblings, 0 replies; 12+ messages in thread From: John Baldwin @ 2019-06-10 21:14 UTC (permalink / raw) To: Shawn Landden, gdb-patches On 6/9/19 8:17 AM, Shawn Landden wrote: > I was getting 8% and 6% cpu usage in tolower() and isspace(), > respectively, waiting for a breakpoint on ppc64el. > > Also, gdb doesn't want non-deterministic behavior here. These routines are not always macros. For example, FreeBSD's ctype.h exposes these as functions for C++ and only as macros for pure C. Perhaps it would be better to define custom functions that don't use the POSIX names to avoid clashing with whatever libc provides if we want something with very specific behavior. -- John Baldwin ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2] Fix slow and non-deterministic behavior of isspace() and tolower() 2019-06-09 15:17 [PATCH] Fix slow and non-deterministic behavior of isspace() and tolower() Shawn Landden 2019-06-09 15:42 ` Eli Zaretskii 2019-06-10 21:14 ` John Baldwin @ 2019-06-10 21:30 ` Shawn Landden 2019-06-21 17:35 ` Pedro Alves 2019-06-21 19:20 ` [PATCH] " Shawn Landden 2 siblings, 2 replies; 12+ messages in thread From: Shawn Landden @ 2019-06-10 21:30 UTC (permalink / raw) To: gdb-patches; +Cc: jhb, eliz, Shawn Landden I was getting 8% and 6% cpu usage in tolower() and isspace(), respectively, waiting for a breakpoint on ppc64el. Also, gdb doesn't want non-deterministic behavior here. v2: do not clash with C99 names --- gdb/utils.c | 27 +++++++++++++++++++++++---- 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/gdb/utils.c b/gdb/utils.c index 9686927473..0b68fabe4d 100644 --- a/gdb/utils.c +++ b/gdb/utils.c @@ -2626,10 +2626,29 @@ strcmp_iw (const char *string1, const char *string2) user searches for "foo", then strcmp will sort "foo" before "foo$". Then lookup_partial_symbol will notice that strcmp_iw("foo$", "foo") is false, so it won't proceed to the actual match of "foo(int)" with "foo". */ +/* glibc versions of these have non-deterministic locale-dependant behavior, + and are very slow, taking 8% and 6% of total CPU time with some use-cases */ + +static inline int isspace_inline(int c) +{ + return c == ' ' || (unsigned)c-'\t' < 5; +} + +static inline int isupper_inline(int c) +{ + return (unsigned)c-'A' < 26; +} + +static inline int tolower_inline(int c) +{ + if (isupper(c)) return c | 32; + return c; +} + int strcmp_iw_ordered (const char *string1, const char *string2) { const char *saved_string1 = string1, *saved_string2 = string2; enum case_sensitivity case_pass = case_sensitive_off; @@ -2641,20 +2660,20 @@ strcmp_iw_ordered (const char *string1, const char *string2) strings. */ char c1 = 'X', c2 = 'X'; while (*string1 != '\0' && *string2 != '\0') { - while (isspace (*string1)) + while (isspace_inline (*string1)) string1++; - while (isspace (*string2)) + while (isspace_inline (*string2)) string2++; switch (case_pass) { case case_sensitive_off: - c1 = tolower ((unsigned char) *string1); - c2 = tolower ((unsigned char) *string2); + c1 = tolower_inline ((unsigned char) *string1); + c2 = tolower_inline ((unsigned char) *string2); break; case case_sensitive_on: c1 = *string1; c2 = *string2; break; -- 2.20.1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] Fix slow and non-deterministic behavior of isspace() and tolower() 2019-06-10 21:30 ` [PATCH v2] " Shawn Landden @ 2019-06-21 17:35 ` Pedro Alves 2019-06-21 17:59 ` Shawn Landden 2019-06-21 19:20 ` [PATCH] " Shawn Landden 1 sibling, 1 reply; 12+ messages in thread From: Pedro Alves @ 2019-06-21 17:35 UTC (permalink / raw) To: Shawn Landden, gdb-patches; +Cc: jhb, eliz Hi, On 6/10/19 10:30 PM, Shawn Landden wrote: > I was getting 8% and 6% cpu usage in tolower() and isspace(), > respectively, waiting for a breakpoint on ppc64el. > > Also, gdb doesn't want non-deterministic behavior here. > > v2: do not clash with C99 names When I was working on the C++ wildmatching support a couple years ago, I had some testcases that would stress name parsing that I was running under perf, and I also noticed these functions higher up on the profile. I wrote a few patches back then: https://github.com/palves/gdb/commits/palves/ada-decode-speedups And this one has the same idea as yours: https://github.com/palves/gdb/commit/f701531b79380356134d53db97adb6f467f9d784 So, I agree that this makes sense. I also agree that we don't want to depend on the current locale when parsing symbol names. In my version I was naming the functions gdb_xxx while in your version you're using xxx_inline. gdb_xxx naming is more common as a gdb version of some standard function, so I would prefer that. But, meanwhile, last year I merged this patch: <https://sourceware.org/ml/gdb-patches/2018-05/msg00561.html> which touches a somewhat subject. That patch uses the existing libiberty uppercase TOLOWER, ISXDIGIT, etc. macros, which are inline and locale independent by design. See include/safe-ctype.h. Can we use those instead of adding new functions? I don't recall if I benchmarked ISSPACE vs the gdb_isspace in that optimization patch on my github, but I think I just didn't remember ISSPACE back then. Least but not least, the patch as is is not following the GNU/GDB coding format conventions. Take a look here: https://sourceware.org/gdb/wiki/ContributionChecklist Thanks, Pedro Alves ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] Fix slow and non-deterministic behavior of isspace() and tolower() 2019-06-21 17:35 ` Pedro Alves @ 2019-06-21 17:59 ` Shawn Landden 2019-06-21 18:08 ` Pedro Alves 0 siblings, 1 reply; 12+ messages in thread From: Shawn Landden @ 2019-06-21 17:59 UTC (permalink / raw) To: Pedro Alves; +Cc: gdb-patches, jhb, eliz El vie., 21 de jun. de 2019 12:35, Pedro Alves <palves@redhat.com> escribió: > Hi, > > On 6/10/19 10:30 PM, Shawn Landden wrote: > > I was getting 8% and 6% cpu usage in tolower() and isspace(), > > respectively, waiting for a breakpoint on ppc64el. > > > > Also, gdb doesn't want non-deterministic behavior here. > > > > v2: do not clash with C99 names > > When I was working on the C++ wildmatching support a > couple years ago, I had some testcases that would stress name > parsing that I was running under perf, and I also noticed these functions > higher up on the profile. I wrote a few patches back then: > > https://github.com/palves/gdb/commits/palves/ada-decode-speedups > > And this one has the same idea as yours: > > > https://github.com/palves/gdb/commit/f701531b79380356134d53db97adb6f467f9d784 > > So, I agree that this makes sense. > I don't care how it gets fixed, and the GNU coding standard (which I write to for glibc) will take more time than writing this patch. (Or your well-documented response) Also, while I have a copyright assignment for glibc, mine for GCC and binutils-gdb is only pending. Go ahead and fix this, and give me credit. Cheers. > > I also agree that we don't want to depend on the current > locale when parsing symbol names. > > In my version I was naming the functions gdb_xxx while in > your version you're using xxx_inline. gdb_xxx naming is > more common as a gdb version of some standard function, > so I would prefer that. > > But, meanwhile, last year I merged this patch: > <https://sourceware.org/ml/gdb-patches/2018-05/msg00561.html> > which touches a somewhat subject. > > That patch uses the existing libiberty uppercase TOLOWER, ISXDIGIT, > etc. macros, which are inline and locale independent by design. > See include/safe-ctype.h. Can we use those instead of adding new > functions? I don't recall if I benchmarked ISSPACE vs the gdb_isspace > in that optimization patch on my github, but I think I just didn't > remember ISSPACE back then. > > Least but not least, the patch as is is not following the > GNU/GDB coding format conventions. Take a look here: > > https://sourceware.org/gdb/wiki/ContributionChecklist > > Thanks, > Pedro Alves > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] Fix slow and non-deterministic behavior of isspace() and tolower() 2019-06-21 17:59 ` Shawn Landden @ 2019-06-21 18:08 ` Pedro Alves 0 siblings, 0 replies; 12+ messages in thread From: Pedro Alves @ 2019-06-21 18:08 UTC (permalink / raw) To: Shawn Landden; +Cc: gdb-patches, jhb, eliz On 6/21/19 6:59 PM, Shawn Landden wrote: >> On 6/10/19 10:30 PM, Shawn Landden wrote: >>> I was getting 8% and 6% cpu usage in tolower() and isspace(), >>> respectively, waiting for a breakpoint on ppc64el. >>> >>> Also, gdb doesn't want non-deterministic behavior here. >>> >>> v2: do not clash with C99 names >> >> When I was working on the C++ wildmatching support a >> couple years ago, I had some testcases that would stress name >> parsing that I was running under perf, and I also noticed these functions >> higher up on the profile. I wrote a few patches back then: >> >> https://github.com/palves/gdb/commits/palves/ada-decode-speedups >> >> And this one has the same idea as yours: >> >> >> https://github.com/palves/gdb/commit/f701531b79380356134d53db97adb6f467f9d784 >> >> So, I agree that this makes sense. >> > I don't care how it gets fixed, and the GNU coding standard (which I write > to for glibc) will take more time than writing this patch. (Or your > well-documented response) Also, while I have a copyright assignment for > glibc, mine for GCC and binutils-gdb is only pending. > > Go ahead and fix this, and give me credit. The interesting thing to do here is >> That patch uses the existing libiberty uppercase TOLOWER, ISXDIGIT, >> etc. macros, which are inline and locale independent by design. >> See include/safe-ctype.h. Can we use those instead of adding new >> functions? I don't recall if I benchmarked ISSPACE vs the gdb_isspace >> in that optimization patch on my github, but I think I just didn't >> remember ISSPACE back then. How were you benchmarking this? Thanks, Pedro Alves ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] Fix slow and non-deterministic behavior of isspace() and tolower() 2019-06-10 21:30 ` [PATCH v2] " Shawn Landden 2019-06-21 17:35 ` Pedro Alves @ 2019-06-21 19:20 ` Shawn Landden 2019-06-21 20:12 ` Pedro Alves 1 sibling, 1 reply; 12+ messages in thread From: Shawn Landden @ 2019-06-21 19:20 UTC (permalink / raw) To: gdb-patches; +Cc: Pedro Alves, jhb, eliz, Shawn Landden I was getting 8% and 6% cpu usage in tolower() and isspace(), respectively, waiting for a breakpoint on ppc64el. Also, gdb doesn't want non-deterministic behavior here. v2: do not touch C namespace v3: Turns out there are two places using these in performance-critical parts. Follow GNU coding standards. Signed-off-by: Shawn Landden <shawn@git.icu> Co-Author: Pedro Alves <palves@redhat.com> Mailing-list: https://sourceware.org/ml/gdb-patches/2019-06/threads.html#00187 --- gdb/common/common-utils.c | 6 +++--- gdb/common/common-utils.h | 22 ++++++++++++++++++++++ gdb/utils.c | 8 ++++---- 3 files changed, 29 insertions(+), 7 deletions(-) diff --git a/gdb/common/common-utils.c b/gdb/common/common-utils.c index ae2dd9db2b..8215f505e8 100644 --- a/gdb/common/common-utils.c +++ b/gdb/common/common-utils.c @@ -338,7 +338,7 @@ skip_spaces (char *chp) { if (chp == NULL) return NULL; - while (*chp && isspace (*chp)) + while (*chp && gdb_isspace (*chp)) chp++; return chp; } @@ -350,7 +350,7 @@ skip_spaces (const char *chp) { if (chp == NULL) return NULL; - while (*chp && isspace (*chp)) + while (*chp && gdb_isspace (*chp)) chp++; return chp; } @@ -362,7 +362,7 @@ skip_to_space (const char *chp) { if (chp == NULL) return NULL; - while (*chp && !isspace (*chp)) + while (*chp && !gdb_isspace (*chp)) chp++; return chp; } diff --git a/gdb/common/common-utils.h b/gdb/common/common-utils.h index 2320318de7..5e6e0afb9c 100644 --- a/gdb/common/common-utils.h +++ b/gdb/common/common-utils.h @@ -146,4 +146,26 @@ in_inclusive_range (T value, T low, T high) return value >= low && value <= high; } +static inline +int +gdb_isspace (int c) +{ + return c == ' ' || (unsigned)c - '\t' < 5; +} + +static inline +int +gdb_isupper (int c) +{ + return (unsigned)c - 'A' < 26; +} + +static inline +int +gdb_tolower (int c) +{ + if (isupper(c)) return c | 32; + return c; +} + #endif diff --git a/gdb/utils.c b/gdb/utils.c index c531748fe4..f92e8f5346 100644 --- a/gdb/utils.c +++ b/gdb/utils.c @@ -2572,16 +2572,16 @@ strcmp_iw_ordered (const char *string1, const char *string2) while (*string1 != '\0' && *string2 != '\0') { - while (isspace (*string1)) + while (gdb_isspace (*string1)) string1++; - while (isspace (*string2)) + while (gdb_isspace (*string2)) string2++; switch (case_pass) { case case_sensitive_off: - c1 = tolower ((unsigned char) *string1); - c2 = tolower ((unsigned char) *string2); + c1 = gdb_tolower ((unsigned char) *string1); + c2 = gdb_tolower ((unsigned char) *string2); break; case case_sensitive_on: c1 = *string1; -- 2.20.1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Fix slow and non-deterministic behavior of isspace() and tolower() 2019-06-21 19:20 ` [PATCH] " Shawn Landden @ 2019-06-21 20:12 ` Pedro Alves 2019-06-21 20:30 ` Shawn Landden 0 siblings, 1 reply; 12+ messages in thread From: Pedro Alves @ 2019-06-21 20:12 UTC (permalink / raw) To: Shawn Landden, gdb-patches; +Cc: jhb, eliz On 6/21/19 8:19 PM, Shawn Landden wrote: > I was getting 8% and 6% cpu usage in tolower() and isspace(), > respectively, waiting for a breakpoint on ppc64el. > > Also, gdb doesn't want non-deterministic behavior here. > > v2: do not touch C namespace > v3: Turns out there are two places using these in performance-critical > parts. > Follow GNU coding standards. This doesn't address my comments about <https://sourceware.org/ml/gdb-patches/2019-06/msg00442.html>. Something like this. How does this compare? Any significant difference in your benchmark? And again, how are you benchmarking this? From 1bd3203a26f989ba146376842f5b6a78bdfae181 Mon Sep 17 00:00:00 2001 From: Pedro Alves <palves@redhat.com> Date: Fri, 21 Jun 2019 19:12:30 +0100 Subject: [PATCH] Use ISSPACE etc --- gdb/common/gdb-safe-ctype.h | 46 ++++++++++++++++++++++++++++++++++++++++++++ gdb/utils.c | 47 +++++++++++++++++++++++---------------------- 2 files changed, 70 insertions(+), 23 deletions(-) create mode 100644 gdb/common/gdb-safe-ctype.h diff --git a/gdb/common/gdb-safe-ctype.h b/gdb/common/gdb-safe-ctype.h new file mode 100644 index 00000000000..01a97b7ce39 --- /dev/null +++ b/gdb/common/gdb-safe-ctype.h @@ -0,0 +1,46 @@ +/* Wrapper around libiberty's safe-ctype.h for GDB, the GNU debugger. + + Copyright (C) 2019 Free Software Foundation, Inc. + + This file is part of GDB. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. */ + +#ifndef GDB_SAFE_CTYPE_H +#define GDB_SAFE_CTYPE_H + +/* After safe-ctype.h is included, we can no longer use the host's + ctype routines. Trying to do so results in compile errors. Code + that uses safe-ctype.h that wants to refer to the locale-dependent + ctype functions must call these wrapper versions instead. */ + +static inline int +gdb_isprint (int ch) +{ + return isprint (ch); +} + +/* readline.h defines these symbols too, but we want libiberty's + versions. */ +#undef ISALPHA +#undef ISALNUM +#undef ISDIGIT +#undef ISLOWER +#undef ISPRINT +#undef ISUPPER +#undef ISXDIGIT + +#include "safe-ctype.h" + +#endif diff --git a/gdb/utils.c b/gdb/utils.c index c7922cf7f56..367ea277c7d 100644 --- a/gdb/utils.c +++ b/gdb/utils.c @@ -73,6 +73,7 @@ #include "common/pathstuff.h" #include "cli/cli-style.h" #include "common/scope-exit.h" +#include "common/gdb-safe-ctype.h" void (*deprecated_error_begin_hook) (void); @@ -1057,7 +1058,7 @@ parse_escape (struct gdbarch *gdbarch, const char **string_ptr) while (++count < 3) { c = (**string_ptr); - if (isdigit (c) && c != '8' && c != '9') + if (ISDIGIT (c) && c != '8' && c != '9') { (*string_ptr)++; i *= 8; @@ -1985,7 +1986,7 @@ puts_debug (char *prefix, char *string, char *suffix) switch (ch) { default: - if (isprint (ch)) + if (gdb_isprint (ch)) fputc_unfiltered (ch, gdb_stdlog); else @@ -2268,7 +2269,7 @@ fprintf_symbol_filtered (struct ui_file *stream, const char *name, static bool valid_identifier_name_char (int ch) { - return (isalnum (ch) || ch == '_'); + return (ISALNUM (ch) || ch == '_'); } /* Skip to end of token, or to END, whatever comes first. Input is @@ -2278,7 +2279,7 @@ static const char * cp_skip_operator_token (const char *token, const char *end) { const char *p = token; - while (p != end && !isspace (*p) && *p != '(') + while (p != end && !ISSPACE (*p) && *p != '(') { if (valid_identifier_name_char (*p)) { @@ -2332,9 +2333,9 @@ cp_skip_operator_token (const char *token, const char *end) static void skip_ws (const char *&string1, const char *&string2, const char *end_str2) { - while (isspace (*string1)) + while (ISSPACE (*string1)) string1++; - while (string2 < end_str2 && isspace (*string2)) + while (string2 < end_str2 && ISSPACE (*string2)) string2++; } @@ -2396,8 +2397,8 @@ strncmp_iw_with_mode (const char *string1, const char *string2, while (1) { if (skip_spaces - || ((isspace (*string1) && !valid_identifier_name_char (*string2)) - || (isspace (*string2) && !valid_identifier_name_char (*string1)))) + || ((ISSPACE (*string1) && !valid_identifier_name_char (*string2)) + || (ISSPACE (*string2) && !valid_identifier_name_char (*string1)))) { skip_ws (string1, string2, end_str2); skip_spaces = false; @@ -2430,7 +2431,7 @@ strncmp_iw_with_mode (const char *string1, const char *string2, if (match_for_lcd != NULL && abi_start != string1) match_for_lcd->mark_ignored_range (abi_start, string1); - while (isspace (*string1)) + while (ISSPACE (*string1)) string1++; } @@ -2455,9 +2456,9 @@ strncmp_iw_with_mode (const char *string1, const char *string2, string1++; string2++; - while (isspace (*string1)) + while (ISSPACE (*string1)) string1++; - while (string2 < end_str2 && isspace (*string2)) + while (string2 < end_str2 && ISSPACE (*string2)) string2++; continue; } @@ -2551,14 +2552,14 @@ strncmp_iw_with_mode (const char *string1, const char *string2, if (case_sensitivity == case_sensitive_on && *string1 != *string2) break; if (case_sensitivity == case_sensitive_off - && (tolower ((unsigned char) *string1) - != tolower ((unsigned char) *string2))) + && (TOLOWER ((unsigned char) *string1) + != TOLOWER ((unsigned char) *string2))) break; /* If we see any non-whitespace, non-identifier-name character (any of "()<>*&" etc.), then skip spaces the next time around. */ - if (!isspace (*string1) && !valid_identifier_name_char (*string1)) + if (!ISSPACE (*string1) && !valid_identifier_name_char (*string1)) skip_spaces = true; string1++; @@ -2679,16 +2680,16 @@ strcmp_iw_ordered (const char *string1, const char *string2) while (*string1 != '\0' && *string2 != '\0') { - while (isspace (*string1)) + while (ISSPACE (*string1)) string1++; - while (isspace (*string2)) + while (ISSPACE (*string2)) string2++; switch (case_pass) { case case_sensitive_off: - c1 = tolower ((unsigned char) *string1); - c2 = tolower ((unsigned char) *string2); + c1 = TOLOWER ((unsigned char) *string1); + c2 = TOLOWER ((unsigned char) *string2); break; case case_sensitive_on: c1 = *string1; @@ -2927,17 +2928,17 @@ string_to_core_addr (const char *my_string) { CORE_ADDR addr = 0; - if (my_string[0] == '0' && tolower (my_string[1]) == 'x') + if (my_string[0] == '0' && TOLOWER (my_string[1]) == 'x') { /* Assume that it is in hex. */ int i; for (i = 2; my_string[i] != '\0'; i++) { - if (isdigit (my_string[i])) + if (ISDIGIT (my_string[i])) addr = (my_string[i] - '0') + (addr * 16); - else if (isxdigit (my_string[i])) - addr = (tolower (my_string[i]) - 'a' + 0xa) + (addr * 16); + else if (ISXDIGIT (my_string[i])) + addr = (TOLOWER (my_string[i]) - 'a' + 0xa) + (addr * 16); else error (_("invalid hex \"%s\""), my_string); } @@ -2949,7 +2950,7 @@ string_to_core_addr (const char *my_string) for (i = 0; my_string[i] != '\0'; i++) { - if (isdigit (my_string[i])) + if (ISDIGIT (my_string[i])) addr = (my_string[i] - '0') + (addr * 10); else error (_("invalid decimal \"%s\""), my_string); -- 2.14.5 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Fix slow and non-deterministic behavior of isspace() and tolower() 2019-06-21 20:12 ` Pedro Alves @ 2019-06-21 20:30 ` Shawn Landden 2019-06-21 21:14 ` Pedro Alves 0 siblings, 1 reply; 12+ messages in thread From: Shawn Landden @ 2019-06-21 20:30 UTC (permalink / raw) To: Pedro Alves; +Cc: gdb-patches, jhb, Eli Zaretskii On Fri, Jun 21, 2019 at 3:12 PM Pedro Alves <palves@redhat.com> wrote: > > On 6/21/19 8:19 PM, Shawn Landden wrote: > > I was getting 8% and 6% cpu usage in tolower() and isspace(), > > respectively, waiting for a breakpoint on ppc64el. > > > > Also, gdb doesn't want non-deterministic behavior here. > > > > v2: do not touch C namespace > > v3: Turns out there are two places using these in performance-critical > > parts. > > Follow GNU coding standards. > > This doesn't address my comments about > <https://sourceware.org/ml/gdb-patches/2019-06/msg00442.html>. > > Something like this. How does this compare? Any significant > difference in your benchmark? And again, how are you benchmarking > this? perf record. goes from like 6-8% to almost nothing (if you remove the static). > > From 1bd3203a26f989ba146376842f5b6a78bdfae181 Mon Sep 17 00:00:00 2001 > From: Pedro Alves <palves@redhat.com> > Date: Fri, 21 Jun 2019 19:12:30 +0100 > Subject: [PATCH] Use ISSPACE etc > > --- > gdb/common/gdb-safe-ctype.h | 46 ++++++++++++++++++++++++++++++++++++++++++++ > gdb/utils.c | 47 +++++++++++++++++++++++---------------------- > 2 files changed, 70 insertions(+), 23 deletions(-) > create mode 100644 gdb/common/gdb-safe-ctype.h > > diff --git a/gdb/common/gdb-safe-ctype.h b/gdb/common/gdb-safe-ctype.h > new file mode 100644 > index 00000000000..01a97b7ce39 > --- /dev/null > +++ b/gdb/common/gdb-safe-ctype.h > @@ -0,0 +1,46 @@ > +/* Wrapper around libiberty's safe-ctype.h for GDB, the GNU debugger. > + > + Copyright (C) 2019 Free Software Foundation, Inc. > + > + This file is part of GDB. > + > + This program is free software; you can redistribute it and/or modify > + it under the terms of the GNU General Public License as published by > + the Free Software Foundation; either version 3 of the License, or > + (at your option) any later version. > + > + This program is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + GNU General Public License for more details. > + > + You should have received a copy of the GNU General Public License > + along with this program. If not, see <http://www.gnu.org/licenses/>. */ > + > +#ifndef GDB_SAFE_CTYPE_H > +#define GDB_SAFE_CTYPE_H > + > +/* After safe-ctype.h is included, we can no longer use the host's > + ctype routines. Trying to do so results in compile errors. Code > + that uses safe-ctype.h that wants to refer to the locale-dependent > + ctype functions must call these wrapper versions instead. */ > + > +static inline int > +gdb_isprint (int ch) > +{ > + return isprint (ch); > +} > + > +/* readline.h defines these symbols too, but we want libiberty's > + versions. */ > +#undef ISALPHA > +#undef ISALNUM > +#undef ISDIGIT > +#undef ISLOWER > +#undef ISPRINT > +#undef ISUPPER > +#undef ISXDIGIT > + > +#include "safe-ctype.h" > + > +#endif > diff --git a/gdb/utils.c b/gdb/utils.c > index c7922cf7f56..367ea277c7d 100644 > --- a/gdb/utils.c > +++ b/gdb/utils.c > @@ -73,6 +73,7 @@ > #include "common/pathstuff.h" > #include "cli/cli-style.h" > #include "common/scope-exit.h" > +#include "common/gdb-safe-ctype.h" > > void (*deprecated_error_begin_hook) (void); > > @@ -1057,7 +1058,7 @@ parse_escape (struct gdbarch *gdbarch, const char **string_ptr) > while (++count < 3) > { > c = (**string_ptr); > - if (isdigit (c) && c != '8' && c != '9') > + if (ISDIGIT (c) && c != '8' && c != '9') > { > (*string_ptr)++; > i *= 8; > @@ -1985,7 +1986,7 @@ puts_debug (char *prefix, char *string, char *suffix) > switch (ch) > { > default: > - if (isprint (ch)) > + if (gdb_isprint (ch)) > fputc_unfiltered (ch, gdb_stdlog); > > else > @@ -2268,7 +2269,7 @@ fprintf_symbol_filtered (struct ui_file *stream, const char *name, > static bool > valid_identifier_name_char (int ch) > { > - return (isalnum (ch) || ch == '_'); > + return (ISALNUM (ch) || ch == '_'); > } > > /* Skip to end of token, or to END, whatever comes first. Input is > @@ -2278,7 +2279,7 @@ static const char * > cp_skip_operator_token (const char *token, const char *end) > { > const char *p = token; > - while (p != end && !isspace (*p) && *p != '(') > + while (p != end && !ISSPACE (*p) && *p != '(') > { > if (valid_identifier_name_char (*p)) > { > @@ -2332,9 +2333,9 @@ cp_skip_operator_token (const char *token, const char *end) > static void > skip_ws (const char *&string1, const char *&string2, const char *end_str2) > { > - while (isspace (*string1)) > + while (ISSPACE (*string1)) > string1++; > - while (string2 < end_str2 && isspace (*string2)) > + while (string2 < end_str2 && ISSPACE (*string2)) > string2++; > } > > @@ -2396,8 +2397,8 @@ strncmp_iw_with_mode (const char *string1, const char *string2, > while (1) > { > if (skip_spaces > - || ((isspace (*string1) && !valid_identifier_name_char (*string2)) > - || (isspace (*string2) && !valid_identifier_name_char (*string1)))) > + || ((ISSPACE (*string1) && !valid_identifier_name_char (*string2)) > + || (ISSPACE (*string2) && !valid_identifier_name_char (*string1)))) > { > skip_ws (string1, string2, end_str2); > skip_spaces = false; > @@ -2430,7 +2431,7 @@ strncmp_iw_with_mode (const char *string1, const char *string2, > if (match_for_lcd != NULL && abi_start != string1) > match_for_lcd->mark_ignored_range (abi_start, string1); > > - while (isspace (*string1)) > + while (ISSPACE (*string1)) > string1++; > } > > @@ -2455,9 +2456,9 @@ strncmp_iw_with_mode (const char *string1, const char *string2, > string1++; > string2++; > > - while (isspace (*string1)) > + while (ISSPACE (*string1)) > string1++; > - while (string2 < end_str2 && isspace (*string2)) > + while (string2 < end_str2 && ISSPACE (*string2)) > string2++; > continue; > } > @@ -2551,14 +2552,14 @@ strncmp_iw_with_mode (const char *string1, const char *string2, > if (case_sensitivity == case_sensitive_on && *string1 != *string2) > break; > if (case_sensitivity == case_sensitive_off > - && (tolower ((unsigned char) *string1) > - != tolower ((unsigned char) *string2))) > + && (TOLOWER ((unsigned char) *string1) > + != TOLOWER ((unsigned char) *string2))) > break; > > /* If we see any non-whitespace, non-identifier-name character > (any of "()<>*&" etc.), then skip spaces the next time > around. */ > - if (!isspace (*string1) && !valid_identifier_name_char (*string1)) > + if (!ISSPACE (*string1) && !valid_identifier_name_char (*string1)) > skip_spaces = true; > > string1++; > @@ -2679,16 +2680,16 @@ strcmp_iw_ordered (const char *string1, const char *string2) > > while (*string1 != '\0' && *string2 != '\0') > { > - while (isspace (*string1)) > + while (ISSPACE (*string1)) > string1++; > - while (isspace (*string2)) > + while (ISSPACE (*string2)) > string2++; > > switch (case_pass) > { > case case_sensitive_off: > - c1 = tolower ((unsigned char) *string1); > - c2 = tolower ((unsigned char) *string2); > + c1 = TOLOWER ((unsigned char) *string1); > + c2 = TOLOWER ((unsigned char) *string2); > break; > case case_sensitive_on: > c1 = *string1; > @@ -2927,17 +2928,17 @@ string_to_core_addr (const char *my_string) > { > CORE_ADDR addr = 0; > > - if (my_string[0] == '0' && tolower (my_string[1]) == 'x') > + if (my_string[0] == '0' && TOLOWER (my_string[1]) == 'x') > { > /* Assume that it is in hex. */ > int i; > > for (i = 2; my_string[i] != '\0'; i++) > { > - if (isdigit (my_string[i])) > + if (ISDIGIT (my_string[i])) > addr = (my_string[i] - '0') + (addr * 16); > - else if (isxdigit (my_string[i])) > - addr = (tolower (my_string[i]) - 'a' + 0xa) + (addr * 16); > + else if (ISXDIGIT (my_string[i])) > + addr = (TOLOWER (my_string[i]) - 'a' + 0xa) + (addr * 16); > else > error (_("invalid hex \"%s\""), my_string); > } > @@ -2949,7 +2950,7 @@ string_to_core_addr (const char *my_string) > > for (i = 0; my_string[i] != '\0'; i++) > { > - if (isdigit (my_string[i])) > + if (ISDIGIT (my_string[i])) > addr = (my_string[i] - '0') + (addr * 10); > else > error (_("invalid decimal \"%s\""), my_string); > -- > 2.14.5 > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Fix slow and non-deterministic behavior of isspace() and tolower() 2019-06-21 20:30 ` Shawn Landden @ 2019-06-21 21:14 ` Pedro Alves 0 siblings, 0 replies; 12+ messages in thread From: Pedro Alves @ 2019-06-21 21:14 UTC (permalink / raw) To: Shawn Landden; +Cc: gdb-patches, jhb, Eli Zaretskii On 6/21/19 9:30 PM, Shawn Landden wrote: > On Fri, Jun 21, 2019 at 3:12 PM Pedro Alves <palves@redhat.com> wrote: >> On 6/21/19 8:19 PM, Shawn Landden wrote: >>> I was getting 8% and 6% cpu usage in tolower() and isspace(), >>> respectively, waiting for a breakpoint on ppc64el. >>> >>> Also, gdb doesn't want non-deterministic behavior here. >>> >>> v2: do not touch C namespace >>> v3: Turns out there are two places using these in performance-critical >>> parts. >>> Follow GNU coding standards. >> This doesn't address my comments about >> <https://sourceware.org/ml/gdb-patches/2019-06/msg00442.html>. >> >> Something like this. How does this compare? Any significant >> difference in your benchmark? And again, how are you benchmarking >> this? > perf record. What was the use case? Did you have some speficic set of commands? Some script? Please be detailed/more specific, so that others can reproduce it, either now, or in the future when we look back into this thread. > goes from like 6-8% to almost nothing (if you remove the static). What do you mean by "remove the static"? Thanks, Pedro Alves ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2019-06-21 21:14 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-06-09 15:17 [PATCH] Fix slow and non-deterministic behavior of isspace() and tolower() Shawn Landden 2019-06-09 15:42 ` Eli Zaretskii 2019-06-09 15:51 ` Shawn Landden 2019-06-10 21:14 ` John Baldwin 2019-06-10 21:30 ` [PATCH v2] " Shawn Landden 2019-06-21 17:35 ` Pedro Alves 2019-06-21 17:59 ` Shawn Landden 2019-06-21 18:08 ` Pedro Alves 2019-06-21 19:20 ` [PATCH] " Shawn Landden 2019-06-21 20:12 ` Pedro Alves 2019-06-21 20:30 ` Shawn Landden 2019-06-21 21:14 ` Pedro Alves
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox