From: "Zack Weinberg" <zackw@stanford.edu>
To: Eli Zaretskii <eliz@is.elta.co.il>
Cc: dj@redhat.com, gcc@gcc.gnu.org, gdb@sources.redhat.com,
binutils@sources.redhat.com, cygwin@sources.redhat.com
Subject: Re: Another RFC: regex in libiberty
Date: Fri, 08 Jun 2001 09:59:00 -0000 [thread overview]
Message-ID: <20010608095932.S979@stanford.edu> (raw)
In-Reply-To: <9003-Fri08Jun2001100651+0300-eliz@is.elta.co.il>
On Fri, Jun 08, 2001 at 10:06:51AM +0300, Eli Zaretskii wrote:
>
> One notorious problem with GNU regex is that it is quite slow for many
> simple jobs, such as matching a simple regular expression with no
> backtracking. It seems that the main reason for this slowness is the
> fact that GNU regex supports null characters in strings. For
> examnple, Sed 3.02 compiled with GNU regex is about 2-4 times slower
> on simple jobs than the same Sed compiled with Spencer's regex
> library.
I think the null characters are a red herring. I looked into GNU
regex's performance in the context of GCC's fixincludes program, last
year. On a platform that has mostly-okay headers, fixincludes spends
most of its time matching regular expressions.
The regex.c that came with GDB 4.18, which I think is the one that got
spread around widely, had a bug in its implementation of the POSIX
regcomp/regexec interface, which caused a major performance hit. That
bug has been fixed in GNU libc for a long time. When I replaced
fixincludes' copy of regex.c with a more recent version from glibc,
fixincludes was sped up by a factor of nine. That same bug affects
Sed 3.02 - replace the regex.c it ships with with the one from glibc
2.2.x and I bet you'll see better performance.
There's some discussion in these messages:
http://gcc.gnu.org/ml/gcc-patches/2000-01/msg00764.html
http://gcc.gnu.org/ml/gcc-patches/2000-01/msg00765.html
The relevant fix is in there, too, if you want to pull it out and
apply it.
I did some benchmarking of fixincludes with Spencer's regexp library
as well. IIRC, it was about the same as the fixed GNU regex.c.
--
zw This is, no doubt, the rational strategy; quite possibly the
only one that will work. But it ignores the exigiencies of
the tenure system and is therefore impractical.
-- Jerry Fodor, _The Mind Doesn't Work That Way_
next prev parent reply other threads:[~2001-06-08 9:59 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <Daniel>
[not found] ` <Vogel's>
[not found] ` <message>
[not found] ` <of>
[not found] ` <Mon,>
[not found] ` <01>
[not found] ` <Nov>
[not found] ` <1999>
[not found] ` <14:25:01>
[not found] ` <+0100>
[not found] ` <381D94AD.B37EC167@grafzahl.de>
1999-11-08 8:54 ` go32-nat.c compilation problem Pierre Muller
[not found] ` <Fri,>
[not found] ` <08>
[not found] ` <Jun>
[not found] ` <2001>
[not found] ` <10:06:51>
[not found] ` <+0300>
2001-06-07 18:27 ` Another RFC: regex in libiberty DJ Delorie
2001-06-07 18:31 ` Ian Lance Taylor
2001-06-07 18:33 ` DJ Delorie
2001-06-07 18:43 ` Ian Lance Taylor
2001-06-08 0:11 ` Eli Zaretskii
2001-06-08 9:18 ` Mark Mitchell
2001-06-08 9:59 ` Zack Weinberg [this message]
2001-06-08 10:05 ` H . J . Lu
2001-06-08 10:31 ` Eli Zaretskii
2001-06-08 10:39 ` H . J . Lu
2001-06-08 10:37 ` Eli Zaretskii
2001-06-11 22:49 ` Jim Blandy
2001-06-11 23:51 ` Randall R Schulz
2001-06-12 6:48 ` Jim Blandy
2001-06-08 1:15 ` Pierre Muller
2001-06-08 1:36 ` About struct bpp_transfer_params ±èµæÃÃ
2001-06-08 7:43 ` Fernando Nasser
2001-06-09 13:34 ` Another RFC: regex in libiberty Andrew Cagney
[not found] <Eli>
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20010608095932.S979@stanford.edu \
--to=zackw@stanford.edu \
--cc=binutils@sources.redhat.com \
--cc=cygwin@sources.redhat.com \
--cc=dj@redhat.com \
--cc=eliz@is.elta.co.il \
--cc=gcc@gcc.gnu.org \
--cc=gdb@sources.redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox