Mirror of the gdb mailing list
 help / color / mirror / Atom feed
* go32-nat.c compilation problem
       [not found]                   ` <381D94AD.B37EC167@grafzahl.de>
@ 1999-11-08  8:54                     ` Pierre Muller
  0 siblings, 0 replies; 19+ messages in thread
From: Pierre Muller @ 1999-11-08  8:54 UTC (permalink / raw)
  To: gdb

  Trying to compile go32-nat.c I ran into the problem that
that file uses fatal function which is not defined anymore in gdb directory !

  Replacing it by error allowed me to compile the code but 
what should be the replacement of fatal function ??


Pierre Muller
Institut Charles Sadron
6,rue Boussingault
F 67083 STRASBOURG CEDEX (France)
mailto:muller@ics.u-strasbg.fr
Phone : (33)-3-88-41-40-07  Fax : (33)-3-88-41-40-99
From ezannoni@cygnus.com Mon Nov 08 09:02:00 1999
From: Elena Zannoni <ezannoni@cygnus.com>
To: Pierre Muller <muller@cerbere.u-strasbg.fr>
Cc: gdb@sourceware.cygnus.com
Subject: go32-nat.c compilation problem
Date: Mon, 08 Nov 1999 09:02:00 -0000
Message-id: <14375.536.118347.328812@kwikemart.cygnus.com>
References: <Daniel> <Vogel's> <message> <of> <Mon,> <01> <Nov> <1999> <14:25:01> <+0100> <381D94AD.B37EC167@grafzahl.de> <199911081709.SAA23904@cerbere.u-strasbg.fr>
X-SW-Source: 1999-q4/msg00222.html
Content-length: 566

Pierre Muller writes:
 > 
 >   Trying to compile go32-nat.c I ran into the problem that
 > that file uses fatal function which is not defined anymore in gdb directory !
 > 
 >   Replacing it by error allowed me to compile the code but 
 > what should be the replacement of fatal function ??
 > 
 > 
 > Pierre Muller
 > Institut Charles Sadron
 > 6,rue Boussingault
 > F 67083 STRASBOURG CEDEX (France)
 > mailto:muller@ics.u-strasbg.fr
 > Phone : (33)-3-88-41-40-07  Fax : (33)-3-88-41-40-99

I believe you need to change fatal() to internal_error().

Elena Zannoni
From eliz@gnu.org Mon Nov 08 09:43:00 1999
From: Eli Zaretskii <eliz@gnu.org>
To: Elena Zannoni <ezannoni@cygnus.com>
Cc: Pierre Muller <muller@cerbere.u-strasbg.fr>, Stan Shebs <shebs@cygnus.com>, gdb@sourceware.cygnus.com
Subject: Re: go32-nat.c compilation problem
Date: Mon, 08 Nov 1999 09:43:00 -0000
Message-id: <199911081742.MAA20623@mescaline.gnu.org>
References: <199911081709.SAA23904@cerbere.u-strasbg.fr> <14375.536.118347.328812@kwikemart.cygnus.com>
X-SW-Source: 1999-q4/msg00223.html
Content-length: 385

> I believe you need to change fatal() to internal_error().

WIBNI, when a function is renamed or removed from the sources,
somebody would grep all the sources and change all the callers, or
least alert the various platform maintainers that a function used by
their code is going to become extinct?

(If such a message *was* indeed posted in this case, I apologize for
not seeing it.)
From ezannoni@cygnus.com Mon Nov 08 10:17:00 1999
From: Elena Zannoni <ezannoni@cygnus.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: Elena Zannoni <ezannoni@cygnus.com>, Pierre Muller <muller@cerbere.u-strasbg.fr>, Stan Shebs <shebs@cygnus.com>, gdb@sourceware.cygnus.com
Subject: Re: go32-nat.c compilation problem
Date: Mon, 08 Nov 1999 10:17:00 -0000
Message-id: <14375.5038.377535.816858@kwikemart.cygnus.com>
References: <199911081709.SAA23904@cerbere.u-strasbg.fr> <14375.536.118347.328812@kwikemart.cygnus.com> <199911081742.MAA20623@mescaline.gnu.org>
X-SW-Source: 1999-q4/msg00224.html
Content-length: 779

Eli Zaretskii writes:
 > 
 > > I believe you need to change fatal() to internal_error().
 > 
 > WIBNI, when a function is renamed or removed from the sources,
 > somebody would grep all the sources and change all the callers, or
 > least alert the various platform maintainers that a function used by
 > their code is going to become extinct?
 > 
 > (If such a message *was* indeed posted in this case, I apologize for
 > not seeing it.)
 > 

Eli, I see what happened.
Fatal() was deleted, and then changes to go32-nat.c were made that
reintroduced calls to fatal(). I believe the changes were part of a patch
you submitted, *before* the function fatal was replaced by internal_error().

This is a consequence of not having applied the patch sooner. Our fault.
Apologies.

Elena
From jimb@cygnus.com Mon Nov 08 16:18:00 1999
From: Jim Blandy <jimb@cygnus.com>
To: Mark Kettenis <kettenis@wins.uva.nl>, Eli Zaretskii <eliz@gnu.org>, Chris Faylor <cgf@cygnus.com>, "J. T. Conklin" <jtc@redbacknetworks.com>, "J. Kean Johnston" <jkj@sco.com>, "H. J. Lu" <hjl@valinux.com>
Cc: gdb@sourceware.cygnus.com
Subject: i386: Are we settled?
Date: Mon, 08 Nov 1999 16:18:00 -0000
Message-id: <199911090018.TAA12933@zwingli.cygnus.com>
X-SW-Source: 1999-q4/msg00225.html
Content-length: 1468

Are we settled on the essential contents of tm-i386.h?  Can we start
removing the little [regs] and [fpregs] boxes from
http://sourceware.cygnus.com/gdb/papers/linux/i386-includes.png ?

Essentially, I see two outstanding questions remaining:

- How should i386 targets handle the x86 FPU's 80-bit float type?  How
  can we make sure that hosts capable of handling it properly don't
  perform lossy conversions?

- What format should the output from "info float" take?  (Actually, it
  sounds like this is pretty much resolved.)

Notably missing from this list are any other questions about tm-i386.h
as it stands.  Am I correct in thinking that the other x86 port
maintainers think it's basically sane?

If so, I encourage folks to start deleting stuff from their more
specialized tm-*.h files, and using the definitions in tm-i386.h.
It's been done for Linux and the HURD, so it's had some testing, but
if the definitions there don't please you, and not for some odd
platform-specific reason, then we don't yet have a consensus, and I
would like to continue talking about what you do want in tm-i386.h.

(Again, I'm excluding issues related to `long double'; I do expect
folks to retain their own definitions for coping with that.)

I'd specifically like responses from the people addressed directly in
the "To:" header --- those are the people who look to me most likely
to be doing the work for a specific platform, and/or who have
participated in the discussion.
From cgf@cygnus.com Mon Nov 08 16:21:00 1999
From: Chris Faylor <cgf@cygnus.com>
To: Jim Blandy <jimb@cygnus.com>
Cc: Mark Kettenis <kettenis@wins.uva.nl>, Eli Zaretskii <eliz@gnu.org>, "J. T. Conklin" <jtc@redbacknetworks.com>, "J. Kean Johnston" <jkj@sco.com>, "H. J. Lu" <hjl@valinux.com>, gdb@sourceware.cygnus.com
Subject: Re: i386: Are we settled?
Date: Mon, 08 Nov 1999 16:21:00 -0000
Message-id: <19991108192442.B2703@cygnus.com>
References: <199911090018.TAA12933@zwingli.cygnus.com>
X-SW-Source: 1999-q4/msg00226.html
Content-length: 504

On Mon, Nov 08, 1999 at 07:18:11PM -0500, Jim Blandy wrote:
>- What format should the output from "info float" take?  (Actually, it
>  sounds like this is pretty much resolved.)

Did we resolve that there would be a generic routine for displaying the
stuff from "info float"?

>Notably missing from this list are any other questions about tm-i386.h
>as it stands.  Am I correct in thinking that the other x86 port
>maintainers think it's basically sane?

AFAICT, the plan makes sense.  Let's do it.

cgf
From ac131313@cygnus.com Mon Nov 08 16:22:00 1999
From: Andrew Cagney <ac131313@cygnus.com>
To: Eli Zaretskii <eliz@gnu.org>
Cc: Stan Shebs <shebs@cygnus.com>, gdb@sourceware.cygnus.com
Subject: Re: go32-nat.c compilation problem
Date: Mon, 08 Nov 1999 16:22:00 -0000
Message-id: <382768ED.EC306A06@cygnus.com>
References: <199911081709.SAA23904@cerbere.u-strasbg.fr> <14375.536.118347.328812@kwikemart.cygnus.com> <199911081742.MAA20623@mescaline.gnu.org>
X-SW-Source: 1999-q4/msg00227.html
Content-length: 689

Eli Zaretskii wrote:
> 
> > I believe you need to change fatal() to internal_error().
> 
> WIBNI, when a function is renamed or removed from the sources,
> somebody would grep all the sources and change all the callers, or
> least alert the various platform maintainers that a function used by
> their code is going to become extinct?
> 
> (If such a message *was* indeed posted in this case, I apologize for
> not seeing it.)

See the thread
http://sourceware.cygnus.com/ml/gdb-patches/1999-q3/msg00108.html
fatal() -> internal_error() jumbo patch

At the time J.T.C identifed one file I missed - .gdbinit.

As Elena noted, it came about as a result of crossed patches.

	sorry,
		Andrew
From jimb@cygnus.com Mon Nov 08 17:34:00 1999
From: Jim Blandy <jimb@cygnus.com>
To: Chris Faylor <cgf@cygnus.com>
Cc: Mark Kettenis <kettenis@wins.uva.nl>, Eli Zaretskii <eliz@gnu.org>, "J. T. Conklin" <jtc@redbacknetworks.com>, "J. Kean Johnston" <jkj@sco.com>, "H. J. Lu" <hjl@valinux.com>, gdb@sourceware.cygnus.com
Subject: Re: i386: Are we settled?
Date: Mon, 08 Nov 1999 17:34:00 -0000
Message-id: <npogd4h2i7.fsf@zwingli.cygnus.com>
References: <199911090018.TAA12933@zwingli.cygnus.com> <19991108192442.B2703@cygnus.com>
X-SW-Source: 1999-q4/msg00228.html
Content-length: 652

> >- What format should the output from "info float" take?  (Actually, it
> >  sounds like this is pretty much resolved.)
> 
> Did we resolve that there would be a generic routine for displaying the
> stuff from "info float"?

Yep.  Mark has written an implementation.  There have been comments
and suggestions, but no major objections.  I think there are copyright
issues to be resolved.

> >Notably missing from this list are any other questions about tm-i386.h
> >as it stands.  Am I correct in thinking that the other x86 port
> >maintainers think it's basically sane?
> 
> AFAICT, the plan makes sense.  Let's do it.

Okay.  One down, five to go.
From cgf@cygnus.com Mon Nov 08 17:54:00 1999
From: Chris Faylor <cgf@cygnus.com>
To: Jim Blandy <jimb@cygnus.com>
Cc: Mark Kettenis <kettenis@wins.uva.nl>, Eli Zaretskii <eliz@gnu.org>, "J. T. Conklin" <jtc@redbacknetworks.com>, "J. Kean Johnston" <jkj@sco.com>, "H. J. Lu" <hjl@valinux.com>, gdb@sourceware.cygnus.com
Subject: Re: i386: Are we settled?
Date: Mon, 08 Nov 1999 17:54:00 -0000
Message-id: <19991108205721.A1683@cygnus.com>
References: <199911090018.TAA12933@zwingli.cygnus.com> <19991108192442.B2703@cygnus.com> <npogd4h2i7.fsf@zwingli.cygnus.com>
X-SW-Source: 1999-q4/msg00229.html
Content-length: 513

On Mon, Nov 08, 1999 at 08:34:08PM -0500, Jim Blandy wrote:
>
>> >- What format should the output from "info float" take?  (Actually, it
>> >  sounds like this is pretty much resolved.)
>> 
>> Did we resolve that there would be a generic routine for displaying the
>> stuff from "info float"?
>
>Yep.  Mark has written an implementation.  There have been comments
>and suggestions, but no major objections.  I think there are copyright
>issues to be resolved.

Ah, that's right.  I remember seeing this now.

cgf
From jimb@cygnus.com Mon Nov 08 23:06:00 1999
From: Jim Blandy <jimb@cygnus.com>
To: gdb@sourceware.cygnus.com
Subject: MMX: Messy Multimedia eXtensions
Date: Mon, 08 Nov 1999 23:06:00 -0000
Message-id: <199911090706.CAA13120@zwingli.cygnus.com>
X-SW-Source: 1999-q4/msg00230.html
Content-length: 9093

Intel has contracted with Cygnus to provide support for the MMX and
SSE registers in the GNU toolchain.  We've just finished the beta.
The work is on a branch at the moment, so it won't be showing up in
snapshots.  I'd like to explain what I did in GDB, and get folks'
criticisms and ideas for what we would be happy with in mainline GDB.
There are some exquisitely twisted problems here.

If someone wants to really scrutinize my code, then I can post diffs.

In GCC, you can now write code like this (toy example, not tested):

    /* This declares the type V4SF to be a vector of four single-precision
       floats, in a way that encourages GCC to map it onto the Pentium-III
       SSE registers.  */
    typedef int v4sf __attribute__ ((mode(V4SF)));

    /* Given a bunch of points (X[i], Y[i]), 0 <= i < N, rotate
       each one clockwise by ANGLE radians.  For simplicity's sake,
       N must be a multiple of four.  */
    void
    rotate (double angle, int n, float *x, float *y)
    {
      int i;

      /* Load all four slots of these with the sin and cos of angle.  */
      v4sf cos_angle = __builtin_ia32_setps1 (cos (angle));
      v4sf sin_angle = __builtin_ia32_setps1 (sin (angle));

      /* Rotate all the points, four at a time.  */
      for (i = 0; i < n; i += 4)
	{
	  /* new_x = cos (angle) * x - sin (angle) * y
	     new_y = sin (angle) * x + cos (angle) * y */
	  v4sf x4 = __builtin_ia32_loadaps (x + i);
	  v4sf y4 = __builtin_ia32_loadaps (y + i);
	  v4sf new_x4
	    = __builtin_ia32_subps (__builtin_ia32_mulps (cos_angle, x4),
				    __builtin_ia32_mulps (sin_angle, y4));
	  v4sf new_y4
	    = __builtin_ia32_addps (__builtin_ia32_mulps (sin_angle, x4),
				    __builtin_ia32_mulps (cos_angle, y4));

	  __builtin_ia32_storeaps (x + i, new_x4);
	  __builtin_ia32_storeaps (y + i, new_y4);
	}
    }

All the v4sf values get mapped onto SSE registers automatically, and
the __builtin_ia32_foo forms turn into single SSE instructions.  It's
very sexy.  (Automatic vectorization would be even sexier, of course,
but that's another day.)

In GDB, you can now debug code like this (real example):


    (gdb) break *0x0804846b
    Breakpoint 1 at 0x804846b: file sse-mandel.c, line 42.
    (gdb) run
    Starting program: /home/jimb/play/sse-mandel 

    Breakpoint 1, 0x804846b in iter.aligned () at sse-mandel.c:42
    (gdb) next
    iter.aligned () at sse-mandel.c:43
    (gdb) p count
    $1 = {f = {1, 1, 1, 1}}
    (gdb) p countadd
    $2 = {f = {0, 0, 0, 0}}
    (gdb) p countadd
    $3 = {f = {1, 1, 1, 1}}
    (gdb) p zx
    $4 = {f = {-2.5, -2.482337, -2.464674, -2.44701076}}
    (gdb) p zy
    $5 = {f = {-1.25, -1.25, -1.25, -1.25}}
    (gdb) p countadd
    $6 = {f = {1, 1, 1, 1}}
    (gdb) set countadd.f[1] = 0
    (gdb) p countadd
    $7 = {f = {1, 0, 1, 1}}
    (gdb) 

If you want to print SSE registers, you can:

    (gdb) p $xmm3
    $14 = {f = {-2.5, -2.482337, -2.464674, -2.44701076}}

You can print MMX registers, too, but it's messier, since GDB doesn't
know whether it's eight 8-bit values, four 16-bit values, et cetera:

    (gdb) p $mm2
    $1 = {v8qi = {f = "\001\000\001\000\001\000\001"}, v4hi = {f = {1,
      1, 1, 1}}, v2si = {f = {65537, 65537}}, uint64 = 281479271743489}

(Please ignore the fact that the eight 8-bit integers are printed as
characters.  I'm going to fix that.)

The SSE work is pretty uncontroversial.  I think there's basically one
right way to do this.  The only unusual step is to assign the
appropriate virtual type to the registers --- choose something like

	struct __builtin_v4sf { float f[4]; };

and everything just works.

The MMX arrangement, however, is controversial.  That's what I'd like
people's criticism and comments on.

There are eight MMX registers, 64 bits long each.  They're actually
not new registers --- they occupy the 64-bit mantissas of the eight
floating-point registers.  The MMX registers map to physical FP
registers; the correspondence is unaffected by the FPU's top-of-stack
register.

The interaction between the MMX instructions and the FPU is odd.
Whenever you read or write an MMX register, the processor sets the
FPU's TOS to zero, and marks all FP registers as "Valid".  That is,
the stack is now full.  If you write an MMX register, the processor
sets the corresponding FP register's upper 16 bits to 0xffff.  (I
think this is a quiet NaN.)

So, how should we represent the MMX registers in GDB's register file?
There are two basic approaches:
- Assign them register numbers separate from the FP stack registers'.
- Assign them the same numbers as the FP stack registers, and treat them as
  an alternative way of looking at the FP registers' mantissas.

The first approach has some problems.
- Do you assign the MMX registers a separate region of the register
  file as well?
  - If so, when your target-specific code writes back GDB register values to
    the inferior, which copy does it write --- the FP registers, or the
    MMX registers?
  - If the user assigns to an FP stack register, the corresponding MMX
    registers' contents must be updated.  Is that handled in
    architecture-specific code?  Via what interface to the
    architecture-independent code?  How can that interface be designed
    so that future hackers, perhaps innocent of the delights of the
    x86, won't break it?  Would *you* expect writing register 12 to
    affect the value, in GDB's register file, of register 42?

I think this approach is fundamentally wrong, because the register
file doesn't match reality.  There are not really two separate sets of
bits --- the FP mantissas and the MMX registers are the same object.
If our model doesn't reflect that, we're going to be perpetually
discovering bugs with no correct solution.  I hate that.

The second approach is the one I took.  The typing information
provided by the compiler tells GDB how to interpret the register's
bits anyway.  The only wrinkle is that the FP registers are
REGISTER_CONVERTIBLE, so REGISTER_CONVERT_TO_{VIRTUAL,RAW} need to
expect MMX types as well as FP types.  They simply memcpy them.
With this approach, the register file accurately reflects the reality:
there is only one set of bits.

To let people access the MMX registers using names like `$mm2', I
added a new thing, "register views".  Register views allow you see a
register's bits using different types, depending on the name you call
it.

When the parser sees `$FOO', after checking whether `FOO' is a
register name, it calls the architecture-defined macro
IS_REGISTER_VIEW_NAME.  This macro either returns -1, meaning that it
doesn't recognize the name, or a register view number.  The macros
REGISTER_VIEW_REGNO and REGISTER_VIEW_TYPE map this register view
number to an ordinary register number, and a type to apply to that
register.  So for the x86, we have register views named "mm0", "mm1",
and so on, for which REGISTER_VIEW_REGNO returns FP0_REGNUM,
FP0_REGNUM + 1, and so on, and for which REGISTER_VIEW_TYPE returns an
appropriate union type for MMX registers.  There is a new expression
op, OP_REGISTER_VIEW, which works much like OP_REGISTER, but uses
REGISTER_VIEW_TYPE insead of REGISTER_VIRTUAL_TYPE.

I think this concept is useful for other architectures, too.  You
could use register views to provide more helpful interpretations of
control registers.  For example, perhaps a new register view $ftos
could apply the type
  struct { :10; unsigned int tos:3 }
to $fstat, or $fprec could apply the type
  struct { :7; enum { single, reserved, double, extended } pc:2; }
to $fctrl.  Thus:

    (gdb) print $ftos
    $1 = 3
    (gdb) print $fprec
    $2 = extended

Or something like that.

But, getting back to the MMX registers...

The problem is, we're using the same register number for %mm0 and
%st(0), but %mm0 doesn't really correspond to %st(0).  It depends on
the value of the FPU TOS register.  However, every MMX instruction
does reset TOS to zero.  And you can't really mix FP and MMX code very
effectively; the processor's behavior (marking the stack as full;
resetting TOS) seems designed to prevent this, without actually losing
data.  So it's almost always right.

Another problem is, we've added an entirely new concept --- register
views --- which affects the parser and the evaluator.  But the changes
are simple and straightforward, and they could be useful on other
architectures, if you want to view a single register 

Still, though, it's not quite right.  All the information is available
to do the job perfectly --- we have the TOS in $fctrl and everything.
And for something which requires (even simple) changes to the parser,
expression evaluator, and everything else that touches expressions,
you'd like to get perfection.

The real obstacle is the assumption, pervasive in GDB, that each
distinct register is an independent part of the machine state.  This
makes it very difficult to implement a truly accurate solution.  I
don't really know how to work around that.

So, I'm interested in folks' opinions on the current support, and
ideas on how to do better.  How should we do this?
From lplos@essegi.net Tue Nov 09 03:06:00 1999
From: "Livio Plos - Essegi s.r.l." <lplos@essegi.net>
To: gdb@sourceware.cygnus.com
Subject: PPC gdb stub
Date: Tue, 09 Nov 1999 03:06:00 -0000
Message-id: <3.0.6.32.19991109120052.007e9bf0@titano>
X-SW-Source: 1999-q4/msg00231.html
Content-length: 138

Can anyone help me, pointing me a site where I can
find sources for a power pc gdb stub (resident debug monitor)?

Thank you.
Livio Plos


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Another RFC: regex in libiberty
@ 2001-06-07 18:27                   ` DJ Delorie
  2001-06-07 18:31                     ` Ian Lance Taylor
                                       ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: DJ Delorie @ 2001-06-07 18:27 UTC (permalink / raw)
  To: gcc, gdb, binutils, cygwin

[More lists added to get a wider audience]

I didn't get a clear feeling about what people wanted wrt this.  I saw
three people propose three versions of regex, not much to go on.  Is
this a big deal?  Will it really get used by everyone who currently
has their own regex?  Is it important to try to use a BSD-licensed
regex to minimize future problems?

The two contenders seem to be a modified GNU regex and the
ever-popular Henry Spencer's regex.  Does anyone have any strong
opinions for either of these, or against any regex in libiberty at
all?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Another RFC: regex in libiberty
  2001-06-07 18:27                   ` Another RFC: regex in libiberty DJ Delorie
@ 2001-06-07 18:31                     ` Ian Lance Taylor
  2001-06-07 18:33                       ` DJ Delorie
  2001-06-08  0:11                     ` Eli Zaretskii
                                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Ian Lance Taylor @ 2001-06-07 18:31 UTC (permalink / raw)
  To: DJ Delorie; +Cc: gcc, gdb, binutils, cygwin

DJ Delorie <dj@redhat.com> writes:

> [More lists added to get a wider audience]
> 
> I didn't get a clear feeling about what people wanted wrt this.  I saw
> three people propose three versions of regex, not much to go on.  Is
> this a big deal?  Will it really get used by everyone who currently
> has their own regex?  Is it important to try to use a BSD-licensed
> regex to minimize future problems?
> 
> The two contenders seem to be a modified GNU regex and the
> ever-popular Henry Spencer's regex.  Does anyone have any strong
> opinions for either of these, or against any regex in libiberty at
> all?

gdb already ships with gnu-regex.c.  Why not just move that to
libiberty?

I can't see any reason for a BSD-licensed regex in libiberty.
libiberty already GPL code.

Ian


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Another RFC: regex in libiberty
  2001-06-07 18:31                     ` Ian Lance Taylor
@ 2001-06-07 18:33                       ` DJ Delorie
  2001-06-07 18:43                         ` Ian Lance Taylor
  0 siblings, 1 reply; 19+ messages in thread
From: DJ Delorie @ 2001-06-07 18:33 UTC (permalink / raw)
  To: ian; +Cc: gcc, gdb, binutils, cygwin

> gdb already ships with gnu-regex.c.  Why not just move that to
> libiberty?

Because gdb, tcl, expect, cygwin, and gcc each have a copy of regex,
and they're all different.  Which to choose?

> I can't see any reason for a BSD-licensed regex in libiberty.
> libiberty already GPL code.

Any regex added to libiberty becomes part of newlib and cygwin as
well, and those projects are sensitive to GPL vs non-GPL licensing
issues.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Another RFC: regex in libiberty
  2001-06-07 18:33                       ` DJ Delorie
@ 2001-06-07 18:43                         ` Ian Lance Taylor
  0 siblings, 0 replies; 19+ messages in thread
From: Ian Lance Taylor @ 2001-06-07 18:43 UTC (permalink / raw)
  To: DJ Delorie; +Cc: gcc, gdb, binutils, cygwin

DJ Delorie <dj@delorie.com> writes:

> > gdb already ships with gnu-regex.c.  Why not just move that to
> > libiberty?
> 
> Because gdb, tcl, expect, cygwin, and gcc each have a copy of regex,
> and they're all different.  Which to choose?

The ones in gdb and gcc are basically the same.  TCL and Expect are
not GNU projects, and will continue to have their own regex code.
Cygwin has different licensing constraints; cygwin already has its own
copy of getopt, for instance.

> > I can't see any reason for a BSD-licensed regex in libiberty.
> > libiberty already GPL code.
> 
> Any regex added to libiberty becomes part of newlib and cygwin as
> well, and those projects are sensitive to GPL vs non-GPL licensing
> issues.

I see no reason to confuse the regex in libiberty with the regex in
newlib and cygwin, any more than there is to confuse the getopt in
libiberty.  regex in libiberty should satisfy the needs of GNU tools,
and as such I think it is appropriate to use the GNU regex.  Of
course, if the GNU regex is inferior, then it might make sense to
choose something else.  But I don't think we should avoid using GNU
code for GNU tools because of licensing issues for non-GNU tools.

Ian


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Another RFC: regex in libiberty
  2001-06-07 18:27                   ` Another RFC: regex in libiberty DJ Delorie
  2001-06-07 18:31                     ` Ian Lance Taylor
@ 2001-06-08  0:11                     ` Eli Zaretskii
  2001-06-08  9:18                       ` Mark Mitchell
                                         ` (2 more replies)
  2001-06-08  1:15                     ` Pierre Muller
  2001-06-09 13:34                     ` Another RFC: regex in libiberty Andrew Cagney
  3 siblings, 3 replies; 19+ messages in thread
From: Eli Zaretskii @ 2001-06-08  0:11 UTC (permalink / raw)
  To: dj; +Cc: gcc, gdb, binutils, cygwin

> Date: Thu, 7 Jun 2001 21:27:31 -0400
> From: DJ Delorie <dj@redhat.com>
> 
> I didn't get a clear feeling about what people wanted wrt this.  I saw
> three people propose three versions of regex, not much to go on.  Is
> this a big deal?  Will it really get used by everyone who currently
> has their own regex?  Is it important to try to use a BSD-licensed
> regex to minimize future problems?
> 
> The two contenders seem to be a modified GNU regex and the
> ever-popular Henry Spencer's regex.  Does anyone have any strong
> opinions for either of these, or against any regex in libiberty at
> all?

One notorious problem with GNU regex is that it is quite slow for many
simple jobs, such as matching a simple regular expression with no
backtracking.  It seems that the main reason for this slowness is the
fact that GNU regex supports null characters in strings.  For
examnple, Sed 3.02 compiled with GNU regex is about 2-4 times slower
on simple jobs than the same Sed compiled with Spencer's regex
library.  (The DJGPP port of Sed is actually distributed with two
executables, one build with GNU regex, the other with Spencer's, for
this very reason.)

So perhaps it might help to have more than just GNU regex in
libiberty, for those applications that don't need to support null
characters, and where regular expressions are used a lot, and so need
to be fast.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Another RFC: regex in libiberty
  2001-06-07 18:27                   ` Another RFC: regex in libiberty DJ Delorie
  2001-06-07 18:31                     ` Ian Lance Taylor
  2001-06-08  0:11                     ` Eli Zaretskii
@ 2001-06-08  1:15                     ` Pierre Muller
  2001-06-08  1:36                       ` About struct bpp_transfer_params ±èµæÁß
  2001-06-09 13:34                     ` Another RFC: regex in libiberty Andrew Cagney
  3 siblings, 1 reply; 19+ messages in thread
From: Pierre Muller @ 2001-06-08  1:15 UTC (permalink / raw)
  To: DJ Delorie; +Cc: gdb

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1282 bytes --]

At 03:27 08/06/01 , vous avez écrit:

>[More lists added to get a wider audience]
>
>I didn't get a clear feeling about what people wanted wrt this.  I saw
>three people propose three versions of regex, not much to go on.  Is
>this a big deal?  Will it really get used by everyone who currently
>has their own regex?  Is it important to try to use a BSD-licensed
>regex to minimize future problems?


  I would really like to get a clearer understanding of this file.
This file is so full of #ifdef that I never really understood much of it.

   Neverthelees, I once sent a patch (without really expecting that it 
would be accepted)
to allow case insensitive expression parsing which is
included in my patch to the gdb-5.0 version for
support of pascal language.

   See
http://sources.redhat.com/ml/gdb/2000-06/msg00146.html
And also Mark Kettenis answers
http://sources.redhat.com/ml/gdb/2000-06/msg00150.html
and
http://sources.redhat.com/ml/gdb/2000-06/msg00155.html

   No decision to change to POSIX entry points were made since
(unless I missed something, which is of course highly possible)



Pierre Muller
Institut Charles Sadron
6,rue Boussingault
F 67083 STRASBOURG CEDEX (France)
mailto:muller@ics.u-strasbg.fr
Phone : (33)-3-88-41-40-07  Fax : (33)-3-88-41-40-99


^ permalink raw reply	[flat|nested] 19+ messages in thread

* About struct bpp_transfer_params...
  2001-06-08  1:15                     ` Pierre Muller
@ 2001-06-08  1:36                       ` ±èµæÁß
  2001-06-08  7:43                         ` Fernando Nasser
  0 siblings, 1 reply; 19+ messages in thread
From: ±èµæÁß @ 2001-06-08  1:36 UTC (permalink / raw)
  To: gdb

Hi..
 
i'm studying a gdb parallel port source that is gdb/rdi-share/ directory..
 
but, i dont know about struct bpp_transfer_parms..
and i couldn't find any infomation about that..
so., who tell me a method? or give me a infomation about struct bpp_transfer_parms..
 
Have a nice day!
From f.haverkamp@web.de Fri Jun 08 03:55:00 2001
From: "Frank Haverkamp" <f.haverkamp@web.de>
To: gdb@sources.redhat.com
Subject: How can I analyze a core from the target on my hostsystem?
Date: Fri, 08 Jun 2001 03:55:00 -0000
Message-id: <200106081055.f58AtMj18637@mailgate4.cinetic.de>
X-SW-Source: 2001-06/msg00047.html
Content-length: 1223

Hi,
 
one of the programs on my PowerPC embedded systems generated a
core file. I want to analyze it on my host pc. Therefore I started my
powerpc-linux-gdb debugger and asked it to load the core, but it
refused:

There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "--host=i586-pc-linux-gnu --target=powerpc-linux".
(gdb)
(gdb)
(gdb) core-file core
GDB can't read core files on this machine.
(gdb)

The gdb running on the target can analyze the core. But I want to
do that on the hostsystem cause in production I will not have access
to the target system anymore, but I will get the core files :-)
and I should analyze them. 

So my question: 
Is that a misconfiguration of my selfcompiled powerpc-linux-gdb or is that not possible in general, and if it isn't what is the reason for it?
What options do I have if analyzation of the target core file with is not possible on the host? Are there any other tools for that purpose?

Thanks,

Frank


--
Frank Haverkamp
f.haverkamp@web.de
______________________________________________________________________________
Sie surfen im Internet statt im Meer? Selbst schuld!
Auf zum Strand: http://lastminute.de/?PP=1-0-100-105-1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: About struct bpp_transfer_params...
  2001-06-08  1:36                       ` About struct bpp_transfer_params ±èµæÁß
@ 2001-06-08  7:43                         ` Fernando Nasser
  0 siblings, 0 replies; 19+ messages in thread
From: Fernando Nasser @ 2001-06-08  7:43 UTC (permalink / raw)
  To: ±èµæÁß
  Cc: gdb

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1535 bytes --]

±èµæÁß wrote:
> 
> Hi..
> 
> i'm studying a gdb parallel port source that is gdb/rdi-share/ directory..
> 
> but, i dont know about struct bpp_transfer_parms..
> and i couldn't find any infomation about that..
> so., who tell me a method? or give me a infomation about struct bpp_transfer_parms..
> 
> Have a nice day!


bpp_transfer_parms  only exists on Sun machines.  It is either under
/usr/include/sys or /usr/include/sbusdev in a header file called
bpp_io.h


Note that you can figure this things out by yourself.  If you looked at
the start of that source file you would have seen:

#ifdef sun
# include <sys/ioccom.h>
# ifdef __svr4__
#  include <sys/bpp_io.h>
# else
#  include <sbusdev/bpp_io.h>
# endif
#endif


Always remember that there is no magic.  If the compiler did not
complain about a symbol and it was not defined in that source file, so
it must have been defined in something that it included -- probably in
the header files.

Also, if you look at the lines where it is used, like:

    struct bpp_transfer_parms tp;

    /*
     * we need to set the parallel port up for BUSY handshaking,
     * and select the timeout
     */
    if (ioctl(parpfd, BPPIOC_GETPARMS, &tp) < 0)

you can determine that it is associated with the ioctl() in some way, in
particular with parallel ports ad the operation BPPIOC_GETPARMS.


Good luck with your project.


-- 
Fernando Nasser
Red Hat Canada Ltd.                     E-Mail:  fnasser@redhat.com
2323 Yonge Street, Suite #300
Toronto, Ontario   M4P 2C9


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Another RFC: regex in libiberty
  2001-06-08  0:11                     ` Eli Zaretskii
@ 2001-06-08  9:18                       ` Mark Mitchell
  2001-06-08  9:59                       ` Zack Weinberg
  2001-06-11 22:49                       ` Jim Blandy
  2 siblings, 0 replies; 19+ messages in thread
From: Mark Mitchell @ 2001-06-08  9:18 UTC (permalink / raw)
  To: eliz; +Cc: dj, gcc, gdb, binutils, cygwin

>>>>> "Eli" == Eli Zaretskii <eliz@is.elta.co.il> writes:

    >> The two contenders seem to be a modified GNU regex and the
    >> ever-popular Henry Spencer's regex.  Does anyone have any
    >> strong opinions for either of these, or against any regex in
    >> libiberty at all?

My opinion may or may not matter on this debate, but here it is.
Since libiberty for use in GNU software, we must use GNU regex.  If
GNU regex is slow, we should make it faster.

--
Mark Mitchell                   mark@codesourcery.com
CodeSourcery, LLC               http://www.codesourcery.com


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Another RFC: regex in libiberty
  2001-06-08  0:11                     ` Eli Zaretskii
  2001-06-08  9:18                       ` Mark Mitchell
@ 2001-06-08  9:59                       ` Zack Weinberg
  2001-06-08 10:05                         ` H . J . Lu
  2001-06-08 10:37                         ` Eli Zaretskii
  2001-06-11 22:49                       ` Jim Blandy
  2 siblings, 2 replies; 19+ messages in thread
From: Zack Weinberg @ 2001-06-08  9:59 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: dj, gcc, gdb, binutils, cygwin

On Fri, Jun 08, 2001 at 10:06:51AM +0300, Eli Zaretskii wrote:
> 
> One notorious problem with GNU regex is that it is quite slow for many
> simple jobs, such as matching a simple regular expression with no
> backtracking.  It seems that the main reason for this slowness is the
> fact that GNU regex supports null characters in strings.  For
> examnple, Sed 3.02 compiled with GNU regex is about 2-4 times slower
> on simple jobs than the same Sed compiled with Spencer's regex
> library.

I think the null characters are a red herring.  I looked into GNU
regex's performance in the context of GCC's fixincludes program, last
year.  On a platform that has mostly-okay headers, fixincludes spends
most of its time matching regular expressions.

The regex.c that came with GDB 4.18, which I think is the one that got
spread around widely, had a bug in its implementation of the POSIX
regcomp/regexec interface, which caused a major performance hit.  That
bug has been fixed in GNU libc for a long time.  When I replaced
fixincludes' copy of regex.c with a more recent version from glibc,
fixincludes was sped up by a factor of nine.  That same bug affects
Sed 3.02 - replace the regex.c it ships with with the one from glibc
2.2.x and I bet you'll see better performance.

There's some discussion in these messages:

http://gcc.gnu.org/ml/gcc-patches/2000-01/msg00764.html
http://gcc.gnu.org/ml/gcc-patches/2000-01/msg00765.html

The relevant fix is in there, too, if you want to pull it out and
apply it.

I did some benchmarking of fixincludes with Spencer's regexp library
as well.  IIRC, it was about the same as the fixed GNU regex.c.

-- 
zw        This is, no doubt, the rational strategy; quite possibly the
          only one that will work.  But it ignores the exigiencies of
          the tenure system and is therefore impractical.
          	-- Jerry Fodor, _The Mind Doesn't Work That Way_


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Another RFC: regex in libiberty
  2001-06-08  9:59                       ` Zack Weinberg
@ 2001-06-08 10:05                         ` H . J . Lu
  2001-06-08 10:31                           ` Eli Zaretskii
  2001-06-08 10:37                         ` Eli Zaretskii
  1 sibling, 1 reply; 19+ messages in thread
From: H . J . Lu @ 2001-06-08 10:05 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Eli Zaretskii, dj, gcc, gdb, binutils, cygwin

On Fri, Jun 08, 2001 at 09:59:32AM -0700, Zack Weinberg wrote:
> 
> The regex.c that came with GDB 4.18, which I think is the one that got
> spread around widely, had a bug in its implementation of the POSIX
> regcomp/regexec interface, which caused a major performance hit.  That
> bug has been fixed in GNU libc for a long time.  When I replaced
> fixincludes' copy of regex.c with a more recent version from glibc,
> fixincludes was sped up by a factor of nine.  That same bug affects
> Sed 3.02 - replace the regex.c it ships with with the one from glibc
> 2.2.x and I bet you'll see better performance.
> 

I have been telling people that you should use regex.c in glibc if
all possible if you are using gnu-regex. Every package which uses
gnu-regex should have a configuration option not to use the included
gnu-regex.


H.J.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Another RFC: regex in libiberty
  2001-06-08 10:05                         ` H . J . Lu
@ 2001-06-08 10:31                           ` Eli Zaretskii
  2001-06-08 10:39                             ` H . J . Lu
  0 siblings, 1 reply; 19+ messages in thread
From: Eli Zaretskii @ 2001-06-08 10:31 UTC (permalink / raw)
  To: hjl; +Cc: zackw, dj, gcc, gdb, binutils, cygwin

> Date: Fri, 8 Jun 2001 10:05:32 -0700
> From: "H . J . Lu" <hjl@lucon.org>
> 
> On Fri, Jun 08, 2001 at 09:59:32AM -0700, Zack Weinberg wrote:
> > 
> > The regex.c that came with GDB 4.18, which I think is the one that got
> > spread around widely, had a bug in its implementation of the POSIX
> > regcomp/regexec interface, which caused a major performance hit.  That
> > bug has been fixed in GNU libc for a long time.  When I replaced
> > fixincludes' copy of regex.c with a more recent version from glibc,
> > fixincludes was sped up by a factor of nine.  That same bug affects
> > Sed 3.02 - replace the regex.c it ships with with the one from glibc
> > 2.2.x and I bet you'll see better performance.
> > 
> 
> I have been telling people that you should use regex.c in glibc if
> all possible if you are using gnu-regex. Every package which uses
> gnu-regex should have a configuration option not to use the included
> gnu-regex.

Sed does have such an option (I used it to build the binary with
Spencer's regex which is the standard regex included in the DJGPP
library).


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Another RFC: regex in libiberty
  2001-06-08  9:59                       ` Zack Weinberg
  2001-06-08 10:05                         ` H . J . Lu
@ 2001-06-08 10:37                         ` Eli Zaretskii
  1 sibling, 0 replies; 19+ messages in thread
From: Eli Zaretskii @ 2001-06-08 10:37 UTC (permalink / raw)
  To: zackw; +Cc: dj, gcc, gdb, binutils, cygwin

> From: "Zack Weinberg" <zackw@stanford.edu>
> Date: Fri, 8 Jun 2001 09:59:32 -0700
> 
> On Fri, Jun 08, 2001 at 10:06:51AM +0300, Eli Zaretskii wrote:
> > 
> > One notorious problem with GNU regex is that it is quite slow for many
> > simple jobs, such as matching a simple regular expression with no
> > backtracking.  It seems that the main reason for this slowness is the
> > fact that GNU regex supports null characters in strings.  For
> > examnple, Sed 3.02 compiled with GNU regex is about 2-4 times slower
> > on simple jobs than the same Sed compiled with Spencer's regex
> > library.
> 
> I think the null characters are a red herring.

It's possible; I never had time to look into it far enough to be
sure.  All I know is that the slow-down happened between two specific
versions of GNU regex, and the support for null characters was
introduced between those two versions.

> The regex.c that came with GDB 4.18, which I think is the one that got
> spread around widely, had a bug in its implementation of the POSIX
> regcomp/regexec interface, which caused a major performance hit.  That
> bug has been fixed in GNU libc for a long time.  When I replaced
> fixincludes' copy of regex.c with a more recent version from glibc,
> fixincludes was sped up by a factor of nine.  That same bug affects
> Sed 3.02 - replace the regex.c it ships with with the one from glibc
> 2.2.x and I bet you'll see better performance.
> 
> There's some discussion in these messages:
> 
> http://gcc.gnu.org/ml/gcc-patches/2000-01/msg00764.html
> http://gcc.gnu.org/ml/gcc-patches/2000-01/msg00765.html

Thanks for the pointers.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Another RFC: regex in libiberty
  2001-06-08 10:31                           ` Eli Zaretskii
@ 2001-06-08 10:39                             ` H . J . Lu
  0 siblings, 0 replies; 19+ messages in thread
From: H . J . Lu @ 2001-06-08 10:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: zackw, dj, gcc, gdb, binutils, cygwin

On Fri, Jun 08, 2001 at 08:26:00PM +0300, Eli Zaretskii wrote:
> > 
> > I have been telling people that you should use regex.c in glibc if
> > all possible if you are using gnu-regex. Every package which uses
> > gnu-regex should have a configuration option not to use the included
> > gnu-regex.
> 
> Sed does have such an option (I used it to build the binary with
> Spencer's regex which is the standard regex included in the DJGPP
> library).

Glad to hear that. It makes even more senses when gnu-regex is the
standard regex in the system C library.


H.J.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Another RFC: regex in libiberty
  2001-06-07 18:27                   ` Another RFC: regex in libiberty DJ Delorie
                                       ` (2 preceding siblings ...)
  2001-06-08  1:15                     ` Pierre Muller
@ 2001-06-09 13:34                     ` Andrew Cagney
  3 siblings, 0 replies; 19+ messages in thread
From: Andrew Cagney @ 2001-06-09 13:34 UTC (permalink / raw)
  To: DJ Delorie; +Cc: gcc, gdb, binutils, cygwin

For GDB:

For 5.0 it included its own REGEXP *but* if so configured and if the 
system GNU REGEXP is > version xyz, it could use that.

For 5.1 (if someone remembers to do it) the logic was going to be 
switched.  Use the system REGEXP provided it is > version XYZ.  Well I 
think that is what we decided.

Given GDB is GNU code, it shall use GNU regex.

	enjoy,
		Andrew


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Another RFC: regex in libiberty
  2001-06-08  0:11                     ` Eli Zaretskii
  2001-06-08  9:18                       ` Mark Mitchell
  2001-06-08  9:59                       ` Zack Weinberg
@ 2001-06-11 22:49                       ` Jim Blandy
  2001-06-11 23:51                         ` Randall R Schulz
  2001-06-12  6:48                         ` Jim Blandy
  2 siblings, 2 replies; 19+ messages in thread
From: Jim Blandy @ 2001-06-11 22:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: dj, gcc, gdb, binutils, cygwin

"Eli Zaretskii" <eliz@is.elta.co.il> writes:
> One notorious problem with GNU regex is that it is quite slow for many
> simple jobs, such as matching a simple regular expression with no
> backtracking.  It seems that the main reason for this slowness is the
> fact that GNU regex supports null characters in strings.  For
> examnple, Sed 3.02 compiled with GNU regex is about 2-4 times slower
> on simple jobs than the same Sed compiled with Spencer's regex
> library.  (The DJGPP port of Sed is actually distributed with two
> executables, one build with GNU regex, the other with Spencer's, for
> this very reason.)

I'm suspicious of this assertion.  I've worked on GNU regexp in the
past, and I don't see any reason this should be so.

However, I was messing around with regexps a lot when GNU regexp
suddenly became slow on certain regexps.  I looked into the cause, and
it turned out that this was because GNU regexp had been made to comply
with the POSIX regexp specification.  POSIX regexp semantics require
that the regexp match the longest possible string (I may have the
details wrong, but it's something like that).  For backtracking regexp
engines (the GNU, Henry Spencer, and Perl regexp matchers are all of
this design), that innocent-sounding constraint basically requires
insanely slow behavior.

GNU regexp has a flag that allows you to turn this behavior off, and
get the traditional, faster, non-POSIX-compliant behavior.  So I don't
see any reason the GNU regexp library couldn't serve all the GPL'd
software's needs.

----

The details, for the curious:

Suppose you have a regexp like this (assume the obvious
metacharacters)

  (FOObar|FOO)(barbar)+
   ---a-- -b-  ---c--      <= labels for pieces of the regexp

which you're matching against the string

  FOObarbarbarbar
  0  3  6  9  12

and let's suppose you're calling the regexp library in a manner which
asks "does a prefix of this string match this regexp?"  (That is,
you're not asking "does this regexp match this entire string?")

The traditional behavior is for the regexp engine to match part ---a--
of the regexp against data[0..5], match one repetition of part ---c--
against data[6..8], and say it's done.  The Perl regexp matcher will
return this match.

But this isn't the behavior POSIX requires.  POSIX says you must
return the *longest possible* match.  So a POSIX-compliant matcher
must match -b- against data[0..2], and then two repetitions of ---c--
against data[3..8] and data[9..14].  This is a longer match.

To find the longest match, in general, a backtracking matcher has to
generate every possible match, and return the longest one it found.
This is what GNU regexp does.

So, just how bad is this?  Well, suppose you're matching a regexp like:

        .*.*.*.*.*.*

against a string like

     aaaaaaaaaaaaaaaaaaaa

To generate every possible match, you have to choose every possible
way to divide up those twenty a's amongst six .* patterns.  I think
this is 20 choose 5, or 1.9 million, matches you have to try.  In
general, I think the time to match POSIXly can increase exponentially
in the length of your regexp, given a long enough data string.

If you have a smart regexp compiler (I understand Perl is pretty
clever), then you could probably handle this regexp with a bit more
aplomb.  But I'll bet that I can find a regexp where you really do
have to try all matchings, no matter how smart your regexp compiler
is.

(So think of that the next time you propose some innocent-sounding
constraint, like "longest match always"!)

Anyway, the outcome is that all the really popular regexp matchers
either don't implement the POSIX behavior, or provide options to turn
it off.  For GNU regexp, you can use the RE_NO_POSIX_BACKTRACKING
flag, and you'll get the traditional not-always-the-longest-match nice
fast behavior.  Perl simply documents the traditional behavior ("The
[Perl regexp] engine thinks locally and acts globally," as the Camel
book puts it).


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Another RFC: regex in libiberty
  2001-06-11 22:49                       ` Jim Blandy
@ 2001-06-11 23:51                         ` Randall R Schulz
  2001-06-12  6:48                         ` Jim Blandy
  1 sibling, 0 replies; 19+ messages in thread
From: Randall R Schulz @ 2001-06-11 23:51 UTC (permalink / raw)
  To: Jim Blandy, Eli Zaretskii; +Cc: dj, gcc, gdb, binutils

Jim,

[ This isn't cygwin-specific, so I removed it from the recipient list. ]

Your analysis is correct, basically, but the requirement for "maximum bite" 
or "greediness" (as it's variously called) is quite common and has been the 
behavior of all the Unix-based or -inspired regular expression matchers 
I've worked with for about 20 years.

If there are regular expression matchers out there that do otherwise, I 
haven't encountered them.

The maximum bite requirement really is far from "insane," because without 
it, there's no other well-defined and meaningful specification of how 
(much) to match when the regular expression is ambiguous w.r.t. the target 
text. It would hardly do to just return the first match found, since that 
would (well, might) depend on implementation details. I'm not sure, but I'd 
want to think about whether relaxing maximum bite was significant w.r.t. 
the choice of NFA vs. DFA matcher (I don't know which approach is used by 
regex).

It's fine to have an option to change this, I guess, but regular expression 
matchers that don't implement maximum bite by default would not be what 
people expect at all. Actually, I'm not certain how the user would be able 
to predict what would be matched if maximum bite were disabled.

It seems to me that if you have an on / off option for maximum bite, then 
the only meaningful choice when maximum bite is off would be minimum bite, 
and for * that would always be zero and for + it would be one, so what's 
the point of closure? If there's no closure involved, then static 
examination of the regular expression (when the NFA or DFA is constructed) 
is enough to determine maximum bite, so there's no need for a performance 
hit. Implementing that optimization isn't difficult at all.

As you suggest, there are other more subtle (but still well defined) 
optimizations that make some common cases much better behaved (e.g., 
detecting disjointedness of the sets of characters that can be matched at 
the boundaries of closed (* and +) or alternated (|) sub-expressions can be 
used to eliminate backtracking). Differentiating parenthesized 
sub-expressions that must be tracked for independent extraction from those 
that only group the sub-expression they enclose to override the normal 
operator precedence can also be used to advantage (I think).

Anyway, I really don't think you should change this behavior--you'd be 
breaking regex. Maximum bite is specified for a reason--a good reason.

Randall Schulz
Mountain View, CA USA


At 22:49 2001-06-11, Jim Blandy wrote:

>"Eli Zaretskii" <eliz@is.elta.co.il> writes:
> > One notorious problem with GNU regex is that it is quite slow for many
> > simple jobs, such as matching a simple regular expression with no
> > backtracking.  It seems that the main reason for this slowness is the
> > fact that GNU regex supports null characters in strings.  For
> > examnple, Sed 3.02 compiled with GNU regex is about 2-4 times slower
> > on simple jobs than the same Sed compiled with Spencer's regex
> > library.  (The DJGPP port of Sed is actually distributed with two
> > executables, one build with GNU regex, the other with Spencer's, for
> > this very reason.)
>
>I'm suspicious of this assertion.  I've worked on GNU regexp in the
>past, and I don't see any reason this should be so.
>
>However, I was messing around with regexps a lot when GNU regexp
>suddenly became slow on certain regexps.  I looked into the cause, and
>it turned out that this was because GNU regexp had been made to comply
>with the POSIX regexp specification.  POSIX regexp semantics require
>that the regexp match the longest possible string (I may have the
>details wrong, but it's something like that).  For backtracking regexp
>engines (the GNU, Henry Spencer, and Perl regexp matchers are all of
>this design), that innocent-sounding constraint basically requires
>insanely slow behavior.
>
>GNU regexp has a flag that allows you to turn this behavior off, and
>get the traditional, faster, non-POSIX-compliant behavior.  So I don't
>see any reason the GNU regexp library couldn't serve all the GPL'd
>software's needs.
>
>----
>
>The details, for the curious:
>
>Suppose you have a regexp like this (assume the obvious
>metacharacters)
>
>   (FOObar|FOO)(barbar)+
>    ---a-- -b-  ---c--      <= labels for pieces of the regexp
>
>which you're matching against the string
>
>   FOObarbarbarbar
>   0  3  6  9  12
>
>and let's suppose you're calling the regexp library in a manner which
>asks "does a prefix of this string match this regexp?"  (That is,
>you're not asking "does this regexp match this entire string?")
>
>The traditional behavior is for the regexp engine to match part ---a--
>of the regexp against data[0..5], match one repetition of part ---c--
>against data[6..8], and say it's done.  The Perl regexp matcher will
>return this match.
>
>But this isn't the behavior POSIX requires.  POSIX says you must
>return the *longest possible* match.  So a POSIX-compliant matcher
>must match -b- against data[0..2], and then two repetitions of ---c--
>against data[3..8] and data[9..14].  This is a longer match.
>
>To find the longest match, in general, a backtracking matcher has to
>generate every possible match, and return the longest one it found.
>This is what GNU regexp does.
>
>So, just how bad is this?  Well, suppose you're matching a regexp like:
>
>         .*.*.*.*.*.*
>
>against a string like
>
>      aaaaaaaaaaaaaaaaaaaa
>
>To generate every possible match, you have to choose every possible
>way to divide up those twenty a's amongst six .* patterns.  I think
>this is 20 choose 5, or 1.9 million, matches you have to try.  In
>general, I think the time to match POSIXly can increase exponentially
>in the length of your regexp, given a long enough data string.
>
>If you have a smart regexp compiler (I understand Perl is pretty
>clever), then you could probably handle this regexp with a bit more
>aplomb.  But I'll bet that I can find a regexp where you really do
>have to try all matchings, no matter how smart your regexp compiler
>is.
>
>(So think of that the next time you propose some innocent-sounding
>constraint, like "longest match always"!)
>
>Anyway, the outcome is that all the really popular regexp matchers
>either don't implement the POSIX behavior, or provide options to turn
>it off.  For GNU regexp, you can use the RE_NO_POSIX_BACKTRACKING
>flag, and you'll get the traditional not-always-the-longest-match nice
>fast behavior.  Perl simply documents the traditional behavior ("The
>[Perl regexp] engine thinks locally and acts globally," as the Camel
>book puts it).
>
>--
>Want to unsubscribe from this list?
>Check out: http://cygwin.com/ml/#unsubscribe-simple


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Another RFC: regex in libiberty
  2001-06-11 22:49                       ` Jim Blandy
  2001-06-11 23:51                         ` Randall R Schulz
@ 2001-06-12  6:48                         ` Jim Blandy
  1 sibling, 0 replies; 19+ messages in thread
From: Jim Blandy @ 2001-06-12  6:48 UTC (permalink / raw)
  To: Jim Blandy; +Cc: Eli Zaretskii, dj, gcc, gdb, binutils, cygwin

Jim Blandy <jimb@cygnus.com> writes:
> To generate every possible match, you have to choose every possible
> way to divide up those twenty a's amongst six .* patterns.  I think
> this is 20 choose 5, or 1.9 million, matches you have to try.  In
> general, I think the time to match POSIXly can increase exponentially
> in the length of your regexp, given a long enough data string.

20 choose 5 is, of course, only 15504, not 1.9 million.  Oops.


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2001-06-12  6:48 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <Daniel>
     [not found] ` <Vogel's>
     [not found]   ` <message>
     [not found]     ` <of>
     [not found]       ` <Mon,>
     [not found]         ` <01>
     [not found]           ` <Nov>
     [not found]             ` <1999>
     [not found]               ` <14:25:01>
     [not found]                 ` <+0100>
     [not found]                   ` <381D94AD.B37EC167@grafzahl.de>
1999-11-08  8:54                     ` go32-nat.c compilation problem Pierre Muller
     [not found]       ` <Fri,>
     [not found]         ` <08>
     [not found]           ` <Jun>
     [not found]             ` <2001>
     [not found]               ` <10:06:51>
     [not found]                 ` <+0300>
2001-06-07 18:27                   ` Another RFC: regex in libiberty DJ Delorie
2001-06-07 18:31                     ` Ian Lance Taylor
2001-06-07 18:33                       ` DJ Delorie
2001-06-07 18:43                         ` Ian Lance Taylor
2001-06-08  0:11                     ` Eli Zaretskii
2001-06-08  9:18                       ` Mark Mitchell
2001-06-08  9:59                       ` Zack Weinberg
2001-06-08 10:05                         ` H . J . Lu
2001-06-08 10:31                           ` Eli Zaretskii
2001-06-08 10:39                             ` H . J . Lu
2001-06-08 10:37                         ` Eli Zaretskii
2001-06-11 22:49                       ` Jim Blandy
2001-06-11 23:51                         ` Randall R Schulz
2001-06-12  6:48                         ` Jim Blandy
2001-06-08  1:15                     ` Pierre Muller
2001-06-08  1:36                       ` About struct bpp_transfer_params ±èµæÁß
2001-06-08  7:43                         ` Fernando Nasser
2001-06-09 13:34                     ` Another RFC: regex in libiberty Andrew Cagney
     [not found] <Eli>

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox