From mboxrd@z Thu Jan  1 00:00:00 1970
From: Donn Terry <donnte@microsoft.com>
To: 'Daniel Berlin' <dberlin@redhat.com>, Kevin Buettner <kevinb@cygnus.com>
Cc: Andrew Cagney <ac131313@cygnus.com>, GDB Discussion <gdb@sourceware.cygnus.com>, GDB Patches Mail List <gdb-patches@sourceware.cygnus.com>
Subject: RE: RFA: Testsuite patches...
Date: Wed, 02 Aug 2000 10:59:00 -0000
Message-id: <309F4FC4705DC844987051A517E9E39B16EFE6@red-pt-02.redmond.corp.microsoft.com>
X-SW-Source: 2000-08/msg00041.html

One of the first rules of testing: don't discard useful tests.
Thus, I agree that the tests should be added as they are generated.

The problem ISN'T the tests, it's the apparent regression (and the
work that that causes).  Blaming the tests is like blaming the messenger.
Having been in the business of doing a new port over a protracted
period (and of the whole toolchain) it has been a real pain for me
to deal with this problem, but I would never, ever object to including
a new (valid, of course) test.

In the gcc case there has been a database of "who failed what", which gives
me a clue as to whether it's a new test that most systems don't pass (and
I sould defer working on it to more immediate (for me) things) or something
I should fix now.  It's the lack of THAT data that makes including the
new tests painful.  (The gcc tool is less-than-perfect, but moving
it to a central server would address the biggest of the problems.)

XFAIL is a very crude tool: it doesn't really say what "reasonable
expectation" is for any given platform.  However, it's a help, but
has often been abused.

Bright ideas on how to manage the problem of apparent regressions would
be most useful.  Certainly educating the people who look at the
regression results that because the set of regressions is growing,
there will be continuous new failures would be useful.

(One bright idea: insist that the body of the test describe precisely
what is being tested and why, the expectation of which systems should
or should not pass (not in detail, but "RISCs probably will have trouble
with this because..."), and a date(!!!) will make it much easier to
make an informed judgement on a new failure.   Another: include date
of addition in a place where the final report can say something like:
"Tests younger than 30 days: nnn pass; mmm fail.
 Tests younger than 90 days..."

(This makes it easier to show to some manager that it's new tests
that are a problem.)

Donn


> -----Original Message-----
> From: Daniel Berlin [ mailto:dberlin@redhat.com ]
> Sent: Wednesday, August 02, 2000 9:45 AM
> To: Kevin Buettner
> Cc: Andrew Cagney; GDB Discussion; GDB Patches Mail List
> Subject: Re: RFA: Testsuite patches...
> 
> 
> Kevin Buettner <kevinb@cygnus.com> writes:
> 
> > On Aug 2,  9:55pm, Andrew Cagney wrote:
> > 
> > > I think GDB should be accepting tests (provided that they are
> > > rigiously examined) even when they add failures - just as long as
> > > the failures examine real bugs.  I think this also better reflects
> > > what really goes on.
> > 
> > I agree.
> > 
> > If we make "no new failures" the criteria for whether a 
> test is added
> > to the testsuite or not, then it seems to me that we'll end 
> up adding
> > very few new tests just because it's so difficult for any one person
> > to test on all affected targets.  (And it really doesn't work to
> > post a patch and expect everyone affected to try it.)
> > 
> > It makes sense to me to spread the workload by adding a 
> test and then
> > expecting the various maintainers to make sure that the test passes
> > (or gets suitably tweaked) for their targets.
> > 
> > Kevin
> 
> This seems like a better idea.
> 
> In fact, I propose the following, or something like it:
> 
> We accept all new tests people are willing to contribute, whether GDB
> passes them or not, on any platform (assuming the test itself is
> showing a problem with GDB, or something that should eventually work
> in GDB, like say, virtual function calling).
> 
> We have a seperate directory in the testsuite for tests that nobody
> has any idea whether it will pass on all platforms or not, or whether
> GDB can do that yet or not.
> 
> That way, even if you XFAIL'd the test (so people didn't bitch about
> the failures), at least I could look in that test results for 
> that directory when I wanted to know what should be
> working, but isn't, etc.
> 
> Or maybe i'm just babbling.
> --Dan
> 
>From fnasser@cygnus.com Wed Aug 02 11:11:00 2000
From: Fernando Nasser <fnasser@cygnus.com>
To: Donn Terry <donnte@microsoft.com>
Cc: "'Daniel Berlin'" <dberlin@redhat.com>, Kevin Buettner <kevinb@cygnus.com>, Andrew Cagney <ac131313@cygnus.com>, GDB Discussion <gdb@sourceware.cygnus.com>, GDB Patches Mail List <gdb-patches@sourceware.cygnus.com>
Subject: Re: RFA: Testsuite patches...
Date: Wed, 02 Aug 2000 11:11:00 -0000
Message-id: <39886458.CAC5D8EB@cygnus.com>
References: <309F4FC4705DC844987051A517E9E39B16EFE6@red-pt-02.redmond.corp.microsoft.com>
X-SW-Source: 2000-08/msg00042.html
Content-length: 1153

One thing that I have proposed once, just because of this problem, was to create
the KNOWN failures category.  This was to prevent abusing XFAIL that means that 
a test is expected to fail in a certain platform because of problems not related
to the tool being tested (a OS portability problem, bug, old OS version etc).

This category would also be used to link to the bug database (gnats?) so that 
by the tests results we could could compile a list of known issues and also so
that we could activate the tests after fixing the bug.

It was decided, at the time, that we should just XFAIL the tests and add 
appropriate comments telling that this was a known bug and mentioning the bug
database ticket (abusing XFAIL a little more IMO).

We can add the tests and mark them as XFAILs (with the appropriate comments).
This way we keep increasing the test base and still can use the number of failures
to check the result of changes etc.

-- 
Fernando Nasser
Red Hat - Toronto                       E-Mail:  fnasser@cygnus.com
2323 Yonge Street, Suite #300           Tel:  416-482-2661 ext. 311
Toronto, Ontario   M4P 2C9              Fax:  416-482-6299
>From msnyder@redhat.com Wed Aug 02 11:13:00 2000
From: Michael Snyder <msnyder@redhat.com>
To: Donn Terry <donnte@microsoft.com>
Cc: "'Daniel Berlin'" <dberlin@redhat.com>, Kevin Buettner <kevinb@cygnus.com>, Andrew Cagney <ac131313@cygnus.com>, GDB Discussion <gdb@sourceware.cygnus.com>, GDB Patches Mail List <gdb-patches@sourceware.cygnus.com>
Subject: Re: RFA: Testsuite patches...
Date: Wed, 02 Aug 2000 11:13:00 -0000
Message-id: <398864C6.426E@redhat.com>
References: <309F4FC4705DC844987051A517E9E39B16EFE6@red-pt-02.redmond.corp.microsoft.com>
X-SW-Source: 2000-08/msg00043.html
Content-length: 4497

Donn Terry wrote:
> 
> One of the first rules of testing: don't discard useful tests.
> Thus, I agree that the tests should be added as they are generated.

We do not have to throw away the tests.
Convert them into problem reports.
That is how we traditionally keep track of problems.
We do not traditionally note a problem by adding
a test that will fail.  If we know of a problem, 
there is an established way to keep track of it.

Adding a test that will fail is a very annoying way
to ask someone to fix something, IMHO.


> 
> The problem ISN'T the tests, it's the apparent regression (and the
> work that that causes).  Blaming the tests is like blaming the messenger.
> Having been in the business of doing a new port over a protracted
> period (and of the whole toolchain) it has been a real pain for me
> to deal with this problem, but I would never, ever object to including
> a new (valid, of course) test.
> 
> In the gcc case there has been a database of "who failed what", which gives
> me a clue as to whether it's a new test that most systems don't pass (and
> I sould defer working on it to more immediate (for me) things) or something
> I should fix now.  It's the lack of THAT data that makes including the
> new tests painful.  (The gcc tool is less-than-perfect, but moving
> it to a central server would address the biggest of the problems.)
> 
> XFAIL is a very crude tool: it doesn't really say what "reasonable
> expectation" is for any given platform.  However, it's a help, but
> has often been abused.
> 
> Bright ideas on how to manage the problem of apparent regressions would
> be most useful.  Certainly educating the people who look at the
> regression results that because the set of regressions is growing,
> there will be continuous new failures would be useful.
> 
> (One bright idea: insist that the body of the test describe precisely
> what is being tested and why, the expectation of which systems should
> or should not pass (not in detail, but "RISCs probably will have trouble
> with this because..."), and a date(!!!) will make it much easier to
> make an informed judgement on a new failure.   Another: include date
> of addition in a place where the final report can say something like:
> "Tests younger than 30 days: nnn pass; mmm fail.
>  Tests younger than 90 days..."
> 
> (This makes it easier to show to some manager that it's new tests
> that are a problem.)
> 
> Donn
> 
> > -----Original Message-----
> > From: Daniel Berlin [ mailto:dberlin@redhat.com ]
> > Sent: Wednesday, August 02, 2000 9:45 AM
> > To: Kevin Buettner
> > Cc: Andrew Cagney; GDB Discussion; GDB Patches Mail List
> > Subject: Re: RFA: Testsuite patches...
> >
> >
> > Kevin Buettner <kevinb@cygnus.com> writes:
> >
> > > On Aug 2,  9:55pm, Andrew Cagney wrote:
> > >
> > > > I think GDB should be accepting tests (provided that they are
> > > > rigiously examined) even when they add failures - just as long as
> > > > the failures examine real bugs.  I think this also better reflects
> > > > what really goes on.
> > >
> > > I agree.
> > >
> > > If we make "no new failures" the criteria for whether a
> > test is added
> > > to the testsuite or not, then it seems to me that we'll end
> > up adding
> > > very few new tests just because it's so difficult for any one person
> > > to test on all affected targets.  (And it really doesn't work to
> > > post a patch and expect everyone affected to try it.)
> > >
> > > It makes sense to me to spread the workload by adding a
> > test and then
> > > expecting the various maintainers to make sure that the test passes
> > > (or gets suitably tweaked) for their targets.
> > >
> > > Kevin
> >
> > This seems like a better idea.
> >
> > In fact, I propose the following, or something like it:
> >
> > We accept all new tests people are willing to contribute, whether GDB
> > passes them or not, on any platform (assuming the test itself is
> > showing a problem with GDB, or something that should eventually work
> > in GDB, like say, virtual function calling).
> >
> > We have a seperate directory in the testsuite for tests that nobody
> > has any idea whether it will pass on all platforms or not, or whether
> > GDB can do that yet or not.
> >
> > That way, even if you XFAIL'd the test (so people didn't bitch about
> > the failures), at least I could look in that test results for
> > that directory when I wanted to know what should be
> > working, but isn't, etc.
> >
> > Or maybe i'm just babbling.
> > --Dan
> >
>From guo@cup.hp.com Wed Aug 02 11:20:00 2000
From: Jimmy Guo <guo@cup.hp.com>
To: Daniel Berlin <dberlin@redhat.com>
Cc: Kevin Buettner <kevinb@cygnus.com>, Andrew Cagney <ac131313@cygnus.com>, GDB Discussion <gdb@sourceware.cygnus.com>, GDB Patches Mail List <gdb-patches@sourceware.cygnus.com>
Subject: Re: RFA: Testsuite patches...
Date: Wed, 02 Aug 2000 11:20:00 -0000
Message-id: <Pine.LNX.4.10.10008021058430.12072-100000@hpcll168.cup.hp.com>
References: <m31z07bojh.fsf@dan2.cygnus.com>
X-SW-Source: 2000-08/msg00044.html
Content-length: 1824

Just like to point out that with my latest patch to dejagnu relaxing
PRMS id pattern (which I will commit today), you can do something like:

setup_xfail "*-*-*" NOT_YET_SUPPORTED

This is the test point level control we can play with.  For brand-new
tests awaiting check-out on supported platforms, maybe a simple naming
convention with a starting _ would be effective enough.

In principle I agree we should relax the test acceptance criteria a bit,
but at the same time we need to consider the impact of high level of
FAILs to the ongoing development effort (at HP we have no FAILs in our
top-of-trunk and all XFAILs are explained by defect IDs, or something
like 'NOT_YET_SUPPORTED').  And I'm just suggesting a couple of ways to
let people interpret test outcomes selectively.

A new temporary staging test tree might be useful but I'm not sure how
that could complicate HP's multipass testing scheme where we selectively
run tests with different compilers / options based on test directories
-- unless you replicate the top level test tree under this staging tree.

- Jimmy

>Kevin Buettner <kevinb@cygnus.com> writes:
>
>In fact, I propose the following, or something like it:
>
>We accept all new tests people are willing to contribute, whether GDB
>passes them or not, on any platform (assuming the test itself is
>showing a problem with GDB, or something that should eventually work
>in GDB, like say, virtual function calling).
>
>We have a seperate directory in the testsuite for tests that nobody
>has any idea whether it will pass on all platforms or not, or whether
>GDB can do that yet or not.
>
>That way, even if you XFAIL'd the test (so people didn't bitch about
>the failures), at least I could look in that test results for that directory when I wanted to know what should be
>working, but isn't, etc.