From mboxrd@z Thu Jan 1 00:00:00 1970 From: Donn Terry To: 'Daniel Berlin' , Kevin Buettner Cc: Andrew Cagney , GDB Discussion , GDB Patches Mail List Subject: RE: RFA: Testsuite patches... Date: Wed, 02 Aug 2000 10:59:00 -0000 Message-id: <309F4FC4705DC844987051A517E9E39B16EFE6@red-pt-02.redmond.corp.microsoft.com> X-SW-Source: 2000-08/msg00041.html One of the first rules of testing: don't discard useful tests. Thus, I agree that the tests should be added as they are generated. The problem ISN'T the tests, it's the apparent regression (and the work that that causes). Blaming the tests is like blaming the messenger. Having been in the business of doing a new port over a protracted period (and of the whole toolchain) it has been a real pain for me to deal with this problem, but I would never, ever object to including a new (valid, of course) test. In the gcc case there has been a database of "who failed what", which gives me a clue as to whether it's a new test that most systems don't pass (and I sould defer working on it to more immediate (for me) things) or something I should fix now. It's the lack of THAT data that makes including the new tests painful. (The gcc tool is less-than-perfect, but moving it to a central server would address the biggest of the problems.) XFAIL is a very crude tool: it doesn't really say what "reasonable expectation" is for any given platform. However, it's a help, but has often been abused. Bright ideas on how to manage the problem of apparent regressions would be most useful. Certainly educating the people who look at the regression results that because the set of regressions is growing, there will be continuous new failures would be useful. (One bright idea: insist that the body of the test describe precisely what is being tested and why, the expectation of which systems should or should not pass (not in detail, but "RISCs probably will have trouble with this because..."), and a date(!!!) will make it much easier to make an informed judgement on a new failure. Another: include date of addition in a place where the final report can say something like: "Tests younger than 30 days: nnn pass; mmm fail. Tests younger than 90 days..." (This makes it easier to show to some manager that it's new tests that are a problem.) Donn > -----Original Message----- > From: Daniel Berlin [ mailto:dberlin@redhat.com ] > Sent: Wednesday, August 02, 2000 9:45 AM > To: Kevin Buettner > Cc: Andrew Cagney; GDB Discussion; GDB Patches Mail List > Subject: Re: RFA: Testsuite patches... > > > Kevin Buettner writes: > > > On Aug 2, 9:55pm, Andrew Cagney wrote: > > > > > I think GDB should be accepting tests (provided that they are > > > rigiously examined) even when they add failures - just as long as > > > the failures examine real bugs. I think this also better reflects > > > what really goes on. > > > > I agree. > > > > If we make "no new failures" the criteria for whether a > test is added > > to the testsuite or not, then it seems to me that we'll end > up adding > > very few new tests just because it's so difficult for any one person > > to test on all affected targets. (And it really doesn't work to > > post a patch and expect everyone affected to try it.) > > > > It makes sense to me to spread the workload by adding a > test and then > > expecting the various maintainers to make sure that the test passes > > (or gets suitably tweaked) for their targets. > > > > Kevin > > This seems like a better idea. > > In fact, I propose the following, or something like it: > > We accept all new tests people are willing to contribute, whether GDB > passes them or not, on any platform (assuming the test itself is > showing a problem with GDB, or something that should eventually work > in GDB, like say, virtual function calling). > > We have a seperate directory in the testsuite for tests that nobody > has any idea whether it will pass on all platforms or not, or whether > GDB can do that yet or not. > > That way, even if you XFAIL'd the test (so people didn't bitch about > the failures), at least I could look in that test results for > that directory when I wanted to know what should be > working, but isn't, etc. > > Or maybe i'm just babbling. > --Dan > >From fnasser@cygnus.com Wed Aug 02 11:11:00 2000 From: Fernando Nasser To: Donn Terry Cc: "'Daniel Berlin'" , Kevin Buettner , Andrew Cagney , GDB Discussion , GDB Patches Mail List Subject: Re: RFA: Testsuite patches... Date: Wed, 02 Aug 2000 11:11:00 -0000 Message-id: <39886458.CAC5D8EB@cygnus.com> References: <309F4FC4705DC844987051A517E9E39B16EFE6@red-pt-02.redmond.corp.microsoft.com> X-SW-Source: 2000-08/msg00042.html Content-length: 1153 One thing that I have proposed once, just because of this problem, was to create the KNOWN failures category. This was to prevent abusing XFAIL that means that a test is expected to fail in a certain platform because of problems not related to the tool being tested (a OS portability problem, bug, old OS version etc). This category would also be used to link to the bug database (gnats?) so that by the tests results we could could compile a list of known issues and also so that we could activate the tests after fixing the bug. It was decided, at the time, that we should just XFAIL the tests and add appropriate comments telling that this was a known bug and mentioning the bug database ticket (abusing XFAIL a little more IMO). We can add the tests and mark them as XFAILs (with the appropriate comments). This way we keep increasing the test base and still can use the number of failures to check the result of changes etc. -- Fernando Nasser Red Hat - Toronto E-Mail: fnasser@cygnus.com 2323 Yonge Street, Suite #300 Tel: 416-482-2661 ext. 311 Toronto, Ontario M4P 2C9 Fax: 416-482-6299 >From msnyder@redhat.com Wed Aug 02 11:13:00 2000 From: Michael Snyder To: Donn Terry Cc: "'Daniel Berlin'" , Kevin Buettner , Andrew Cagney , GDB Discussion , GDB Patches Mail List Subject: Re: RFA: Testsuite patches... Date: Wed, 02 Aug 2000 11:13:00 -0000 Message-id: <398864C6.426E@redhat.com> References: <309F4FC4705DC844987051A517E9E39B16EFE6@red-pt-02.redmond.corp.microsoft.com> X-SW-Source: 2000-08/msg00043.html Content-length: 4497 Donn Terry wrote: > > One of the first rules of testing: don't discard useful tests. > Thus, I agree that the tests should be added as they are generated. We do not have to throw away the tests. Convert them into problem reports. That is how we traditionally keep track of problems. We do not traditionally note a problem by adding a test that will fail. If we know of a problem, there is an established way to keep track of it. Adding a test that will fail is a very annoying way to ask someone to fix something, IMHO. > > The problem ISN'T the tests, it's the apparent regression (and the > work that that causes). Blaming the tests is like blaming the messenger. > Having been in the business of doing a new port over a protracted > period (and of the whole toolchain) it has been a real pain for me > to deal with this problem, but I would never, ever object to including > a new (valid, of course) test. > > In the gcc case there has been a database of "who failed what", which gives > me a clue as to whether it's a new test that most systems don't pass (and > I sould defer working on it to more immediate (for me) things) or something > I should fix now. It's the lack of THAT data that makes including the > new tests painful. (The gcc tool is less-than-perfect, but moving > it to a central server would address the biggest of the problems.) > > XFAIL is a very crude tool: it doesn't really say what "reasonable > expectation" is for any given platform. However, it's a help, but > has often been abused. > > Bright ideas on how to manage the problem of apparent regressions would > be most useful. Certainly educating the people who look at the > regression results that because the set of regressions is growing, > there will be continuous new failures would be useful. > > (One bright idea: insist that the body of the test describe precisely > what is being tested and why, the expectation of which systems should > or should not pass (not in detail, but "RISCs probably will have trouble > with this because..."), and a date(!!!) will make it much easier to > make an informed judgement on a new failure. Another: include date > of addition in a place where the final report can say something like: > "Tests younger than 30 days: nnn pass; mmm fail. > Tests younger than 90 days..." > > (This makes it easier to show to some manager that it's new tests > that are a problem.) > > Donn > > > -----Original Message----- > > From: Daniel Berlin [ mailto:dberlin@redhat.com ] > > Sent: Wednesday, August 02, 2000 9:45 AM > > To: Kevin Buettner > > Cc: Andrew Cagney; GDB Discussion; GDB Patches Mail List > > Subject: Re: RFA: Testsuite patches... > > > > > > Kevin Buettner writes: > > > > > On Aug 2, 9:55pm, Andrew Cagney wrote: > > > > > > > I think GDB should be accepting tests (provided that they are > > > > rigiously examined) even when they add failures - just as long as > > > > the failures examine real bugs. I think this also better reflects > > > > what really goes on. > > > > > > I agree. > > > > > > If we make "no new failures" the criteria for whether a > > test is added > > > to the testsuite or not, then it seems to me that we'll end > > up adding > > > very few new tests just because it's so difficult for any one person > > > to test on all affected targets. (And it really doesn't work to > > > post a patch and expect everyone affected to try it.) > > > > > > It makes sense to me to spread the workload by adding a > > test and then > > > expecting the various maintainers to make sure that the test passes > > > (or gets suitably tweaked) for their targets. > > > > > > Kevin > > > > This seems like a better idea. > > > > In fact, I propose the following, or something like it: > > > > We accept all new tests people are willing to contribute, whether GDB > > passes them or not, on any platform (assuming the test itself is > > showing a problem with GDB, or something that should eventually work > > in GDB, like say, virtual function calling). > > > > We have a seperate directory in the testsuite for tests that nobody > > has any idea whether it will pass on all platforms or not, or whether > > GDB can do that yet or not. > > > > That way, even if you XFAIL'd the test (so people didn't bitch about > > the failures), at least I could look in that test results for > > that directory when I wanted to know what should be > > working, but isn't, etc. > > > > Or maybe i'm just babbling. > > --Dan > > >From guo@cup.hp.com Wed Aug 02 11:20:00 2000 From: Jimmy Guo To: Daniel Berlin Cc: Kevin Buettner , Andrew Cagney , GDB Discussion , GDB Patches Mail List Subject: Re: RFA: Testsuite patches... Date: Wed, 02 Aug 2000 11:20:00 -0000 Message-id: References: X-SW-Source: 2000-08/msg00044.html Content-length: 1824 Just like to point out that with my latest patch to dejagnu relaxing PRMS id pattern (which I will commit today), you can do something like: setup_xfail "*-*-*" NOT_YET_SUPPORTED This is the test point level control we can play with. For brand-new tests awaiting check-out on supported platforms, maybe a simple naming convention with a starting _ would be effective enough. In principle I agree we should relax the test acceptance criteria a bit, but at the same time we need to consider the impact of high level of FAILs to the ongoing development effort (at HP we have no FAILs in our top-of-trunk and all XFAILs are explained by defect IDs, or something like 'NOT_YET_SUPPORTED'). And I'm just suggesting a couple of ways to let people interpret test outcomes selectively. A new temporary staging test tree might be useful but I'm not sure how that could complicate HP's multipass testing scheme where we selectively run tests with different compilers / options based on test directories -- unless you replicate the top level test tree under this staging tree. - Jimmy >Kevin Buettner writes: > >In fact, I propose the following, or something like it: > >We accept all new tests people are willing to contribute, whether GDB >passes them or not, on any platform (assuming the test itself is >showing a problem with GDB, or something that should eventually work >in GDB, like say, virtual function calling). > >We have a seperate directory in the testsuite for tests that nobody >has any idea whether it will pass on all platforms or not, or whether >GDB can do that yet or not. > >That way, even if you XFAIL'd the test (so people didn't bitch about >the failures), at least I could look in that test results for that directory when I wanted to know what should be >working, but isn't, etc.