Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed
* Re: RFC: KFAIL DejaGnu patch
@ 2002-04-07  9:10 Michael Elizabeth Chastain
  0 siblings, 0 replies; 35+ messages in thread
From: Michael Elizabeth Chastain @ 2002-04-07  9:10 UTC (permalink / raw)
  To: fnasser; +Cc: ac131313, drow, gdb-patches, rob

Okay, I checked two gdb.log files chosen at random, and did
before-and-after diffs.  The only differences are the usual differences
in process id's and addresses and times and stuff (3000 lines per file!)

I'll proofread the code and documents tonight.

Michael C


^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: RFC: KFAIL DejaGnu patch
@ 2002-04-09  9:13 Michael Elizabeth Chastain
  2002-04-09  9:34 ` Rob Savoye
  0 siblings, 1 reply; 35+ messages in thread
From: Michael Elizabeth Chastain @ 2002-04-09  9:13 UTC (permalink / raw)
  To: rob; +Cc: ac131313, drow, eliz, fnasser, gdb-patches

Hi Rob,

I'm not looking for any changes in 1.4.3.  Think of this more as one
user's feedback.  If you get any good ideas from it, that's fine.

mec> ERRORs and WARNINGs: it would help if all ERRORs and WARNINGs ...
rob> This is probably a good idea.

Now that I know Dejagnu is actively maintained, maybe I'll pick this
one up myself.  I would have to learn more TCL but Dejagnu is not
that large.

mec> Duplicate test names: ...
rob> Um, I don't think it's the responsibility of DejaGnu to work around a
rob> poor programming practice.

Okay, that's fine.  I admit it's a poor practice.  It would have been
nice to move the anti-duplication code into a common place (that isn't
in my scripts, grin).

mec> TIMEOUT: ...
rob> If GDB has crashed, this is supposed to be UNRESOLVED, according to POSIX.
rob> UNRESOLVED is a test case that can't be finished, and needs a human to look
rob> at it.  UNTESTED is when there isn't support in the underlying OS or
rob> hardware to run a test case.

If gdb has crashed, indeed a human has to look at it.  But the tone of
the documentation is that the human is deciding whether the test passed
or not.  Actually a human is needed here to fix the tool, or perhaps fix
the testing environment.  I think TIMEOUT would be more accurate than
using UNRESOLVED for this.

Right now we use FAIL "... (timeout)" for this, so in principle, we're
getting the information we need.  So I can live without this if we really
need it.
  
mec> Split gdb.log file: when I look at gdb.log, I am usually interested in
mec> just one test script.  So I'd like to have a directory of gdb.log files,
mec> one per each test script.  I don't need the giant log.
rob> Hum... I might try that. Mind you, the log files were more for debugging
rob> purposes, than analysing them for test results.

Here is the situation: each week, I run the test suite on a few dozen
configurations.  I have a Perl script that looks for changes in the
results since the last test run.  When I find a negative change, then
I file a bug report.

The problem is what to put in that bug report so that an independent
person can use it effectively.  I'd like to include a copy of the gdb.log
section for that test script, so that people perusing the bug report
can see what's going on without downloading a big tarball from me first.

Another way to look at it: as projects get larger, the roles separate.
I'm planning to publish my scripts so that lots of people can run gdb
tests and mail in useful information.  My end goal is an Internet scale
"gnutest@home" facility, where developers can inject proposed patches and
get some kind of differential analysis over a diverse group of test beds.

rob> Any changes I make have to work generically, since the gdb team is a 
rob> only part of the DejaGnu user community.

Sure, I understand.

Thanks for being here and doing this.

Michael C


^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: RFC: KFAIL DejaGnu patch
@ 2002-04-08 11:56 Michael Elizabeth Chastain
  0 siblings, 0 replies; 35+ messages in thread
From: Michael Elizabeth Chastain @ 2002-04-08 11:56 UTC (permalink / raw)
  To: drow; +Cc: ac131313, eliz, fnasser, gdb-patches, rob

Daniel Jacobowitz writes:
> That would only be useful if we always marked all tests - which we're
> awful about.  continue {2} might be any number of different continue
> statements in the test.

Let me explain in more detail.

Right now there are tests with output like this:

  gdb.base/foo.exp: PASS: continue
  gdb.base/foo.exp: PASS: continue
  gdb.base/foo.exp: PASS: continue
  gdb.base/foo.exp: PASS: continue

I have to do something to make the test names unique.  So I behave
as if the input is this:

  gdb.base/foo.exp: PASS: continue
  gdb.base/foo.exp: PASS: continue {2}
  gdb.base/foo.exp: PASS: continue {3}
  gdb.base/foo.exp: PASS: continue {4}

This is flawed, because if someone adds or subtracts sections from
the test, the sequence numbers will get re-numbered, and I lose the
ability to compare across many runs.  As you point out, "continue {2}"
might be in different places depending on conditional execution and
so on.

But I have to do *something*.  If I just do "$hash{$name} = $result",
then the totals don't even add up correctly.

Michael C


^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: RFC: KFAIL DejaGnu patch
@ 2002-04-08 10:34 Michael Elizabeth Chastain
  2002-04-08 11:38 ` Daniel Jacobowitz
  2002-04-09  7:50 ` Rob Savoye
  0 siblings, 2 replies; 35+ messages in thread
From: Michael Elizabeth Chastain @ 2002-04-08 10:34 UTC (permalink / raw)
  To: eliz, rob; +Cc: ac131313, drow, fnasser, gdb-patches

Rob Savoye writes:
> btw - Matt says he'll add the extra fields for ERROR and WARNING to the
> XML output, which he's hoping will be done as soon as he firms up the DTD.
> Anything else you need to have that be useful ?

Now there is an open ended question.  I'll ramble on it for a while.

ERRORs and WARNINGs: it would help if all ERRORs and WARNINGs went
through report_test and got treated like the other results, including
printing the name of the current test script.  Right now I pick this up
when I am lexing the gdb.sum file.

Duplicate test names: we have as many as 30 tests in the same file
with the same name (often named "continue").  Right now I add sequence
numbers to this when I am lexing the gdb.sum file:

  gdb.base/foo.exp: continue
  gdb.base/foo.exp: continue {2}
  gdb.base/foo.exp: continue {3}

It would be nice if DejaGnu did this automatically.

TIMEOUT: I would like to have a new test result of TIMEOUT for a test
that times out.  A TIMEOUT often indicates that gdb has crashed, and is
much more serious than an ordinary FAIL.

Split gdb.log file: when I look at gdb.log, I am usually interested in
just one test script.  So I'd like to have a directory of gdb.log files,
one per each test script.  I don't need the giant log.

Clean up the bug id stuff: the name "PRMS" is hard wired into Dejagnu!

I also have a lot of vague stuff related to scaling up the testing process.
Right now I run 30 configurations per week.  I use external scripts to
manage my archive of test results.  I don't want dejagnu to be in the
business of managing 100's of test scripts, but eventually I will want some
features to add more stuff to the log file (like a uuid, the email
address of the person who ran the test, the version of binutils and
gcc used when testing gdb, and lots of stuff like that).

All of this stuff is just me; the gdb group hasn't talked this over
and reached any kind of consensus yet.

Michael C


^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: RFC: KFAIL DejaGnu patch
@ 2002-04-08  9:57 Michael Elizabeth Chastain
  0 siblings, 0 replies; 35+ messages in thread
From: Michael Elizabeth Chastain @ 2002-04-08  9:57 UTC (permalink / raw)
  To: fnasser, rob; +Cc: ac131313, drow, eliz, gdb-patches

Rob Savoye writes:
> Oops, I spaced that part. I think we'll need the kpass/kfail patch first,
> to them make those output routines spit out XML if the flag is set.

record_test puts out a <$type>$message</$type> for all types,
so it will automatically put out kpass/kfail when they are available.

The xml output section in log_summary has an explicit list
of "PASS FAIL XPASS XFAIL ..." so the xml patch needs to co-ordinate with
the kpass/kfail patch somehow.

Michael C


^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: RFC: KFAIL DejaGnu patch
@ 2002-04-07 18:41 Michael Elizabeth Chastain
  0 siblings, 0 replies; 35+ messages in thread
From: Michael Elizabeth Chastain @ 2002-04-07 18:41 UTC (permalink / raw)
  To: fnasser; +Cc: ac131313, drow, gdb-patches, rob

More comments ...

gdb.c++/cplusfuncs.exp has a good place for some KFAIL's.  Grep for
hairyfunc5, hairyfunc6, hairyfunc7.  They are associated with PR gdb/19.
I added some setup_kfail's to that and it appears to work okay,
except that the bug id is not printed on xpass.

Okay, here's some proofreading comments.

Michael C

=== Distinction between XFAIL and KFAIL

An XFAIL is:

  A test is expected to fail in some environment(s) due to some
  bug in the environment ...

And a KFAIL is:

  A test is known to fail in some environment(s) due to a known bug
  in the tool being tested (identified by a bug id string).

I like this distinction.  The "K" letter means that it is a known problem
inside the tool, and the "X" letter means that it is an expected problem
outside the tool.

That makes it weird for a KFAIL to turn into an XPASS.  I've got test
results here with XPASS for the gcc v2 compilers and KFAIL for the gcc
v3 compilers.  I really want KFAIL/KPASS, not KFAIL/XPASS.

A specific note: proc record_test has this code:

  XPASS {
      set exit_status 1
      if { $xfail_prms != 0 } {
	  set message [concat $message "\t(PRMS $xfail_prms)"]
      }
  }

In fact that code could use a little refactoring, because
UNRESOLVED/UNSUPPORTD/UNTESTED all have the same code to pick up
kfail_prms and xfail_prms.

So an XPASS that comes from setup_xfail will have a PRMS id,
but an XPASS that comes from setup_kfail will not.

=== KFAIL as a way of hiding problems

Later on in the documentation:

  @item KFAIL
  ...
  This exists so that, after a bug is identified and properly registered
  in a bug tracking database (Gnats, for instance), the count of failures
  can be kept as zero.  Having zero has a baseline in all platforms allow
  the tool developers to immediately detect regressions caused by changes
  (which may affect some platforms and not others).

Conceptually, there is a disconnect here.  To me, KFAIL means: "there
is a problem report associated with this test".  The problem could be
minor, or it could be a showstopper.  But you say that KFAIL means:
"tool developers can ignore this test when they look for regressions".

I think this is fundamentally wrong.  The right way to look for
regressions is to compare before-and-after test runs.  gdb.sum files
are already pretty good for this; you can just diff them.

I see two practical problems that come out of this wrongness:

(1) If I find a showstopper regression, such as PR gdb/379, and I mark
    it with setup_kfail (as I should), then someone who is using the
    "# of FAILs" metric is not going to see the regression.

(2) If the test suite has a significant number of setup_kfail (and
    it should), then a regression bug may manifest as a transition from
    PASS -> KFAIL.  I will see that because my regression reports look
    at all transitions.  But someone looking at "# of FAILs" will not see
    this transition, so they won't see the regression.

I believe we already have this problem with XFAIL.  People conflate
the idea of "this test fails due to an external problem" with the idea
"this failure is not important enough to care about at this time".

=== ChangeLog entry

DejaGnu 1.4.2 has a ChangeLog, so the patch needs a ChangeLog entry.

=== Tests

testsuite/ needs some tests for the new KFAIL feature.

=== lib/dg.exp

lib/dg.exp has a bunch of XFAIL stuff, so it needs KFAIL stuff.


^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: RFC: KFAIL DejaGnu patch
@ 2002-04-06 14:40 Michael Elizabeth Chastain
  0 siblings, 0 replies; 35+ messages in thread
From: Michael Elizabeth Chastain @ 2002-04-06 14:40 UTC (permalink / raw)
  To: fnasser; +Cc: ac131313, drow, gdb-patches, rob

I have preliminary results.  On my full test bed (30 configurations,
native i686-pc-linux-gnu), there is no significant difference in
the gdb.sum files produced.

The "before" set is tcl 8.3.4 + expect 5.33 + dejagnu 1.4.2.
The "after"  set is tcl 8.3.4 + expect 5.33 + dejagnu 1.4.2 + fn kfail patch.

Later this weekend, I will pick two or three of the configurations at
random and look carefully at every difference in gdb.log.  There is a lot
of noise to wade through (machine addresses and process id's different
from run to run).

Also I will actually proofread the code.  So far I just skimmed it.

Michael C


^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: RFC: KFAIL DejaGnu patch
@ 2002-04-05 17:57 Michael Elizabeth Chastain
  0 siblings, 0 replies; 35+ messages in thread
From: Michael Elizabeth Chastain @ 2002-04-05 17:57 UTC (permalink / raw)
  To: ac131313, fnasser; +Cc: drow, gdb-patches, rob

ac> To be honest, I think just removing all but the most recent (i.e. in 
ac> last two years) xfails might be for the best.

I'm okay with that.

I just changed my report generator so that everything except PASS is an
attention line.  So starting with the next report, the attention reports
are going to include KPASS, KFAIL, XPASS, XFAIL.  That increases the
attention table for gdb HEAD from 86 lines to 271 lines.  I think that
is manageable.

The difference tables have always been agnostic to the type of result,
and those are what I use most of the time.

Michael C


^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: RFC: KFAIL DejaGnu patch
@ 2002-04-05 16:51 Michael Elizabeth Chastain
  0 siblings, 0 replies; 35+ messages in thread
From: Michael Elizabeth Chastain @ 2002-04-05 16:51 UTC (permalink / raw)
  To: fnasser; +Cc: ac131313, drow, gdb-patches, rob

Fernando Nasser writes:
> Michael Chastain: Will you be willing to help me test this?
> I've tried it and it seems to work.  Yours scripts must also
> be happy with it.

You got it.  I use dejagnu 1.4.2, so I'll be working from that base.
I'm willing to switch to sourceware dejagnu but I'd rather stay close
to the fsf dejagnu if I can.

> I was trying to find some test to try the setup_kfail on.

Try PR gdb/460.  It mentions one failure in the bug report.  You can
download my testsuite directory tarballs off the bug report and diff
the gdb.log files to pick up the regressions in gdb.base/condbreak.exp
and gdb.base/ena-dis-br.exp.

I have 2.0 gigabytes of test directories now.  I'm still experimenting
how to communicate stuff so that people can understand it.  The testsuite
directory tarballs are just another experiment, let me know how it works.

It's kinda depressing that the test suite and the bug database are
so disjoint.  I guess we're here to fix that.

Michael C


^ permalink raw reply	[flat|nested] 35+ messages in thread
* Re: RFC: KFAILs [Was: [RFA/mi-testsuite] XFAIL mi*-console.exp]
@ 2002-04-05  9:53 Michael Elizabeth Chastain
  2002-04-05 16:32 ` RFC: KFAIL DejaGnu patch Fernando Nasser
  0 siblings, 1 reply; 35+ messages in thread
From: Michael Elizabeth Chastain @ 2002-04-05  9:53 UTC (permalink / raw)
  To: fnasser; +Cc: ac131313, drow, gdb-patches, rob

Fernando Nasser writes:

fna> KFAIL: could not run to marker1 (PRMS gdb/999)
fna> Would that make the scripts happy?

Err, I'm not sure if you mean the dejagnu scripts, or my scripts.
My scripts are happy with this format.

A lot of tests use "(...)" for various things, so the "PRMS"
and "gdb/NNN" bits need to be mandatory in order to pick out this
information from the noise.

fna> setup_kfail "gdb/999" *-*-*

Fine with me.  setup_kfail *-*-* "gdb/999" is fine with me as well.

fna> 4) Note that, when a test that was expected to fail due to a known
fna> bug suddenly starts to pass, it becomes a KPASS (as XFAILs do).

Okay, I added a KPASS column to my tables.

fna> I will do it in Perl (I still don't know how to programmatically access
fna> the Gnats database though).  But I have very little spare time, so I
fna> will not mind if someone that can do it sooner volunteers to do this.

You can access the Gnats database through a URL:

  http://sources.redhat.com/cgi-bin/gnatsweb.pl?database=gdb&cmd=view&pr=460

For programmatic access, there may already be a more suitable "cmd" than
"cmd=view", or someone may need to update gnatsweb.pl.

I volunteer to write Perl analysis scripts.  My test bed is almost all Perl,
and I am planning to release it.

Michael C


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2002-04-09 16:34 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-04-07  9:10 RFC: KFAIL DejaGnu patch Michael Elizabeth Chastain
  -- strict thread matches above, loose matches on Subject: below --
2002-04-09  9:13 Michael Elizabeth Chastain
2002-04-09  9:34 ` Rob Savoye
2002-04-08 11:56 Michael Elizabeth Chastain
2002-04-08 10:34 Michael Elizabeth Chastain
2002-04-08 11:38 ` Daniel Jacobowitz
2002-04-09  7:50 ` Rob Savoye
2002-04-08  9:57 Michael Elizabeth Chastain
2002-04-07 18:41 Michael Elizabeth Chastain
2002-04-06 14:40 Michael Elizabeth Chastain
2002-04-05 17:57 Michael Elizabeth Chastain
2002-04-05 16:51 Michael Elizabeth Chastain
2002-04-05  9:53 RFC: KFAILs [Was: [RFA/mi-testsuite] XFAIL mi*-console.exp] Michael Elizabeth Chastain
2002-04-05 16:32 ` RFC: KFAIL DejaGnu patch Fernando Nasser
2002-04-05 17:05   ` Andrew Cagney
2002-04-07 16:25     ` Rob Savoye
2002-04-05 17:10   ` Daniel Jacobowitz
2002-04-05 17:40     ` Andrew Cagney
2002-04-08  8:37     ` Fernando Nasser
2002-04-07 16:29   ` Rob Savoye
2002-04-07 22:05     ` Eli Zaretskii
2002-04-08  8:22       ` Rob Savoye
2002-04-08  8:52         ` Fernando Nasser
2002-04-08  9:01           ` Rob Savoye
2002-04-08  8:41     ` Fernando Nasser
2002-04-08  9:00       ` Rob Savoye
2002-04-08 13:55         ` Andrew Cagney
2002-04-08 16:21           ` Rob Savoye
2002-04-08 16:34             ` Andrew Cagney
2002-04-08 16:48               ` Rob Savoye
2002-04-08 16:58                 ` Andrew Cagney
2002-04-08 17:09                   ` Rob Savoye
2002-04-08 23:58                     ` Eli Zaretskii
2002-04-09  7:38                       ` Rob Savoye
2002-04-08 23:53               ` Eli Zaretskii
2002-04-09  7:06                 ` Daniel Jacobowitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox