From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 2532 invoked by alias); 25 Feb 2016 17:01:42 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 1864 invoked by uid 89); 25 Feb 2016 17:01:41 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.8 required=5.0 tests=BAYES_50,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=UD:watchpoint-fork.exp, UD:fork-plus-threads.exp, sk:forkpl, sk:fork-pl X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Thu, 25 Feb 2016 17:01:31 +0000 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (Postfix) with ESMTPS id 298428EA38 for ; Thu, 25 Feb 2016 17:01:30 +0000 (UTC) Received: from [127.0.0.1] (ovpn01.gateway.prod.ext.phx2.redhat.com [10.5.9.1]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u1PH1S4K025606; Thu, 25 Feb 2016 12:01:29 -0500 Message-ID: <56CF3368.5070101@redhat.com> Date: Thu, 25 Feb 2016 17:01:00 -0000 From: Pedro Alves User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Sergio Durigan Junior , GDB Patches Subject: Re: [PATCH] Improve analysis of racy testcases References: <87r3gcgm91.fsf@redhat.com> In-Reply-To: <87r3gcgm91.fsf@redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-SW-Source: 2016-02/txt/msg00807.txt.bz2 On 02/17/2016 01:26 AM, Sergio Durigan Junior wrote: > This patch is a proposal to introduce some mechanisms to identify racy > testcases present in our testsuite. As can be seen in previous > discussions, racy tests are really bothersome and cause our BuildBot to > pollute the gdb-testers mailing list with hundreds of false-positives > messages every month. Hopefully, by identifying these racy tests in > advance (and automatically) will contribute to the reduction of noise > traffic to gdb-testers, maybe to the point where we will be able to send > the failure messages directly to the authors of the commits. > > I spent some time trying to decide the best way to tackle this problem, > and decided that there is no silver bullet. Racy tests are tricky and > it is difficult to catch them, so the best solution I could find (for > now?) is to run our testsuite a number of times in a row, and then > compare the results (i.e., the gdb.sum files generated during each run). > The more times you run the tests, the more racy tests you are likely to > detect (at the expense of waiting longer and longer). You can also run > the tests in parallel, which makes things faster (and contribute to > catching more racy tests, because your machine will have less resources > for each test and some of them are likely to fail when this happens). I > did some tests in my machine (8-core i7, 16GB RAM), and running the > whole GDB testsuite 5 times using -j6 took 23 minutes. Not bad. > > In order to run the racy test machinery, you need to specify the > RACY_ITER environment variable. You will assign a number to this > variable, which represents the number of times you want to run the > tests. So, for example, if you want to run the whole testsuite 3 times > in parallel (using 2 cores), you will do: > > make check RACY_ITER=3 -j2 > > It is also possible to use the TESTS variable and specify which tests > you want to run: > > make check TEST='gdb.base/default.exp' RACY_ITER=3 -j2 I did this, and noticed: $ ls testsuite/racy_outputs/ gdb1.log gdb1.sum gdb2.log gdb2.sum gdb3.log gdb3.sum I think it'd make more sense, and it'd be easier to copy test run results around, diff result dirs etc., if these were put in subdirs instead: testsuite/racy_outputs/1/gdb.sum testsuite/racy_outputs/1/gdb.sum testsuite/race_outputs/1/gdb.log testsuite/race_outputs/2/gdb.sum testsuite/race_outputs/2/gdb.log etc. Any reason not to? > > And so on. The output files will be put at the directory > gdb/testsuite/racy_outputs/. > > After make invokes the necessary rules to run the tests, it finally runs > a Python script that will analyze the resulting gdb.sum files. This > Python script will read each file, and construct a series of sets based > on the results of the tests (one set for FAIL's, one for PASS'es, one > for KFAIL's, etc.). It will then do some set operations and come up > with a list of unique, sorted testcases that are racy. The algorithm > behind this is: > > for state in PASS, FAIL, XFAIL, XPASS...; do > if a test's state in every sumfile is $state; then > it is not racy > else > it is racy > > IOW, a test must have the same state in every sumfile. Is it easy to start with preexisting dirs with gdb.sum/gdb.log files, and run this script on them? E.g., say I've already run the testsuite three times, and now I want to check for racy tests, using the test results I already have: testruns/2015-02-21/1/gdb.sum testruns/2015-02-21/2/gdb.sum testruns/2015-02-21/3/gdb.sum I think we should document that. > > After processing everything, the script prints the racy tests it could > identify on stdout. I am redirecting this to a file named racy.sum. I tried: make check TESTS='gdb.base/default.exp' RACY_ITER=3 and was surprised to see nothing output at the end. Then I found that the racy.sum file is completely empty. How about we instead make sure that something is already printed? Maybe follow dejagnu's output style, and print a summary at the end? I've now run "make check -j8 RACY_ITER=3" and got this: $ cat testsuite/racy.sum gdb.threads/attach-into-signal.exp: nonthreaded: attempt 1: attach (pass 1), pending signal catch gdb.threads/attach-into-signal.exp: nonthreaded: attempt 1: attach (pass 2), pending signal catch gdb.threads/attach-into-signal.exp: nonthreaded: attempt 4: attach (pass 1), pending signal catch gdb.threads/attach-into-signal.exp: nonthreaded: attempt 6: attach (pass 2), pending signal catch gdb.threads/attach-into-signal.exp: threaded: attempt 1: attach (pass 1), pending signal catch gdb.threads/attach-into-signal.exp: threaded: attempt 3: attach (pass 1), pending signal catch gdb.threads/attach-into-signal.exp: threaded: attempt 3: attach (pass 2), pending signal catch gdb.threads/attach-many-short-lived-threads.exp: iter 5: attach gdb.threads/attach-many-short-lived-threads.exp: iter 5: attach (EPERM) gdb.threads/attach-many-short-lived-threads.exp: iter 9: attach gdb.threads/attach-many-short-lived-threads.exp: iter 9: attach (EPERM) gdb.threads/fork-plus-threads.exp: detach-on-fork=off: only inferior 1 left gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: inferior 1 exited gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: inferior 1 exited (prompt) (PRMS: gdb/18749) gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: no threads left gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: inferior 1 exited gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: inferior 1 exited (memory error) (PRMS: gdb/18749) gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: no threads left gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: inferior 1 exited gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: inferior 1 exited (prompt) (PRMS: gdb/18749) gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: no threads left gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited (timeout) (PRMS: gdb/18749) gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: no threads left gdb.threads/watchpoint-fork.exp: child: multithreaded: finish gdb.threads/watchpoint-fork.exp: child: multithreaded: watchpoint A after the second fork gdb.threads/watchpoint-fork.exp: child: multithreaded: watchpoint B after the second fork The gdb.threads/process-dies-while-handling-bp.exp ones are actually: -PASS: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited +KFAIL: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited (prompt) (PRMS: gdb/18749) Test sum diffing should probably strip away tail ()s, and ignore PASS->KFAIL->PASS. > > Something else that I wasn't sure how to deal with was non-unique > messages in our testsuite. I decided to do the same thing I do in our > BuildBot: include a unique identifier in the end of message, like: > > gdb.base/xyz.exp: non-unique message > gdb.base/xyz.exp: non-unique message <<2>> > > This means that you will have to be careful about them when you use the > racy.sum file. > > I ran the script several times here, and it did a good job catching some > well-known racy tests. Overall, I am satisfied with this approach and I > think it will be helpful to have it upstream'ed. I think so too. We don't have to fix everything before its put in either. And having this doesn't preclude exploring other approaches either. The only thing I do wish we should do, is use the fruits of this to somehow mark racy tests in the testsuite itself, instead of only making the buildbot ignore them, so that local development benefits as well. > I also intend to > extend our BuildBot and create new, specialized builders that will be > responsible for detecting the racy tests every X number of days, but > that will only be done when this patch is accepted. That'll be great. > Comments are welcome, as usual. Thanks. On 02/17/2016 01:26 AM, Sergio Durigan Junior wrote: > diff --git a/gdb/testsuite/Makefile.in b/gdb/testsuite/Makefile.in > index 38c3052..4bf6191 100644 > --- a/gdb/testsuite/Makefile.in > +++ b/gdb/testsuite/Makefile.in > @@ -52,6 +52,8 @@ RUNTESTFLAGS = > > FORCE_PARALLEL = > > +DEFAULT_RACY_ITER = 3 Are users supposed to be able to trigger the default? I didn't see it documented. > +Racy testcases > +************** > + > +Unfortunately, GDB has a number of testcases that can randomly pass or > +fail. We call them "racy testcases", and they can be bothersome when > +one is comparing different testsuite runs. In order to help I'd suggest future-proofing this, by rephrasing it as if we were at a stage where the testsuite is stable, and new racy tests occasionally are added to the testsuite. > > diff --git a/gdb/testsuite/analyze-racy-logs.py b/gdb/testsuite/analyze-racy-logs.py > new file mode 100755 > index 0000000..4f05185 > --- /dev/null > +++ b/gdb/testsuite/analyze-racy-logs.py > @@ -0,0 +1,131 @@ > +#!/usr/bin/env python > + > +# Copyright (C) 2016 Free Software Foundation, Inc. > +# > +# This file is part of GDB. > +# > +# This program is free software; you can redistribute it and/or modify > +# it under the terms of the GNU General Public License as published by > +# the Free Software Foundation; either version 3 of the License, or > +# (at your option) any later version. > +# > +# This program is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program. If not, see . > + > + > +# This program is used to analyze the test results (i.e., *.sum files) > +# generated by GDB's testsuite, and print the testcases that are found > +# to be racy. > +# > +# Racy testcases are considered as being testcases which can > +# intermitently FAIL (or PASS) when run two or more times intermittently I think you should put this in as is, and we can keep working on it on top. Thanks, Pedro Alves