From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-151289-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 105773 invoked by alias); 1 Oct 2018 09:36:34 -0000
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
Received: (qmail 105761 invoked by uid 89); 1 Oct 2018 09:36:32 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=meanwhile, vintage, spuriously, row
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 01 Oct 2018 09:36:30 +0000
Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.24])	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))	(No client certificate requested)	by mx1.redhat.com (Postfix) with ESMTPS id BF8BF44BD5;	Mon,  1 Oct 2018 09:36:28 +0000 (UTC)
Received: from [127.0.0.1] (ovpn04.gateway.prod.ext.ams2.redhat.com [10.39.146.4])	by smtp.corp.redhat.com (Postfix) with ESMTP id A9E1E308BDAF;	Mon,  1 Oct 2018 09:36:27 +0000 (UTC)
Subject: dg-extract-results's bad sorting behavior (Re: Update dg-extract-results.* from gcc)
To: Rainer Orth <ro@grover.CeBiTec.Uni-Bielefeld.DE>
References: <yddh8ku2mrc.fsf@CeBiTec.Uni-Bielefeld.DE> <515bfbff-e2f7-3574-2726-ee1592786891@redhat.com> <yddva8lx6wc.fsf@grover.CeBiTec.Uni-Bielefeld.DE> <400a6540-c7fb-8de8-d5dc-3f07e2d632ca@redhat.com>
Cc: gdb-patches@sourceware.org
From: Pedro Alves <palves@redhat.com>
Message-ID: <c62772c0-05d9-2ece-0a28-f8e4f8b7caba@redhat.com>
Date: Mon, 01 Oct 2018 09:36:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1
MIME-Version: 1.0
In-Reply-To: <400a6540-c7fb-8de8-d5dc-3f07e2d632ca@redhat.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-SW-Source: 2018-10/txt/msg00007.txt.bz2

On 08/08/2018 04:24 PM, Pedro Alves wrote:
> On 08/08/2018 03:36 PM, Rainer Orth wrote:
> 
>>> On 07/20/2018 12:02 PM, Rainer Orth wrote:
>>>> When looking at the gdb.sum file produced by dg-extract-results.sh on
>>>> Solaris 11/x86, I noticed some wrong sorting, like this:
>>>>
>>>> PASS: gdb.ada/addr_arith.exp: print something'address + 0
>>>> PASS: gdb.ada/addr_arith.exp: print 0 + something'address
>>>> PASS: gdb.ada/addr_arith.exp: print something'address - 0
>>>> PASS: gdb.ada/addr_arith.exp: print 0 - something'address
>>>>
>>>> Looking closer, I noticed that while dg-extract-results.sh had been
>>>> copied over from contrib in the gcc repo, the corresponding
>>>> dg-extract-results.py file had not.  The latter not only fixes the
>>>> sorting problem I'd observed, but is also way faster than the shell
>>>> version (like a factor of 50 faster).
>>>
>>> We used to have the dg-extract-results.py file, but we deleted it
>>> because it caused (funnily enough, sorting) problems.  See:
>>>
>>>   https://sourceware.org/ml/gdb-patches/2015-02/msg00333.html
>>>
>>> Has that sorting stability issue been meanwhile fixed upstream?
>>
>> not that I can see: between the version of dg-extract-results.py removed
>> in early 2015 and the one in current gcc trunk, there's only added
>> handling for DejaGnu ERRORs and another minor change to do with
>> summaries that doesn't seem to change anything wrt. sorting on first
>> blush.
> 
> OK.
> 
>> Howver, I've just run make -j16 check three times in a row on
>> amd64-pc-solaris2.11, followed by make -j48 check, and the only
>> differences were to to 200+ racy tests, the vast majority of them in
>> gdb.threads.  
> 
> Thanks for testing.
> 
>> Maybe the prior problems have been due to bugs in older
>> versions of python?
> Might be.
> 
> IIRC, the sorting that would change would be the order that the different
> individual gdb.sum results would  be merged (the per-gdb.sum order was
> stable).  So comparing two runs, you'd get something like, in one run this:
> 
>  gdb.base/foo.exp:PASS: test1
>  gdb.base/foo.exp:PASS: test2
>  gdb.base/foo.exp:PASS: test3
>  gdb.base/bar.exp:PASS: testA
>  gdb.base/bar.exp:PASS: testB
>  gdb.base/bar.exp:PASS: testC
> 
> and another run this:
> 
>  gdb.base/bar.exp:PASS: testA
>  gdb.base/bar.exp:PASS: testB
>  gdb.base/bar.exp:PASS: testC
>  gdb.base/foo.exp:PASS: test1
>  gdb.base/foo.exp:PASS: test2
>  gdb.base/foo.exp:PASS: test3
> 
> which would result in all those tests spuriously showing up in a
> gdb.sum old/new diff.
> 
> I'm not sure whether we were seeing that if you compared runs
> of the same tree multiple times.  It could be that it only happened
> when comparing the results of different trees, which contained
> a slightly different set of tests and testcases, like for example
> comparing testresults of a patched master against the testresults
> of master from a month or week ago, which is something I frequently
> do, for example.
> 
> *time passes*
> 
> Wait wait wait, can you clarify what you meant by wrong sorting in:
> 
>  PASS: gdb.ada/addr_arith.exp: print something'address + 0
>  PASS: gdb.ada/addr_arith.exp: print 0 + something'address
>  PASS: gdb.ada/addr_arith.exp: print something'address - 0
>  PASS: gdb.ada/addr_arith.exp: print 0 - something'address
> 
> ?
> 
> Why do you think those results _should_ be sorted?  And in what order?
> 
> Typically, the order/sequence in which the tests of a given exp
> file is executed is important.  The order in the gdb.sum file must
> be the order in which the fail/pass calls are written/issued in the .exp file.
> It'd be absolutely incorrect to alphabetically sort the gdb.sum output.
> Is that what the .py version does?  That's not what I recall, though.
> I guess I may be confused.

Getting back to this, because I just diffed testresults between
runs of different vintage, and got bitten by the sorting problems.

I'm diffing testresults between a run on 20180713 and a run
against today's gdb, and I got a _ton_ of spurious diffs like these:

 -PASS: gdb.ada/complete.exp: complete p my_glob 
 -PASS: gdb.ada/complete.exp: complete p insi 
 -PASS: gdb.ada/complete.exp: complete p inner.insi 
 -PASS: gdb.ada/complete.exp: complete p pck.inne 
 -PASS: gdb.ada/complete.exp: complete p pck__inner__ins 
 -PASS: gdb.ada/complete.exp: complete p pck.inner.ins 
 -PASS: gdb.ada/complete.exp: complete p side 
 -PASS: gdb.ada/complete.exp: complete p exported 
 +PASS: gdb.ada/complete.exp: complete break ada 
  PASS: gdb.ada/complete.exp: complete p <Exported
 -PASS: gdb.ada/complete.exp: p <Exported_Capitalized> 
 -PASS: gdb.ada/complete.exp: p Exported_Capitalized 
 -PASS: gdb.ada/complete.exp: p exported_capitalized 
 -PASS: gdb.ada/complete.exp: complete p __gnat_ada_main_progra 
  PASS: gdb.ada/complete.exp: complete p <__gnat_ada_main_prog
 -PASS: gdb.ada/complete.exp: complete p some 
 +PASS: gdb.ada/complete.exp: complete p <pck__my 
 +PASS: gdb.ada/complete.exp: complete p __gnat_ada_main_progra 
 +PASS: gdb.ada/complete.exp: complete p ambig 
 +PASS: gdb.ada/complete.exp: complete p ambiguous_f 
 +PASS: gdb.ada/complete.exp: complete p ambiguous_func 
 +PASS: gdb.ada/complete.exp: complete p exported 
 +PASS: gdb.ada/complete.exp: complete p external_ident 
 +PASS: gdb.ada/complete.exp: complete p inner.insi 
 +PASS: gdb.ada/complete.exp: complete p insi 
 +PASS: gdb.ada/complete.exp: complete p local_ident 
 +PASS: gdb.ada/complete.exp: complete p my_glob 
  PASS: gdb.ada/complete.exp: complete p not_in_sco
 -PASS: gdb.ada/complete.exp: complete p pck.ins 
 -PASS: gdb.ada/complete.exp: complete p pck.my 
 +PASS: gdb.ada/complete.exp: complete p pck 
 +PASS: gdb.ada/complete.exp: complete p pck. 
 +PASS: gdb.ada/complete.exp: complete p pck.inne 
  PASS: gdb.ada/complete.exp: complete p pck.inne
  PASS: gdb.ada/complete.exp: complete p pck.inner.
 -PASS: gdb.ada/complete.exp: complete p local_ident 
 +PASS: gdb.ada/complete.exp: complete p pck.inner.ins 
 +PASS: gdb.ada/complete.exp: complete p pck.ins 
  PASS: gdb.ada/complete.exp: complete p pck.local_ident
 +PASS: gdb.ada/complete.exp: complete p pck.my 
 +PASS: gdb.ada/complete.exp: complete p pck__inner__ins 
  PASS: gdb.ada/complete.exp: complete p pck__local_ident
 -PASS: gdb.ada/complete.exp: complete p external_ident 
 -PASS: gdb.ada/complete.exp: complete p pck 
 -PASS: gdb.ada/complete.exp: complete p pck. 
 -PASS: gdb.ada/complete.exp: complete p <pck__my 
 +PASS: gdb.ada/complete.exp: complete p side 
 +PASS: gdb.ada/complete.exp: complete p some 
  PASS: gdb.ada/complete.exp: interactive complete 'print some'
 -PASS: gdb.ada/complete.exp: complete p ambig 
 -PASS: gdb.ada/complete.exp: complete p ambiguous_f 
 -PASS: gdb.ada/complete.exp: complete p ambiguous_func 
 +PASS: gdb.ada/complete.exp: p <Exported_Capitalized> 
 +PASS: gdb.ada/complete.exp: p Exported_Capitalized 
 +PASS: gdb.ada/complete.exp: p exported_capitalized 
  PASS: gdb.ada/complete.exp: set max-completions unlimited
 -PASS: gdb.ada/complete.exp: complete break ada 


Given the earlier discussions about sorting, I could
immediately recognize what is wrong.  It's that while
testsuite/outputs/gdb.ada/complete/gdb.sum lists the
test results in chronological order, preserving
execution sequence:

 Running src/gdb/testsuite/gdb.ada/complete.exp ...
 PASS: gdb.ada/complete.exp: compilation foo.adb
 PASS: gdb.ada/complete.exp: complete p my_glob
 PASS: gdb.ada/complete.exp: complete p insi
 PASS: gdb.ada/complete.exp: complete p inner.insi
 PASS: gdb.ada/complete.exp: complete p pck.inne
 PASS: gdb.ada/complete.exp: complete p pck__inner__ins
 PASS: gdb.ada/complete.exp: complete p pck.inner.ins
 PASS: gdb.ada/complete.exp: complete p side
 PASS: gdb.ada/complete.exp: complete p exported
 PASS: gdb.ada/complete.exp: complete p <Exported
 PASS: gdb.ada/complete.exp: p <Exported_Capitalized>
 PASS: gdb.ada/complete.exp: p Exported_Capitalized
 PASS: gdb.ada/complete.exp: p exported_capitalized
 PASS: gdb.ada/complete.exp: complete p __gnat_ada_main_progra
 PASS: gdb.ada/complete.exp: complete p <__gnat_ada_main_prog
 PASS: gdb.ada/complete.exp: complete p some
 PASS: gdb.ada/complete.exp: complete p not_in_sco
 PASS: gdb.ada/complete.exp: complete p pck.ins
 PASS: gdb.ada/complete.exp: complete p pck.my
 PASS: gdb.ada/complete.exp: complete p pck.inne
 PASS: gdb.ada/complete.exp: complete p pck.inner.
 PASS: gdb.ada/complete.exp: complete p local_ident
 PASS: gdb.ada/complete.exp: complete p pck.local_ident
 PASS: gdb.ada/complete.exp: complete p pck__local_ident
 PASS: gdb.ada/complete.exp: complete p external_ident
 PASS: gdb.ada/complete.exp: complete p pck
 PASS: gdb.ada/complete.exp: complete p pck.
 PASS: gdb.ada/complete.exp: complete p <pck__my
 PASS: gdb.ada/complete.exp: interactive complete 'print some'
 PASS: gdb.ada/complete.exp: complete p ambig
 PASS: gdb.ada/complete.exp: complete p ambiguous_f
 PASS: gdb.ada/complete.exp: complete p ambiguous_func
 PASS: gdb.ada/complete.exp: set max-completions unlimited
 PASS: gdb.ada/complete.exp: complete break ada

... the squashed testsuite/gdb.sum ended up with those tests above
sorted lexically:

 PASS: gdb.ada/complete.exp: compilation foo.adb
 PASS: gdb.ada/complete.exp: complete break ada
 PASS: gdb.ada/complete.exp: complete p <Exported
 PASS: gdb.ada/complete.exp: complete p <__gnat_ada_main_prog
 PASS: gdb.ada/complete.exp: complete p <pck__my
 PASS: gdb.ada/complete.exp: complete p __gnat_ada_main_progra
 PASS: gdb.ada/complete.exp: complete p ambig
 PASS: gdb.ada/complete.exp: complete p ambiguous_f
 PASS: gdb.ada/complete.exp: complete p ambiguous_func
 PASS: gdb.ada/complete.exp: complete p exported
 PASS: gdb.ada/complete.exp: complete p external_ident
 PASS: gdb.ada/complete.exp: complete p inner.insi
 PASS: gdb.ada/complete.exp: complete p insi
 PASS: gdb.ada/complete.exp: complete p local_ident
 PASS: gdb.ada/complete.exp: complete p my_glob
 PASS: gdb.ada/complete.exp: complete p not_in_sco
 PASS: gdb.ada/complete.exp: complete p pck
 PASS: gdb.ada/complete.exp: complete p pck.
 PASS: gdb.ada/complete.exp: complete p pck.inne
 PASS: gdb.ada/complete.exp: complete p pck.inne
 PASS: gdb.ada/complete.exp: complete p pck.inner.
 PASS: gdb.ada/complete.exp: complete p pck.inner.ins
 PASS: gdb.ada/complete.exp: complete p pck.ins
 PASS: gdb.ada/complete.exp: complete p pck.local_ident
 PASS: gdb.ada/complete.exp: complete p pck.my
 PASS: gdb.ada/complete.exp: complete p pck__inner__ins
 PASS: gdb.ada/complete.exp: complete p pck__local_ident
 PASS: gdb.ada/complete.exp: complete p side
 PASS: gdb.ada/complete.exp: complete p some
 PASS: gdb.ada/complete.exp: interactive complete 'print some'
 PASS: gdb.ada/complete.exp: p <Exported_Capitalized>
 PASS: gdb.ada/complete.exp: p Exported_Capitalized
 PASS: gdb.ada/complete.exp: p exported_capitalized
 PASS: gdb.ada/complete.exp: set max-completions unlimited

... which is clearly incorrect.

So you won't see the problem if you compare test results of
two runs that both postdate the dg-extract-results update,
and if they're both run in parallel mode.  I assume the problem
is visible if you compare a parallel mode run against
a serial mode run, since the latter won't sort.

Is this something that can be easily fixed?

Thanks,
Pedro Alves