From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 42739 invoked by alias); 15 Dec 2017 16:20:32 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 42700 invoked by uid 89); 15 Dec 2017 16:20:31 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.9 required=5.0 tests=BAYES_00,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_PASS,T_RP_MATCHES_RCVD autolearn=no version=3.3.2 spammy=urls X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 15 Dec 2017 16:20:29 +0000 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id CCDD1C0587E0; Fri, 15 Dec 2017 16:20:28 +0000 (UTC) Received: from [127.0.0.1] (ovpn04.gateway.prod.ext.ams2.redhat.com [10.39.146.4]) by smtp.corp.redhat.com (Postfix) with ESMTP id 17E0A62676; Fri, 15 Dec 2017 16:20:23 +0000 (UTC) Subject: Re: [BuildBot] Notifications disabled for Debian-s390x-* and Fedora-ppc64*-* builders To: David Edelsohn References: <87d13g6r5t.fsf@redhat.com> <878te46pk4.fsf@redhat.com> <3808c9ac-450e-3d53-d5c6-ddd7f4b8d1df@redhat.com> Cc: Sergio Durigan Junior , GDB Patches , Edjunior Machado From: Pedro Alves Message-ID: Date: Fri, 15 Dec 2017 16:20:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-SW-Source: 2017-12/txt/msg00384.txt.bz2 On 12/15/2017 03:53 PM, David Edelsohn wrote: > On Fri, Dec 15, 2017 at 10:42 AM, Pedro Alves wrote: >> On 12/15/2017 03:06 PM, David Edelsohn wrote: >> >>> Third, the testsuite summaries that no one from the GDB community >>> monitored show that the testsuite runtime jumped from a relatively >>> short amount of time to over 9 hours for each run, which points to a >>> newly introduced problem in GDB or in the testsuite (timeouts?). >> >> That may well be. Can you point at some representative builds, >> before/after the jump? > > The testsuite runs for 6 minutes on RHEL7 s390x buildslave and 9 hours > on Debian Jessie s390x buildslave. Those are separate machines. I'd like to see the jump on the same machine, so we can maybe pinpoint what caused it. I was really asking for URLs. Here looks like there's some: https://gdb-build.sergiodj.net/builders/Debian-s390x-native-gdbserver-m64 Here, for example: https://gdb-build.sergiodj.net/builders/Debian-s390x-native-gdbserver-m64/builds/4351 "test gdb tested GDB failed (9 hrs, 2 mins, 56 secs)" That's definitely too long. I downloaded the gdb.log file, and did: $ grep FAIL gdb.log | grep timeout | sed 's/.exp.*/.exp/g' | sort | uniq -c | sort -n 1 FAIL: gdb.base/watch-cond.exp 1 FAIL: gdb.multi/watchpoint-multi-exit.exp 1 FAIL: gdb.threads/interrupted-hand-call.exp 1 FAIL: gdb.threads/thread-unwindonsignal.exp 2 FAIL: gdb.base/value-double-free.exp 3 FAIL: gdb.mi/mi-async.exp 3 FAIL: gdb.threads/process-dies-while-detaching.exp 4 FAIL: gdb.base/pr11022.exp 10 FAIL: gdb.base/watch-bitfields.exp 15 FAIL: gdb.base/watchpoints.exp 20 FAIL: gdb.threads/interrupt-while-step-over.exp 32 FAIL: gdb.threads/watchpoint-fork.exp 45 FAIL: gdb.threads/step-over-trips-on-watchpoint.exp 46 FAIL: gdb.base/display.exp 51 FAIL: gdb.base/watchpoint.exp Not _that_ many. Could they explain the long time? I suspect not. We see this: $ grep "Test run by" gdb.log | head -n 3 Test run by dje on Tue Nov 21 03:23:01 2017 Test run by dje on Tue Nov 21 03:23:01 2017 Test run by dje on Tue Nov 21 03:23:01 2017 $ grep "Test run by" gdb.log | tail -n 3 Test run by dje on Tue Nov 21 03:29:54 2017 Test run by dje on Tue Nov 21 03:29:54 2017 Test run by dje on Tue Nov 21 03:29:54 2017 So most of the testsuite actually ran for 7 minutes. And then something hung for 9 hours? I have no idea how that could happen from the existing logs. The tail end of the log has: ~~~ FAIL: gdb.base/watchpoint.exp: delete all breakpoints in delete_breakpoints (timeout) ERROR: breakpoints not deleted ERROR: breakpoints not deleted command timed out: 1200 seconds without output running ['make', '-k', 'check', 'RUNTESTFLAGS=--target_board native-gdbserver', '-j8', 'FORCE_PARALLEL=1'], attempting to kill process killed by signal 9 program finished with exit code -1 elapsedTime=32576.210392 ~~~ I don't understand how 7 minutes plus 1200 seconds (~20min) resulted in "elapsedTime=32576.210392" (~9h). Maybe that number isn't to be trusted. Anyway, I'm sorry, but I really don't have the time to be looking at this. Someone with the motivation and access to the machine could try running the testsuite manually, for example, see how long that takes, and where the hang is. > The Debian Jessie system also runs a Python buildslave without > problem. The system has 4 virtual cpus and 16GB of memory, which > should be more than adequately sized. Thanks, Pedro Alves