From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 110936 invoked by alias); 13 Aug 2015 23:26:43 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 110927 invoked by uid 89); 13 Aug 2015 23:26:42 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.0 required=5.0 tests=AWL,BAYES_50,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=no version=3.3.2 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Thu, 13 Aug 2015 23:26:41 +0000 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) by mx1.redhat.com (Postfix) with ESMTPS id 6B030915AA; Thu, 13 Aug 2015 23:26:40 +0000 (UTC) Received: from [127.0.0.1] (ovpn01.gateway.prod.ext.ams2.redhat.com [10.39.146.11]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t7DNQcL4023090; Thu, 13 Aug 2015 19:26:39 -0400 Message-ID: <55CD27AE.2090002@redhat.com> Date: Thu, 13 Aug 2015 23:26:00 -0000 From: Pedro Alves User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Sergio Durigan Junior , Patrick Palka CC: "gdb-patches@sourceware.org" Subject: Re: Racy failures on gdb.base/gdbinit-history.exp (native-extended-gdbserver/-m64) References: <1433878062-23560-1-git-send-email-patrick@parcs.ath.cx> <1434466413-28892-1-git-send-email-patrick@parcs.ath.cx> <87mvym67i5.fsf_-_@redhat.com> <87zj1un7ro.fsf@redhat.com> In-Reply-To: <87zj1un7ro.fsf@redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-SW-Source: 2015-08/txt/msg00374.txt.bz2 On 08/13/2015 11:27 PM, Sergio Durigan Junior wrote: > On Thursday, August 13 2015, Patrick Palka wrote: > >> On Thu, Jul 23, 2015 at 2:42 PM, Sergio Durigan Junior >> wrote: >>> On Tuesday, June 16 2015, Patrick Palka wrote: >>> >>>> We still do not handle "set history size unlimited" correctly. In >>>> particular, after writing to the history file, we truncate the history >>>> even if it is unlimited. >>>> >>>> This patch makes sure that we do not call history_truncate_file() if the >>>> history is not stifled (i.e. if it's unlimited). This bug causes the >>>> history file to be truncated to zero on exit when one has "set history >>>> size unlimited" in their gdbinit file. Although this code exists in GDB >>>> 7.8, the bug is masked by a pre-existing bug that's been only fixed in >>>> GDB 7.9 (PR gdb/17820). >>> >>> Hey Patrick, >>> >>> Looking at the BuildBot logs today, I found that this new test is >>> failing occasionally on native-extended-gdbserver testing. Take a look >>> at the following build: >>> >>> >>> >>> You can see that gdb.base/gdbinit-history.exp failed: >>> >>> PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands >>> PASS -> FAIL: gdb.base/gdbinit-history.exp: truncation: creating: server show commands >>> >>> The gdb.log is here: >>> >>> >>> >>> I haven't really investigated to determine what's going on here, but let >>> me know if you need any help with this. >> >> Hi Sergio, >> >> Have you seen these spurious FAILs pop up recently? I wonder if the >> fixes to SIGTERM handling I made a few weeks ago may have fixed this >> as well. > > Hey Patrick, > > Yeah, I still see those FAIL's from time to time: > > > > > I may be wrong, but I noticed that the FAIL's happen only on the PPC > buildslaves. > Here's a theory: (gdb) server show commands 1 set height 0 2 set width 0 3 target extended-remote localhost:2348 4 print 1 5 monitor exit 6 set height 0 7 set width 0 8 target extended-remote localhost:2349 (gdb) PASS: gdb.base/gdbinit-history.exp: truncation: creating: server show commands Remote debugging from host 127.0.0.1 monitor exit (gdb) spawn /home/gdb-buildbot/fedora-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/build/gdb/testsuite/../../gdb/gdb -nw -nx -data-directory /home/gdb-buildbot/fedora -ppc64le-1/fedora-ppc64le-native-extended-gdbserver/build/gdb/testsuite/../data-directory -x /home/gdb-buildbot/fedora-ppc64le-1/fedora-ppc64le-native-extended-gdbserver/bu ild/gdb/testsuite/outputs/gdb.base/gdbinit-history/gdbinit-history.gdbinit -ex set auto-connect-native-target off GNU gdb (GDB) 7.10.50.20150812-cvs ... (gdb) set height 0 (gdb) set width 0 (gdb) spawn ../gdbserver/gdbserver --once --multi :2350 Listening on port 2350 target extended-remote localhost:2350 Remote debugging using localhost:2350 (gdb) server show commands 1 set height 0 2 set width 0 3 target extended-remote localhost:2350 (gdb) FAIL: gdb.base/gdbinit-history.exp: truncation: appending: server show commands Observations: #1 - In the FAILing case, the history only shows commands invoked in that gdb session. That means that either that GDB invocation failed to load the history file, or the previous invocation failed to write it. #2 - dejagnu's standard_close, what is ultimately used to exit gdb, does this: 2.1 - send gdb a SIGINT. This might help by interrupting any command that gdb might be stuck in, getting it back to the event loop processing input. 2.2 - close gdb's ptty, in effect closing gdb's stdin/stdout. 2.3 - if gdb doesn't exit after 5 seconds, kills it with SIGTERM. If it still doesn't exit after another 5 seconds, dejagnu kill it with SIGKILL. Unless you're running dejagnu git master, 2.1 and 2.2 happen in parallel. I'm assuming that 2.3 is not happening. It's 2.2 that normally causes gdb to exit. See event-top.c:stdin_event_handler. - Because, due to 2.2, stdin/stdout are already closed when gdb is running the teardown code (quit_force, what saves history), any warning/output that gdb might output goes nowhere, it won't appear in gdb.log. #3 - If you'll recall, not long ago we made the code that appends the commands to the history file roughly: - mv old-history-file old-history-file.tmp - append to old-history-file.tmp - mv old-history-file.tmp old-history-file That is top.c:gdb_safe_append_history. AFAICS, SIGINT is not set to SA_RESTART. Now imagine what happens if the SIGINT lands exactly just when gdb is about to call rename here, in gdb_safe_append_history: ret = rename (local_history_filename, history_filename); saved_errno = errno; if (ret < 0 && saved_errno != EEXIST) warning (_("Could not rename %s to %s: %s"), local_history_filename, history_filename, safe_strerror (saved_errno)); } and rename fails with EINTR. Then the following gdb invocation would not find any previous history file to load, which would perfectly explain that FAIL. If we still have the build/test directory of that test run, if the theory is right, I think we'd find a leftover gdbinit-history.gdb_history-gdb-$PID~ file. What doesn't look right in the theory is that the rename man does not document rename as failing with EINTR... But maybe it does? Thanks, Pedro Alves