From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) by sourceware.org (Postfix) with ESMTP id 6BF3A385BF81 for ; Thu, 16 Apr 2020 13:28:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 6BF3A385BF81 Received: from mail-ej1-f69.google.com (mail-ej1-f69.google.com [209.85.218.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-284-MMhRv2NtPkCj5zKORYlVhg-1; Thu, 16 Apr 2020 09:28:45 -0400 X-MC-Unique: MMhRv2NtPkCj5zKORYlVhg-1 Received: by mail-ej1-f69.google.com with SMTP id q24so979859ejb.3 for ; Thu, 16 Apr 2020 06:28:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=b/MUIS05JMhqobW6UWJ8YgL6WZIsxDJN4AR4YPoc9Ys=; b=MQrr6wCE83VkjiZ1gdWr4tIny0zMAoUitdPrU6a6s6ZfbtMN7kBNTT+l8QW/QZZynT kJf8PyWGIE2qDvK13EEY5VONPh3Odx23azAKjhWWv0HULwXdHvQ8c9fv1cHkcDnVfi0N 2q7WKAqRsc21mcpDmX2dXDrBlCVh6P4yvNVo55NveeJSc4pxDWDfUb2xM1CQQyHKv+g/ twrV74coQuvE+fP56w4oKOg42OxlTlEtiAMvzCI4jHm1Xv1+3U2aHq/PfH4IsCowYHVG 3KJwqndAc62OfWMUmJL9gNcsAzDN3k/lQItDgDjkhTK6L7mC/8cxVIGVQPFDEKNVCHwL 4Xiw== X-Gm-Message-State: AGi0PubQ0BlD9QmIyqceeQht2Em8OzAetAlX14hW9X2O85lUVimKsqrq IqSyyQnMskJXzsfBkxBCEZIo4dnfGaY9HpPz/p629g0zTtkXW/txUH/WUtpmsH5U0V8pQjZXV+4 3pY/nmL5P4KryNnkQt/tL4A== X-Received: by 2002:aa7:c2d8:: with SMTP id m24mr7972760edp.342.1587043722836; Thu, 16 Apr 2020 06:28:42 -0700 (PDT) X-Google-Smtp-Source: APiQypKGlgn13hH5ItthVix9mdqCkPy0ntkfn77gDh1gnMdWnRjRoWkrQh1Isble2bzB67knWzaFSA== X-Received: by 2002:aa7:c2d8:: with SMTP id m24mr7972722edp.342.1587043722417; Thu, 16 Apr 2020 06:28:42 -0700 (PDT) Received: from ?IPv6:2001:8a0:f909:7b00:56ee:75ff:fe8d:232b? ([2001:8a0:f909:7b00:56ee:75ff:fe8d:232b]) by smtp.gmail.com with ESMTPSA id u7sm2227895edy.25.2020.04.16.06.28.41 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 16 Apr 2020 06:28:41 -0700 (PDT) Subject: Re: [PATCH][gdb] Fix hang after ext sigkill To: Tom de Vries , Simon Marchi , gdb-patches@sourceware.org References: <20200224201403.GA7079@delia> <831161db-85a9-74da-1833-7bab3cc41d15@suse.de> <4fc1ee75-397b-8c05-14a9-fcedf584e8b8@suse.de> From: Pedro Alves Message-ID: <8094f685-b037-3e65-e8c3-b35c21c79f96@redhat.com> Date: Thu, 16 Apr 2020 14:28:40 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: <4fc1ee75-397b-8c05-14a9-fcedf584e8b8@suse.de> Content-Language: en-US X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-24.4 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Apr 2020 13:28:50 -0000 Hi, Sorry for the delay, and thanks much for working on this. On 3/25/20 3:51 PM, Tom de Vries wrote: > On 25-03-2020 15:44, Simon Marchi wrote: >> On 2020-03-25 6:29 a.m., Tom de Vries wrote: >>> Here's the updated patch. >> Thanks. Some comments about the test: >> >> - Please add a comment at the top to describe briefly what this is testing. >> - Please replace the infinite loops with bounded ones (e.g. for (i = 0; i < 300; i++)), >> so that the test program eventually exits if something goes wrong and it is allowed to run >> freely. > Done. > 0001-gdb-Fix-hang-after-ext-sigkill.patch > > [gdb] Fix hang after ext sigkill > > Consider the test-case from this patch, compiled with pthread support: > ... > $ gcc src/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c -lpthread > ... > > After running, the program sleeps: > ... > $ gdb a.out > Reading symbols from a.out... > (gdb) r > Starting program: /data/gdb_versions/devel/a.out > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib64/libthread_db.so.1". > [New Thread 0x7ffff77fe700 (LWP 22604)] > ... > > Until we interrupt it with a control-C: > ... > ^C > Thread 1 "a.out" received signal SIGINT, Interrupt. > 0x00007ffff78c50f0 in nanosleep () from /lib64/libc.so.6 > (gdb) > ... > > If we then kill the inferior using an external SIGKILL: > ... > (gdb) shell killall -s SIGKILL a.out > ... > and subsequently continue: > ... > (gdb) c > Continuing. > Couldn't get registers: No such process. > Couldn't get registers: No such process. > (gdb) Couldn't get registers: No such process. > (gdb) Couldn't get registers: No such process. > (gdb) Couldn't get registers: No such process. > > ... > gdb hangs repeating the same warning. Typing control-C no longer helps, > and we have to kill gdb. > > This is a regression since commit 873657b9e8 "Preserve selected thread in > all-stop w/ background execution". The commit adds a > scoped_restore_current_thread typed variable restore_thread to > fetch_inferior_event, and the hang is caused by the constructor throwing an > exception. > > Fix this by catching the exception in the constructor. > > Build and reg-tested on x86_64-linux. > > gdb/ChangeLog: > > 2020-02-24 Tom de Vries > > PR gdb/25471 > * thread.c > (scoped_restore_current_thread::scoped_restore_current_thread): Catch > exception in get_frame_id. > > gdb/testsuite/ChangeLog: > > 2020-02-24 Tom de Vries > > PR gdb/25471 > * gdb.threads/hang-after-ext-sigkill.c: New test. > * gdb.threads/hang-after-ext-sigkill.exp: New file. "hang-after-ext-sigkill" is named in terms of how the bug manifested (a hang), but once the bug is fixed, it won't be obvious to remember to look for "hang" when someone goes look for a testcase related to a process being killed outside of gdb's control. Plus, then testcase may be extended in the future for related bugs that do not cause a hang. There's a gdb.base/killed-outside.exp testcase already exactly for this sort of issue -- the testcase does the same thing with killing with SIGKILL from outside, and then making sure that GDB behaves. I'd rather this new testcase was given the same or a similar name, so that e.g. 'make check TESTS="*/*killed-outside*.exp"' runs it too. Or maybe merge the testcases, though it's useful to run the existing one on non-threaded environments too. But I'm not sure this one needs to be threaded at all. Won't we see the failure to read registers with a single-threaded program too? > * lib/gdb.exp (runto): Handle "Temporary breakpoint" string. > > --- > gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c | 43 +++++++++++ > .../gdb.threads/hang-after-ext-sigkill.exp | 88 ++++++++++++++++++++++ > gdb/testsuite/lib/gdb.exp | 2 +- > gdb/thread.c | 12 ++- > 4 files changed, 142 insertions(+), 3 deletions(-) > > diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c > new file mode 100644 > index 0000000000..b93d6c644a > --- /dev/null > +++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c > @@ -0,0 +1,43 @@ > +/* This testcase is part of GDB, the GNU debugger. > + > + Copyright 2020 Free Software Foundation, Inc. > + > + This program is free software; you can redistribute it and/or modify > + it under the terms of the GNU General Public License as published by > + the Free Software Foundation; either version 3 of the License, or > + (at your option) any later version. > + > + This program is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + GNU General Public License for more details. > + > + You should have received a copy of the GNU General Public License > + along with this program. If not, see . */ > + > +#include > +#include > + > +static void * > +fun (void *dummy) > +{ > + int i; > + > + for (i = 0; i < 300; i++) > + sleep (1); > + > + return NULL; > +} > + > +int > +main (void) > +{ > + int i; > + pthread_t thread; > + pthread_create (&thread, NULL, fun, NULL); > + > + for (i = 0; i < 300; i++) > + sleep (1); > + > + return 0; > +} > diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp > new file mode 100644 > index 0000000000..89b38b1f6c > --- /dev/null > +++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp > @@ -0,0 +1,88 @@ > +# Copyright (C) 2020 Free Software Foundation, Inc. > + > +# This program is free software; you can redistribute it and/or modify > +# it under the terms of the GNU General Public License as published by > +# the Free Software Foundation; either version 3 of the License, or > +# (at your option) any later version. > +# > +# This program is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program. If not, see . > + > +# This test-case tests that continuing an inferior that has been killed > +# using an external sigkill does not make gdb hang. > + > +standard_testfile > + > +if {[prepare_for_testing "failed to prepare" $testfile $srcfile \ > + {pthreads}] == -1} { > + return -1 > +} > + > +set res [runto main no-message temporary] > +if { $res != 1 } { > + return -1 > +} > + > +set pid -1 > +gdb_test_multiple "info inferior 1" "get inferior pid" { > + -re -wrap "process (\[0-9\]*).*" { > + set pid $expect_out(1,string) > + pass $gdb_test_name > + } > +} This won't work with remote targets that don't support the process extensions, since on that case, you'll get "Remote target" instead of "process $PID". See remote_target::pid_to_str. Likewise probably other targets. See gdb.base/killed-outside.exp. > +if { $pid == -1 } { > + return -1 > +} > + > +gdb_test_multiple "continue" "" { > + -re "Continuing" { > + pass $gdb_test_name > + } > +} > + > +send_gdb "\003" > + > +gdb_test_multiple "" "get sigint" { > + -re -wrap "received signal SIGINT, Interrupt\..*" { > + pass $gdb_test_name > + } > +} > + I don't think interrupting with Ctrl-C is really important, compared to e.g., running to a breakpoint. I'd prefer to run to a breakpoint instead. E.g., call a all_started(); function after the child thread is spawned and after a pthread_barrier_wait call, to make sure the child was scheduled. See e.g., gdb.threads/async.c. Then all you need is to "runto all_started" instead of runto_main. > +gdb_test_no_output "shell kill -s SIGKILL $pid" "shell kill -s SIGKILL pid" This will always kill a process on the host gdb is running on, which of course does wrong the thing in cross scenarios. So this should do instead: remote_exec target "kill -9 ${testpid}" and sleeping a bit is a good idea to make sure the kill is actually scheduled and does its thing before gdb does the "continue". See gdb.base/killed-outside.exp. > + > +set no_such_process_msg "Couldn't get registers: No such process\." > +set killed_msg "Program terminated with signal SIGKILL, Killed\." > +set no_longer_exists_msg "The program no longer exists\." > +set not_being_run_msg "The program is not being run\." > + > +gdb_test_multiple "continue" "prompt after first continue" { > + -re "Continuing\.\r\n\r\n$killed_msg\r\n$no_longer_exists_msg\r\n$gdb_prompt $" { > + pass $gdb_test_name > + # Regular output, bug condition was not triggered, we're done. > + return -1 > + } > + -re "Continuing\.\r\n$no_such_process_msg\r\n$no_such_process_msg\r\n$gdb_prompt " { > + pass $gdb_test_name > + # Two times $no_such_process_msg. The bug condition was triggered, go > + # check for it. > + } > + -re "Continuing\.\r\n$no_such_process_msg\r\n$gdb_prompt $" { > + pass $gdb_test_name > + # One time $no_such_process_msg. We're stuck here. The bug condition > + # was not triggered, but we're not getting correct gdb behaviour either: > + # every subsequent continue produces one no_such_process_msg. Give up. > + return -1 I'm confused here -- the comment says we're not getting correct behavior, but this won't result in any FAIL? > + } > +} > + > +gdb_test_multiple "" "messages" { > + -re ".*$killed_msg.*$no_longer_exists_msg\r\n" { > + pass $gdb_test_name > + gdb_test "continue" $not_being_run_msg "second continue" > + } > +} It isn't obvious to me why put this one separately instead of nested within the pass case in the other gdb_test_multiple above. Is this also meant to run if the previous gdb_test_multiple fails due to an internal gdb_test_multiple case being hit, like the default gdb_prompt match? > diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp > index e17ac0ef75..4cf2beca00 100644 > --- a/gdb/testsuite/lib/gdb.exp > +++ b/gdb/testsuite/lib/gdb.exp > @@ -570,7 +570,7 @@ proc runto { function args } { > } > return 1 > } > - -re "Breakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" { > + -re "\[Bb\]reakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" { > if { $print_pass } { > pass $test_name > } > diff --git a/gdb/thread.c b/gdb/thread.c > index c6e3d356a5..d287bce45f 100644 > --- a/gdb/thread.c > +++ b/gdb/thread.c > @@ -1488,8 +1488,16 @@ scoped_restore_current_thread::scoped_restore_current_thread () > else > frame = NULL; > > - m_selected_frame_id = get_frame_id (frame); > - m_selected_frame_level = frame_relative_level (frame); > + try > + { > + m_selected_frame_id = get_frame_id (frame); > + m_selected_frame_level = frame_relative_level (frame); > + } > + catch (const gdb_exception &ex) This silently swallows Ctrl-C/QUIT too. That's usually not a good idea. gdb_exception_error should be the default choice, unless you really want to handle Ctrl-C here. Thanks, Pedro Alves