From: Tom de Vries <tdevries@suse.de>
To: Simon Marchi <simark@simark.ca>, gdb-patches@sourceware.org
Cc: Pedro Alves <palves@redhat.com>
Subject: Re: [PATCH][gdb] Fix hang after ext sigkill
Date: Wed, 25 Mar 2020 11:29:54 +0100 [thread overview]
Message-ID: <831161db-85a9-74da-1833-7bab3cc41d15@suse.de> (raw)
In-Reply-To: <e04d1b2c-1db3-57d7-f80f-493a7d0bc2f3@simark.ca>
[-- Attachment #1: Type: text/plain, Size: 4477 bytes --]
On 24-03-2020 16:35, Simon Marchi wrote:
> Hi Tom,
>
> The test fails for me more often than not. I've attached a gdb.log showing one such
> failure. There seems to be a problem matching the output of the last "continue".
>
Hi Simon,
I've managed to reproduce that, by running the test-case in parallel
with stress -c 5.
> This is what I see when I reproduce the case by hand:
>
> (gdb) c
> Continuing.
> Couldn't get registers: No such process.
> Couldn't get registers: No such process.
> (gdb) [Thread 0x7ffff7d99700 (LWP 514079) exited]
>
> Program terminated with signal SIGKILL, Killed.
> The program no longer exists.
>
I've modified the test-case to check for the amount of "Couldn't get
registers: No such process." between the continue and the following gdb
prompt, and handle that appropriately. Hopefully this will fix the FAILs
you're seeing.
> I didn't really understand why we saw a prompt coming back before the other messages,
> so I looked into it a bit and this is what I think happens:
>
> 1. While we are stopped at the prompt, the linux-nat event handler is unregistered from the event loop
> 2. Inferior gets SIGKILL'ed, so GDB gets SIGCHLD'ed, that posts an event to the event pipe, but since it's
> not registered in the event loop, nothing happens
> 3. User does continue, the linux-nat event handler gets registered with the event loop. Then, the continue
> fails (No such process), which brings us back at the prompt. However, the linux-nat event handler stays
> registered with the event loop.
> 4. When we come back to the event loop, we process the event for the SIGKILL, which makes GDB print the
> thread exit message and "Program terminated" message.
>
> Normally, after the "continue" fails, I don't think we would want to leave the linux-nat
> handler registered with the event loop: if it was not registered before, why should it be
> registered after? However, if it wasn't left there, we wouldn't see the messages saying
> the program has terminated, so that wouldn't be good either.
>
> Maybe there's a better way to handle it?
>
Thanks for the investigation. Unfortunately I'm not able to comment on this.
> However, I still think it would be a good idea to merge a patch like yours, it's already a
> step forward (especially since it fixes a regression).
>
>> diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
>> index 81518b9646..745694df2d 100644
>> --- a/gdb/testsuite/lib/gdb.exp
>> +++ b/gdb/testsuite/lib/gdb.exp
>> @@ -571,7 +571,7 @@ proc runto { function args } {
>> }
>> return 1
>> }
>> - -re "Breakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" {
>> + -re "\[Bb\]reakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" {
>> if { $print_pass } {
>> pass $test_name
>> }
>> diff --git a/gdb/thread.c b/gdb/thread.c
>> index 54b59e2244..9876ca3c76 100644
>> --- a/gdb/thread.c
>> +++ b/gdb/thread.c
>> @@ -1444,6 +1444,8 @@ scoped_restore_current_thread::restore ()
>>
>> scoped_restore_current_thread::~scoped_restore_current_thread ()
>> {
>> + if (m_inf == NULL)
>> + return;
>> if (!m_dont_restore)
>> {
>> try
>> @@ -1488,7 +1490,17 @@ scoped_restore_current_thread::scoped_restore_current_thread ()
>> else
>> frame = NULL;
>>
>> - m_selected_frame_id = get_frame_id (frame);
>> + try
>> + {
>> + m_selected_frame_id = get_frame_id (frame);
>> + }
>> + catch (const gdb_exception &ex)
>> + {
>> + m_inf = NULL;
>> + m_selected_frame_id = null_frame_id;
>> + m_selected_frame_level = -1;
>> + return;
>> + }
>
> The indentation is a bit off here.
>
> Instead of clearing everything, I think we should just set m_selected_frame_id to
> null_frame_id and m_selected_frame_level to -1. m_inf and m_thread can still be
> set. This way, the right inferior will be restored, at least, I think this is
> desirable. scoped_restore_current_thread::restore handles well the case where the
> thread to restore has exited. The thread_info object is refcounted for this exact
> use case, where the thread would get deleted while
>
> In other words:
>
> try
> {
> m_selected_frame_id = get_frame_id (frame);
> m_selected_frame_level = frame_relative_level (frame);
> }
> catch (const gdb_exception &ex)
> {
> m_selected_frame_id = null_frame_id;
> m_selected_frame_level = -1;
> }
Yep, that works.
Here's the updated patch.
Thanks,
- Tom
[-- Attachment #2: 0001-gdb-Fix-hang-after-ext-sigkill.patch --]
[-- Type: text/x-patch, Size: 7593 bytes --]
[gdb] Fix hang after ext sigkill
Consider the test-case from this patch, compiled with pthread support:
...
$ gcc src/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c -lpthread
...
After running, the program sleeps:
...
$ gdb a.out
Reading symbols from a.out...
(gdb) r
Starting program: /data/gdb_versions/devel/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff77fe700 (LWP 22604)]
...
Until we interrupt it with a control-C:
...
^C
Thread 1 "a.out" received signal SIGINT, Interrupt.
0x00007ffff78c50f0 in nanosleep () from /lib64/libc.so.6
(gdb)
...
If we then kill the inferior using an external SIGKILL:
...
(gdb) shell killall -s SIGKILL a.out
...
and subsequently continue:
...
(gdb) c
Continuing.
Couldn't get registers: No such process.
Couldn't get registers: No such process.
(gdb) Couldn't get registers: No such process.
(gdb) Couldn't get registers: No such process.
(gdb) Couldn't get registers: No such process.
<repeat>
...
gdb hangs repeating the same warning. Typing control-C no longer helps,
and we have to kill gdb.
This is a regression since commit 873657b9e8 "Preserve selected thread in
all-stop w/ background execution". The commit adds a
scoped_restore_current_thread typed variable restore_thread to
fetch_inferior_event, and the hang is caused by the constructor throwing an
exception.
Fix this by catching the exception in the constructor.
Build and reg-tested on x86_64-linux.
gdb/ChangeLog:
2020-02-24 Tom de Vries <tdevries@suse.de>
PR gdb/25471
* thread.c
(scoped_restore_current_thread::scoped_restore_current_thread): Catch
exception in get_frame_id.
gdb/testsuite/ChangeLog:
2020-02-24 Tom de Vries <tdevries@suse.de>
PR gdb/25471
* gdb.threads/hang-after-ext-sigkill.c: New test.
* gdb.threads/hang-after-ext-sigkill.exp: New file.
* lib/gdb.exp (runto): Handle "Temporary breakpoint" string.
---
gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c | 40 ++++++++++
.../gdb.threads/hang-after-ext-sigkill.exp | 85 ++++++++++++++++++++++
gdb/testsuite/lib/gdb.exp | 2 +-
gdb/thread.c | 12 ++-
4 files changed, 136 insertions(+), 3 deletions(-)
diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c
new file mode 100644
index 0000000000..bfce6c3085
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c
@@ -0,0 +1,40 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+ Copyright 2020 Free Software Foundation, Inc.
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>. */
+
+#include <pthread.h>
+#include <unistd.h>
+
+static void *
+fun (void *dummy)
+{
+ while (1)
+ sleep (1);
+
+ return NULL;
+}
+
+int
+main (void)
+{
+ pthread_t thread;
+ pthread_create (&thread, NULL, fun, NULL);
+
+ while (1)
+ sleep (1);
+
+ return 0;
+}
diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp
new file mode 100644
index 0000000000..37577592cd
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp
@@ -0,0 +1,85 @@
+# Copyright (C) 2020 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+standard_testfile
+
+if {[prepare_for_testing "failed to prepare" $testfile $srcfile \
+ {pthreads}] == -1} {
+ return -1
+}
+
+set res [runto main no-message temporary]
+if { $res != 1 } {
+ return -1
+}
+
+set pid -1
+gdb_test_multiple "info inferior 1" "get inferior pid" {
+ -re -wrap "process (\[0-9\]*).*" {
+ set pid $expect_out(1,string)
+ pass $gdb_test_name
+ }
+}
+if { $pid == -1 } {
+ return -1
+}
+
+gdb_test_multiple "continue" "" {
+ -re "Continuing" {
+ pass $gdb_test_name
+ }
+}
+
+send_gdb "\003"
+
+gdb_test_multiple "" "get sigint" {
+ -re -wrap "received signal SIGINT, Interrupt\..*" {
+ pass $gdb_test_name
+ }
+}
+
+gdb_test_no_output "shell kill -s SIGKILL $pid" "shell kill -s SIGKILL pid"
+
+set no_such_process_msg "Couldn't get registers: No such process\."
+set killed_msg "Program terminated with signal SIGKILL, Killed\."
+set no_longer_exists_msg "The program no longer exists\."
+set not_being_run_msg "The program is not being run\."
+
+gdb_test_multiple "continue" "prompt after first continue" {
+ -re "Continuing\.\r\n\r\n$killed_msg\r\n$no_longer_exists_msg\r\n$gdb_prompt $" {
+ pass $gdb_test_name
+ # Regular output, bug condition was not triggered, we're done.
+ return -1
+ }
+ -re "Continuing\.\r\n$no_such_process_msg\r\n$no_such_process_msg\r\n$gdb_prompt " {
+ pass $gdb_test_name
+ # Two times $no_such_process_msg. The bug condition was triggered, go
+ # check for it.
+ }
+ -re "Continuing\.\r\n$no_such_process_msg\r\n$gdb_prompt $" {
+ pass $gdb_test_name
+ # One time $no_such_process_msg. We're stuck here. The bug condition
+ # was not triggered, but we're not getting correct gdb behaviour either:
+ # every subsequent continue produces one no_such_process_msg. Give up.
+ return -1
+ }
+}
+
+gdb_test_multiple "" "messages" {
+ -re ".*$killed_msg.*$no_longer_exists_msg\r\n" {
+ pass $gdb_test_name
+ gdb_test "continue" $not_being_run_msg "second continue"
+ }
+}
diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
index e17ac0ef75..4cf2beca00 100644
--- a/gdb/testsuite/lib/gdb.exp
+++ b/gdb/testsuite/lib/gdb.exp
@@ -570,7 +570,7 @@ proc runto { function args } {
}
return 1
}
- -re "Breakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" {
+ -re "\[Bb\]reakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" {
if { $print_pass } {
pass $test_name
}
diff --git a/gdb/thread.c b/gdb/thread.c
index c6e3d356a5..d287bce45f 100644
--- a/gdb/thread.c
+++ b/gdb/thread.c
@@ -1488,8 +1488,16 @@ scoped_restore_current_thread::scoped_restore_current_thread ()
else
frame = NULL;
- m_selected_frame_id = get_frame_id (frame);
- m_selected_frame_level = frame_relative_level (frame);
+ try
+ {
+ m_selected_frame_id = get_frame_id (frame);
+ m_selected_frame_level = frame_relative_level (frame);
+ }
+ catch (const gdb_exception &ex)
+ {
+ m_selected_frame_id = null_frame_id;
+ m_selected_frame_level = -1;
+ }
tp->incref ();
m_thread = tp;
next prev parent reply other threads:[~2020-03-25 10:29 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-24 20:14 Tom de Vries
2020-03-09 12:52 ` [PING][PATCH][gdb] " Tom de Vries
2020-03-23 19:16 ` [PING^2][PATCH][gdb] " Tom de Vries
2020-03-24 15:35 ` [PATCH][gdb] " Simon Marchi
2020-03-25 10:29 ` Tom de Vries [this message]
2020-03-25 14:44 ` Simon Marchi
2020-03-25 15:51 ` Tom de Vries
2020-03-25 15:57 ` Simon Marchi
2020-04-16 13:28 ` Pedro Alves
2020-04-21 12:38 ` Tom de Vries
2020-04-21 13:42 ` Pedro Alves
2020-09-25 9:39 ` Tom de Vries
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=831161db-85a9-74da-1833-7bab3cc41d15@suse.de \
--to=tdevries@suse.de \
--cc=gdb-patches@sourceware.org \
--cc=palves@redhat.com \
--cc=simark@simark.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox