Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed
From: Tom de Vries <tdevries@suse.de>
To: Pedro Alves <palves@redhat.com>, Simon Marchi <simark@simark.ca>,
	gdb-patches@sourceware.org
Subject: Re: [PATCH][gdb] Fix hang after ext sigkill
Date: Tue, 21 Apr 2020 14:38:17 +0200	[thread overview]
Message-ID: <e941745e-2662-7e7e-77ae-d828d5c649e4@suse.de> (raw)
In-Reply-To: <8094f685-b037-3e65-e8c3-b35c21c79f96@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 12258 bytes --]

On 16-04-2020 15:28, Pedro Alves wrote:
> Hi,
> 
> Sorry for the delay, and thanks much for working on this.
> 
> On 3/25/20 3:51 PM, Tom de Vries wrote:
>> On 25-03-2020 15:44, Simon Marchi wrote:
>>> On 2020-03-25 6:29 a.m., Tom de Vries wrote:
>>>> Here's the updated patch.
>>> Thanks.  Some comments about the test:
>>>
>>> - Please add a comment at the top to describe briefly what this is testing.
>>> - Please replace the infinite loops with bounded ones (e.g. for (i = 0; i < 300; i++)),
>>>   so that the test program eventually exits if something goes wrong and it is allowed to run
>>>   freely.
>> Done.
>> 0001-gdb-Fix-hang-after-ext-sigkill.patch
>>
>> [gdb] Fix hang after ext sigkill
>>
>> Consider the test-case from this patch, compiled with pthread support:
>> ...
>> $ gcc src/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c -lpthread
>> ...
>>
>> After running, the program sleeps:
>> ...
>> $ gdb a.out
>> Reading symbols from a.out...
>> (gdb) r
>> Starting program: /data/gdb_versions/devel/a.out
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> [New Thread 0x7ffff77fe700 (LWP 22604)]
>> ...
>>
>> Until we interrupt it with a control-C:
>> ...
>> ^C
>> Thread 1 "a.out" received signal SIGINT, Interrupt.
>> 0x00007ffff78c50f0 in nanosleep () from /lib64/libc.so.6
>> (gdb)
>> ...
>>
>> If we then kill the inferior using an external SIGKILL:
>> ...
>> (gdb) shell killall -s SIGKILL a.out
>> ...
>> and subsequently continue:
>> ...
>> (gdb) c
>> Continuing.
>> Couldn't get registers: No such process.
>> Couldn't get registers: No such process.
>> (gdb) Couldn't get registers: No such process.
>> (gdb) Couldn't get registers: No such process.
>> (gdb) Couldn't get registers: No such process.
>> <repeat>
>> ...
>> gdb hangs repeating the same warning.  Typing control-C no longer helps,
>> and we have to kill gdb.
>>
>> This is a regression since commit 873657b9e8 "Preserve selected thread in
>> all-stop w/ background execution".  The commit adds a
>> scoped_restore_current_thread typed variable restore_thread to
>> fetch_inferior_event, and the hang is caused by the constructor throwing an
>> exception.
>>
>> Fix this by catching the exception in the constructor.
>>
>> Build and reg-tested on x86_64-linux.
>>
>> gdb/ChangeLog:
>>
>> 2020-02-24  Tom de Vries  <tdevries@suse.de>
>>
>> 	PR gdb/25471
>> 	* thread.c
>> 	(scoped_restore_current_thread::scoped_restore_current_thread): Catch
>> 	exception in get_frame_id.
>>
>> gdb/testsuite/ChangeLog:
>>
>> 2020-02-24  Tom de Vries  <tdevries@suse.de>
>>
>> 	PR gdb/25471
>> 	* gdb.threads/hang-after-ext-sigkill.c: New test.
>> 	* gdb.threads/hang-after-ext-sigkill.exp: New file.
> 
> "hang-after-ext-sigkill" is named in terms of how the
> bug manifested (a hang), but once the bug is fixed, it won't
> be obvious to remember to look for "hang" when someone goes
> look for a testcase related to a process being killed outside of
> gdb's control.  Plus, then testcase may be extended in the future
> for related bugs that do not cause a hang.
> 
> There's a gdb.base/killed-outside.exp testcase already exactly for
> this sort of issue -- the testcase does the same thing with killing
> with SIGKILL from outside, and then making sure that GDB
> behaves.  I'd rather this new testcase was given the same or
> a similar name, so that e.g. 'make check TESTS="*/*killed-outside*.exp"'
> runs it too.

Ack, test-case renamed to gdb.threads/killed-outside.{exp,c}.

>  Or maybe merge the testcases, though it's useful
> to run the existing one on non-threaded environments too.
> But I'm not sure this one needs to be threaded at all.  Won't
> we see the failure to read registers with a single-threaded program
> too?
> 

I was not able to reproduce the hang with a single-threaded program.

>> 	* lib/gdb.exp (runto): Handle "Temporary breakpoint" string.
>>
>> ---
>>  gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c | 43 +++++++++++
>>  .../gdb.threads/hang-after-ext-sigkill.exp         | 88 ++++++++++++++++++++++
>>  gdb/testsuite/lib/gdb.exp                          |  2 +-
>>  gdb/thread.c                                       | 12 ++-
>>  4 files changed, 142 insertions(+), 3 deletions(-)
>>
>> diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c
>> new file mode 100644
>> index 0000000000..b93d6c644a
>> --- /dev/null
>> +++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c
>> @@ -0,0 +1,43 @@
>> +/* This testcase is part of GDB, the GNU debugger.
>> +
>> +   Copyright 2020 Free Software Foundation, Inc.
>> +
>> +   This program is free software; you can redistribute it and/or modify
>> +   it under the terms of the GNU General Public License as published by
>> +   the Free Software Foundation; either version 3 of the License, or
>> +   (at your option) any later version.
>> +
>> +   This program is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +   GNU General Public License for more details.
>> +
>> +   You should have received a copy of the GNU General Public License
>> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
>> +
>> +#include <pthread.h>
>> +#include <unistd.h>
>> +
>> +static void *
>> +fun (void *dummy)
>> +{
>> +  int i;
>> +
>> +  for (i = 0; i < 300; i++)
>> +    sleep (1);
>> +
>> +  return NULL;
>> +}
>> +
>> +int
>> +main (void)
>> +{
>> +  int i;
>> +  pthread_t thread;
>> +  pthread_create (&thread, NULL, fun, NULL);
>> +
>> +  for (i = 0; i < 300; i++)
>> +    sleep (1);
>> +
>> +  return 0;
>> +}
>> diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp
>> new file mode 100644
>> index 0000000000..89b38b1f6c
>> --- /dev/null
>> +++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp
>> @@ -0,0 +1,88 @@
>> +# Copyright (C) 2020 Free Software Foundation, Inc.
>> +
>> +# This program is free software; you can redistribute it and/or modify
>> +# it under the terms of the GNU General Public License as published by
>> +# the Free Software Foundation; either version 3 of the License, or
>> +# (at your option) any later version.
>> +#
>> +# This program is distributed in the hope that it will be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> +
>> +# This test-case tests that continuing an inferior that has been killed
>> +# using an external sigkill does not make gdb hang.
>> +
>> +standard_testfile
>> +
>> +if {[prepare_for_testing "failed to prepare" $testfile $srcfile \
>> +	 {pthreads}] == -1} {
>> +    return -1
>> +}
>> +
>> +set res [runto main no-message temporary]
>> +if { $res != 1 } {
>> +    return -1
>> +}
>> +
>> +set pid -1
>> +gdb_test_multiple "info inferior 1" "get inferior pid" {
>> +    -re -wrap "process (\[0-9\]*).*" {
>> +       set pid $expect_out(1,string)
>> +       pass $gdb_test_name
>> +    }
>> +}
> 
> This won't work with remote targets that don't support
> the process extensions, since on that case, you'll
> get "Remote target" instead of "process $PID".  See
> remote_target::pid_to_str.  Likewise probably other
> targets.  See gdb.base/killed-outside.exp.
> 
> 

Ack, updated accordingly.

>> +if { $pid == -1 } {
>> +    return -1
>> +}
>> +
>> +gdb_test_multiple "continue" "" {
>> +    -re "Continuing" {
>> +	pass $gdb_test_name
>> +    }
>> +}
>> +
>> +send_gdb "\003"
>> +
>> +gdb_test_multiple "" "get sigint" {
>> +    -re -wrap "received signal SIGINT, Interrupt\..*" {
>> +       pass $gdb_test_name
>> +   }
>> +}
>> +
> 
> 
> I don't think interrupting with Ctrl-C is really important,
> compared to e.g., running to a breakpoint.  I'd prefer to run
> to a breakpoint instead.  E.g., call a all_started(); function
> after the child thread is spawned and after a pthread_barrier_wait
> call, to make sure the child was scheduled.
> See e.g., gdb.threads/async.c.  Then all you need is to
> "runto all_started" instead of runto_main.
> 

Done.

>> +gdb_test_no_output "shell kill -s SIGKILL $pid" "shell kill -s SIGKILL pid"
> 
> This will always kill a process on the host gdb is running on, which
> of course does wrong the thing in cross scenarios.  So this should
> do instead:
> 
>     remote_exec target "kill -9 ${testpid}"
> 
> and sleeping a bit is a good idea to make sure the kill is actually
> scheduled and does its thing before gdb does the "continue".
> See gdb.base/killed-outside.exp.
> 
> 

Ack, done.

>> +
>> +set no_such_process_msg "Couldn't get registers: No such process\."
>> +set killed_msg "Program terminated with signal SIGKILL, Killed\."
>> +set no_longer_exists_msg "The program no longer exists\."
>> +set not_being_run_msg "The program is not being run\."
>> +
>> +gdb_test_multiple "continue" "prompt after first continue" {
>> +    -re "Continuing\.\r\n\r\n$killed_msg\r\n$no_longer_exists_msg\r\n$gdb_prompt $" {
>> +	pass $gdb_test_name
>> +	# Regular output, bug condition was not triggered, we're done.
>> +	return -1
>> +    }
>> +    -re "Continuing\.\r\n$no_such_process_msg\r\n$no_such_process_msg\r\n$gdb_prompt " {
>> +	pass $gdb_test_name
>> +	# Two times $no_such_process_msg.  The bug condition was triggered, go
>> +	# check for it.
>> +    }
>> +    -re "Continuing\.\r\n$no_such_process_msg\r\n$gdb_prompt $" {
>> +	pass $gdb_test_name
>> +	# One time $no_such_process_msg.  We're stuck here.  The bug condition
>> +	# was not triggered, but we're not getting correct gdb behaviour either:
>> +	# every subsequent continue produces one no_such_process_msg.  Give up.
>> +	return -1
> 
> I'm confused here -- the comment says we're not getting correct behavior,
> but this won't result in any FAIL?
> 

Um, yes. I made that decision because I was trying to trigger another
scenario. But agreed, it's debatable. Anyway, it also doesn't matter
anymore since rewriting the test to stop at a breakpoint eliminated the
need for this. The scenario I was trying to trigger now always reproduces.

>> +    }
>> +}
>> +
>> +gdb_test_multiple "" "messages" {
>> +    -re ".*$killed_msg.*$no_longer_exists_msg\r\n" {
>> +	pass $gdb_test_name
>> +	gdb_test "continue" $not_being_run_msg "second continue"
>> +    }
>> +}
> 
> It isn't obvious to me why put this one separately instead of
> nested within the pass case in the other gdb_test_multiple above.
> Is this also meant to run if the previous gdb_test_multiple fails
> due to an internal gdb_test_multiple case being hit,
> like the default gdb_prompt match?
> 

Fixed.

> 
>> diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
>> index e17ac0ef75..4cf2beca00 100644
>> --- a/gdb/testsuite/lib/gdb.exp
>> +++ b/gdb/testsuite/lib/gdb.exp
>> @@ -570,7 +570,7 @@ proc runto { function args } {
>>  	    }
>>  	    return 1
>>  	}
>> -	-re "Breakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" { 
>> +	-re "\[Bb\]reakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" {
>>  	    if { $print_pass } {
>>  		pass $test_name
>>  	    }
>> diff --git a/gdb/thread.c b/gdb/thread.c
>> index c6e3d356a5..d287bce45f 100644
>> --- a/gdb/thread.c
>> +++ b/gdb/thread.c
>> @@ -1488,8 +1488,16 @@ scoped_restore_current_thread::scoped_restore_current_thread ()
>>        else
>>  	frame = NULL;
>>  
>> -      m_selected_frame_id = get_frame_id (frame);
>> -      m_selected_frame_level = frame_relative_level (frame);
>> +      try
>> +	{
>> +	  m_selected_frame_id = get_frame_id (frame);
>> +	  m_selected_frame_level = frame_relative_level (frame);
>> +	}
>> +      catch (const gdb_exception &ex)
> 
> This silently swallows Ctrl-C/QUIT too.  That's usually not a good
> idea.  gdb_exception_error should be the default choice, unless you
> really want to handle Ctrl-C here.
> 

Fixed.

Thanks for the review, that was helpful.

Any more comments?

- Tom

[-- Attachment #2: 0001-gdb-Fix-hang-after-ext-sigkill.patch --]
[-- Type: text/x-patch, Size: 6699 bytes --]

[gdb] Fix hang after ext sigkill

Consider the test-case from this patch, compiled with pthread support:
...
$ gcc gdb/testsuite/gdb.threads/killed-outside.c -lpthread -g
...

After running to all_started, we can print pid:
...
$ gdb a.out -ex "b all_started" -ex run -ex "delete 1" -ex "p pid"
...
Reading symbols from a.out...
Breakpoint 1 at 0x40072b: file killed-outside.c, line 29.
Starting program: /data/gdb_versions/devel/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff77fc700 (LWP 3155)]

Thread 1 "a.out" hit Breakpoint 1, all_started () at killed-outside.c:29
29      }
$1 = 3151
(gdb)
...

If we then kill the inferior using an external SIGKILL:
...
(gdb) shell kill -9 3151
...
and subsequently continue:
...
(gdb) c
Continuing.
Couldn't get registers: No such process.
Couldn't get registers: No such process.
(gdb) Couldn't get registers: No such process.
(gdb) Couldn't get registers: No such process.
(gdb) Couldn't get registers: No such process.
<repeat>
...
gdb hangs repeating the same warning.  Typing control-C no longer helps,
and we have to kill gdb.

This is a regression since commit 873657b9e8 "Preserve selected thread in
all-stop w/ background execution".  The commit adds a
scoped_restore_current_thread typed variable restore_thread to
fetch_inferior_event, and the hang is caused by the constructor throwing an
exception.

Fix this by catching the exception in the constructor.

Build and reg-tested on x86_64-linux.

gdb/ChangeLog:

2020-02-24  Tom de Vries  <tdevries@suse.de>

	PR gdb/25471
	* thread.c
	(scoped_restore_current_thread::scoped_restore_current_thread): Catch
	exception in get_frame_id.

gdb/testsuite/ChangeLog:

2020-02-24  Tom de Vries  <tdevries@suse.de>

	PR gdb/25471
        * gdb.threads/killed-outside.c: New test.
        * gdb.threads/killed-outside.exp: New file.

---
 gdb/testsuite/gdb.threads/killed-outside.c   | 64 ++++++++++++++++++++++++++++
 gdb/testsuite/gdb.threads/killed-outside.exp | 57 +++++++++++++++++++++++++
 gdb/thread.c                                 | 12 +++++-
 3 files changed, 131 insertions(+), 2 deletions(-)

diff --git a/gdb/testsuite/gdb.threads/killed-outside.c b/gdb/testsuite/gdb.threads/killed-outside.c
new file mode 100644
index 0000000000..ac4cf0b07a
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/killed-outside.c
@@ -0,0 +1,64 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2020 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <pthread.h>
+#include <unistd.h>
+#include <stdlib.h>
+
+pid_t pid;
+
+static pthread_barrier_t threads_started_barrier;
+
+static void
+all_started (void)
+{
+}
+
+static void *
+fun (void *dummy)
+{
+  int i;
+
+  pthread_barrier_wait (&threads_started_barrier);
+
+  for (i = 0; i < 180; i++)
+    sleep (1);
+
+  pthread_exit (NULL);
+}
+
+int
+main (void)
+{
+  int i;
+  pthread_t thread;
+
+  pid = getpid ();
+
+  pthread_barrier_init (&threads_started_barrier, NULL, 2);
+
+  pthread_create (&thread, NULL, fun, NULL);
+
+  pthread_barrier_wait (&threads_started_barrier);
+
+  all_started ();
+
+  for (i = 0; i < 180; i++)
+    sleep (1);
+
+  exit (EXIT_SUCCESS);
+}
diff --git a/gdb/testsuite/gdb.threads/killed-outside.exp b/gdb/testsuite/gdb.threads/killed-outside.exp
new file mode 100644
index 0000000000..ff5a115728
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/killed-outside.exp
@@ -0,0 +1,57 @@
+# Copyright (C) 2020 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# This test-case tests that continuing an inferior that has been killed
+# using an external sigkill does not make gdb misbehave.
+
+standard_testfile
+
+if {[prepare_for_testing "failed to prepare" $testfile $srcfile \
+	 {pthreads debug}] == -1} {
+    return -1
+}
+
+if { ![runto "all_started"] } {
+    return -1
+}
+delete_breakpoints
+
+set testpid [get_valueof "" "pid" -1 "get pid of inferior"]
+if { $testpid == -1 } {
+    return -1
+}
+remote_exec target "kill -9 ${testpid}"
+
+# Give it some time to die.
+sleep 2
+
+set no_such_process_msg "Couldn't get registers: No such process\."
+set killed_msg "Program terminated with signal SIGKILL, Killed\."
+set no_longer_exists_msg "The program no longer exists\."
+set not_being_run_msg "The program is not being run\."
+
+gdb_test_multiple "continue" "prompt after first continue" {
+    -re "Continuing\.\r\n$no_such_process_msg\r\n$no_such_process_msg\r\n$gdb_prompt " {
+	pass $gdb_test_name
+	# Two times $no_such_process_msg.  The bug condition was triggered, go
+	# check for it.
+	gdb_test_multiple "" "messages" {
+	    -re ".*$killed_msg.*$no_longer_exists_msg\r\n" {
+		pass $gdb_test_name
+		gdb_test "continue" $not_being_run_msg "second continue"
+	    }
+	}
+    }
+}
diff --git a/gdb/thread.c b/gdb/thread.c
index c6e3d356a5..03805bd256 100644
--- a/gdb/thread.c
+++ b/gdb/thread.c
@@ -1488,8 +1488,16 @@ scoped_restore_current_thread::scoped_restore_current_thread ()
       else
 	frame = NULL;
 
-      m_selected_frame_id = get_frame_id (frame);
-      m_selected_frame_level = frame_relative_level (frame);
+      try
+	{
+	  m_selected_frame_id = get_frame_id (frame);
+	  m_selected_frame_level = frame_relative_level (frame);
+	}
+      catch (const gdb_exception_error &ex)
+	{
+	  m_selected_frame_id = null_frame_id;
+	  m_selected_frame_level = -1;
+	}
 
       tp->incref ();
       m_thread = tp;

  reply	other threads:[~2020-04-21 12:38 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-24 20:14 Tom de Vries
2020-03-09 12:52 ` [PING][PATCH][gdb] " Tom de Vries
2020-03-23 19:16   ` [PING^2][PATCH][gdb] " Tom de Vries
2020-03-24 15:35 ` [PATCH][gdb] " Simon Marchi
2020-03-25 10:29   ` Tom de Vries
2020-03-25 14:44     ` Simon Marchi
2020-03-25 15:51       ` Tom de Vries
2020-03-25 15:57         ` Simon Marchi
2020-04-16 13:28         ` Pedro Alves
2020-04-21 12:38           ` Tom de Vries [this message]
2020-04-21 13:42             ` Pedro Alves
2020-09-25  9:39             ` Tom de Vries

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e941745e-2662-7e7e-77ae-d828d5c649e4@suse.de \
    --to=tdevries@suse.de \
    --cc=gdb-patches@sourceware.org \
    --cc=palves@redhat.com \
    --cc=simark@simark.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox