Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed
From: Tom de Vries <tdevries@suse.de>
To: Simon Marchi <simark@simark.ca>, gdb-patches@sourceware.org
Cc: Pedro Alves <palves@redhat.com>
Subject: Re: [PATCH][gdb] Fix hang after ext sigkill
Date: Wed, 25 Mar 2020 11:29:54 +0100	[thread overview]
Message-ID: <831161db-85a9-74da-1833-7bab3cc41d15@suse.de> (raw)
In-Reply-To: <e04d1b2c-1db3-57d7-f80f-493a7d0bc2f3@simark.ca>

[-- Attachment #1: Type: text/plain, Size: 4477 bytes --]

On 24-03-2020 16:35, Simon Marchi wrote:
> Hi Tom,
> 
> The test fails for me more often than not.  I've attached a gdb.log showing one such
> failure.  There seems to be a problem matching the output of the last "continue".
> 

Hi Simon,

I've managed to reproduce that, by running the test-case in parallel
with stress -c 5.

> This is what I see when I reproduce the case by hand:
> 
> (gdb) c
> Continuing.
> Couldn't get registers: No such process.
> Couldn't get registers: No such process.
> (gdb) [Thread 0x7ffff7d99700 (LWP 514079) exited]
> 
> Program terminated with signal SIGKILL, Killed.
> The program no longer exists.
> 

I've modified the test-case to check for the amount of "Couldn't get
registers: No such process." between the continue and the following gdb
prompt, and handle that appropriately. Hopefully this will fix the FAILs
you're seeing.

> I didn't really understand why we saw a prompt coming back before the other messages,
> so I looked into it a bit and this is what I think happens:
> 
> 1. While we are stopped at the prompt, the linux-nat event handler is unregistered from the event loop
> 2. Inferior gets SIGKILL'ed, so GDB gets SIGCHLD'ed, that posts an event to the event pipe, but since it's
>    not registered in the event loop, nothing happens
> 3. User does continue, the linux-nat event handler gets registered with the event loop.  Then, the continue
>    fails (No such process), which brings us back at the prompt.  However, the linux-nat event handler stays
>    registered with the event loop.
> 4. When we come back to the event loop, we process the event for the SIGKILL, which makes GDB print the
>    thread exit message and "Program terminated" message.
> 
> Normally, after the "continue" fails, I don't think we would want to leave the linux-nat
> handler registered with the event loop: if it was not registered before, why should it be
> registered after?  However, if it wasn't left there, we wouldn't see the messages saying
> the program has terminated, so that wouldn't be good either.
> 
> Maybe there's a better way to handle it?
> 

Thanks for the investigation. Unfortunately I'm not able to comment on this.

> However, I still think it would be a good idea to merge a patch like yours, it's already a
> step forward (especially since it fixes a regression).
> 
>> diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
>> index 81518b9646..745694df2d 100644
>> --- a/gdb/testsuite/lib/gdb.exp
>> +++ b/gdb/testsuite/lib/gdb.exp
>> @@ -571,7 +571,7 @@ proc runto { function args } {
>>  	    }
>>  	    return 1
>>  	}
>> -	-re "Breakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" { 
>> +	-re "\[Bb\]reakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" {
>>  	    if { $print_pass } {
>>  		pass $test_name
>>  	    }
>> diff --git a/gdb/thread.c b/gdb/thread.c
>> index 54b59e2244..9876ca3c76 100644
>> --- a/gdb/thread.c
>> +++ b/gdb/thread.c
>> @@ -1444,6 +1444,8 @@ scoped_restore_current_thread::restore ()
>>  
>>  scoped_restore_current_thread::~scoped_restore_current_thread ()
>>  {
>> +  if (m_inf == NULL)
>> +    return;
>>    if (!m_dont_restore)
>>      {
>>        try
>> @@ -1488,7 +1490,17 @@ scoped_restore_current_thread::scoped_restore_current_thread ()
>>        else
>>  	frame = NULL;
>>  
>> -      m_selected_frame_id = get_frame_id (frame);
>> +      try
>> +       {
>> +	 m_selected_frame_id = get_frame_id (frame);
>> +       }
>> +      catch (const gdb_exception &ex)
>> +       {
>> +	 m_inf = NULL;
>> +	 m_selected_frame_id = null_frame_id;
>> +	 m_selected_frame_level = -1;
>> +	 return;
>> +       }
> 
> The indentation is a bit off here.
> 
> Instead of clearing everything, I think we should just set m_selected_frame_id to
> null_frame_id and m_selected_frame_level to -1.  m_inf and m_thread can still be
> set.  This way, the right inferior will be restored, at least, I think this is
> desirable.  scoped_restore_current_thread::restore handles well the case where the
> thread to restore has exited.  The thread_info object is refcounted for this exact
> use case, where the thread would get deleted while
> 
> In other words:
> 
>       try
> 	{
> 	  m_selected_frame_id = get_frame_id (frame);
> 	  m_selected_frame_level = frame_relative_level (frame);
> 	}
>       catch (const gdb_exception &ex)
> 	{
> 	  m_selected_frame_id = null_frame_id;
> 	  m_selected_frame_level = -1;
> 	}

Yep, that works.

Here's the updated patch.

Thanks,
- Tom

[-- Attachment #2: 0001-gdb-Fix-hang-after-ext-sigkill.patch --]
[-- Type: text/x-patch, Size: 7593 bytes --]

[gdb] Fix hang after ext sigkill

Consider the test-case from this patch, compiled with pthread support:
...
$ gcc src/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c -lpthread
...

After running, the program sleeps:
...
$ gdb a.out
Reading symbols from a.out...
(gdb) r
Starting program: /data/gdb_versions/devel/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff77fe700 (LWP 22604)]
...

Until we interrupt it with a control-C:
...
^C
Thread 1 "a.out" received signal SIGINT, Interrupt.
0x00007ffff78c50f0 in nanosleep () from /lib64/libc.so.6
(gdb)
...

If we then kill the inferior using an external SIGKILL:
...
(gdb) shell killall -s SIGKILL a.out
...
and subsequently continue:
...
(gdb) c
Continuing.
Couldn't get registers: No such process.
Couldn't get registers: No such process.
(gdb) Couldn't get registers: No such process.
(gdb) Couldn't get registers: No such process.
(gdb) Couldn't get registers: No such process.
<repeat>
...
gdb hangs repeating the same warning.  Typing control-C no longer helps,
and we have to kill gdb.

This is a regression since commit 873657b9e8 "Preserve selected thread in
all-stop w/ background execution".  The commit adds a
scoped_restore_current_thread typed variable restore_thread to
fetch_inferior_event, and the hang is caused by the constructor throwing an
exception.

Fix this by catching the exception in the constructor.

Build and reg-tested on x86_64-linux.

gdb/ChangeLog:

2020-02-24  Tom de Vries  <tdevries@suse.de>

	PR gdb/25471
	* thread.c
	(scoped_restore_current_thread::scoped_restore_current_thread): Catch
	exception in get_frame_id.

gdb/testsuite/ChangeLog:

2020-02-24  Tom de Vries  <tdevries@suse.de>

	PR gdb/25471
	* gdb.threads/hang-after-ext-sigkill.c: New test.
	* gdb.threads/hang-after-ext-sigkill.exp: New file.
	* lib/gdb.exp (runto): Handle "Temporary breakpoint" string.

---
 gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c | 40 ++++++++++
 .../gdb.threads/hang-after-ext-sigkill.exp         | 85 ++++++++++++++++++++++
 gdb/testsuite/lib/gdb.exp                          |  2 +-
 gdb/thread.c                                       | 12 ++-
 4 files changed, 136 insertions(+), 3 deletions(-)

diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c
new file mode 100644
index 0000000000..bfce6c3085
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c
@@ -0,0 +1,40 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2020 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <pthread.h>
+#include <unistd.h>
+
+static void *
+fun (void *dummy)
+{
+  while (1)
+    sleep (1);
+
+  return NULL;
+}
+
+int
+main (void)
+{
+  pthread_t thread;
+  pthread_create (&thread, NULL, fun, NULL);
+
+  while (1)
+    sleep (1);
+
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp
new file mode 100644
index 0000000000..37577592cd
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp
@@ -0,0 +1,85 @@
+# Copyright (C) 2020 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+standard_testfile
+
+if {[prepare_for_testing "failed to prepare" $testfile $srcfile \
+	 {pthreads}] == -1} {
+    return -1
+}
+
+set res [runto main no-message temporary]
+if { $res != 1 } {
+    return -1
+}
+
+set pid -1
+gdb_test_multiple "info inferior 1" "get inferior pid" {
+    -re -wrap "process (\[0-9\]*).*" {
+       set pid $expect_out(1,string)
+       pass $gdb_test_name
+    }
+}
+if { $pid == -1 } {
+    return -1
+}
+
+gdb_test_multiple "continue" "" {
+    -re "Continuing" {
+	pass $gdb_test_name
+    }
+}
+
+send_gdb "\003"
+
+gdb_test_multiple "" "get sigint" {
+    -re -wrap "received signal SIGINT, Interrupt\..*" {
+       pass $gdb_test_name
+   }
+}
+
+gdb_test_no_output "shell kill -s SIGKILL $pid" "shell kill -s SIGKILL pid"
+
+set no_such_process_msg "Couldn't get registers: No such process\."
+set killed_msg "Program terminated with signal SIGKILL, Killed\."
+set no_longer_exists_msg "The program no longer exists\."
+set not_being_run_msg "The program is not being run\."
+
+gdb_test_multiple "continue" "prompt after first continue" {
+    -re "Continuing\.\r\n\r\n$killed_msg\r\n$no_longer_exists_msg\r\n$gdb_prompt $" {
+	pass $gdb_test_name
+	# Regular output, bug condition was not triggered, we're done.
+	return -1
+    }
+    -re "Continuing\.\r\n$no_such_process_msg\r\n$no_such_process_msg\r\n$gdb_prompt " {
+	pass $gdb_test_name
+	# Two times $no_such_process_msg.  The bug condition was triggered, go
+	# check for it.
+    }
+    -re "Continuing\.\r\n$no_such_process_msg\r\n$gdb_prompt $" {
+	pass $gdb_test_name
+	# One time $no_such_process_msg.  We're stuck here.  The bug condition
+	# was not triggered, but we're not getting correct gdb behaviour either:
+	# every subsequent continue produces one no_such_process_msg.  Give up.
+	return -1
+    }
+}
+
+gdb_test_multiple "" "messages" {
+    -re ".*$killed_msg.*$no_longer_exists_msg\r\n" {
+	pass $gdb_test_name
+	gdb_test "continue" $not_being_run_msg "second continue"
+    }
+}
diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
index e17ac0ef75..4cf2beca00 100644
--- a/gdb/testsuite/lib/gdb.exp
+++ b/gdb/testsuite/lib/gdb.exp
@@ -570,7 +570,7 @@ proc runto { function args } {
 	    }
 	    return 1
 	}
-	-re "Breakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" { 
+	-re "\[Bb\]reakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" {
 	    if { $print_pass } {
 		pass $test_name
 	    }
diff --git a/gdb/thread.c b/gdb/thread.c
index c6e3d356a5..d287bce45f 100644
--- a/gdb/thread.c
+++ b/gdb/thread.c
@@ -1488,8 +1488,16 @@ scoped_restore_current_thread::scoped_restore_current_thread ()
       else
 	frame = NULL;
 
-      m_selected_frame_id = get_frame_id (frame);
-      m_selected_frame_level = frame_relative_level (frame);
+      try
+	{
+	  m_selected_frame_id = get_frame_id (frame);
+	  m_selected_frame_level = frame_relative_level (frame);
+	}
+      catch (const gdb_exception &ex)
+	{
+	  m_selected_frame_id = null_frame_id;
+	  m_selected_frame_level = -1;
+	}
 
       tp->incref ();
       m_thread = tp;

  reply	other threads:[~2020-03-25 10:29 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-24 20:14 Tom de Vries
2020-03-09 12:52 ` [PING][PATCH][gdb] " Tom de Vries
2020-03-23 19:16   ` [PING^2][PATCH][gdb] " Tom de Vries
2020-03-24 15:35 ` [PATCH][gdb] " Simon Marchi
2020-03-25 10:29   ` Tom de Vries [this message]
2020-03-25 14:44     ` Simon Marchi
2020-03-25 15:51       ` Tom de Vries
2020-03-25 15:57         ` Simon Marchi
2020-04-16 13:28         ` Pedro Alves
2020-04-21 12:38           ` Tom de Vries
2020-04-21 13:42             ` Pedro Alves
2020-09-25  9:39             ` Tom de Vries

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=831161db-85a9-74da-1833-7bab3cc41d15@suse.de \
    --to=tdevries@suse.de \
    --cc=gdb-patches@sourceware.org \
    --cc=palves@redhat.com \
    --cc=simark@simark.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox