Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed
From: Pedro Alves <palves@redhat.com>
To: Tom de Vries <tdevries@suse.de>, Simon Marchi <simark@simark.ca>,
	gdb-patches@sourceware.org
Subject: Re: [PATCH][gdb] Fix hang after ext sigkill
Date: Thu, 16 Apr 2020 14:28:40 +0100	[thread overview]
Message-ID: <8094f685-b037-3e65-e8c3-b35c21c79f96@redhat.com> (raw)
In-Reply-To: <4fc1ee75-397b-8c05-14a9-fcedf584e8b8@suse.de>

Hi,

Sorry for the delay, and thanks much for working on this.

On 3/25/20 3:51 PM, Tom de Vries wrote:
> On 25-03-2020 15:44, Simon Marchi wrote:
>> On 2020-03-25 6:29 a.m., Tom de Vries wrote:
>>> Here's the updated patch.
>> Thanks.  Some comments about the test:
>>
>> - Please add a comment at the top to describe briefly what this is testing.
>> - Please replace the infinite loops with bounded ones (e.g. for (i = 0; i < 300; i++)),
>>   so that the test program eventually exits if something goes wrong and it is allowed to run
>>   freely.
> Done.
> 0001-gdb-Fix-hang-after-ext-sigkill.patch
> 
> [gdb] Fix hang after ext sigkill
> 
> Consider the test-case from this patch, compiled with pthread support:
> ...
> $ gcc src/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c -lpthread
> ...
> 
> After running, the program sleeps:
> ...
> $ gdb a.out
> Reading symbols from a.out...
> (gdb) r
> Starting program: /data/gdb_versions/devel/a.out
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> [New Thread 0x7ffff77fe700 (LWP 22604)]
> ...
> 
> Until we interrupt it with a control-C:
> ...
> ^C
> Thread 1 "a.out" received signal SIGINT, Interrupt.
> 0x00007ffff78c50f0 in nanosleep () from /lib64/libc.so.6
> (gdb)
> ...
> 
> If we then kill the inferior using an external SIGKILL:
> ...
> (gdb) shell killall -s SIGKILL a.out
> ...
> and subsequently continue:
> ...
> (gdb) c
> Continuing.
> Couldn't get registers: No such process.
> Couldn't get registers: No such process.
> (gdb) Couldn't get registers: No such process.
> (gdb) Couldn't get registers: No such process.
> (gdb) Couldn't get registers: No such process.
> <repeat>
> ...
> gdb hangs repeating the same warning.  Typing control-C no longer helps,
> and we have to kill gdb.
> 
> This is a regression since commit 873657b9e8 "Preserve selected thread in
> all-stop w/ background execution".  The commit adds a
> scoped_restore_current_thread typed variable restore_thread to
> fetch_inferior_event, and the hang is caused by the constructor throwing an
> exception.
> 
> Fix this by catching the exception in the constructor.
> 
> Build and reg-tested on x86_64-linux.
> 
> gdb/ChangeLog:
> 
> 2020-02-24  Tom de Vries  <tdevries@suse.de>
> 
> 	PR gdb/25471
> 	* thread.c
> 	(scoped_restore_current_thread::scoped_restore_current_thread): Catch
> 	exception in get_frame_id.
> 
> gdb/testsuite/ChangeLog:
> 
> 2020-02-24  Tom de Vries  <tdevries@suse.de>
> 
> 	PR gdb/25471
> 	* gdb.threads/hang-after-ext-sigkill.c: New test.
> 	* gdb.threads/hang-after-ext-sigkill.exp: New file.

"hang-after-ext-sigkill" is named in terms of how the
bug manifested (a hang), but once the bug is fixed, it won't
be obvious to remember to look for "hang" when someone goes
look for a testcase related to a process being killed outside of
gdb's control.  Plus, then testcase may be extended in the future
for related bugs that do not cause a hang.

There's a gdb.base/killed-outside.exp testcase already exactly for
this sort of issue -- the testcase does the same thing with killing
with SIGKILL from outside, and then making sure that GDB
behaves.  I'd rather this new testcase was given the same or
a similar name, so that e.g. 'make check TESTS="*/*killed-outside*.exp"'
runs it too.  Or maybe merge the testcases, though it's useful
to run the existing one on non-threaded environments too.
But I'm not sure this one needs to be threaded at all.  Won't
we see the failure to read registers with a single-threaded program
too?

> 	* lib/gdb.exp (runto): Handle "Temporary breakpoint" string.
> 
> ---
>  gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c | 43 +++++++++++
>  .../gdb.threads/hang-after-ext-sigkill.exp         | 88 ++++++++++++++++++++++
>  gdb/testsuite/lib/gdb.exp                          |  2 +-
>  gdb/thread.c                                       | 12 ++-
>  4 files changed, 142 insertions(+), 3 deletions(-)
> 
> diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c
> new file mode 100644
> index 0000000000..b93d6c644a
> --- /dev/null
> +++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c
> @@ -0,0 +1,43 @@
> +/* This testcase is part of GDB, the GNU debugger.
> +
> +   Copyright 2020 Free Software Foundation, Inc.
> +
> +   This program is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   This program is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
> +
> +#include <pthread.h>
> +#include <unistd.h>
> +
> +static void *
> +fun (void *dummy)
> +{
> +  int i;
> +
> +  for (i = 0; i < 300; i++)
> +    sleep (1);
> +
> +  return NULL;
> +}
> +
> +int
> +main (void)
> +{
> +  int i;
> +  pthread_t thread;
> +  pthread_create (&thread, NULL, fun, NULL);
> +
> +  for (i = 0; i < 300; i++)
> +    sleep (1);
> +
> +  return 0;
> +}
> diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp
> new file mode 100644
> index 0000000000..89b38b1f6c
> --- /dev/null
> +++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp
> @@ -0,0 +1,88 @@
> +# Copyright (C) 2020 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +# This test-case tests that continuing an inferior that has been killed
> +# using an external sigkill does not make gdb hang.
> +
> +standard_testfile
> +
> +if {[prepare_for_testing "failed to prepare" $testfile $srcfile \
> +	 {pthreads}] == -1} {
> +    return -1
> +}
> +
> +set res [runto main no-message temporary]
> +if { $res != 1 } {
> +    return -1
> +}
> +
> +set pid -1
> +gdb_test_multiple "info inferior 1" "get inferior pid" {
> +    -re -wrap "process (\[0-9\]*).*" {
> +       set pid $expect_out(1,string)
> +       pass $gdb_test_name
> +    }
> +}

This won't work with remote targets that don't support
the process extensions, since on that case, you'll
get "Remote target" instead of "process $PID".  See
remote_target::pid_to_str.  Likewise probably other
targets.  See gdb.base/killed-outside.exp.


> +if { $pid == -1 } {
> +    return -1
> +}
> +
> +gdb_test_multiple "continue" "" {
> +    -re "Continuing" {
> +	pass $gdb_test_name
> +    }
> +}
> +
> +send_gdb "\003"
> +
> +gdb_test_multiple "" "get sigint" {
> +    -re -wrap "received signal SIGINT, Interrupt\..*" {
> +       pass $gdb_test_name
> +   }
> +}
> +


I don't think interrupting with Ctrl-C is really important,
compared to e.g., running to a breakpoint.  I'd prefer to run
to a breakpoint instead.  E.g., call a all_started(); function
after the child thread is spawned and after a pthread_barrier_wait
call, to make sure the child was scheduled.
See e.g., gdb.threads/async.c.  Then all you need is to
"runto all_started" instead of runto_main.

> +gdb_test_no_output "shell kill -s SIGKILL $pid" "shell kill -s SIGKILL pid"

This will always kill a process on the host gdb is running on, which
of course does wrong the thing in cross scenarios.  So this should
do instead:

    remote_exec target "kill -9 ${testpid}"

and sleeping a bit is a good idea to make sure the kill is actually
scheduled and does its thing before gdb does the "continue".
See gdb.base/killed-outside.exp.


> +
> +set no_such_process_msg "Couldn't get registers: No such process\."
> +set killed_msg "Program terminated with signal SIGKILL, Killed\."
> +set no_longer_exists_msg "The program no longer exists\."
> +set not_being_run_msg "The program is not being run\."
> +
> +gdb_test_multiple "continue" "prompt after first continue" {
> +    -re "Continuing\.\r\n\r\n$killed_msg\r\n$no_longer_exists_msg\r\n$gdb_prompt $" {
> +	pass $gdb_test_name
> +	# Regular output, bug condition was not triggered, we're done.
> +	return -1
> +    }
> +    -re "Continuing\.\r\n$no_such_process_msg\r\n$no_such_process_msg\r\n$gdb_prompt " {
> +	pass $gdb_test_name
> +	# Two times $no_such_process_msg.  The bug condition was triggered, go
> +	# check for it.
> +    }
> +    -re "Continuing\.\r\n$no_such_process_msg\r\n$gdb_prompt $" {
> +	pass $gdb_test_name
> +	# One time $no_such_process_msg.  We're stuck here.  The bug condition
> +	# was not triggered, but we're not getting correct gdb behaviour either:
> +	# every subsequent continue produces one no_such_process_msg.  Give up.
> +	return -1

I'm confused here -- the comment says we're not getting correct behavior,
but this won't result in any FAIL?

> +    }
> +}
> +
> +gdb_test_multiple "" "messages" {
> +    -re ".*$killed_msg.*$no_longer_exists_msg\r\n" {
> +	pass $gdb_test_name
> +	gdb_test "continue" $not_being_run_msg "second continue"
> +    }
> +}

It isn't obvious to me why put this one separately instead of
nested within the pass case in the other gdb_test_multiple above.
Is this also meant to run if the previous gdb_test_multiple fails
due to an internal gdb_test_multiple case being hit,
like the default gdb_prompt match?


> diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp
> index e17ac0ef75..4cf2beca00 100644
> --- a/gdb/testsuite/lib/gdb.exp
> +++ b/gdb/testsuite/lib/gdb.exp
> @@ -570,7 +570,7 @@ proc runto { function args } {
>  	    }
>  	    return 1
>  	}
> -	-re "Breakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" { 
> +	-re "\[Bb\]reakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" {
>  	    if { $print_pass } {
>  		pass $test_name
>  	    }
> diff --git a/gdb/thread.c b/gdb/thread.c
> index c6e3d356a5..d287bce45f 100644
> --- a/gdb/thread.c
> +++ b/gdb/thread.c
> @@ -1488,8 +1488,16 @@ scoped_restore_current_thread::scoped_restore_current_thread ()
>        else
>  	frame = NULL;
>  
> -      m_selected_frame_id = get_frame_id (frame);
> -      m_selected_frame_level = frame_relative_level (frame);
> +      try
> +	{
> +	  m_selected_frame_id = get_frame_id (frame);
> +	  m_selected_frame_level = frame_relative_level (frame);
> +	}
> +      catch (const gdb_exception &ex)

This silently swallows Ctrl-C/QUIT too.  That's usually not a good
idea.  gdb_exception_error should be the default choice, unless you
really want to handle Ctrl-C here.

Thanks,
Pedro Alves



  parent reply	other threads:[~2020-04-16 13:28 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-24 20:14 Tom de Vries
2020-03-09 12:52 ` [PING][PATCH][gdb] " Tom de Vries
2020-03-23 19:16   ` [PING^2][PATCH][gdb] " Tom de Vries
2020-03-24 15:35 ` [PATCH][gdb] " Simon Marchi
2020-03-25 10:29   ` Tom de Vries
2020-03-25 14:44     ` Simon Marchi
2020-03-25 15:51       ` Tom de Vries
2020-03-25 15:57         ` Simon Marchi
2020-04-16 13:28         ` Pedro Alves [this message]
2020-04-21 12:38           ` Tom de Vries
2020-04-21 13:42             ` Pedro Alves
2020-09-25  9:39             ` Tom de Vries

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8094f685-b037-3e65-e8c3-b35c21c79f96@redhat.com \
    --to=palves@redhat.com \
    --cc=gdb-patches@sourceware.org \
    --cc=simark@simark.ca \
    --cc=tdevries@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox