On 16-04-2020 15:28, Pedro Alves wrote: > Hi, > > Sorry for the delay, and thanks much for working on this. > > On 3/25/20 3:51 PM, Tom de Vries wrote: >> On 25-03-2020 15:44, Simon Marchi wrote: >>> On 2020-03-25 6:29 a.m., Tom de Vries wrote: >>>> Here's the updated patch. >>> Thanks. Some comments about the test: >>> >>> - Please add a comment at the top to describe briefly what this is testing. >>> - Please replace the infinite loops with bounded ones (e.g. for (i = 0; i < 300; i++)), >>> so that the test program eventually exits if something goes wrong and it is allowed to run >>> freely. >> Done. >> 0001-gdb-Fix-hang-after-ext-sigkill.patch >> >> [gdb] Fix hang after ext sigkill >> >> Consider the test-case from this patch, compiled with pthread support: >> ... >> $ gcc src/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c -lpthread >> ... >> >> After running, the program sleeps: >> ... >> $ gdb a.out >> Reading symbols from a.out... >> (gdb) r >> Starting program: /data/gdb_versions/devel/a.out >> [Thread debugging using libthread_db enabled] >> Using host libthread_db library "/lib64/libthread_db.so.1". >> [New Thread 0x7ffff77fe700 (LWP 22604)] >> ... >> >> Until we interrupt it with a control-C: >> ... >> ^C >> Thread 1 "a.out" received signal SIGINT, Interrupt. >> 0x00007ffff78c50f0 in nanosleep () from /lib64/libc.so.6 >> (gdb) >> ... >> >> If we then kill the inferior using an external SIGKILL: >> ... >> (gdb) shell killall -s SIGKILL a.out >> ... >> and subsequently continue: >> ... >> (gdb) c >> Continuing. >> Couldn't get registers: No such process. >> Couldn't get registers: No such process. >> (gdb) Couldn't get registers: No such process. >> (gdb) Couldn't get registers: No such process. >> (gdb) Couldn't get registers: No such process. >> >> ... >> gdb hangs repeating the same warning. Typing control-C no longer helps, >> and we have to kill gdb. >> >> This is a regression since commit 873657b9e8 "Preserve selected thread in >> all-stop w/ background execution". The commit adds a >> scoped_restore_current_thread typed variable restore_thread to >> fetch_inferior_event, and the hang is caused by the constructor throwing an >> exception. >> >> Fix this by catching the exception in the constructor. >> >> Build and reg-tested on x86_64-linux. >> >> gdb/ChangeLog: >> >> 2020-02-24 Tom de Vries >> >> PR gdb/25471 >> * thread.c >> (scoped_restore_current_thread::scoped_restore_current_thread): Catch >> exception in get_frame_id. >> >> gdb/testsuite/ChangeLog: >> >> 2020-02-24 Tom de Vries >> >> PR gdb/25471 >> * gdb.threads/hang-after-ext-sigkill.c: New test. >> * gdb.threads/hang-after-ext-sigkill.exp: New file. > > "hang-after-ext-sigkill" is named in terms of how the > bug manifested (a hang), but once the bug is fixed, it won't > be obvious to remember to look for "hang" when someone goes > look for a testcase related to a process being killed outside of > gdb's control. Plus, then testcase may be extended in the future > for related bugs that do not cause a hang. > > There's a gdb.base/killed-outside.exp testcase already exactly for > this sort of issue -- the testcase does the same thing with killing > with SIGKILL from outside, and then making sure that GDB > behaves. I'd rather this new testcase was given the same or > a similar name, so that e.g. 'make check TESTS="*/*killed-outside*.exp"' > runs it too. Ack, test-case renamed to gdb.threads/killed-outside.{exp,c}. > Or maybe merge the testcases, though it's useful > to run the existing one on non-threaded environments too. > But I'm not sure this one needs to be threaded at all. Won't > we see the failure to read registers with a single-threaded program > too? > I was not able to reproduce the hang with a single-threaded program. >> * lib/gdb.exp (runto): Handle "Temporary breakpoint" string. >> >> --- >> gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c | 43 +++++++++++ >> .../gdb.threads/hang-after-ext-sigkill.exp | 88 ++++++++++++++++++++++ >> gdb/testsuite/lib/gdb.exp | 2 +- >> gdb/thread.c | 12 ++- >> 4 files changed, 142 insertions(+), 3 deletions(-) >> >> diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c >> new file mode 100644 >> index 0000000000..b93d6c644a >> --- /dev/null >> +++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c >> @@ -0,0 +1,43 @@ >> +/* This testcase is part of GDB, the GNU debugger. >> + >> + Copyright 2020 Free Software Foundation, Inc. >> + >> + This program is free software; you can redistribute it and/or modify >> + it under the terms of the GNU General Public License as published by >> + the Free Software Foundation; either version 3 of the License, or >> + (at your option) any later version. >> + >> + This program is distributed in the hope that it will be useful, >> + but WITHOUT ANY WARRANTY; without even the implied warranty of >> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> + GNU General Public License for more details. >> + >> + You should have received a copy of the GNU General Public License >> + along with this program. If not, see . */ >> + >> +#include >> +#include >> + >> +static void * >> +fun (void *dummy) >> +{ >> + int i; >> + >> + for (i = 0; i < 300; i++) >> + sleep (1); >> + >> + return NULL; >> +} >> + >> +int >> +main (void) >> +{ >> + int i; >> + pthread_t thread; >> + pthread_create (&thread, NULL, fun, NULL); >> + >> + for (i = 0; i < 300; i++) >> + sleep (1); >> + >> + return 0; >> +} >> diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp >> new file mode 100644 >> index 0000000000..89b38b1f6c >> --- /dev/null >> +++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp >> @@ -0,0 +1,88 @@ >> +# Copyright (C) 2020 Free Software Foundation, Inc. >> + >> +# This program is free software; you can redistribute it and/or modify >> +# it under the terms of the GNU General Public License as published by >> +# the Free Software Foundation; either version 3 of the License, or >> +# (at your option) any later version. >> +# >> +# This program is distributed in the hope that it will be useful, >> +# but WITHOUT ANY WARRANTY; without even the implied warranty of >> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> +# GNU General Public License for more details. >> +# >> +# You should have received a copy of the GNU General Public License >> +# along with this program. If not, see . >> + >> +# This test-case tests that continuing an inferior that has been killed >> +# using an external sigkill does not make gdb hang. >> + >> +standard_testfile >> + >> +if {[prepare_for_testing "failed to prepare" $testfile $srcfile \ >> + {pthreads}] == -1} { >> + return -1 >> +} >> + >> +set res [runto main no-message temporary] >> +if { $res != 1 } { >> + return -1 >> +} >> + >> +set pid -1 >> +gdb_test_multiple "info inferior 1" "get inferior pid" { >> + -re -wrap "process (\[0-9\]*).*" { >> + set pid $expect_out(1,string) >> + pass $gdb_test_name >> + } >> +} > > This won't work with remote targets that don't support > the process extensions, since on that case, you'll > get "Remote target" instead of "process $PID". See > remote_target::pid_to_str. Likewise probably other > targets. See gdb.base/killed-outside.exp. > > Ack, updated accordingly. >> +if { $pid == -1 } { >> + return -1 >> +} >> + >> +gdb_test_multiple "continue" "" { >> + -re "Continuing" { >> + pass $gdb_test_name >> + } >> +} >> + >> +send_gdb "\003" >> + >> +gdb_test_multiple "" "get sigint" { >> + -re -wrap "received signal SIGINT, Interrupt\..*" { >> + pass $gdb_test_name >> + } >> +} >> + > > > I don't think interrupting with Ctrl-C is really important, > compared to e.g., running to a breakpoint. I'd prefer to run > to a breakpoint instead. E.g., call a all_started(); function > after the child thread is spawned and after a pthread_barrier_wait > call, to make sure the child was scheduled. > See e.g., gdb.threads/async.c. Then all you need is to > "runto all_started" instead of runto_main. > Done. >> +gdb_test_no_output "shell kill -s SIGKILL $pid" "shell kill -s SIGKILL pid" > > This will always kill a process on the host gdb is running on, which > of course does wrong the thing in cross scenarios. So this should > do instead: > > remote_exec target "kill -9 ${testpid}" > > and sleeping a bit is a good idea to make sure the kill is actually > scheduled and does its thing before gdb does the "continue". > See gdb.base/killed-outside.exp. > > Ack, done. >> + >> +set no_such_process_msg "Couldn't get registers: No such process\." >> +set killed_msg "Program terminated with signal SIGKILL, Killed\." >> +set no_longer_exists_msg "The program no longer exists\." >> +set not_being_run_msg "The program is not being run\." >> + >> +gdb_test_multiple "continue" "prompt after first continue" { >> + -re "Continuing\.\r\n\r\n$killed_msg\r\n$no_longer_exists_msg\r\n$gdb_prompt $" { >> + pass $gdb_test_name >> + # Regular output, bug condition was not triggered, we're done. >> + return -1 >> + } >> + -re "Continuing\.\r\n$no_such_process_msg\r\n$no_such_process_msg\r\n$gdb_prompt " { >> + pass $gdb_test_name >> + # Two times $no_such_process_msg. The bug condition was triggered, go >> + # check for it. >> + } >> + -re "Continuing\.\r\n$no_such_process_msg\r\n$gdb_prompt $" { >> + pass $gdb_test_name >> + # One time $no_such_process_msg. We're stuck here. The bug condition >> + # was not triggered, but we're not getting correct gdb behaviour either: >> + # every subsequent continue produces one no_such_process_msg. Give up. >> + return -1 > > I'm confused here -- the comment says we're not getting correct behavior, > but this won't result in any FAIL? > Um, yes. I made that decision because I was trying to trigger another scenario. But agreed, it's debatable. Anyway, it also doesn't matter anymore since rewriting the test to stop at a breakpoint eliminated the need for this. The scenario I was trying to trigger now always reproduces. >> + } >> +} >> + >> +gdb_test_multiple "" "messages" { >> + -re ".*$killed_msg.*$no_longer_exists_msg\r\n" { >> + pass $gdb_test_name >> + gdb_test "continue" $not_being_run_msg "second continue" >> + } >> +} > > It isn't obvious to me why put this one separately instead of > nested within the pass case in the other gdb_test_multiple above. > Is this also meant to run if the previous gdb_test_multiple fails > due to an internal gdb_test_multiple case being hit, > like the default gdb_prompt match? > Fixed. > >> diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp >> index e17ac0ef75..4cf2beca00 100644 >> --- a/gdb/testsuite/lib/gdb.exp >> +++ b/gdb/testsuite/lib/gdb.exp >> @@ -570,7 +570,7 @@ proc runto { function args } { >> } >> return 1 >> } >> - -re "Breakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" { >> + -re "\[Bb\]reakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" { >> if { $print_pass } { >> pass $test_name >> } >> diff --git a/gdb/thread.c b/gdb/thread.c >> index c6e3d356a5..d287bce45f 100644 >> --- a/gdb/thread.c >> +++ b/gdb/thread.c >> @@ -1488,8 +1488,16 @@ scoped_restore_current_thread::scoped_restore_current_thread () >> else >> frame = NULL; >> >> - m_selected_frame_id = get_frame_id (frame); >> - m_selected_frame_level = frame_relative_level (frame); >> + try >> + { >> + m_selected_frame_id = get_frame_id (frame); >> + m_selected_frame_level = frame_relative_level (frame); >> + } >> + catch (const gdb_exception &ex) > > This silently swallows Ctrl-C/QUIT too. That's usually not a good > idea. gdb_exception_error should be the default choice, unless you > really want to handle Ctrl-C here. > Fixed. Thanks for the review, that was helpful. Any more comments? - Tom