From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by sourceware.org (Postfix) with ESMTPS id 690F5385E009 for ; Wed, 25 Mar 2020 10:29:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 690F5385E009 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=tdevries@suse.de X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 4BB9AAC79; Wed, 25 Mar 2020 10:29:55 +0000 (UTC) Subject: Re: [PATCH][gdb] Fix hang after ext sigkill To: Simon Marchi , gdb-patches@sourceware.org Cc: Pedro Alves References: <20200224201403.GA7079@delia> From: Tom de Vries Autocrypt: addr=tdevries@suse.de; keydata= xsBNBF0ltCcBCADDhsUnMMdEXiHFfqJdXeRvgqSEUxLCy/pHek88ALuFnPTICTwkf4g7uSR7 HvOFUoUyu8oP5mNb4VZHy3Xy8KRZGaQuaOHNhZAT1xaVo6kxjswUi3vYgGJhFMiLuIHdApoc u5f7UbV+egYVxmkvVLSqsVD4pUgHeSoAcIlm3blZ1sDKviJCwaHxDQkVmSsGXImaAU+ViJ5l CwkvyiiIifWD2SoOuFexZyZ7RUddLosgsO0npVUYbl6dEMq2a5ijGF6/rBs1m3nAoIgpXk6P TCKlSWVW6OCneTaKM5C387972qREtiArTakRQIpvDJuiR2soGfdeJ6igGA1FZjU+IsM5ABEB AAHNH1RvbSBkZSBWcmllcyA8dGRldnJpZXNAc3VzZS5kZT7CwKsEEwEIAD4WIQSsnSe5hKbL MK1mGmjuhV2rbOJEoAUCXSW0JwIbAwUJA8JnAAULCQgHAgYVCgkICwIEFgIDAQIeAQIXgAAh CRDuhV2rbOJEoBYhBKydJ7mEpsswrWYaaO6FXats4kSgc48H/Ra2lq5p3dHsrlQLqM7N68Fo eRDf3PMevXyMlrCYDGLVncQwMw3O/AkousktXKQ42DPJh65zoXB22yUt8m0g12xkLax98KFJ 5NyUloa6HflLl+wQL/uZjIdNUQaHQLw3HKwRMVi4l0/Jh/TygYG1Dtm8I4o708JS4y8GQxoQ UL0z1OM9hyM3gI2WVTTyprsBHy2EjMOu/2Xpod95pF8f90zBLajy6qXEnxlcsqreMaqmkzKn 3KTZpWRxNAS/IH3FbGQ+3RpWkNGSJpwfEMVCeyK5a1n7yt1podd1ajY5mA1jcaUmGppqx827 8TqyteNe1B/pbiUt2L/WhnTgW1NC1QDOwE0EXSW0JwEIAM99H34Bu4MKM7HDJVt864MXbx7B 1M93wVlpJ7Uq+XDFD0A0hIal028j+h6jA6bhzWto4RUfDl/9mn1StngNVFovvwtfzbamp6+W pKHZm9X5YvlIwCx131kTxCNDcF+/adRW4n8CU3pZWYmNVqhMUiPLxElA6QhXTtVBh1RkjCZQ Kmbd1szvcOfaD8s+tJABJzNZsmO2hVuFwkDrRN8Jgrh92a+yHQPd9+RybW2l7sJv26nkUH5Z 5s84P6894ebgimcprJdAkjJTgprl1nhgvptU5M9Uv85Pferoh2groQEAtRPlCGrZ2/2qVNe9 XJfSYbiyedvApWcJs5DOByTaKkcAEQEAAcLAkwQYAQgAJhYhBKydJ7mEpsswrWYaaO6FXats 4kSgBQJdJbQnAhsMBQkDwmcAACEJEO6FXats4kSgFiEErJ0nuYSmyzCtZhpo7oVdq2ziRKD3 twf7BAQBZ8TqR812zKAD7biOnWIJ0McV72PFBxmLIHp24UVe0ZogtYMxSWKLg3csh0yLVwc7 H3vldzJ9AoK3Qxp0Q6K/rDOeUy3HMqewQGcqrsRRh0NXDIQk5CgSrZslPe47qIbe3O7ik/MC q31FNIAQJPmKXX25B115MMzkSKlv4udfx7KdyxHrTSkwWZArLQiEZj5KG4cCKhIoMygPTA3U yGaIvI/BGOtHZ7bEBVUCFDFfOWJ26IOCoPnSVUvKPEOH9dv+sNy7jyBsP5QxeTqwxC/1ZtNS DUCSFQjqA6bEGwM22dP8OUY6SC94x1G81A9/xbtm9LQxKm0EiDH8KBMLfQ== Message-ID: <831161db-85a9-74da-1833-7bab3cc41d15@suse.de> Date: Wed, 25 Mar 2020 11:29:54 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/mixed; boundary="------------C476365E14FF408D5FECFE2F" Content-Language: en-US X-Spam-Status: No, score=-36.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_DMARC_STATUS, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Mar 2020 10:29:58 -0000 This is a multi-part message in MIME format. --------------C476365E14FF408D5FECFE2F Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit On 24-03-2020 16:35, Simon Marchi wrote: > Hi Tom, > > The test fails for me more often than not. I've attached a gdb.log showing one such > failure. There seems to be a problem matching the output of the last "continue". > Hi Simon, I've managed to reproduce that, by running the test-case in parallel with stress -c 5. > This is what I see when I reproduce the case by hand: > > (gdb) c > Continuing. > Couldn't get registers: No such process. > Couldn't get registers: No such process. > (gdb) [Thread 0x7ffff7d99700 (LWP 514079) exited] > > Program terminated with signal SIGKILL, Killed. > The program no longer exists. > I've modified the test-case to check for the amount of "Couldn't get registers: No such process." between the continue and the following gdb prompt, and handle that appropriately. Hopefully this will fix the FAILs you're seeing. > I didn't really understand why we saw a prompt coming back before the other messages, > so I looked into it a bit and this is what I think happens: > > 1. While we are stopped at the prompt, the linux-nat event handler is unregistered from the event loop > 2. Inferior gets SIGKILL'ed, so GDB gets SIGCHLD'ed, that posts an event to the event pipe, but since it's > not registered in the event loop, nothing happens > 3. User does continue, the linux-nat event handler gets registered with the event loop. Then, the continue > fails (No such process), which brings us back at the prompt. However, the linux-nat event handler stays > registered with the event loop. > 4. When we come back to the event loop, we process the event for the SIGKILL, which makes GDB print the > thread exit message and "Program terminated" message. > > Normally, after the "continue" fails, I don't think we would want to leave the linux-nat > handler registered with the event loop: if it was not registered before, why should it be > registered after? However, if it wasn't left there, we wouldn't see the messages saying > the program has terminated, so that wouldn't be good either. > > Maybe there's a better way to handle it? > Thanks for the investigation. Unfortunately I'm not able to comment on this. > However, I still think it would be a good idea to merge a patch like yours, it's already a > step forward (especially since it fixes a regression). > >> diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp >> index 81518b9646..745694df2d 100644 >> --- a/gdb/testsuite/lib/gdb.exp >> +++ b/gdb/testsuite/lib/gdb.exp >> @@ -571,7 +571,7 @@ proc runto { function args } { >> } >> return 1 >> } >> - -re "Breakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" { >> + -re "\[Bb\]reakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" { >> if { $print_pass } { >> pass $test_name >> } >> diff --git a/gdb/thread.c b/gdb/thread.c >> index 54b59e2244..9876ca3c76 100644 >> --- a/gdb/thread.c >> +++ b/gdb/thread.c >> @@ -1444,6 +1444,8 @@ scoped_restore_current_thread::restore () >> >> scoped_restore_current_thread::~scoped_restore_current_thread () >> { >> + if (m_inf == NULL) >> + return; >> if (!m_dont_restore) >> { >> try >> @@ -1488,7 +1490,17 @@ scoped_restore_current_thread::scoped_restore_current_thread () >> else >> frame = NULL; >> >> - m_selected_frame_id = get_frame_id (frame); >> + try >> + { >> + m_selected_frame_id = get_frame_id (frame); >> + } >> + catch (const gdb_exception &ex) >> + { >> + m_inf = NULL; >> + m_selected_frame_id = null_frame_id; >> + m_selected_frame_level = -1; >> + return; >> + } > > The indentation is a bit off here. > > Instead of clearing everything, I think we should just set m_selected_frame_id to > null_frame_id and m_selected_frame_level to -1. m_inf and m_thread can still be > set. This way, the right inferior will be restored, at least, I think this is > desirable. scoped_restore_current_thread::restore handles well the case where the > thread to restore has exited. The thread_info object is refcounted for this exact > use case, where the thread would get deleted while > > In other words: > > try > { > m_selected_frame_id = get_frame_id (frame); > m_selected_frame_level = frame_relative_level (frame); > } > catch (const gdb_exception &ex) > { > m_selected_frame_id = null_frame_id; > m_selected_frame_level = -1; > } Yep, that works. Here's the updated patch. Thanks, - Tom --------------C476365E14FF408D5FECFE2F Content-Type: text/x-patch; charset=UTF-8; name="0001-gdb-Fix-hang-after-ext-sigkill.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="0001-gdb-Fix-hang-after-ext-sigkill.patch" [gdb] Fix hang after ext sigkill Consider the test-case from this patch, compiled with pthread support: ... $ gcc src/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c -lpthread ... After running, the program sleeps: ... $ gdb a.out Reading symbols from a.out... (gdb) r Starting program: /data/gdb_versions/devel/a.out [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". [New Thread 0x7ffff77fe700 (LWP 22604)] ... Until we interrupt it with a control-C: ... ^C Thread 1 "a.out" received signal SIGINT, Interrupt. 0x00007ffff78c50f0 in nanosleep () from /lib64/libc.so.6 (gdb) ... If we then kill the inferior using an external SIGKILL: ... (gdb) shell killall -s SIGKILL a.out ... and subsequently continue: ... (gdb) c Continuing. Couldn't get registers: No such process. Couldn't get registers: No such process. (gdb) Couldn't get registers: No such process. (gdb) Couldn't get registers: No such process. (gdb) Couldn't get registers: No such process. ... gdb hangs repeating the same warning. Typing control-C no longer helps, and we have to kill gdb. This is a regression since commit 873657b9e8 "Preserve selected thread in all-stop w/ background execution". The commit adds a scoped_restore_current_thread typed variable restore_thread to fetch_inferior_event, and the hang is caused by the constructor throwing an exception. Fix this by catching the exception in the constructor. Build and reg-tested on x86_64-linux. gdb/ChangeLog: 2020-02-24 Tom de Vries PR gdb/25471 * thread.c (scoped_restore_current_thread::scoped_restore_current_thread): Catch exception in get_frame_id. gdb/testsuite/ChangeLog: 2020-02-24 Tom de Vries PR gdb/25471 * gdb.threads/hang-after-ext-sigkill.c: New test. * gdb.threads/hang-after-ext-sigkill.exp: New file. * lib/gdb.exp (runto): Handle "Temporary breakpoint" string. --- gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c | 40 ++++++++++ .../gdb.threads/hang-after-ext-sigkill.exp | 85 ++++++++++++++++++++++ gdb/testsuite/lib/gdb.exp | 2 +- gdb/thread.c | 12 ++- 4 files changed, 136 insertions(+), 3 deletions(-) diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c new file mode 100644 index 0000000000..bfce6c3085 --- /dev/null +++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.c @@ -0,0 +1,40 @@ +/* This testcase is part of GDB, the GNU debugger. + + Copyright 2020 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . */ + +#include +#include + +static void * +fun (void *dummy) +{ + while (1) + sleep (1); + + return NULL; +} + +int +main (void) +{ + pthread_t thread; + pthread_create (&thread, NULL, fun, NULL); + + while (1) + sleep (1); + + return 0; +} diff --git a/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp new file mode 100644 index 0000000000..37577592cd --- /dev/null +++ b/gdb/testsuite/gdb.threads/hang-after-ext-sigkill.exp @@ -0,0 +1,85 @@ +# Copyright (C) 2020 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . + +standard_testfile + +if {[prepare_for_testing "failed to prepare" $testfile $srcfile \ + {pthreads}] == -1} { + return -1 +} + +set res [runto main no-message temporary] +if { $res != 1 } { + return -1 +} + +set pid -1 +gdb_test_multiple "info inferior 1" "get inferior pid" { + -re -wrap "process (\[0-9\]*).*" { + set pid $expect_out(1,string) + pass $gdb_test_name + } +} +if { $pid == -1 } { + return -1 +} + +gdb_test_multiple "continue" "" { + -re "Continuing" { + pass $gdb_test_name + } +} + +send_gdb "\003" + +gdb_test_multiple "" "get sigint" { + -re -wrap "received signal SIGINT, Interrupt\..*" { + pass $gdb_test_name + } +} + +gdb_test_no_output "shell kill -s SIGKILL $pid" "shell kill -s SIGKILL pid" + +set no_such_process_msg "Couldn't get registers: No such process\." +set killed_msg "Program terminated with signal SIGKILL, Killed\." +set no_longer_exists_msg "The program no longer exists\." +set not_being_run_msg "The program is not being run\." + +gdb_test_multiple "continue" "prompt after first continue" { + -re "Continuing\.\r\n\r\n$killed_msg\r\n$no_longer_exists_msg\r\n$gdb_prompt $" { + pass $gdb_test_name + # Regular output, bug condition was not triggered, we're done. + return -1 + } + -re "Continuing\.\r\n$no_such_process_msg\r\n$no_such_process_msg\r\n$gdb_prompt " { + pass $gdb_test_name + # Two times $no_such_process_msg. The bug condition was triggered, go + # check for it. + } + -re "Continuing\.\r\n$no_such_process_msg\r\n$gdb_prompt $" { + pass $gdb_test_name + # One time $no_such_process_msg. We're stuck here. The bug condition + # was not triggered, but we're not getting correct gdb behaviour either: + # every subsequent continue produces one no_such_process_msg. Give up. + return -1 + } +} + +gdb_test_multiple "" "messages" { + -re ".*$killed_msg.*$no_longer_exists_msg\r\n" { + pass $gdb_test_name + gdb_test "continue" $not_being_run_msg "second continue" + } +} diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp index e17ac0ef75..4cf2beca00 100644 --- a/gdb/testsuite/lib/gdb.exp +++ b/gdb/testsuite/lib/gdb.exp @@ -570,7 +570,7 @@ proc runto { function args } { } return 1 } - -re "Breakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" { + -re "\[Bb\]reakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" { if { $print_pass } { pass $test_name } diff --git a/gdb/thread.c b/gdb/thread.c index c6e3d356a5..d287bce45f 100644 --- a/gdb/thread.c +++ b/gdb/thread.c @@ -1488,8 +1488,16 @@ scoped_restore_current_thread::scoped_restore_current_thread () else frame = NULL; - m_selected_frame_id = get_frame_id (frame); - m_selected_frame_level = frame_relative_level (frame); + try + { + m_selected_frame_id = get_frame_id (frame); + m_selected_frame_level = frame_relative_level (frame); + } + catch (const gdb_exception &ex) + { + m_selected_frame_id = null_frame_id; + m_selected_frame_level = -1; + } tp->incref (); m_thread = tp; --------------C476365E14FF408D5FECFE2F--