From: Hui Zhu <hui_zhu@mentor.com>
To: Pedro Alves <palves@redhat.com>,
gdb-patches ml <gdb-patches@sourceware.org>
Subject: Re: [PATCH v2] GDBserver crashes when killing a multi-thread process
Date: Fri, 11 Jul 2014 08:21:00 -0000 [thread overview]
Message-ID: <53BF935B.3000303@mentor.com> (raw)
In-Reply-To: <53BEAE5E.7030209@redhat.com>
Hi Pedro,
Thanks for your patch. It fixed the issue in my part.
Best,
Hui
On 07/10/14 23:16, Pedro Alves wrote:
> Hi Hui,
>
> On 07/02/2014 10:07 AM, Pedro Alves wrote:
>>> This issue is introduced by the patches for PR 12702 that was pushed by you.
>>> Maybe you could take a look on this patch.
>>
>> Thanks. I'll need to think a bit about whether this is the right
>> solution.
>>
>> I've added it to the 7.8 regression queue at:
>>
>> https://sourceware.org/gdb/wiki/GDB_7.8_Release
>>
>> I'll look better at it as soon as I have a chance.
>>
>
> First, it's really annoying that the testsuite doesn't catch this!
> We're really missing such a basic test, so I went ahead and wrote one.
> I was actually quite surprised that we didn't have any
> test that makes sure "kill" works without complaints/spurious output.
>
> However, unfortunately, the new test revealed that the fix isn't right. :-/
>
> Sometimes the test passed, but other times (that'll depend on
> machine, of course), I'd get:
>
> kill
> Kill the program being debugged? (y or n) y
> Ignoring packet error, continuing...
> (gdb) FAIL: gdb.threads/kill.exp: kill
>
> That means GDBserver didn't reply to the vKill packet.
>
> Attaching to GDBserver at this point shows that it's stuck forever
> in the new loop:
>
>
> + /* Keep call kill_one_lwp_callback until find_inferior cannot find any
> + lwps that is for pid. */
> + while (find_inferior (&all_threads, kill_one_lwp_callback , &pid) != NULL);
>
>
> The reason was that the leader lwp goes zombie before the other
> lwps are reaped, and linux_wait_for_event would delete it. We'd be
> left with e.g., only one lwp that is _not_ the leader.
>
> That lwp had disappeared already (due to SIGKILL), but there's
> no status to collect (waitpid would return -1). Nothing in linux-low.c
> would notice the lwp is already gone. Because that lwp has lwp->stopped == 0,
> linux_wait_for_event would return, here:
>
> if ((find_inferior (&all_threads,
> not_stopped_callback,
> &wait_ptid) == NULL))
> {
> if (debug_threads)
> debug_printf ("LLW: exit (no unwaited-for LWP)\n");
> sigprocmask (SIG_SETMASK, &prev_mask, NULL);
> return -1;
> }
>
> So this would return true:
>
> --- a/gdb/gdbserver/linux-low.c
> +++ b/gdb/gdbserver/linux-low.c
> @@ -944,7 +944,9 @@ kill_one_lwp_callback (struct inferior_l
> pid = linux_wait_for_event (thread->entry.id, &wstat, __WALL);
> } while (pid > 0 && WIFSTOPPED (wstat));
>
> - return 0;
> + /* Let find_inferior return because maybe other lwp in the list will
> + be deleted by delete_lwp. */
> + return 1;
> }
>
> So the new loop would never break.
>
> Here's a different patch that passes the test cleanly for me.
> Let me know if it works for you.
>
> 8<-------------------
> From ed3e42e5447e0276d6fc759a27408d06e9ccf0b7 Mon Sep 17 00:00:00 2001
> From: Pedro Alves <palves@redhat.com>
> Date: Thu, 10 Jul 2014 15:13:26 +0100
> Subject: [PATCH] GDBserver crashes when killing a multi-thread process
>
> Here's an example, with the new test:
>
> gdbserver :9999 gdb.threads/kill
> gdb gdb.threads/kill
> (gdb) b 52
> Breakpoint 1 at 0x4007f4: file kill.c, line 52.
> Continuing.
>
> Breakpoint 1, main () at kill.c:52
> 52 return 0; /* set break here */
> (gdb) k
> Kill the program being debugged? (y or n) y
>
> gdbserver :9999 gdb.threads/kill
> Process gdb.base/watch_thread_num created; pid = 9719
> Listening on port 1234
> Remote debugging from host 127.0.0.1
> Killing all inferiors
> Segmentation fault (core dumped)
>
> Backtrace:
>
> (gdb) bt
> #0 0x00000000004068a0 in find_inferior (list=0x66b060 <all_threads>, func=0x427637 <kill_one_lwp_callback>, arg=0x7fffffffd3fc) at src/gdb/gdbserver/inferiors.c:199
> #1 0x00000000004277b6 in linux_kill (pid=15708) at src/gdb/gdbserver/linux-low.c:966
> #2 0x000000000041354d in kill_inferior (pid=15708) at src/gdb/gdbserver/target.c:163
> #3 0x00000000004107e9 in kill_inferior_callback (entry=0x6704f0) at src/gdb/gdbserver/server.c:2934
> #4 0x0000000000406522 in for_each_inferior (list=0x66b050 <all_processes>, action=0x4107a6 <kill_inferior_callback>) at src/gdb/gdbserver/inferiors.c:57
> #5 0x0000000000412377 in process_serial_event () at src/gdb/gdbserver/server.c:3767
> #6 0x000000000041267c in handle_serial_event (err=0, client_data=0x0) at src/gdb/gdbserver/server.c:3880
> #7 0x00000000004189ff in handle_file_event (event_file_desc=4) at src/gdb/gdbserver/event-loop.c:434
> #8 0x00000000004181c6 in process_event () at src/gdb/gdbserver/event-loop.c:189
> #9 0x0000000000418f45 in start_event_loop () at src/gdb/gdbserver/event-loop.c:552
> #10 0x0000000000411272 in main (argc=3, argv=0x7fffffffd8d8) at src/gdb/gdbserver/server.c:3283
>
> The problem is that linux_wait_for_event deletes lwps that have exited
> (even those not passed in as lwps of interest), while the lwp/thread
> list is being walked on with find_inferior. find_inferior can handle
> the current iterated inferior being deleted, but not others.
>
> When killing lwps, we don't really care about any of the pending
> status handling of linux_wait_for_event. We can just waitpid the lwps
> directly, which is also what GDB does (see
> linux-nat.c:kill_wait_callback). This way the lwps are not deleted
> while we're walking the list. They'll be deleted by linux_mourn
> afterwards.
>
> This crash triggers several times when running the testsuite against
> GDBserver with the native-gdbserver board (target remote), but as GDB
> can't distinguish between GDBserver crashing and "kill" being
> sucessful, as in both cases the connection is closed (the 'k' packet
> doesn't require a reply), and the inferior is gone, that results in no
> FAIL.
>
> The patch adds a generic test that catches the issue with
> extended-remote mode (and works fine with native testing too). Here's
> how it fails with the native-extended-gdbserver board without the fix:
>
> (gdb) info threads
> Id Target Id Frame
> 6 Thread 15367.15374 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
> 5 Thread 15367.15373 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
> 4 Thread 15367.15372 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
> 3 Thread 15367.15371 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
> 2 Thread 15367.15370 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
> * 1 Thread 15367.15367 main () at .../gdb.threads/kill.c:52
> (gdb) kill
> Kill the program being debugged? (y or n) y
> Remote connection closed
> ^^^^^^^^^^^^^^^^^^^^^^^^
> (gdb) FAIL: gdb.threads/kill.exp: kill
>
> Extended remote should remain connected after the kill.
>
> gdb/gdbserver/
> 2014-07-10 Pedro Alves <palves@redhat.com>
>
> * linux-low.c (kill_wait_lwp): New function, based on
> kill_one_lwp_callback, but use my_waitpid directly.
> (kill_one_lwp_callback, linux_kill): Use it.
>
> gdb/testsuite/
> 2014-07-10 Pedro Alves <palves@redhat.com>
>
> * gdb.threads/kill.c: New file.
> * gdb.threads/kill.exp: New file.
> ---
> gdb/gdbserver/linux-low.c | 68 ++++++++++++++++++++-------------
> gdb/testsuite/gdb.threads/kill.c | 64 +++++++++++++++++++++++++++++++
> gdb/testsuite/gdb.threads/kill.exp | 77 ++++++++++++++++++++++++++++++++++++++
> 3 files changed, 183 insertions(+), 26 deletions(-)
> create mode 100644 gdb/testsuite/gdb.threads/kill.c
> create mode 100644 gdb/testsuite/gdb.threads/kill.exp
>
> diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
> index 61552f4..bdc9aaa 100644
> --- a/gdb/gdbserver/linux-low.c
> +++ b/gdb/gdbserver/linux-low.c
> @@ -909,6 +909,46 @@ linux_kill_one_lwp (struct lwp_info *lwp)
> errno ? strerror (errno) : "OK");
> }
>
> +/* Kill LWP and wait for it to die. */
> +
> +static void
> +kill_wait_lwp (struct lwp_info *lwp)
> +{
> + struct thread_info *thr = get_lwp_thread (lwp);
> + int pid = ptid_get_pid (ptid_of (thr));
> + int lwpid = ptid_get_lwp (ptid_of (thr));
> + int wstat;
> + int res;
> +
> + if (debug_threads)
> + debug_printf ("kwl: killing lwp %d, for pid: %d\n", lwpid, pid);
> +
> + do
> + {
> + linux_kill_one_lwp (lwp);
> +
> + /* Make sure it died. Notes:
> +
> + - The loop is most likely unnecessary.
> +
> + - We don't use linux_wait_for_event as that could delete lwps
> + while we're iterating over them. We're not interested in
> + any pending status at this point, only in making sure all
> + wait status on the kernel side are collected until the
> + process is reaped.
> +
> + - We don't use __WALL here as the __WALL emulation relies on
> + SIGCHLD, and killing a stopped process doesn't generate
> + one, nor an exit status.
> + */
> + res = my_waitpid (lwpid, &wstat, 0);
> + if (res == -1 && errno == ECHILD)
> + res = my_waitpid (lwpid, &wstat, __WCLONE);
> + } while (res > 0 && WIFSTOPPED (wstat));
> +
> + gdb_assert (res > 0);
> +}
> +
> /* Callback for `find_inferior'. Kills an lwp of a given process,
> except the leader. */
>
> @@ -917,7 +957,6 @@ kill_one_lwp_callback (struct inferior_list_entry *entry, void *args)
> {
> struct thread_info *thread = (struct thread_info *) entry;
> struct lwp_info *lwp = get_thread_lwp (thread);
> - int wstat;
> int pid = * (int *) args;
>
> if (ptid_get_pid (entry->id) != pid)
> @@ -936,14 +975,7 @@ kill_one_lwp_callback (struct inferior_list_entry *entry, void *args)
> return 0;
> }
>
> - do
> - {
> - linux_kill_one_lwp (lwp);
> -
> - /* Make sure it died. The loop is most likely unnecessary. */
> - pid = linux_wait_for_event (thread->entry.id, &wstat, __WALL);
> - } while (pid > 0 && WIFSTOPPED (wstat));
> -
> + kill_wait_lwp (lwp);
> return 0;
> }
>
> @@ -952,8 +984,6 @@ linux_kill (int pid)
> {
> struct process_info *process;
> struct lwp_info *lwp;
> - int wstat;
> - int lwpid;
>
> process = find_process_pid (pid);
> if (process == NULL)
> @@ -976,21 +1006,7 @@ linux_kill (int pid)
> pid);
> }
> else
> - {
> - struct thread_info *thr = get_lwp_thread (lwp);
> -
> - if (debug_threads)
> - debug_printf ("lk_1: killing lwp %ld, for pid: %d\n",
> - lwpid_of (thr), pid);
> -
> - do
> - {
> - linux_kill_one_lwp (lwp);
> -
> - /* Make sure it died. The loop is most likely unnecessary. */
> - lwpid = linux_wait_for_event (thr->entry.id, &wstat, __WALL);
> - } while (lwpid > 0 && WIFSTOPPED (wstat));
> - }
> + kill_wait_lwp (lwp);
>
> the_target->mourn (process);
>
> diff --git a/gdb/testsuite/gdb.threads/kill.c b/gdb/testsuite/gdb.threads/kill.c
> new file mode 100644
> index 0000000..aefbb06
> --- /dev/null
> +++ b/gdb/testsuite/gdb.threads/kill.c
> @@ -0,0 +1,64 @@
> +/* This testcase is part of GDB, the GNU debugger.
> +
> + Copyright 2014 Free Software Foundation, Inc.
> +
> + This program is free software; you can redistribute it and/or modify
> + it under the terms of the GNU General Public License as published by
> + the Free Software Foundation; either version 3 of the License, or
> + (at your option) any later version.
> +
> + This program is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + GNU General Public License for more details.
> +
> + You should have received a copy of the GNU General Public License
> + along with this program. If not, see <http://www.gnu.org/licenses/>. */
> +
> +#ifdef USE_THREADS
> +
> +#include <unistd.h>
> +#include <pthread.h>
> +
> +#define NUM 5
> +
> +pthread_barrier_t barrier;
> +
> +void *
> +thread_function (void *arg)
> +{
> + volatile unsigned int counter = 1;
> +
> + pthread_barrier_wait (&barrier);
> +
> + while (counter > 0)
> + {
> + counter++;
> + usleep (1);
> + }
> +
> + pthread_exit (NULL);
> +}
> +
> +#endif /* USE_THREADS */
> +
> +void
> +setup (void)
> +{
> +#ifdef USE_THREADS
> + pthread_t threads[NUM];
> + int i;
> +
> + pthread_barrier_init (&barrier, NULL, NUM + 1);
> + for (i = 0; i < NUM; i++)
> + pthread_create (&threads[i], NULL, thread_function, NULL);
> + pthread_barrier_wait (&barrier);
> +#endif /* USE_THREADS */
> +}
> +
> +int
> +main (void)
> +{
> + setup ();
> + return 0; /* set break here */
> +}
> diff --git a/gdb/testsuite/gdb.threads/kill.exp b/gdb/testsuite/gdb.threads/kill.exp
> new file mode 100644
> index 0000000..a3f921f
> --- /dev/null
> +++ b/gdb/testsuite/gdb.threads/kill.exp
> @@ -0,0 +1,77 @@
> +# This testcase is part of GDB, the GNU debugger.
> +
> +# Copyright 2014 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program. If not, see <http://www.gnu.org/licenses/>. */
> +
> +standard_testfile
> +
> +# Run the test proper. THREADED indicates whether to build a threaded
> +# program and spawn several threads before trying to kill the program.
> +
> +proc test {threaded} {
> + global testfile srcfile
> +
> + with_test_prefix [expr ($threaded)?"threaded":"non-threaded"] {
> +
> + set options {debug}
> + if {$threaded} {
> + lappend options "pthreads"
> + lappend options "additional_flags=-DUSE_THREADS"
> + set prog ${testfile}_threads
> + } else {
> + set prog ${testfile}_nothreads
> + }
> +
> + if {[prepare_for_testing "failed to prepare" $prog $srcfile $options] == -1} {
> + return -1
> + }
> +
> + if { ![runto main] } then {
> + fail "run to main"
> + return
> + }
> +
> + set linenum [gdb_get_line_number "set break here"]
> + gdb_breakpoint "$srcfile:$linenum"
> + gdb_continue_to_breakpoint "break here" ".*break here.*"
> +
> + if {$threaded} {
> + gdb_test "info threads" "6.*5.*4.*3.*2.*1.*" "all threads started"
> + }
> +
> + # This kills and ensures no output other than the prompt comes out,
> + # like:
> + #
> + # (gdb) kill
> + # Kill the program being debugged? (y or n) y
> + # (gdb)
> + #
> + # If we instead saw more output, like e.g., with an extended-remote
> + # connection:
> + #
> + # (gdb) kill
> + # Kill the program being debugged? (y or n) y
> + # Remote connection closed
> + # (gdb)
> + #
> + # the above would mean that the remote end crashed.
> +
> + gdb_test "kill" "^y" "kill program" "Kill the program being debugged\\? \\(y or n\\) $" "y"
> + }
> +}
> +
> +foreach threaded {true false} {
> + test $threaded
> +}
>
next prev parent reply other threads:[~2014-07-11 7:34 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-23 10:10 [PATCH] Fix crash of gdbserver when kill threads Hui Zhu
2014-06-29 3:28 ` Hui Zhu
2014-07-02 9:07 ` Pedro Alves
2014-07-10 15:17 ` [PATCH v2] GDBserver crashes when killing a multi-thread process Pedro Alves
2014-07-11 8:21 ` Hui Zhu [this message]
2014-07-11 10:53 ` [PUSHED+7.8] " Pedro Alves
2015-07-13 16:07 ` Yao Qi
2015-07-13 17:32 ` Pedro Alves
2015-07-14 8:00 ` Yao Qi
2015-07-14 9:13 ` Pedro Alves
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53BF935B.3000303@mentor.com \
--to=hui_zhu@mentor.com \
--cc=gdb-patches@sourceware.org \
--cc=palves@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox