From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13214 invoked by alias); 11 Jul 2014 07:34:19 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 13128 invoked by uid 89); 11 Jul 2014 07:34:08 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.2 X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 11 Jul 2014 07:33:58 +0000 Received: from svr-orw-exc-10.mgc.mentorg.com ([147.34.98.58]) by relay1.mentorg.com with esmtp id 1X5VL7-0002lE-4P from Hui_Zhu@mentor.com ; Fri, 11 Jul 2014 00:33:53 -0700 Received: from SVR-ORW-FEM-02.mgc.mentorg.com ([147.34.96.206]) by SVR-ORW-EXC-10.mgc.mentorg.com with Microsoft SMTPSVC(6.0.3790.4675); Fri, 11 Jul 2014 00:33:52 -0700 Received: from localhost.localdomain (147.34.91.1) by svr-orw-fem-02.mgc.mentorg.com (147.34.96.168) with Microsoft SMTP Server id 14.2.247.3; Fri, 11 Jul 2014 00:33:22 -0700 Message-ID: <53BF935B.3000303@mentor.com> Date: Fri, 11 Jul 2014 08:21:00 -0000 From: Hui Zhu User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Pedro Alves , gdb-patches ml Subject: Re: [PATCH v2] GDBserver crashes when killing a multi-thread process References: <510f2362-8d33-4c3c-9a13-5d187f26abdf@SVR-ORW-FEM-04.mgc.mentorg.com> <53AF87EB.60703@mentor.com> <53B3CBDB.5030207@redhat.com> <53BEAE5E.7030209@redhat.com> In-Reply-To: <53BEAE5E.7030209@redhat.com> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2014-07/txt/msg00227.txt.bz2 Hi Pedro, Thanks for your patch. It fixed the issue in my part. Best, Hui On 07/10/14 23:16, Pedro Alves wrote: > Hi Hui, > > On 07/02/2014 10:07 AM, Pedro Alves wrote: >>> This issue is introduced by the patches for PR 12702 that was pushed by you. >>> Maybe you could take a look on this patch. >> >> Thanks. I'll need to think a bit about whether this is the right >> solution. >> >> I've added it to the 7.8 regression queue at: >> >> https://sourceware.org/gdb/wiki/GDB_7.8_Release >> >> I'll look better at it as soon as I have a chance. >> > > First, it's really annoying that the testsuite doesn't catch this! > We're really missing such a basic test, so I went ahead and wrote one. > I was actually quite surprised that we didn't have any > test that makes sure "kill" works without complaints/spurious output. > > However, unfortunately, the new test revealed that the fix isn't right. :-/ > > Sometimes the test passed, but other times (that'll depend on > machine, of course), I'd get: > > kill > Kill the program being debugged? (y or n) y > Ignoring packet error, continuing... > (gdb) FAIL: gdb.threads/kill.exp: kill > > That means GDBserver didn't reply to the vKill packet. > > Attaching to GDBserver at this point shows that it's stuck forever > in the new loop: > > > + /* Keep call kill_one_lwp_callback until find_inferior cannot find any > + lwps that is for pid. */ > + while (find_inferior (&all_threads, kill_one_lwp_callback , &pid) != NULL); > > > The reason was that the leader lwp goes zombie before the other > lwps are reaped, and linux_wait_for_event would delete it. We'd be > left with e.g., only one lwp that is _not_ the leader. > > That lwp had disappeared already (due to SIGKILL), but there's > no status to collect (waitpid would return -1). Nothing in linux-low.c > would notice the lwp is already gone. Because that lwp has lwp->stopped == 0, > linux_wait_for_event would return, here: > > if ((find_inferior (&all_threads, > not_stopped_callback, > &wait_ptid) == NULL)) > { > if (debug_threads) > debug_printf ("LLW: exit (no unwaited-for LWP)\n"); > sigprocmask (SIG_SETMASK, &prev_mask, NULL); > return -1; > } > > So this would return true: > > --- a/gdb/gdbserver/linux-low.c > +++ b/gdb/gdbserver/linux-low.c > @@ -944,7 +944,9 @@ kill_one_lwp_callback (struct inferior_l > pid = linux_wait_for_event (thread->entry.id, &wstat, __WALL); > } while (pid > 0 && WIFSTOPPED (wstat)); > > - return 0; > + /* Let find_inferior return because maybe other lwp in the list will > + be deleted by delete_lwp. */ > + return 1; > } > > So the new loop would never break. > > Here's a different patch that passes the test cleanly for me. > Let me know if it works for you. > > 8<------------------- > From ed3e42e5447e0276d6fc759a27408d06e9ccf0b7 Mon Sep 17 00:00:00 2001 > From: Pedro Alves > Date: Thu, 10 Jul 2014 15:13:26 +0100 > Subject: [PATCH] GDBserver crashes when killing a multi-thread process > > Here's an example, with the new test: > > gdbserver :9999 gdb.threads/kill > gdb gdb.threads/kill > (gdb) b 52 > Breakpoint 1 at 0x4007f4: file kill.c, line 52. > Continuing. > > Breakpoint 1, main () at kill.c:52 > 52 return 0; /* set break here */ > (gdb) k > Kill the program being debugged? (y or n) y > > gdbserver :9999 gdb.threads/kill > Process gdb.base/watch_thread_num created; pid = 9719 > Listening on port 1234 > Remote debugging from host 127.0.0.1 > Killing all inferiors > Segmentation fault (core dumped) > > Backtrace: > > (gdb) bt > #0 0x00000000004068a0 in find_inferior (list=0x66b060 , func=0x427637 , arg=0x7fffffffd3fc) at src/gdb/gdbserver/inferiors.c:199 > #1 0x00000000004277b6 in linux_kill (pid=15708) at src/gdb/gdbserver/linux-low.c:966 > #2 0x000000000041354d in kill_inferior (pid=15708) at src/gdb/gdbserver/target.c:163 > #3 0x00000000004107e9 in kill_inferior_callback (entry=0x6704f0) at src/gdb/gdbserver/server.c:2934 > #4 0x0000000000406522 in for_each_inferior (list=0x66b050 , action=0x4107a6 ) at src/gdb/gdbserver/inferiors.c:57 > #5 0x0000000000412377 in process_serial_event () at src/gdb/gdbserver/server.c:3767 > #6 0x000000000041267c in handle_serial_event (err=0, client_data=0x0) at src/gdb/gdbserver/server.c:3880 > #7 0x00000000004189ff in handle_file_event (event_file_desc=4) at src/gdb/gdbserver/event-loop.c:434 > #8 0x00000000004181c6 in process_event () at src/gdb/gdbserver/event-loop.c:189 > #9 0x0000000000418f45 in start_event_loop () at src/gdb/gdbserver/event-loop.c:552 > #10 0x0000000000411272 in main (argc=3, argv=0x7fffffffd8d8) at src/gdb/gdbserver/server.c:3283 > > The problem is that linux_wait_for_event deletes lwps that have exited > (even those not passed in as lwps of interest), while the lwp/thread > list is being walked on with find_inferior. find_inferior can handle > the current iterated inferior being deleted, but not others. > > When killing lwps, we don't really care about any of the pending > status handling of linux_wait_for_event. We can just waitpid the lwps > directly, which is also what GDB does (see > linux-nat.c:kill_wait_callback). This way the lwps are not deleted > while we're walking the list. They'll be deleted by linux_mourn > afterwards. > > This crash triggers several times when running the testsuite against > GDBserver with the native-gdbserver board (target remote), but as GDB > can't distinguish between GDBserver crashing and "kill" being > sucessful, as in both cases the connection is closed (the 'k' packet > doesn't require a reply), and the inferior is gone, that results in no > FAIL. > > The patch adds a generic test that catches the issue with > extended-remote mode (and works fine with native testing too). Here's > how it fails with the native-extended-gdbserver board without the fix: > > (gdb) info threads > Id Target Id Frame > 6 Thread 15367.15374 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 > 5 Thread 15367.15373 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 > 4 Thread 15367.15372 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 > 3 Thread 15367.15371 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 > 2 Thread 15367.15370 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 > * 1 Thread 15367.15367 main () at .../gdb.threads/kill.c:52 > (gdb) kill > Kill the program being debugged? (y or n) y > Remote connection closed > ^^^^^^^^^^^^^^^^^^^^^^^^ > (gdb) FAIL: gdb.threads/kill.exp: kill > > Extended remote should remain connected after the kill. > > gdb/gdbserver/ > 2014-07-10 Pedro Alves > > * linux-low.c (kill_wait_lwp): New function, based on > kill_one_lwp_callback, but use my_waitpid directly. > (kill_one_lwp_callback, linux_kill): Use it. > > gdb/testsuite/ > 2014-07-10 Pedro Alves > > * gdb.threads/kill.c: New file. > * gdb.threads/kill.exp: New file. > --- > gdb/gdbserver/linux-low.c | 68 ++++++++++++++++++++------------- > gdb/testsuite/gdb.threads/kill.c | 64 +++++++++++++++++++++++++++++++ > gdb/testsuite/gdb.threads/kill.exp | 77 ++++++++++++++++++++++++++++++++++++++ > 3 files changed, 183 insertions(+), 26 deletions(-) > create mode 100644 gdb/testsuite/gdb.threads/kill.c > create mode 100644 gdb/testsuite/gdb.threads/kill.exp > > diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c > index 61552f4..bdc9aaa 100644 > --- a/gdb/gdbserver/linux-low.c > +++ b/gdb/gdbserver/linux-low.c > @@ -909,6 +909,46 @@ linux_kill_one_lwp (struct lwp_info *lwp) > errno ? strerror (errno) : "OK"); > } > > +/* Kill LWP and wait for it to die. */ > + > +static void > +kill_wait_lwp (struct lwp_info *lwp) > +{ > + struct thread_info *thr = get_lwp_thread (lwp); > + int pid = ptid_get_pid (ptid_of (thr)); > + int lwpid = ptid_get_lwp (ptid_of (thr)); > + int wstat; > + int res; > + > + if (debug_threads) > + debug_printf ("kwl: killing lwp %d, for pid: %d\n", lwpid, pid); > + > + do > + { > + linux_kill_one_lwp (lwp); > + > + /* Make sure it died. Notes: > + > + - The loop is most likely unnecessary. > + > + - We don't use linux_wait_for_event as that could delete lwps > + while we're iterating over them. We're not interested in > + any pending status at this point, only in making sure all > + wait status on the kernel side are collected until the > + process is reaped. > + > + - We don't use __WALL here as the __WALL emulation relies on > + SIGCHLD, and killing a stopped process doesn't generate > + one, nor an exit status. > + */ > + res = my_waitpid (lwpid, &wstat, 0); > + if (res == -1 && errno == ECHILD) > + res = my_waitpid (lwpid, &wstat, __WCLONE); > + } while (res > 0 && WIFSTOPPED (wstat)); > + > + gdb_assert (res > 0); > +} > + > /* Callback for `find_inferior'. Kills an lwp of a given process, > except the leader. */ > > @@ -917,7 +957,6 @@ kill_one_lwp_callback (struct inferior_list_entry *entry, void *args) > { > struct thread_info *thread = (struct thread_info *) entry; > struct lwp_info *lwp = get_thread_lwp (thread); > - int wstat; > int pid = * (int *) args; > > if (ptid_get_pid (entry->id) != pid) > @@ -936,14 +975,7 @@ kill_one_lwp_callback (struct inferior_list_entry *entry, void *args) > return 0; > } > > - do > - { > - linux_kill_one_lwp (lwp); > - > - /* Make sure it died. The loop is most likely unnecessary. */ > - pid = linux_wait_for_event (thread->entry.id, &wstat, __WALL); > - } while (pid > 0 && WIFSTOPPED (wstat)); > - > + kill_wait_lwp (lwp); > return 0; > } > > @@ -952,8 +984,6 @@ linux_kill (int pid) > { > struct process_info *process; > struct lwp_info *lwp; > - int wstat; > - int lwpid; > > process = find_process_pid (pid); > if (process == NULL) > @@ -976,21 +1006,7 @@ linux_kill (int pid) > pid); > } > else > - { > - struct thread_info *thr = get_lwp_thread (lwp); > - > - if (debug_threads) > - debug_printf ("lk_1: killing lwp %ld, for pid: %d\n", > - lwpid_of (thr), pid); > - > - do > - { > - linux_kill_one_lwp (lwp); > - > - /* Make sure it died. The loop is most likely unnecessary. */ > - lwpid = linux_wait_for_event (thr->entry.id, &wstat, __WALL); > - } while (lwpid > 0 && WIFSTOPPED (wstat)); > - } > + kill_wait_lwp (lwp); > > the_target->mourn (process); > > diff --git a/gdb/testsuite/gdb.threads/kill.c b/gdb/testsuite/gdb.threads/kill.c > new file mode 100644 > index 0000000..aefbb06 > --- /dev/null > +++ b/gdb/testsuite/gdb.threads/kill.c > @@ -0,0 +1,64 @@ > +/* This testcase is part of GDB, the GNU debugger. > + > + Copyright 2014 Free Software Foundation, Inc. > + > + This program is free software; you can redistribute it and/or modify > + it under the terms of the GNU General Public License as published by > + the Free Software Foundation; either version 3 of the License, or > + (at your option) any later version. > + > + This program is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + GNU General Public License for more details. > + > + You should have received a copy of the GNU General Public License > + along with this program. If not, see . */ > + > +#ifdef USE_THREADS > + > +#include > +#include > + > +#define NUM 5 > + > +pthread_barrier_t barrier; > + > +void * > +thread_function (void *arg) > +{ > + volatile unsigned int counter = 1; > + > + pthread_barrier_wait (&barrier); > + > + while (counter > 0) > + { > + counter++; > + usleep (1); > + } > + > + pthread_exit (NULL); > +} > + > +#endif /* USE_THREADS */ > + > +void > +setup (void) > +{ > +#ifdef USE_THREADS > + pthread_t threads[NUM]; > + int i; > + > + pthread_barrier_init (&barrier, NULL, NUM + 1); > + for (i = 0; i < NUM; i++) > + pthread_create (&threads[i], NULL, thread_function, NULL); > + pthread_barrier_wait (&barrier); > +#endif /* USE_THREADS */ > +} > + > +int > +main (void) > +{ > + setup (); > + return 0; /* set break here */ > +} > diff --git a/gdb/testsuite/gdb.threads/kill.exp b/gdb/testsuite/gdb.threads/kill.exp > new file mode 100644 > index 0000000..a3f921f > --- /dev/null > +++ b/gdb/testsuite/gdb.threads/kill.exp > @@ -0,0 +1,77 @@ > +# This testcase is part of GDB, the GNU debugger. > + > +# Copyright 2014 Free Software Foundation, Inc. > + > +# This program is free software; you can redistribute it and/or modify > +# it under the terms of the GNU General Public License as published by > +# the Free Software Foundation; either version 3 of the License, or > +# (at your option) any later version. > +# > +# This program is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program. If not, see . */ > + > +standard_testfile > + > +# Run the test proper. THREADED indicates whether to build a threaded > +# program and spawn several threads before trying to kill the program. > + > +proc test {threaded} { > + global testfile srcfile > + > + with_test_prefix [expr ($threaded)?"threaded":"non-threaded"] { > + > + set options {debug} > + if {$threaded} { > + lappend options "pthreads" > + lappend options "additional_flags=-DUSE_THREADS" > + set prog ${testfile}_threads > + } else { > + set prog ${testfile}_nothreads > + } > + > + if {[prepare_for_testing "failed to prepare" $prog $srcfile $options] == -1} { > + return -1 > + } > + > + if { ![runto main] } then { > + fail "run to main" > + return > + } > + > + set linenum [gdb_get_line_number "set break here"] > + gdb_breakpoint "$srcfile:$linenum" > + gdb_continue_to_breakpoint "break here" ".*break here.*" > + > + if {$threaded} { > + gdb_test "info threads" "6.*5.*4.*3.*2.*1.*" "all threads started" > + } > + > + # This kills and ensures no output other than the prompt comes out, > + # like: > + # > + # (gdb) kill > + # Kill the program being debugged? (y or n) y > + # (gdb) > + # > + # If we instead saw more output, like e.g., with an extended-remote > + # connection: > + # > + # (gdb) kill > + # Kill the program being debugged? (y or n) y > + # Remote connection closed > + # (gdb) > + # > + # the above would mean that the remote end crashed. > + > + gdb_test "kill" "^y" "kill program" "Kill the program being debugged\\? \\(y or n\\) $" "y" > + } > +} > + > +foreach threaded {true false} { > + test $threaded > +} >