Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed
From: Hui Zhu <hui_zhu@mentor.com>
To: Pedro Alves <palves@redhat.com>,
	gdb-patches ml	<gdb-patches@sourceware.org>
Subject: Re: [PATCH v2] GDBserver crashes when killing a multi-thread process
Date: Fri, 11 Jul 2014 08:21:00 -0000	[thread overview]
Message-ID: <53BF935B.3000303@mentor.com> (raw)
In-Reply-To: <53BEAE5E.7030209@redhat.com>

Hi Pedro,

Thanks for your patch.  It fixed the issue in my part.

Best,
Hui

On 07/10/14 23:16, Pedro Alves wrote:
> Hi Hui,
>
> On 07/02/2014 10:07 AM, Pedro Alves wrote:
>>> This issue is introduced by the patches for PR 12702 that was pushed by you.
>>> Maybe you could take a look on this patch.
>>
>> Thanks.  I'll need to think a bit about whether this is the right
>> solution.
>>
>> I've added it to the 7.8 regression queue at:
>>
>>   https://sourceware.org/gdb/wiki/GDB_7.8_Release
>>
>> I'll look better at it as soon as I have a chance.
>>
>
> First, it's really annoying that the testsuite doesn't catch this!
> We're really missing such a basic test, so I went ahead and wrote one.
> I was actually quite surprised that we didn't have any
> test that makes sure "kill" works without complaints/spurious output.
>
> However, unfortunately, the new test revealed that the fix isn't right.  :-/
>
> Sometimes the test passed, but other times (that'll depend on
> machine, of course), I'd get:
>
>   kill
>   Kill the program being debugged? (y or n) y
>   Ignoring packet error, continuing...
>   (gdb) FAIL: gdb.threads/kill.exp: kill
>
> That means GDBserver didn't reply to the vKill packet.
>
> Attaching to GDBserver at this point shows that it's stuck forever
> in the new loop:
>
>
>   +  /* Keep call kill_one_lwp_callback until find_inferior cannot find any
>   +     lwps that is for pid.  */
>   +  while (find_inferior (&all_threads, kill_one_lwp_callback , &pid) != NULL);
>
>
> The reason was that the leader lwp goes zombie before the other
> lwps are reaped, and linux_wait_for_event would delete it.  We'd be
> left with e.g., only one lwp that is _not_ the leader.
>
> That lwp had disappeared already (due to SIGKILL), but there's
> no status to collect (waitpid would return -1).  Nothing in linux-low.c
> would notice the lwp is already gone.  Because that lwp has lwp->stopped == 0,
> linux_wait_for_event would return, here:
>
>        if ((find_inferior (&all_threads,
> 			  not_stopped_callback,
> 			  &wait_ptid) == NULL))
> 	{
> 	  if (debug_threads)
> 	    debug_printf ("LLW: exit (no unwaited-for LWP)\n");
> 	  sigprocmask (SIG_SETMASK, &prev_mask, NULL);
> 	  return -1;
> 	}
>
> So this would return true:
>
>   --- a/gdb/gdbserver/linux-low.c
>   +++ b/gdb/gdbserver/linux-low.c
>   @@ -944,7 +944,9 @@ kill_one_lwp_callback (struct inferior_l
>         pid = linux_wait_for_event (thread->entry.id, &wstat, __WALL);
>       } while (pid > 0 && WIFSTOPPED (wstat));
>
>   -  return 0;
>   +  /* Let find_inferior return because maybe other lwp in the list will
>   +     be deleted by delete_lwp.  */
>   +  return 1;
>    }
>
> So the new loop would never break.
>
> Here's a different patch that passes the test cleanly for me.
> Let me know if it works for you.
>
> 8<-------------------
>  From ed3e42e5447e0276d6fc759a27408d06e9ccf0b7 Mon Sep 17 00:00:00 2001
> From: Pedro Alves <palves@redhat.com>
> Date: Thu, 10 Jul 2014 15:13:26 +0100
> Subject: [PATCH] GDBserver crashes when killing a multi-thread process
>
> Here's an example, with the new test:
>
>   gdbserver :9999 gdb.threads/kill
>   gdb gdb.threads/kill
>   (gdb) b 52
>   Breakpoint 1 at 0x4007f4: file kill.c, line 52.
>   Continuing.
>
>   Breakpoint 1, main () at kill.c:52
>   52        return 0; /* set break here */
>   (gdb) k
>   Kill the program being debugged? (y or n) y
>
>   gdbserver :9999 gdb.threads/kill
>   Process gdb.base/watch_thread_num created; pid = 9719
>   Listening on port 1234
>   Remote debugging from host 127.0.0.1
>   Killing all inferiors
>   Segmentation fault (core dumped)
>
> Backtrace:
>
>   (gdb) bt
>   #0  0x00000000004068a0 in find_inferior (list=0x66b060 <all_threads>, func=0x427637 <kill_one_lwp_callback>, arg=0x7fffffffd3fc) at src/gdb/gdbserver/inferiors.c:199
>   #1  0x00000000004277b6 in linux_kill (pid=15708) at src/gdb/gdbserver/linux-low.c:966
>   #2  0x000000000041354d in kill_inferior (pid=15708) at src/gdb/gdbserver/target.c:163
>   #3  0x00000000004107e9 in kill_inferior_callback (entry=0x6704f0) at src/gdb/gdbserver/server.c:2934
>   #4  0x0000000000406522 in for_each_inferior (list=0x66b050 <all_processes>, action=0x4107a6 <kill_inferior_callback>) at src/gdb/gdbserver/inferiors.c:57
>   #5  0x0000000000412377 in process_serial_event () at src/gdb/gdbserver/server.c:3767
>   #6  0x000000000041267c in handle_serial_event (err=0, client_data=0x0) at src/gdb/gdbserver/server.c:3880
>   #7  0x00000000004189ff in handle_file_event (event_file_desc=4) at src/gdb/gdbserver/event-loop.c:434
>   #8  0x00000000004181c6 in process_event () at src/gdb/gdbserver/event-loop.c:189
>   #9  0x0000000000418f45 in start_event_loop () at src/gdb/gdbserver/event-loop.c:552
>   #10 0x0000000000411272 in main (argc=3, argv=0x7fffffffd8d8) at src/gdb/gdbserver/server.c:3283
>
> The problem is that linux_wait_for_event deletes lwps that have exited
> (even those not passed in as lwps of interest), while the lwp/thread
> list is being walked on with find_inferior.  find_inferior can handle
> the current iterated inferior being deleted, but not others.
>
> When killing lwps, we don't really care about any of the pending
> status handling of linux_wait_for_event.  We can just waitpid the lwps
> directly, which is also what GDB does (see
> linux-nat.c:kill_wait_callback).  This way the lwps are not deleted
> while we're walking the list.  They'll be deleted by linux_mourn
> afterwards.
>
> This crash triggers several times when running the testsuite against
> GDBserver with the native-gdbserver board (target remote), but as GDB
> can't distinguish between GDBserver crashing and "kill" being
> sucessful, as in both cases the connection is closed (the 'k' packet
> doesn't require a reply), and the inferior is gone, that results in no
> FAIL.
>
> The patch adds a generic test that catches the issue with
> extended-remote mode (and works fine with native testing too).  Here's
> how it fails with the native-extended-gdbserver board without the fix:
>
>   (gdb) info threads
>     Id   Target Id         Frame
>     6    Thread 15367.15374 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
>     5    Thread 15367.15373 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
>     4    Thread 15367.15372 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
>     3    Thread 15367.15371 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
>     2    Thread 15367.15370 0x000000373bcbc98d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
>   * 1    Thread 15367.15367 main () at .../gdb.threads/kill.c:52
>   (gdb) kill
>   Kill the program being debugged? (y or n) y
>   Remote connection closed
>   ^^^^^^^^^^^^^^^^^^^^^^^^
>   (gdb) FAIL: gdb.threads/kill.exp: kill
>
> Extended remote should remain connected after the kill.
>
> gdb/gdbserver/
> 2014-07-10  Pedro Alves  <palves@redhat.com>
>
> 	* linux-low.c (kill_wait_lwp): New function, based on
> 	kill_one_lwp_callback, but use my_waitpid directly.
> 	(kill_one_lwp_callback, linux_kill): Use it.
>
> gdb/testsuite/
> 2014-07-10  Pedro Alves  <palves@redhat.com>
>
> 	* gdb.threads/kill.c: New file.
> 	* gdb.threads/kill.exp: New file.
> ---
>   gdb/gdbserver/linux-low.c          | 68 ++++++++++++++++++++-------------
>   gdb/testsuite/gdb.threads/kill.c   | 64 +++++++++++++++++++++++++++++++
>   gdb/testsuite/gdb.threads/kill.exp | 77 ++++++++++++++++++++++++++++++++++++++
>   3 files changed, 183 insertions(+), 26 deletions(-)
>   create mode 100644 gdb/testsuite/gdb.threads/kill.c
>   create mode 100644 gdb/testsuite/gdb.threads/kill.exp
>
> diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
> index 61552f4..bdc9aaa 100644
> --- a/gdb/gdbserver/linux-low.c
> +++ b/gdb/gdbserver/linux-low.c
> @@ -909,6 +909,46 @@ linux_kill_one_lwp (struct lwp_info *lwp)
>   		  errno ? strerror (errno) : "OK");
>   }
>
> +/* Kill LWP and wait for it to die.  */
> +
> +static void
> +kill_wait_lwp (struct lwp_info *lwp)
> +{
> +  struct thread_info *thr = get_lwp_thread (lwp);
> +  int pid = ptid_get_pid (ptid_of (thr));
> +  int lwpid = ptid_get_lwp (ptid_of (thr));
> +  int wstat;
> +  int res;
> +
> +  if (debug_threads)
> +    debug_printf ("kwl: killing lwp %d, for pid: %d\n", lwpid, pid);
> +
> +  do
> +    {
> +      linux_kill_one_lwp (lwp);
> +
> +      /* Make sure it died.  Notes:
> +
> +	 - The loop is most likely unnecessary.
> +
> +         - We don't use linux_wait_for_event as that could delete lwps
> +           while we're iterating over them.  We're not interested in
> +           any pending status at this point, only in making sure all
> +           wait status on the kernel side are collected until the
> +           process is reaped.
> +
> +	 - We don't use __WALL here as the __WALL emulation relies on
> +	   SIGCHLD, and killing a stopped process doesn't generate
> +	   one, nor an exit status.
> +      */
> +      res = my_waitpid (lwpid, &wstat, 0);
> +      if (res == -1 && errno == ECHILD)
> +	res = my_waitpid (lwpid, &wstat, __WCLONE);
> +    } while (res > 0 && WIFSTOPPED (wstat));
> +
> +  gdb_assert (res > 0);
> +}
> +
>   /* Callback for `find_inferior'.  Kills an lwp of a given process,
>      except the leader.  */
>
> @@ -917,7 +957,6 @@ kill_one_lwp_callback (struct inferior_list_entry *entry, void *args)
>   {
>     struct thread_info *thread = (struct thread_info *) entry;
>     struct lwp_info *lwp = get_thread_lwp (thread);
> -  int wstat;
>     int pid = * (int *) args;
>
>     if (ptid_get_pid (entry->id) != pid)
> @@ -936,14 +975,7 @@ kill_one_lwp_callback (struct inferior_list_entry *entry, void *args)
>         return 0;
>       }
>
> -  do
> -    {
> -      linux_kill_one_lwp (lwp);
> -
> -      /* Make sure it died.  The loop is most likely unnecessary.  */
> -      pid = linux_wait_for_event (thread->entry.id, &wstat, __WALL);
> -    } while (pid > 0 && WIFSTOPPED (wstat));
> -
> +  kill_wait_lwp (lwp);
>     return 0;
>   }
>
> @@ -952,8 +984,6 @@ linux_kill (int pid)
>   {
>     struct process_info *process;
>     struct lwp_info *lwp;
> -  int wstat;
> -  int lwpid;
>
>     process = find_process_pid (pid);
>     if (process == NULL)
> @@ -976,21 +1006,7 @@ linux_kill (int pid)
>   		      pid);
>       }
>     else
> -    {
> -      struct thread_info *thr = get_lwp_thread (lwp);
> -
> -      if (debug_threads)
> -	debug_printf ("lk_1: killing lwp %ld, for pid: %d\n",
> -		      lwpid_of (thr), pid);
> -
> -      do
> -	{
> -	  linux_kill_one_lwp (lwp);
> -
> -	  /* Make sure it died.  The loop is most likely unnecessary.  */
> -	  lwpid = linux_wait_for_event (thr->entry.id, &wstat, __WALL);
> -	} while (lwpid > 0 && WIFSTOPPED (wstat));
> -    }
> +    kill_wait_lwp (lwp);
>
>     the_target->mourn (process);
>
> diff --git a/gdb/testsuite/gdb.threads/kill.c b/gdb/testsuite/gdb.threads/kill.c
> new file mode 100644
> index 0000000..aefbb06
> --- /dev/null
> +++ b/gdb/testsuite/gdb.threads/kill.c
> @@ -0,0 +1,64 @@
> +/* This testcase is part of GDB, the GNU debugger.
> +
> +   Copyright 2014 Free Software Foundation, Inc.
> +
> +   This program is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   This program is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
> +
> +#ifdef USE_THREADS
> +
> +#include <unistd.h>
> +#include <pthread.h>
> +
> +#define NUM 5
> +
> +pthread_barrier_t barrier;
> +
> +void *
> +thread_function (void *arg)
> +{
> +  volatile unsigned int counter = 1;
> +
> +  pthread_barrier_wait (&barrier);
> +
> +  while (counter > 0)
> +    {
> +      counter++;
> +      usleep (1);
> +    }
> +
> +  pthread_exit (NULL);
> +}
> +
> +#endif /* USE_THREADS */
> +
> +void
> +setup (void)
> +{
> +#ifdef USE_THREADS
> +  pthread_t threads[NUM];
> +  int i;
> +
> +  pthread_barrier_init (&barrier, NULL, NUM + 1);
> +  for (i = 0; i < NUM; i++)
> +    pthread_create (&threads[i], NULL, thread_function, NULL);
> +  pthread_barrier_wait (&barrier);
> +#endif /* USE_THREADS */
> +}
> +
> +int
> +main (void)
> +{
> +  setup ();
> +  return 0; /* set break here */
> +}
> diff --git a/gdb/testsuite/gdb.threads/kill.exp b/gdb/testsuite/gdb.threads/kill.exp
> new file mode 100644
> index 0000000..a3f921f
> --- /dev/null
> +++ b/gdb/testsuite/gdb.threads/kill.exp
> @@ -0,0 +1,77 @@
> +# This testcase is part of GDB, the GNU debugger.
> +
> +# Copyright 2014 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
> +
> +standard_testfile
> +
> +# Run the test proper.  THREADED indicates whether to build a threaded
> +# program and spawn several threads before trying to kill the program.
> +
> +proc test {threaded} {
> +    global testfile srcfile
> +
> +    with_test_prefix [expr ($threaded)?"threaded":"non-threaded"] {
> +
> +	set options {debug}
> +	if {$threaded} {
> +	    lappend options "pthreads"
> +	    lappend options "additional_flags=-DUSE_THREADS"
> +	    set prog ${testfile}_threads
> +	} else {
> +	    set prog ${testfile}_nothreads
> +	}
> +
> +	if {[prepare_for_testing "failed to prepare" $prog $srcfile $options] == -1} {
> +	    return -1
> +	}
> +
> +	if { ![runto main] } then {
> +	    fail "run to main"
> +	    return
> +	}
> +
> +	set linenum [gdb_get_line_number "set break here"]
> +	gdb_breakpoint "$srcfile:$linenum"
> +	gdb_continue_to_breakpoint "break here" ".*break here.*"
> +
> +	if {$threaded} {
> +	    gdb_test "info threads" "6.*5.*4.*3.*2.*1.*" "all threads started"
> +	}
> +
> +	# This kills and ensures no output other than the prompt comes out,
> +	# like:
> +	#
> +	#  (gdb) kill
> +	#  Kill the program being debugged? (y or n) y
> +	#  (gdb)
> +	#
> +	# If we instead saw more output, like e.g., with an extended-remote
> +	# connection:
> +	#
> +	#  (gdb) kill
> +	#  Kill the program being debugged? (y or n) y
> +	#  Remote connection closed
> +	#  (gdb)
> +	#
> +	# the above would mean that the remote end crashed.
> +
> +	gdb_test "kill" "^y" "kill program" "Kill the program being debugged\\? \\(y or n\\) $" "y"
> +    }
> +}
> +
> +foreach threaded {true false} {
> +    test $threaded
> +}
>


  reply	other threads:[~2014-07-11  7:34 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-23 10:10 [PATCH] Fix crash of gdbserver when kill threads Hui Zhu
2014-06-29  3:28 ` Hui Zhu
2014-07-02  9:07   ` Pedro Alves
2014-07-10 15:17     ` [PATCH v2] GDBserver crashes when killing a multi-thread process Pedro Alves
2014-07-11  8:21       ` Hui Zhu [this message]
2014-07-11 10:53         ` [PUSHED+7.8] " Pedro Alves
2015-07-13 16:07       ` Yao Qi
2015-07-13 17:32         ` Pedro Alves
2015-07-14  8:00           ` Yao Qi
2015-07-14  9:13             ` Pedro Alves

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53BF935B.3000303@mentor.com \
    --to=hui_zhu@mentor.com \
    --cc=gdb-patches@sourceware.org \
    --cc=palves@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox