From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-117490-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 26676 invoked by alias); 13 Nov 2014 19:18:52 -0000
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
Received: (qmail 26665 invoked by uid 89); 13 Nov 2014 19:18:52 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Thu, 13 Nov 2014 19:18:50 +0000
Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26])	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id sADJIl0Z005717	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL);	Thu, 13 Nov 2014 14:18:47 -0500
Received: from [127.0.0.1] (ovpn01.gateway.prod.ext.ams2.redhat.com [10.39.146.11])	by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id sADJIkox015539;	Thu, 13 Nov 2014 14:18:46 -0500
Message-ID: <54650415.9020705@redhat.com>
Date: Thu, 13 Nov 2014 19:18:00 -0000
From: Pedro Alves <palves@redhat.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1
MIME-Version: 1.0
To: Don Breazeal <donb@codesourcery.com>, gdb-patches@sourceware.org
Subject: Re: [PATCH 11/16 v3] Extended-remote Linux exit events
References: <1408580964-27916-1-git-send-email-donb@codesourcery.com> <1414798134-11536-9-git-send-email-donb@codesourcery.com>
In-Reply-To: <1414798134-11536-9-git-send-email-donb@codesourcery.com>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
X-SW-Source: 2014-11/txt/msg00282.txt.bz2

On 10/31/2014 11:28 PM, Don Breazeal wrote:

> 
> Now for the long answer: in my testing I ran into a race condition in
> check_zombie_leaders, which detects when a thread group leader has exited
> and other threads still exist.  On the Linux kernel, ptrace/waitpid don't
> allow reaping the leader thread until all other threads in the group are
> reaped.  When the leader exits, it goes zombie, but waitpid will not return
> an exit status until the other threads are gone.  When a non-leader thread
> calls exec, all other non-leader threads are destroyed, the leader becomes
> a zombie, and once the "other" threads have been reaped, the execing thread
> takes over the leader's pid (tgid) and appears to vanish.  In order to
> handle this situation in the current implementation, check_zombie_leaders
> polls the process state in /proc and deletes thread group leaders that are
> in a zombie state.  The replacement is added to the lwp list when the exec
> event is reported.
> 
> See https://sourceware.org/ml/gdb-patches/2011-10/msg00704.html for a more
> detailed explanation of how this works.
> 
> Here is the relevant part of check_zombie_leaders:
> 
> if (leader_lp != NULL
>           /* Check if there are other threads in the group, as we may
>              have raced with the inferior simply exiting.  */
>           && !last_thread_of_process_p (leader_pid)
>           && linux_proc_pid_is_zombie (leader_pid))
>         {
>           /* ...large informative comment block... */
>           delete_lwp (leader_lp);
> 
> The race occurred when there were two threads in the program, and the
> non-leader thread called exec.  In this case the leader thread passed
> through a very brief zombie state before being replaced by the exec'ing
> thread as the thread group leader.  This state transition was asynchronous,
> with no dependency on anything gdbserver did.  Because there were no other
> threads, there were no thread exit events, and thus there was no
> synchronization with the leader passing through the zombie state and the
> exec completing.  If there had been more threads, the leader would remain
> in the zombie state until they were waited for.  In the two-thread case,
> sometimes the leader exit was detected and sometimes it wasn't. (Recall
> that check_zombie_leaders is polling the state, via
> linux_proc_pid_is_zombie.  The race is between the leader thread passing
> through the zombie state and check_zombie_leaders testing for zombie
> state.)  If leader exit wasn't detected, gdbserver would end up with a
> dangling lwp entry that didn't correspond to any real lwp, and would hang
> waiting for that lwp to stop.  Using PTRACE_EVENT_EXIT guarantees that the
> leader exit will be detected.
> 
> Note that check_zombie_leaders works just fine for the scenarios where the
> leader thread exits and the other threads continue to run, with no exec
> calls.  It is required for systems that don't support the extended ptrace
> events.
> 
> The sequence of events resulting in the race condition was this:
> 
>     1) In the program, a CLONE event for a new thread occurs.
> 
>     2) In the program, both threads are resumed once gdbserver has
>        completed the new thread processing.
> 
>     3) In gdbserver, the function linux_wait_for_event_filtered loops until
>        waitpid returns "no more events" for the SIGCHLD generated by the
>        CLONE event.  Then linux_wait_for_event_filtered calls
>        check_zombie_leaders.
> 
>     4) In the program, the new thread is doing the exec.  During the exec
>        the leader thread will pass through a transitory zombie state.  If
>        there were more than two threads, the leader thread would remain a
>        zombie until all the non-leader, non-exec'ing threads were reaped by
>        gdbserver.  Since there are no such threads to reap, the leader just
>        becomes a zombie and is replaced by the exec'ing thread on-the-fly.
>        (Note that it appears that the leader thread is a zombie just for a
>        very brief instant.)
> 
>     5) In gdbserver, check_zombie_leaders checks whether an lwp entry
>        corresponds to a zombie leader thread, and if so, deletes it.  Here
>        is the race: in (4) above, the leader may or may not be in the
>        transitory zombie state.  In the case where a zombie isn't detected,
>        delete_lwp is not called.
> 
>     6) In gdbserver, an EXEC event is detected and processed.  When it gets
>        ready to report the event to GDB, it calls stop_all_lwps, which sends
>        a SIGSTOP to each lwp in the list and the waits until all the lwps in
>        the list have reported a stop event.  If the zombie leader wasn't
>        detected and processed in step (5), gdbserver blocks forever in
>        linux_wait_for_event_filtered, waiting for the undeleted lwp to be
>        stopped, which will never happen.

This is easy to see with thread-execl.exp, if we revert the PTRACE_EVENT_EXIT
parts:

linux_low_filter_event: pc is 0x3b37209237
pc is 0x3b37209237
stop pc is 0x3b37209237
HEW: Got exec event from LWP 2721
Hit a non-gdbserver trap event.
>>>> entering stop_all_lwps
stop_all_lwps (stop, except=none)
Sending sigstop to lwp 23192
wait_for_sigstop: pulling events
my_waitpid (-1, 0x40000001)
my_waitpid (-1, 0x80000001): status(0), -1
LWFE: waitpid(-1, ...) returned -1, No child processes
leader_pid=2721, leader_lp!=NULL=1, num_lwps=2, zombie=0
sigsuspend'ing
*** hang ***

But I'm having trouble convincing myself that we do need
the PTRACE_EVENT_EXIT handling.

This seems to work for me just as well.  The fix is in
making check_zombie_leaders ignore stopped threads.

  @@ -1629,7 +1597,7 @@ check_zombie_leaders (void)
 		      leader_pid, leader_lp!= NULL, num_lwps (leader_pid),
 		      linux_proc_pid_is_zombie (leader_pid));

  -      if (leader_lp != NULL
  +      if (leader_lp != NULL && !leader_lp->stopped

But I'm in a hurry to leave now.  Maybe I'm missing something else.

>From ab07ee8afc3c2358cd6b23474655fde087e2b36e Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Thu, 13 Nov 2014 17:53:43 +0000
Subject: [PATCH] revert PTRACE_EVENT_EXIT bits

---
 gdb/gdbserver/linux-low.c | 56 +++--------------------------------------------
 1 file changed, 3 insertions(+), 53 deletions(-)

diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
index f308333..316b302 100644
--- a/gdb/gdbserver/linux-low.c
+++ b/gdb/gdbserver/linux-low.c
@@ -538,38 +538,6 @@ handle_extended_wait (struct lwp_info *event_child, int *wstatp)
       event_child->waitstatus.kind = TARGET_WAITKIND_VFORK_DONE;
       return 0;
     }
-  else if (event == PTRACE_EVENT_EXIT)
-    {
-      unsigned long exit_status;
-      unsigned long lwpid = lwpid_of (event_thr);
-      int ret;
-
-      if (debug_threads)
-	debug_printf ("HEW: Got exit event from LWP %ld\n", lwpid );
-
-      ptrace (PTRACE_GETEVENTMSG, lwpid, (PTRACE_TYPE_ARG3) 0, &exit_status);
-
-      if (num_lwps (pid_of (event_thr)) > 1)
-	{
-	  /* If there is at least one more LWP, then the program still
-	     exists and the exit should not be reported to GDB.  */
-	  delete_lwp (event_child);
-	  ret = 1;
-	}
-      else
-	{
-	  /* Set the exit status to the actual exit status, so normal
-	     WIFEXITED/WIFSIGNALLED processing and reporting for the
-	     last lwp in the process can proceed from here.  */
-	  *wstatp = exit_status;
-	  ret = 0;
-	}
-
-      /* Resume the thread so that it actually exits.  Subsequent exit
-	 events for LWPs that were deleted above will be ignored.  */
-      ptrace (PTRACE_CONT, lwpid, (PTRACE_TYPE_ARG3) 0, (PTRACE_TYPE_ARG4) 0);
-      return ret;
-    }
   else if (event == PTRACE_EVENT_EXEC)
     {
       if (debug_threads)
@@ -1629,7 +1597,7 @@ check_zombie_leaders (void)
 		      leader_pid, leader_lp!= NULL, num_lwps (leader_pid),
 		      linux_proc_pid_is_zombie (leader_pid));

-      if (leader_lp != NULL
+      if (leader_lp != NULL && !leader_lp->stopped
 	  /* Check if there are other threads in the group, as we may
 	     have raced with the inferior simply exiting.  */
 	  && !last_thread_of_process_p (leader_pid)
@@ -2107,17 +2075,6 @@ linux_low_filter_event (ptid_t filter_ptid, int lwpid, int *wstatp)
       child = add_lwp (child_ptid);
       child->stopped = 1;
       current_thread = child->thread;
-
-      if (non_stop && stopping_threads == NOT_STOPPING_THREADS)
-	{
-	  /* Make sure we delete the lwp entry for the exec'ing thread,
-	     which will have vanished.  We do this by sending a signal
-	     to all the other threads in the lwp list, deleting any
-	     that are not found.  Note that in all-stop mode this will
-	     happen before reporting the event.  */
-	  stop_all_lwps (0, child);
-	  unstop_all_lwps (0, child);
-	}
     }

   /* If we didn't find a process, one of two things presumably happened:
@@ -2166,8 +2123,7 @@ linux_low_filter_event (ptid_t filter_ptid, int lwpid, int *wstatp)
      architecture being defined already (so that CHILD has a valid
      regcache), and on LAST_STATUS being set (to check for SIGTRAP or
      not).  */
-  if (WIFSTOPPED (wstat)
-      && (linux_ptrace_get_extended_event (wstat) != PTRACE_EVENT_EXIT))
+  if (WIFSTOPPED (wstat))
     {
       if (debug_threads
 	  && the_low_target.get_pc != NULL)
@@ -3486,8 +3442,6 @@ send_sigstop (struct lwp_info *lwp)
 	 calling exec.  See comments in linux_low_filter_event regarding
 	 PTRACE_EVENT_EXEC.  */
       delete_lwp (lwp);
-      set_desired_thread (0);
-
       if (debug_threads)
 	debug_printf ("send_sigstop: lwp %d has vanished\n", pid);
     }
@@ -5565,10 +5519,7 @@ linux_supports_vfork_events (void)
 static int
 linux_supports_exec_events (void)
 {
-  /* Check for PTRACE_O_TRACEEXIT, since our implementation of follow
-     exec depends on this option, which was implemented in a later
-     kernel version than PTRACE_O_TRACEFORK et al.  */
-  return linux_supports_traceexit ();
+  return linux_supports_tracefork ();
 }

 static int
@@ -6588,7 +6539,6 @@ initialize_low (void)
   linux_ptrace_set_requested_options (PTRACE_O_TRACEFORK
 				      | PTRACE_O_TRACEVFORK
 				      | PTRACE_O_TRACEVFORKDONE
-				      | PTRACE_O_TRACEEXIT
 				      | PTRACE_O_TRACEEXEC);
   linux_ptrace_check_options ();
 }
-- 
1.9.3