From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26676 invoked by alias); 13 Nov 2014 19:18:52 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 26665 invoked by uid 89); 13 Nov 2014 19:18:52 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Thu, 13 Nov 2014 19:18:50 +0000 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id sADJIl0Z005717 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Thu, 13 Nov 2014 14:18:47 -0500 Received: from [127.0.0.1] (ovpn01.gateway.prod.ext.ams2.redhat.com [10.39.146.11]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id sADJIkox015539; Thu, 13 Nov 2014 14:18:46 -0500 Message-ID: <54650415.9020705@redhat.com> Date: Thu, 13 Nov 2014 19:18:00 -0000 From: Pedro Alves User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1 MIME-Version: 1.0 To: Don Breazeal , gdb-patches@sourceware.org Subject: Re: [PATCH 11/16 v3] Extended-remote Linux exit events References: <1408580964-27916-1-git-send-email-donb@codesourcery.com> <1414798134-11536-9-git-send-email-donb@codesourcery.com> In-Reply-To: <1414798134-11536-9-git-send-email-donb@codesourcery.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-SW-Source: 2014-11/txt/msg00282.txt.bz2 On 10/31/2014 11:28 PM, Don Breazeal wrote: > > Now for the long answer: in my testing I ran into a race condition in > check_zombie_leaders, which detects when a thread group leader has exited > and other threads still exist. On the Linux kernel, ptrace/waitpid don't > allow reaping the leader thread until all other threads in the group are > reaped. When the leader exits, it goes zombie, but waitpid will not return > an exit status until the other threads are gone. When a non-leader thread > calls exec, all other non-leader threads are destroyed, the leader becomes > a zombie, and once the "other" threads have been reaped, the execing thread > takes over the leader's pid (tgid) and appears to vanish. In order to > handle this situation in the current implementation, check_zombie_leaders > polls the process state in /proc and deletes thread group leaders that are > in a zombie state. The replacement is added to the lwp list when the exec > event is reported. > > See https://sourceware.org/ml/gdb-patches/2011-10/msg00704.html for a more > detailed explanation of how this works. > > Here is the relevant part of check_zombie_leaders: > > if (leader_lp != NULL > /* Check if there are other threads in the group, as we may > have raced with the inferior simply exiting. */ > && !last_thread_of_process_p (leader_pid) > && linux_proc_pid_is_zombie (leader_pid)) > { > /* ...large informative comment block... */ > delete_lwp (leader_lp); > > The race occurred when there were two threads in the program, and the > non-leader thread called exec. In this case the leader thread passed > through a very brief zombie state before being replaced by the exec'ing > thread as the thread group leader. This state transition was asynchronous, > with no dependency on anything gdbserver did. Because there were no other > threads, there were no thread exit events, and thus there was no > synchronization with the leader passing through the zombie state and the > exec completing. If there had been more threads, the leader would remain > in the zombie state until they were waited for. In the two-thread case, > sometimes the leader exit was detected and sometimes it wasn't. (Recall > that check_zombie_leaders is polling the state, via > linux_proc_pid_is_zombie. The race is between the leader thread passing > through the zombie state and check_zombie_leaders testing for zombie > state.) If leader exit wasn't detected, gdbserver would end up with a > dangling lwp entry that didn't correspond to any real lwp, and would hang > waiting for that lwp to stop. Using PTRACE_EVENT_EXIT guarantees that the > leader exit will be detected. > > Note that check_zombie_leaders works just fine for the scenarios where the > leader thread exits and the other threads continue to run, with no exec > calls. It is required for systems that don't support the extended ptrace > events. > > The sequence of events resulting in the race condition was this: > > 1) In the program, a CLONE event for a new thread occurs. > > 2) In the program, both threads are resumed once gdbserver has > completed the new thread processing. > > 3) In gdbserver, the function linux_wait_for_event_filtered loops until > waitpid returns "no more events" for the SIGCHLD generated by the > CLONE event. Then linux_wait_for_event_filtered calls > check_zombie_leaders. > > 4) In the program, the new thread is doing the exec. During the exec > the leader thread will pass through a transitory zombie state. If > there were more than two threads, the leader thread would remain a > zombie until all the non-leader, non-exec'ing threads were reaped by > gdbserver. Since there are no such threads to reap, the leader just > becomes a zombie and is replaced by the exec'ing thread on-the-fly. > (Note that it appears that the leader thread is a zombie just for a > very brief instant.) > > 5) In gdbserver, check_zombie_leaders checks whether an lwp entry > corresponds to a zombie leader thread, and if so, deletes it. Here > is the race: in (4) above, the leader may or may not be in the > transitory zombie state. In the case where a zombie isn't detected, > delete_lwp is not called. > > 6) In gdbserver, an EXEC event is detected and processed. When it gets > ready to report the event to GDB, it calls stop_all_lwps, which sends > a SIGSTOP to each lwp in the list and the waits until all the lwps in > the list have reported a stop event. If the zombie leader wasn't > detected and processed in step (5), gdbserver blocks forever in > linux_wait_for_event_filtered, waiting for the undeleted lwp to be > stopped, which will never happen. This is easy to see with thread-execl.exp, if we revert the PTRACE_EVENT_EXIT parts: linux_low_filter_event: pc is 0x3b37209237 pc is 0x3b37209237 stop pc is 0x3b37209237 HEW: Got exec event from LWP 2721 Hit a non-gdbserver trap event. >>>> entering stop_all_lwps stop_all_lwps (stop, except=none) Sending sigstop to lwp 23192 wait_for_sigstop: pulling events my_waitpid (-1, 0x40000001) my_waitpid (-1, 0x80000001): status(0), -1 LWFE: waitpid(-1, ...) returned -1, No child processes leader_pid=2721, leader_lp!=NULL=1, num_lwps=2, zombie=0 sigsuspend'ing *** hang *** But I'm having trouble convincing myself that we do need the PTRACE_EVENT_EXIT handling. This seems to work for me just as well. The fix is in making check_zombie_leaders ignore stopped threads. @@ -1629,7 +1597,7 @@ check_zombie_leaders (void) leader_pid, leader_lp!= NULL, num_lwps (leader_pid), linux_proc_pid_is_zombie (leader_pid)); - if (leader_lp != NULL + if (leader_lp != NULL && !leader_lp->stopped But I'm in a hurry to leave now. Maybe I'm missing something else. >From ab07ee8afc3c2358cd6b23474655fde087e2b36e Mon Sep 17 00:00:00 2001 From: Pedro Alves Date: Thu, 13 Nov 2014 17:53:43 +0000 Subject: [PATCH] revert PTRACE_EVENT_EXIT bits --- gdb/gdbserver/linux-low.c | 56 +++-------------------------------------------- 1 file changed, 3 insertions(+), 53 deletions(-) diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c index f308333..316b302 100644 --- a/gdb/gdbserver/linux-low.c +++ b/gdb/gdbserver/linux-low.c @@ -538,38 +538,6 @@ handle_extended_wait (struct lwp_info *event_child, int *wstatp) event_child->waitstatus.kind = TARGET_WAITKIND_VFORK_DONE; return 0; } - else if (event == PTRACE_EVENT_EXIT) - { - unsigned long exit_status; - unsigned long lwpid = lwpid_of (event_thr); - int ret; - - if (debug_threads) - debug_printf ("HEW: Got exit event from LWP %ld\n", lwpid ); - - ptrace (PTRACE_GETEVENTMSG, lwpid, (PTRACE_TYPE_ARG3) 0, &exit_status); - - if (num_lwps (pid_of (event_thr)) > 1) - { - /* If there is at least one more LWP, then the program still - exists and the exit should not be reported to GDB. */ - delete_lwp (event_child); - ret = 1; - } - else - { - /* Set the exit status to the actual exit status, so normal - WIFEXITED/WIFSIGNALLED processing and reporting for the - last lwp in the process can proceed from here. */ - *wstatp = exit_status; - ret = 0; - } - - /* Resume the thread so that it actually exits. Subsequent exit - events for LWPs that were deleted above will be ignored. */ - ptrace (PTRACE_CONT, lwpid, (PTRACE_TYPE_ARG3) 0, (PTRACE_TYPE_ARG4) 0); - return ret; - } else if (event == PTRACE_EVENT_EXEC) { if (debug_threads) @@ -1629,7 +1597,7 @@ check_zombie_leaders (void) leader_pid, leader_lp!= NULL, num_lwps (leader_pid), linux_proc_pid_is_zombie (leader_pid)); - if (leader_lp != NULL + if (leader_lp != NULL && !leader_lp->stopped /* Check if there are other threads in the group, as we may have raced with the inferior simply exiting. */ && !last_thread_of_process_p (leader_pid) @@ -2107,17 +2075,6 @@ linux_low_filter_event (ptid_t filter_ptid, int lwpid, int *wstatp) child = add_lwp (child_ptid); child->stopped = 1; current_thread = child->thread; - - if (non_stop && stopping_threads == NOT_STOPPING_THREADS) - { - /* Make sure we delete the lwp entry for the exec'ing thread, - which will have vanished. We do this by sending a signal - to all the other threads in the lwp list, deleting any - that are not found. Note that in all-stop mode this will - happen before reporting the event. */ - stop_all_lwps (0, child); - unstop_all_lwps (0, child); - } } /* If we didn't find a process, one of two things presumably happened: @@ -2166,8 +2123,7 @@ linux_low_filter_event (ptid_t filter_ptid, int lwpid, int *wstatp) architecture being defined already (so that CHILD has a valid regcache), and on LAST_STATUS being set (to check for SIGTRAP or not). */ - if (WIFSTOPPED (wstat) - && (linux_ptrace_get_extended_event (wstat) != PTRACE_EVENT_EXIT)) + if (WIFSTOPPED (wstat)) { if (debug_threads && the_low_target.get_pc != NULL) @@ -3486,8 +3442,6 @@ send_sigstop (struct lwp_info *lwp) calling exec. See comments in linux_low_filter_event regarding PTRACE_EVENT_EXEC. */ delete_lwp (lwp); - set_desired_thread (0); - if (debug_threads) debug_printf ("send_sigstop: lwp %d has vanished\n", pid); } @@ -5565,10 +5519,7 @@ linux_supports_vfork_events (void) static int linux_supports_exec_events (void) { - /* Check for PTRACE_O_TRACEEXIT, since our implementation of follow - exec depends on this option, which was implemented in a later - kernel version than PTRACE_O_TRACEFORK et al. */ - return linux_supports_traceexit (); + return linux_supports_tracefork (); } static int @@ -6588,7 +6539,6 @@ initialize_low (void) linux_ptrace_set_requested_options (PTRACE_O_TRACEFORK | PTRACE_O_TRACEVFORK | PTRACE_O_TRACEVFORKDONE - | PTRACE_O_TRACEEXIT | PTRACE_O_TRACEEXEC); linux_ptrace_check_options (); } -- 1.9.3