From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13561 invoked by alias); 18 May 2011 17:13:36 -0000 Received: (qmail 13548 invoked by uid 22791); 18 May 2011 17:13:34 -0000 X-SWARE-Spam-Status: No, hits=-6.4 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,TW_OC,T_FRT_PROFILE2,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 18 May 2011 17:13:16 +0000 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p4IHDGuO014093 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Wed, 18 May 2011 13:13:16 -0400 Received: from host1.jankratochvil.net (ovpn-113-35.phx2.redhat.com [10.3.113.35]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id p4IHDE26026386 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Wed, 18 May 2011 13:13:15 -0400 Received: from host1.jankratochvil.net (localhost [127.0.0.1]) by host1.jankratochvil.net (8.14.4/8.14.4) with ESMTP id p4IHDDIC022542; Wed, 18 May 2011 19:13:13 +0200 Received: (from jkratoch@localhost) by host1.jankratochvil.net (8.14.4/8.14.4/Submit) id p4IHDCqC022532; Wed, 18 May 2011 19:13:12 +0200 Date: Wed, 18 May 2011 17:13:00 -0000 From: Jan Kratochvil To: Doug Evans Cc: gdb-patches@sourceware.org Subject: Re: [patch] Workaround for 10970, 12702, avoid calling waitpid Message-ID: <20110518171311.GA13277@host1.jankratochvil.net> References: <20110517181920.7EE51246196@ruffy.mtv.corp.google.com> <20110517220751.GA14238@host1.jankratochvil.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110517220751.GA14238@host1.jankratochvil.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2011-05/txt/msg00401.txt.bz2 On Wed, 18 May 2011 00:07:52 +0200, Jan Kratochvil wrote: > I guess __WNOTHREAD should be enough but it > does not work, it may be a kernel Bug. Filed for evaluation: > ptrace: __WNOTHREAD does not work > https://bugzilla.redhat.com/show_bug.cgi?id=705583 __WNOTHREAD has a completely different purpose according to Oleg Nesterov in that Bug, __WNOTHREAD is not relevant to this GDB problem at all. > I am against checking in racy code which can be hopefully fixed properly. OK to check in this race-less fix instead? Thanks, Jan gdb/ 2011-05-18 Jan Kratochvil Doug Evans Fix PR 10970, PR 12702. * linux-nat.c (linux_lwp_is_zombie): New function. (wait_lwp): Initialize status. New variable prev_mask. Block signals. Check for linux_lwp_is_zombie. Use WNOHANG and sigsuspend. testsuite/ 2011-05-18 Jan Kratochvil * gdb.threads/leader-exit.c: New file. * gdb.threads/leader-exit.exp: New file. --- a/gdb/linux-nat.c +++ b/gdb/linux-nat.c @@ -2356,6 +2356,33 @@ linux_handle_extended_wait (struct lwp_info *lp, int status, _("unknown ptrace event %d"), event); } +/* Return non-zero if LWP is a zombie. */ + +static int +linux_lwp_is_zombie (long lwp) +{ + char buffer[MAXPATHLEN]; + FILE *procfile; + int retval = 0; + + sprintf (buffer, "/proc/%ld/status", lwp); + procfile = fopen (buffer, "r"); + if (procfile == NULL) + { + warning (_("unable to open /proc file '%s'"), buffer); + return 0; + } + while (fgets (buffer, sizeof (buffer), procfile) != NULL) + if (strcmp (buffer, "State:\tZ (zombie)\n") == 0) + { + retval = 1; + break; + } + fclose (procfile); + + return retval; +} + /* Wait for LP to stop. Returns the wait status, or 0 if the LWP has exited. */ @@ -2363,28 +2390,76 @@ static int wait_lwp (struct lwp_info *lp) { pid_t pid; - int status; + int status = 0; int thread_dead = 0; + sigset_t prev_mask; gdb_assert (!lp->stopped); gdb_assert (lp->status == 0); - pid = my_waitpid (GET_LWP (lp->ptid), &status, 0); - if (pid == -1 && errno == ECHILD) + /* Make sure SIGCHLD is blocked for sigsuspend avoiding a race below. */ + block_child_signals (&prev_mask); + + for (;;) { - pid = my_waitpid (GET_LWP (lp->ptid), &status, __WCLONE); - if (pid == -1 && errno == ECHILD) + /* Bugs 10970, 12702. + Thread group leader may have exited in which case we'll lock up in + waitpid if there are other threads, even if they are all zombies too. + Basically, we're not supposed to use waitpid this way. + __WCLONE is not applicable for the leader so we can't use that. + LINUX_NAT_THREAD_ALIVE cannot be used here as it requires a STOPPED + process; it gets ESRCH both for the zombie and for running processes. + + As a workaround, check if we're waiting for the thread group leader and + if it's a zombie, and avoid calling waitpid if it is. + + This is racy, what if the tgl becomes a zombie right after we check? + Therefore always use WNOHANG with sigsuspend - it is equivalent to + waiting waitpid but the linux_lwp_is_zombie is safe this way. */ + + if (is_lwp (lp->ptid) && GET_PID (lp->ptid) == GET_LWP (lp->ptid) + && linux_lwp_is_zombie (GET_LWP (lp->ptid))) { - /* The thread has previously exited. We need to delete it - now because, for some vendor 2.4 kernels with NPTL - support backported, there won't be an exit event unless - it is the main thread. 2.6 kernels will report an exit - event for each thread that exits, as expected. */ thread_dead = 1; if (debug_linux_nat) - fprintf_unfiltered (gdb_stdlog, "WL: %s vanished.\n", + fprintf_unfiltered (gdb_stdlog, + "WL: Thread group leader %s vanished.\n", target_pid_to_str (lp->ptid)); + break; } + + /* If my_waitpid returns 0 it means the __WCLONE vs. non-__WCLONE kind + was right and we should just call sigsuspend. */ + + pid = my_waitpid (GET_LWP (lp->ptid), &status, WNOHANG); + if (pid != 0 && (pid == -1 && errno == ECHILD)) + pid = my_waitpid (GET_LWP (lp->ptid), &status, __WCLONE | WNOHANG); + if (pid != 0) + break; + + /* Wait for next SIGCHLD and try again. This may let SIGCHLD handlers + get invoked despite our caller had them intentionally blocked by + block_child_signals. This is sensitive only to the loop of + linux_nat_wait_1 and there if we get called my_waitpid gets called + again before it gets to sigsuspend so we can safely let the handlers + get executed here. */ + + sigsuspend (&suspend_mask); + } + + restore_child_signals_mask (&prev_mask); + + if (pid == -1 && errno == ECHILD) + { + /* The thread has previously exited. We need to delete it + now because, for some vendor 2.4 kernels with NPTL + support backported, there won't be an exit event unless + it is the main thread. 2.6 kernels will report an exit + event for each thread that exits, as expected. */ + thread_dead = 1; + if (debug_linux_nat) + fprintf_unfiltered (gdb_stdlog, "WL: %s vanished.\n", + target_pid_to_str (lp->ptid)); } if (!thread_dead) --- /dev/null +++ b/gdb/testsuite/gdb.threads/leader-exit.c @@ -0,0 +1,49 @@ +/* Clean exit of the thread group leader should not break GDB. + + Copyright 2007, 2011 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . */ + +#include +#include +#include + +static volatile pthread_t main_thread; + +static void * +start (void *arg) +{ + int i; + + i = pthread_join (main_thread, NULL); + assert (i == 0); + + return arg; /* break-here */ +} + +int +main (void) +{ + pthread_t thread; + int i; + + main_thread = pthread_self (); + + i = pthread_create (&thread, NULL, start, NULL); + assert (i == 0); + + pthread_exit (NULL); + /* NOTREACHED */ + return 0; +} --- /dev/null +++ b/gdb/testsuite/gdb.threads/leader-exit.exp @@ -0,0 +1,38 @@ +# Copyright (C) 2007, 2011 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . + +# Exit of the thread group leader should not break GDB. + +set testfile "leader-exit" +set srcfile ${testfile}.c +set executable ${testfile} +set binfile ${objdir}/${subdir}/${executable} + +if {[gdb_compile_pthreads "${srcdir}/${subdir}/${srcfile}" "${binfile}" executable {debug}] != "" } { + return -1 +} + +clean_restart ${executable} + +if ![runto_main] { + return -1 +} + +gdb_breakpoint [gdb_get_line_number "break-here"] +gdb_continue_to_breakpoint "break-here" ".* break-here .*" + +gdb_test "info threads" \ + "\r\n\[ \t\]*Id\[ \t\]+Target\[ \t\]+Id\[ \t\]+Frame\[ \t\]*\r\n\\* 2 *Thread \[^\r\n\]* at \[^\r\n\]*" \ + "Single thread has been left"