From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16618 invoked by alias); 17 May 2011 18:19:43 -0000 Received: (qmail 16609 invoked by uid 22791); 17 May 2011 18:19:42 -0000 X-SWARE-Spam-Status: No, hits=-2.3 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,SPF_HELO_PASS,TW_OC,T_FRT_PROFILE2,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from smtp-out.google.com (HELO smtp-out.google.com) (74.125.121.67) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 17 May 2011 18:19:27 +0000 Received: from wpaz13.hot.corp.google.com (wpaz13.hot.corp.google.com [172.24.198.77]) by smtp-out.google.com with ESMTP id p4HIJLan032305; Tue, 17 May 2011 11:19:22 -0700 Received: from ruffy.mtv.corp.google.com (ruffy.mtv.corp.google.com [172.18.118.116]) by wpaz13.hot.corp.google.com with ESMTP id p4HIJK0a013899; Tue, 17 May 2011 11:19:20 -0700 Received: by ruffy.mtv.corp.google.com (Postfix, from userid 67641) id 7EE51246196; Tue, 17 May 2011 11:19:20 -0700 (PDT) To: gdb-patches@sourceware.org, jan.kratochvil@redhat.com Subject: [patch] Workaround for 10970, 12702, avoid calling waitpid Message-Id: <20110517181920.7EE51246196@ruffy.mtv.corp.google.com> Date: Tue, 17 May 2011 18:19:00 -0000 From: dje@google.com (Doug Evans) X-System-Of-Record: true X-IsSubscribed: yes Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2011-05/txt/msg00386.txt.bz2 Hi. A proper fix seems much more invasive, and this works well enough for a large number of cases. I'll be checking it in tomorrow if there are no objections. The testcase handles 10970. I'm not sure how to write a regression test for 12702 without employing something like what I wrote in 12702 to show the bug, or running gdb under gdb in order to effect the necessary sequencing of events. Testing for 10970 (main does pthread_exit) at least exercises the issue if not all the ways it can happen. 2011-05-17 Jan Kratochvil Doug Evans Workaround for bugs 10970, 12702. * linux-nat.c (linux_lwp_is_zombie): New function. (wait_lwp): Try to avoid calling waitpid on thread group leader if it will hang. testsuite/ * gdb.threads/leader-exit.c: New file. * gdb.threads/leader-exit.exp: New file. Index: linux-nat.c =================================================================== RCS file: /cvs/src/src/gdb/linux-nat.c,v retrieving revision 1.204 diff -u -p -r1.204 linux-nat.c --- linux-nat.c 13 May 2011 17:28:19 -0000 1.204 +++ linux-nat.c 17 May 2011 18:04:06 -0000 @@ -2356,6 +2356,33 @@ linux_handle_extended_wait (struct lwp_i _("unknown ptrace event %d"), event); } +/* Return non-zero if LWP is a zombie. */ + +static int +linux_lwp_is_zombie (long lwp) +{ + char buffer[MAXPATHLEN]; + FILE *procfile; + int retval = 0; + + sprintf (buffer, "/proc/%ld/status", lwp); + procfile = fopen (buffer, "r"); + if (procfile == NULL) + { + warning (_("unable to open /proc file '%s'"), buffer); + return 0; + } + while (fgets (buffer, sizeof (buffer), procfile) != NULL) + if (strcmp (buffer, "State:\tZ (zombie)\n") == 0) + { + retval = 1; + break; + } + fclose (procfile); + + return retval; +} + /* Wait for LP to stop. Returns the wait status, or 0 if the LWP has exited. */ @@ -2363,16 +2390,44 @@ static int wait_lwp (struct lwp_info *lp) { pid_t pid; - int status; + int status = 0; int thread_dead = 0; gdb_assert (!lp->stopped); gdb_assert (lp->status == 0); - pid = my_waitpid (GET_LWP (lp->ptid), &status, 0); - if (pid == -1 && errno == ECHILD) + /* Bugs 10970, 12702. + Thread group leader may have exited in which case we'll lock up in + waitpid if there are other threads, even if they are all zombies too. + Basically, we're not supposed to use waitpid this way. + We can call waitpid with WNOHANG and obviously avoid the lock up, but we + don't want that here. + __WCLONE is not applicable for the leader so we can't use that. + LINUX_NAT_THREAD_ALIVE cannot be used here as it requires a STOPPED + process; it gets ESRCH both for the zombie and for running processes. + + As a workaround, check if we're waiting for the thread group leader and + if it's a zombie, and avoid calling waitpid if it is. + This is racy, what if the tgl becomes a zombie right after we check? + But it seems to work in many cases, at least until a better fix/rewrite + can be done. */ + + if (is_lwp (lp->ptid) + && GET_PID (lp->ptid) == GET_LWP (lp->ptid) + && linux_lwp_is_zombie (GET_LWP (lp->ptid))) + { + thread_dead = 1; + if (debug_linux_nat) + fprintf_unfiltered (gdb_stdlog, + "WL: Thread group leader %s vanished.\n", + target_pid_to_str (lp->ptid)); + } + + if (!thread_dead) { - pid = my_waitpid (GET_LWP (lp->ptid), &status, __WCLONE); + pid = my_waitpid (GET_LWP (lp->ptid), &status, 0); + if (pid == -1 && errno == ECHILD) + pid = my_waitpid (GET_LWP (lp->ptid), &status, __WCLONE); if (pid == -1 && errno == ECHILD) { /* The thread has previously exited. We need to delete it Index: testsuite/gdb.threads/leader-exit.c =================================================================== RCS file: testsuite/gdb.threads/leader-exit.c diff -N testsuite/gdb.threads/leader-exit.c --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ testsuite/gdb.threads/leader-exit.c 17 May 2011 18:04:06 -0000 @@ -0,0 +1,43 @@ +/* Clean exit of the thread group leader should not break GDB. + + Copyright 2007, 2011 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . */ + +#include +#include +#include + +static void * +start (void *arg) +{ + for (;;) + pause (); + /* NOTREACHED */ + return arg; +} + +int +main (void) +{ + pthread_t thread; + int i; + + i = pthread_create (&thread, NULL, start, NULL); /* create1 */ + assert (i == 0); + + pthread_exit (NULL); + /* NOTREACHED */ + return 0; +} Index: testsuite/gdb.threads/leader-exit.exp =================================================================== RCS file: testsuite/gdb.threads/leader-exit.exp diff -N testsuite/gdb.threads/leader-exit.exp --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ testsuite/gdb.threads/leader-exit.exp 17 May 2011 18:04:06 -0000 @@ -0,0 +1,71 @@ +# Copyright (C) 2007, 2011 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . + +# Exit of the thread group leader should not break GDB. + +# This file was written by Jan Kratochvil . + +if $tracelevel then { + strace $tracelevel +} + +set testfile "leader-exit" +set srcfile ${testfile}.c +set binfile ${objdir}/${subdir}/${testfile} + +if {[gdb_compile_pthreads "${srcdir}/${subdir}/${srcfile}" "${binfile}" executable {debug}] != "" } { + return -1 +} + +gdb_exit +gdb_start +gdb_reinitialize_dir $srcdir/$subdir +gdb_load ${binfile} +runto_main + +# For this to work we must be sure to consume the "Continuing." +# message first, or GDB's signal handler may not be in place. +# We also want to ensure the thread is running. +send_gdb "continue\n" +gdb_expect { + -re "Continuing.\[\r\n\]+.*New Thread" { + pass "continue" + } + timeout { + fail "continue (timeout)" + } +} + +proc stop_process { description } { + global gdb_prompt + + after 1000 {send_gdb "\003"} + gdb_expect { + -re "Program received signal SIGINT.*$gdb_prompt $" + { + pass $description + } + timeout + { + fail "$description (timeout)" + } + } +} + +stop_process "Threads could be stopped" + +gdb_test "info threads" \ + "\\* 2 *Thread \[^\r\n\]* in \[^\r\n\]*" \ + "Single thread has been left"