* [patch] Workaround for 10970, 12702, avoid calling waitpid
@ 2011-05-17 18:19 Doug Evans
2011-05-17 22:08 ` Jan Kratochvil
0 siblings, 1 reply; 10+ messages in thread
From: Doug Evans @ 2011-05-17 18:19 UTC (permalink / raw)
To: gdb-patches, jan.kratochvil
Hi.
A proper fix seems much more invasive, and this works well enough
for a large number of cases.
I'll be checking it in tomorrow if there are no objections.
The testcase handles 10970.
I'm not sure how to write a regression test for 12702 without
employing something like what I wrote in 12702 to show the bug,
or running gdb under gdb in order to effect the necessary
sequencing of events. Testing for 10970 (main does pthread_exit)
at least exercises the issue if not all the ways it can happen.
2011-05-17 Jan Kratochvil <jan.kratochvil@redhat.com>
Doug Evans <dje@google.com>
Workaround for bugs 10970, 12702.
* linux-nat.c (linux_lwp_is_zombie): New function.
(wait_lwp): Try to avoid calling waitpid on thread group leader
if it will hang.
testsuite/
* gdb.threads/leader-exit.c: New file.
* gdb.threads/leader-exit.exp: New file.
Index: linux-nat.c
===================================================================
RCS file: /cvs/src/src/gdb/linux-nat.c,v
retrieving revision 1.204
diff -u -p -r1.204 linux-nat.c
--- linux-nat.c 13 May 2011 17:28:19 -0000 1.204
+++ linux-nat.c 17 May 2011 18:04:06 -0000
@@ -2356,6 +2356,33 @@ linux_handle_extended_wait (struct lwp_i
_("unknown ptrace event %d"), event);
}
+/* Return non-zero if LWP is a zombie. */
+
+static int
+linux_lwp_is_zombie (long lwp)
+{
+ char buffer[MAXPATHLEN];
+ FILE *procfile;
+ int retval = 0;
+
+ sprintf (buffer, "/proc/%ld/status", lwp);
+ procfile = fopen (buffer, "r");
+ if (procfile == NULL)
+ {
+ warning (_("unable to open /proc file '%s'"), buffer);
+ return 0;
+ }
+ while (fgets (buffer, sizeof (buffer), procfile) != NULL)
+ if (strcmp (buffer, "State:\tZ (zombie)\n") == 0)
+ {
+ retval = 1;
+ break;
+ }
+ fclose (procfile);
+
+ return retval;
+}
+
/* Wait for LP to stop. Returns the wait status, or 0 if the LWP has
exited. */
@@ -2363,16 +2390,44 @@ static int
wait_lwp (struct lwp_info *lp)
{
pid_t pid;
- int status;
+ int status = 0;
int thread_dead = 0;
gdb_assert (!lp->stopped);
gdb_assert (lp->status == 0);
- pid = my_waitpid (GET_LWP (lp->ptid), &status, 0);
- if (pid == -1 && errno == ECHILD)
+ /* Bugs 10970, 12702.
+ Thread group leader may have exited in which case we'll lock up in
+ waitpid if there are other threads, even if they are all zombies too.
+ Basically, we're not supposed to use waitpid this way.
+ We can call waitpid with WNOHANG and obviously avoid the lock up, but we
+ don't want that here.
+ __WCLONE is not applicable for the leader so we can't use that.
+ LINUX_NAT_THREAD_ALIVE cannot be used here as it requires a STOPPED
+ process; it gets ESRCH both for the zombie and for running processes.
+
+ As a workaround, check if we're waiting for the thread group leader and
+ if it's a zombie, and avoid calling waitpid if it is.
+ This is racy, what if the tgl becomes a zombie right after we check?
+ But it seems to work in many cases, at least until a better fix/rewrite
+ can be done. */
+
+ if (is_lwp (lp->ptid)
+ && GET_PID (lp->ptid) == GET_LWP (lp->ptid)
+ && linux_lwp_is_zombie (GET_LWP (lp->ptid)))
+ {
+ thread_dead = 1;
+ if (debug_linux_nat)
+ fprintf_unfiltered (gdb_stdlog,
+ "WL: Thread group leader %s vanished.\n",
+ target_pid_to_str (lp->ptid));
+ }
+
+ if (!thread_dead)
{
- pid = my_waitpid (GET_LWP (lp->ptid), &status, __WCLONE);
+ pid = my_waitpid (GET_LWP (lp->ptid), &status, 0);
+ if (pid == -1 && errno == ECHILD)
+ pid = my_waitpid (GET_LWP (lp->ptid), &status, __WCLONE);
if (pid == -1 && errno == ECHILD)
{
/* The thread has previously exited. We need to delete it
Index: testsuite/gdb.threads/leader-exit.c
===================================================================
RCS file: testsuite/gdb.threads/leader-exit.c
diff -N testsuite/gdb.threads/leader-exit.c
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ testsuite/gdb.threads/leader-exit.c 17 May 2011 18:04:06 -0000
@@ -0,0 +1,43 @@
+/* Clean exit of the thread group leader should not break GDB.
+
+ Copyright 2007, 2011 Free Software Foundation, Inc.
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>. */
+
+#include <pthread.h>
+#include <assert.h>
+#include <unistd.h>
+
+static void *
+start (void *arg)
+{
+ for (;;)
+ pause ();
+ /* NOTREACHED */
+ return arg;
+}
+
+int
+main (void)
+{
+ pthread_t thread;
+ int i;
+
+ i = pthread_create (&thread, NULL, start, NULL); /* create1 */
+ assert (i == 0);
+
+ pthread_exit (NULL);
+ /* NOTREACHED */
+ return 0;
+}
Index: testsuite/gdb.threads/leader-exit.exp
===================================================================
RCS file: testsuite/gdb.threads/leader-exit.exp
diff -N testsuite/gdb.threads/leader-exit.exp
--- /dev/null 1 Jan 1970 00:00:00 -0000
+++ testsuite/gdb.threads/leader-exit.exp 17 May 2011 18:04:06 -0000
@@ -0,0 +1,71 @@
+# Copyright (C) 2007, 2011 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+# Exit of the thread group leader should not break GDB.
+
+# This file was written by Jan Kratochvil <jan.kratochvil@redhat.com>.
+
+if $tracelevel then {
+ strace $tracelevel
+}
+
+set testfile "leader-exit"
+set srcfile ${testfile}.c
+set binfile ${objdir}/${subdir}/${testfile}
+
+if {[gdb_compile_pthreads "${srcdir}/${subdir}/${srcfile}" "${binfile}" executable {debug}] != "" } {
+ return -1
+}
+
+gdb_exit
+gdb_start
+gdb_reinitialize_dir $srcdir/$subdir
+gdb_load ${binfile}
+runto_main
+
+# For this to work we must be sure to consume the "Continuing."
+# message first, or GDB's signal handler may not be in place.
+# We also want to ensure the thread is running.
+send_gdb "continue\n"
+gdb_expect {
+ -re "Continuing.\[\r\n\]+.*New Thread" {
+ pass "continue"
+ }
+ timeout {
+ fail "continue (timeout)"
+ }
+}
+
+proc stop_process { description } {
+ global gdb_prompt
+
+ after 1000 {send_gdb "\003"}
+ gdb_expect {
+ -re "Program received signal SIGINT.*$gdb_prompt $"
+ {
+ pass $description
+ }
+ timeout
+ {
+ fail "$description (timeout)"
+ }
+ }
+}
+
+stop_process "Threads could be stopped"
+
+gdb_test "info threads" \
+ "\\* 2 *Thread \[^\r\n\]* in \[^\r\n\]*" \
+ "Single thread has been left"
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] Workaround for 10970, 12702, avoid calling waitpid
2011-05-17 18:19 [patch] Workaround for 10970, 12702, avoid calling waitpid Doug Evans
@ 2011-05-17 22:08 ` Jan Kratochvil
2011-05-18 17:13 ` Jan Kratochvil
0 siblings, 1 reply; 10+ messages in thread
From: Jan Kratochvil @ 2011-05-17 22:08 UTC (permalink / raw)
To: Doug Evans; +Cc: gdb-patches
On Tue, 17 May 2011 20:19:20 +0200, Doug Evans wrote:
> A proper fix seems much more invasive,
Which proper fix do you mean? I guess __WNOTHREAD should be enough but it
does not work, it may be a kernel Bug. Filed for evaluation:
ptrace: __WNOTHREAD does not work
https://bugzilla.redhat.com/show_bug.cgi?id=705583
If it is a kernel Bug another issue is how it could be made backward compatible
with existing kernels and whether such fix should be in FSF GDB. Wouldn't a
non-racy backward fix be possible using blocked SIGCHLD, sigsuspend and
WNOHANG?
BTW the Fedora patch also changed linux_xfer_partial:
/* nptl_db expects being able to transfer memory just by specifying PID.
After the thread group leader exists the Linux kernel turns the task
into zombie no longer permitting accesses to its memory.
Transfer the memory from an arbitrary LWP_LIST entry in such case. */
which no longer seems to be needed, I guess due to Pedro's fix proving:
2008-07-10 85fdbde9dfc62928a36ece6c53a42b59e28b31e7 Non-stop linux native.
/* Access an lwp we know is stopped. */
info->proc_handle.ptid = ptid;
while the fix by me was written before - 2007-07-07.
> and this works well enough for a large number of cases.
Providing fixed testcase but it does not work as it already exploits the race
in the fix.
> I'll be checking it in tomorrow if there are no objections.
I am against checking in racy code which can be hopefully fixed properly.
Thanks,
Jan
testsuite/
2011-05-17 Jan Kratochvil <jan.kratochvil@redhat.com>
* gdb.threads/leader-exit.c: New file.
* gdb.threads/leader-exit.exp: New file.
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/leader-exit.c
@@ -0,0 +1,49 @@
+/* Clean exit of the thread group leader should not break GDB.
+
+ Copyright 2007, 2011 Free Software Foundation, Inc.
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>. */
+
+#include <pthread.h>
+#include <assert.h>
+#include <unistd.h>
+
+static volatile pthread_t main_thread;
+
+static void *
+start (void *arg)
+{
+ int i;
+
+ i = pthread_join (main_thread, NULL);
+ assert (i == 0);
+
+ return arg; /* break-here */
+}
+
+int
+main (void)
+{
+ pthread_t thread;
+ int i;
+
+ main_thread = pthread_self ();
+
+ i = pthread_create (&thread, NULL, start, NULL);
+ assert (i == 0);
+
+ pthread_exit (NULL);
+ /* NOTREACHED */
+ return 0;
+}
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/leader-exit.exp
@@ -0,0 +1,38 @@
+# Copyright (C) 2007, 2011 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+# Exit of the thread group leader should not break GDB.
+
+set testfile "leader-exit"
+set srcfile ${testfile}.c
+set executable ${testfile}
+set binfile ${objdir}/${subdir}/${executable}
+
+if {[gdb_compile_pthreads "${srcdir}/${subdir}/${srcfile}" "${binfile}" executable {debug}] != "" } {
+ return -1
+}
+
+clean_restart ${executable}
+
+if ![runto_main] {
+ return -1
+}
+
+gdb_breakpoint [gdb_get_line_number "break-here"]
+gdb_continue_to_breakpoint "break-here" ".* break-here .*"
+
+gdb_test "info threads" \
+ "\r\n\[ \t\]*Id\[ \t\]+Target\[ \t\]+Id\[ \t\]+Frame\[ \t\]*\r\n\\* 2 *Thread \[^\r\n\]* at \[^\r\n\]*" \
+ "Single thread has been left"
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] Workaround for 10970, 12702, avoid calling waitpid
2011-05-17 22:08 ` Jan Kratochvil
@ 2011-05-18 17:13 ` Jan Kratochvil
2011-05-18 18:17 ` Pedro Alves
0 siblings, 1 reply; 10+ messages in thread
From: Jan Kratochvil @ 2011-05-18 17:13 UTC (permalink / raw)
To: Doug Evans; +Cc: gdb-patches
On Wed, 18 May 2011 00:07:52 +0200, Jan Kratochvil wrote:
> I guess __WNOTHREAD should be enough but it
> does not work, it may be a kernel Bug. Filed for evaluation:
> ptrace: __WNOTHREAD does not work
> https://bugzilla.redhat.com/show_bug.cgi?id=705583
__WNOTHREAD has a completely different purpose according to Oleg Nesterov in
that Bug, __WNOTHREAD is not relevant to this GDB problem at all.
> I am against checking in racy code which can be hopefully fixed properly.
OK to check in this race-less fix instead?
Thanks,
Jan
gdb/
2011-05-18 Jan Kratochvil <jan.kratochvil@redhat.com>
Doug Evans <dje@google.com>
Fix PR 10970, PR 12702.
* linux-nat.c (linux_lwp_is_zombie): New function.
(wait_lwp): Initialize status. New variable prev_mask. Block signals.
Check for linux_lwp_is_zombie. Use WNOHANG and sigsuspend.
testsuite/
2011-05-18 Jan Kratochvil <jan.kratochvil@redhat.com>
* gdb.threads/leader-exit.c: New file.
* gdb.threads/leader-exit.exp: New file.
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -2356,6 +2356,33 @@ linux_handle_extended_wait (struct lwp_info *lp, int status,
_("unknown ptrace event %d"), event);
}
+/* Return non-zero if LWP is a zombie. */
+
+static int
+linux_lwp_is_zombie (long lwp)
+{
+ char buffer[MAXPATHLEN];
+ FILE *procfile;
+ int retval = 0;
+
+ sprintf (buffer, "/proc/%ld/status", lwp);
+ procfile = fopen (buffer, "r");
+ if (procfile == NULL)
+ {
+ warning (_("unable to open /proc file '%s'"), buffer);
+ return 0;
+ }
+ while (fgets (buffer, sizeof (buffer), procfile) != NULL)
+ if (strcmp (buffer, "State:\tZ (zombie)\n") == 0)
+ {
+ retval = 1;
+ break;
+ }
+ fclose (procfile);
+
+ return retval;
+}
+
/* Wait for LP to stop. Returns the wait status, or 0 if the LWP has
exited. */
@@ -2363,28 +2390,76 @@ static int
wait_lwp (struct lwp_info *lp)
{
pid_t pid;
- int status;
+ int status = 0;
int thread_dead = 0;
+ sigset_t prev_mask;
gdb_assert (!lp->stopped);
gdb_assert (lp->status == 0);
- pid = my_waitpid (GET_LWP (lp->ptid), &status, 0);
- if (pid == -1 && errno == ECHILD)
+ /* Make sure SIGCHLD is blocked for sigsuspend avoiding a race below. */
+ block_child_signals (&prev_mask);
+
+ for (;;)
{
- pid = my_waitpid (GET_LWP (lp->ptid), &status, __WCLONE);
- if (pid == -1 && errno == ECHILD)
+ /* Bugs 10970, 12702.
+ Thread group leader may have exited in which case we'll lock up in
+ waitpid if there are other threads, even if they are all zombies too.
+ Basically, we're not supposed to use waitpid this way.
+ __WCLONE is not applicable for the leader so we can't use that.
+ LINUX_NAT_THREAD_ALIVE cannot be used here as it requires a STOPPED
+ process; it gets ESRCH both for the zombie and for running processes.
+
+ As a workaround, check if we're waiting for the thread group leader and
+ if it's a zombie, and avoid calling waitpid if it is.
+
+ This is racy, what if the tgl becomes a zombie right after we check?
+ Therefore always use WNOHANG with sigsuspend - it is equivalent to
+ waiting waitpid but the linux_lwp_is_zombie is safe this way. */
+
+ if (is_lwp (lp->ptid) && GET_PID (lp->ptid) == GET_LWP (lp->ptid)
+ && linux_lwp_is_zombie (GET_LWP (lp->ptid)))
{
- /* The thread has previously exited. We need to delete it
- now because, for some vendor 2.4 kernels with NPTL
- support backported, there won't be an exit event unless
- it is the main thread. 2.6 kernels will report an exit
- event for each thread that exits, as expected. */
thread_dead = 1;
if (debug_linux_nat)
- fprintf_unfiltered (gdb_stdlog, "WL: %s vanished.\n",
+ fprintf_unfiltered (gdb_stdlog,
+ "WL: Thread group leader %s vanished.\n",
target_pid_to_str (lp->ptid));
+ break;
}
+
+ /* If my_waitpid returns 0 it means the __WCLONE vs. non-__WCLONE kind
+ was right and we should just call sigsuspend. */
+
+ pid = my_waitpid (GET_LWP (lp->ptid), &status, WNOHANG);
+ if (pid != 0 && (pid == -1 && errno == ECHILD))
+ pid = my_waitpid (GET_LWP (lp->ptid), &status, __WCLONE | WNOHANG);
+ if (pid != 0)
+ break;
+
+ /* Wait for next SIGCHLD and try again. This may let SIGCHLD handlers
+ get invoked despite our caller had them intentionally blocked by
+ block_child_signals. This is sensitive only to the loop of
+ linux_nat_wait_1 and there if we get called my_waitpid gets called
+ again before it gets to sigsuspend so we can safely let the handlers
+ get executed here. */
+
+ sigsuspend (&suspend_mask);
+ }
+
+ restore_child_signals_mask (&prev_mask);
+
+ if (pid == -1 && errno == ECHILD)
+ {
+ /* The thread has previously exited. We need to delete it
+ now because, for some vendor 2.4 kernels with NPTL
+ support backported, there won't be an exit event unless
+ it is the main thread. 2.6 kernels will report an exit
+ event for each thread that exits, as expected. */
+ thread_dead = 1;
+ if (debug_linux_nat)
+ fprintf_unfiltered (gdb_stdlog, "WL: %s vanished.\n",
+ target_pid_to_str (lp->ptid));
}
if (!thread_dead)
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/leader-exit.c
@@ -0,0 +1,49 @@
+/* Clean exit of the thread group leader should not break GDB.
+
+ Copyright 2007, 2011 Free Software Foundation, Inc.
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>. */
+
+#include <pthread.h>
+#include <assert.h>
+#include <unistd.h>
+
+static volatile pthread_t main_thread;
+
+static void *
+start (void *arg)
+{
+ int i;
+
+ i = pthread_join (main_thread, NULL);
+ assert (i == 0);
+
+ return arg; /* break-here */
+}
+
+int
+main (void)
+{
+ pthread_t thread;
+ int i;
+
+ main_thread = pthread_self ();
+
+ i = pthread_create (&thread, NULL, start, NULL);
+ assert (i == 0);
+
+ pthread_exit (NULL);
+ /* NOTREACHED */
+ return 0;
+}
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/leader-exit.exp
@@ -0,0 +1,38 @@
+# Copyright (C) 2007, 2011 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+# Exit of the thread group leader should not break GDB.
+
+set testfile "leader-exit"
+set srcfile ${testfile}.c
+set executable ${testfile}
+set binfile ${objdir}/${subdir}/${executable}
+
+if {[gdb_compile_pthreads "${srcdir}/${subdir}/${srcfile}" "${binfile}" executable {debug}] != "" } {
+ return -1
+}
+
+clean_restart ${executable}
+
+if ![runto_main] {
+ return -1
+}
+
+gdb_breakpoint [gdb_get_line_number "break-here"]
+gdb_continue_to_breakpoint "break-here" ".* break-here .*"
+
+gdb_test "info threads" \
+ "\r\n\[ \t\]*Id\[ \t\]+Target\[ \t\]+Id\[ \t\]+Frame\[ \t\]*\r\n\\* 2 *Thread \[^\r\n\]* at \[^\r\n\]*" \
+ "Single thread has been left"
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] Workaround for 10970, 12702, avoid calling waitpid
2011-05-18 17:13 ` Jan Kratochvil
@ 2011-05-18 18:17 ` Pedro Alves
2011-05-18 18:53 ` Jan Kratochvil
0 siblings, 1 reply; 10+ messages in thread
From: Pedro Alves @ 2011-05-18 18:17 UTC (permalink / raw)
To: gdb-patches; +Cc: Jan Kratochvil, Doug Evans
On Wednesday 18 May 2011 18:13:11, Jan Kratochvil wrote:
> + if (is_lwp (lp->ptid) && GET_PID (lp->ptid) == GET_LWP (lp->ptid)
> + && linux_lwp_is_zombie (GET_LWP (lp->ptid)))
When can is_lwp(lp->ptid) be false?
> + if (pid != 0 && (pid == -1 && errno == ECHILD))
pid != 0 check looks redundant as is. Is it there to
try to make this more readable? IMO, it isn't because
whenever I read this, I'll stop and ponder whether
there's a bug here, due to the redundancy. :-)
FWIW, this looks good to me, but I wonder whether
checking for zombieness after the waitpids wouldn't
avoid a few /proc/ reads in the common case.
The inferior exit code reported to the core/user will be the
exit code of the last LWP that exits, instead of the
leader's exit code. We've stepped out of C realm when
the leader exited, so I'm not sure that matters (or how
to make it be != 0, even).
--
Pedro Alves
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] Workaround for 10970, 12702, avoid calling waitpid
2011-05-18 18:17 ` Pedro Alves
@ 2011-05-18 18:53 ` Jan Kratochvil
2011-05-18 19:17 ` Pedro Alves
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Jan Kratochvil @ 2011-05-18 18:53 UTC (permalink / raw)
To: Pedro Alves; +Cc: gdb-patches, Doug Evans
On Wed, 18 May 2011 20:16:59 +0200, Pedro Alves wrote:
> On Wednesday 18 May 2011 18:13:11, Jan Kratochvil wrote:
> > + if (is_lwp (lp->ptid) && GET_PID (lp->ptid) == GET_LWP (lp->ptid)
> > + && linux_lwp_is_zombie (GET_LWP (lp->ptid)))
>
> When can is_lwp(lp->ptid) be false?
Removed. IIRC it could be false before (2007) which I agree is irrelevant
now.
> > + if (pid != 0 && (pid == -1 && errno == ECHILD))
>
> pid != 0 check looks redundant as is. Is it there to
> try to make this more readable?
No, more a coding thinko, removed.
> FWIW, this looks good to me, but I wonder whether
> checking for zombieness after the waitpids wouldn't
> avoid a few /proc/ reads in the common case.
OK, changed the order which does not matter there; still the syscalls
efficiency of linux-nat is far from perfect.
> The inferior exit code reported to the core/user will be the
> exit code of the last LWP that exits, instead of the
> leader's exit code. We've stepped out of C realm when
> the leader exited, so I'm not sure that matters (or how
> to make it be != 0, even).
I do not understand this comment much. That != 0 is here for PID from waitpid
which is unrelated to the exit code (present in STATUS).
No regressions on {x86_64,x86_64-m32,i686}-fedora15-linux-gnu.
I will check it in if no futher comments appear.
Thanks for the review,
Jan
gdb/
2011-05-18 Jan Kratochvil <jan.kratochvil@redhat.com>
Doug Evans <dje@google.com>
Fix PR 10970, PR 12702.
* linux-nat.c (linux_lwp_is_zombie): New function.
(wait_lwp): Initialize status. New variable prev_mask. Block signals.
Check for linux_lwp_is_zombie. Use WNOHANG and sigsuspend.
testsuite/
2011-05-18 Jan Kratochvil <jan.kratochvil@redhat.com>
* gdb.threads/leader-exit.c: New file.
* gdb.threads/leader-exit.exp: New file.
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -2356,6 +2356,33 @@ linux_handle_extended_wait (struct lwp_info *lp, int status,
_("unknown ptrace event %d"), event);
}
+/* Return non-zero if LWP is a zombie. */
+
+static int
+linux_lwp_is_zombie (long lwp)
+{
+ char buffer[MAXPATHLEN];
+ FILE *procfile;
+ int retval = 0;
+
+ sprintf (buffer, "/proc/%ld/status", lwp);
+ procfile = fopen (buffer, "r");
+ if (procfile == NULL)
+ {
+ warning (_("unable to open /proc file '%s'"), buffer);
+ return 0;
+ }
+ while (fgets (buffer, sizeof (buffer), procfile) != NULL)
+ if (strcmp (buffer, "State:\tZ (zombie)\n") == 0)
+ {
+ retval = 1;
+ break;
+ }
+ fclose (procfile);
+
+ return retval;
+}
+
/* Wait for LP to stop. Returns the wait status, or 0 if the LWP has
exited. */
@@ -2363,28 +2390,76 @@ static int
wait_lwp (struct lwp_info *lp)
{
pid_t pid;
- int status;
+ int status = 0;
int thread_dead = 0;
+ sigset_t prev_mask;
gdb_assert (!lp->stopped);
gdb_assert (lp->status == 0);
- pid = my_waitpid (GET_LWP (lp->ptid), &status, 0);
- if (pid == -1 && errno == ECHILD)
+ /* Make sure SIGCHLD is blocked for sigsuspend avoiding a race below. */
+ block_child_signals (&prev_mask);
+
+ for (;;)
{
- pid = my_waitpid (GET_LWP (lp->ptid), &status, __WCLONE);
+ /* If my_waitpid returns 0 it means the __WCLONE vs. non-__WCLONE kind
+ was right and we should just call sigsuspend. */
+
+ pid = my_waitpid (GET_LWP (lp->ptid), &status, WNOHANG);
if (pid == -1 && errno == ECHILD)
+ pid = my_waitpid (GET_LWP (lp->ptid), &status, __WCLONE | WNOHANG);
+ if (pid != 0)
+ break;
+
+ /* Bugs 10970, 12702.
+ Thread group leader may have exited in which case we'll lock up in
+ waitpid if there are other threads, even if they are all zombies too.
+ Basically, we're not supposed to use waitpid this way.
+ __WCLONE is not applicable for the leader so we can't use that.
+ LINUX_NAT_THREAD_ALIVE cannot be used here as it requires a STOPPED
+ process; it gets ESRCH both for the zombie and for running processes.
+
+ As a workaround, check if we're waiting for the thread group leader and
+ if it's a zombie, and avoid calling waitpid if it is.
+
+ This is racy, what if the tgl becomes a zombie right after we check?
+ Therefore always use WNOHANG with sigsuspend - it is equivalent to
+ waiting waitpid but the linux_lwp_is_zombie is safe this way. */
+
+ if (GET_PID (lp->ptid) == GET_LWP (lp->ptid)
+ && linux_lwp_is_zombie (GET_LWP (lp->ptid)))
{
- /* The thread has previously exited. We need to delete it
- now because, for some vendor 2.4 kernels with NPTL
- support backported, there won't be an exit event unless
- it is the main thread. 2.6 kernels will report an exit
- event for each thread that exits, as expected. */
thread_dead = 1;
if (debug_linux_nat)
- fprintf_unfiltered (gdb_stdlog, "WL: %s vanished.\n",
+ fprintf_unfiltered (gdb_stdlog,
+ "WL: Thread group leader %s vanished.\n",
target_pid_to_str (lp->ptid));
+ break;
}
+
+ /* Wait for next SIGCHLD and try again. This may let SIGCHLD handlers
+ get invoked despite our caller had them intentionally blocked by
+ block_child_signals. This is sensitive only to the loop of
+ linux_nat_wait_1 and there if we get called my_waitpid gets called
+ again before it gets to sigsuspend so we can safely let the handlers
+ get executed here. */
+
+ sigsuspend (&suspend_mask);
+ }
+
+ restore_child_signals_mask (&prev_mask);
+
+ if (pid == -1 && errno == ECHILD)
+ {
+ /* The thread has previously exited. We need to delete it
+ now because, for some vendor 2.4 kernels with NPTL
+ support backported, there won't be an exit event unless
+ it is the main thread. 2.6 kernels will report an exit
+ event for each thread that exits, as expected. */
+ thread_dead = 1;
+ if (debug_linux_nat)
+ fprintf_unfiltered (gdb_stdlog, "WL: %s vanished.\n",
+ target_pid_to_str (lp->ptid));
}
if (!thread_dead)
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/leader-exit.c
@@ -0,0 +1,49 @@
+/* Clean exit of the thread group leader should not break GDB.
+
+ Copyright 2007, 2011 Free Software Foundation, Inc.
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>. */
+
+#include <pthread.h>
+#include <assert.h>
+#include <unistd.h>
+
+static volatile pthread_t main_thread;
+
+static void *
+start (void *arg)
+{
+ int i;
+
+ i = pthread_join (main_thread, NULL);
+ assert (i == 0);
+
+ return arg; /* break-here */
+}
+
+int
+main (void)
+{
+ pthread_t thread;
+ int i;
+
+ main_thread = pthread_self ();
+
+ i = pthread_create (&thread, NULL, start, NULL);
+ assert (i == 0);
+
+ pthread_exit (NULL);
+ /* NOTREACHED */
+ return 0;
+}
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/leader-exit.exp
@@ -0,0 +1,38 @@
+# Copyright (C) 2007, 2011 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+# Exit of the thread group leader should not break GDB.
+
+set testfile "leader-exit"
+set srcfile ${testfile}.c
+set executable ${testfile}
+set binfile ${objdir}/${subdir}/${executable}
+
+if {[gdb_compile_pthreads "${srcdir}/${subdir}/${srcfile}" "${binfile}" executable {debug}] != "" } {
+ return -1
+}
+
+clean_restart ${executable}
+
+if ![runto_main] {
+ return -1
+}
+
+gdb_breakpoint [gdb_get_line_number "break-here"]
+gdb_continue_to_breakpoint "break-here" ".* break-here .*"
+
+gdb_test "info threads" \
+ "\r\n\[ \t\]*Id\[ \t\]+Target\[ \t\]+Id\[ \t\]+Frame\[ \t\]*\r\n\\* 2 *Thread \[^\r\n\]* at \[^\r\n\]*" \
+ "Single thread has been left"
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] Workaround for 10970, 12702, avoid calling waitpid
2011-05-18 18:53 ` Jan Kratochvil
@ 2011-05-18 19:17 ` Pedro Alves
2011-05-18 19:32 ` Jan Kratochvil
2011-05-18 22:21 ` Doug Evans
2011-05-27 17:04 ` Jan Kratochvil
2 siblings, 1 reply; 10+ messages in thread
From: Pedro Alves @ 2011-05-18 19:17 UTC (permalink / raw)
To: Jan Kratochvil; +Cc: gdb-patches, Doug Evans
On Wednesday 18 May 2011 19:53:02, Jan Kratochvil wrote:
> > The inferior exit code reported to the core/user will be the
> > exit code of the last LWP that exits, instead of the
> > leader's exit code. We've stepped out of C realm when
> > the leader exited, so I'm not sure that matters (or how
> > to make it be != 0, even).
>
> I do not understand this comment much. That != 0 is here for PID from waitpid
> which is unrelated to the exit code (present in STATUS).
I was talking about the exit code of the last cloned
lwp to exit ending up as exit code of the inferior
(TARGET_WAITKIND_EXITED's value.integer), not the
return of that waitpid call.
linux_nat_filter_event
...
if (num_lwps (GET_PID (lp->ptid)) > 1)
{
/* If there is at least one more LWP, then the exit signal
was not the end of the debugged application and should be
ignored. */
exit_lwp (lp);
return NULL;
}
...
linux_nat_wait_1
store_waitstatus (ourstatus, status);
When the last lwp exits, we'll report it's exit code as the
inferior exit code to infrun, instead of the leader's exit
code.
>
>
> No regressions on {x86_64,x86_64-m32,i686}-fedora15-linux-gnu.
>
> I will check it in if no futher comments appear.
--
Pedro Alves
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] Workaround for 10970, 12702, avoid calling waitpid
2011-05-18 19:17 ` Pedro Alves
@ 2011-05-18 19:32 ` Jan Kratochvil
2011-05-18 19:53 ` Pedro Alves
0 siblings, 1 reply; 10+ messages in thread
From: Jan Kratochvil @ 2011-05-18 19:32 UTC (permalink / raw)
To: Pedro Alves; +Cc: gdb-patches, Doug Evans
On Wed, 18 May 2011 21:17:21 +0200, Pedro Alves wrote:
> When the last lwp exits, we'll report it's exit code as the
> inferior exit code to infrun, instead of the leader's exit
> code.
Which is correct, isn't it?
$ ./groupexit; echo $?
5
(gdb) run
Starting program: /home/jkratoch/t/groupexit
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff7822700 (LWP 18087)]
[Thread 0x7ffff7822700 (LWP 18087) exited]
[Inferior 1 (process 18084) exited with code 05]
Thanks,
Jan
#include <pthread.h>
#include <assert.h>
#include <unistd.h>
static void *start (void *arg)
{
sleep (1);
_exit (5);
return arg;
}
int main (void)
{
pthread_t thread1;
int i;
i = pthread_create (&thread1, NULL, start, NULL);
assert (i == 0);
pthread_exit (NULL);
assert (0);
return 0;
}
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] Workaround for 10970, 12702, avoid calling waitpid
2011-05-18 19:32 ` Jan Kratochvil
@ 2011-05-18 19:53 ` Pedro Alves
0 siblings, 0 replies; 10+ messages in thread
From: Pedro Alves @ 2011-05-18 19:53 UTC (permalink / raw)
To: Jan Kratochvil; +Cc: gdb-patches, Doug Evans
On Wednesday 18 May 2011 20:32:26, Jan Kratochvil wrote:
> On Wed, 18 May 2011 21:17:21 +0200, Pedro Alves wrote:
> > When the last lwp exits, we'll report it's exit code as the
> > inferior exit code to infrun, instead of the leader's exit
> > code.
>
> Which is correct, isn't it?
>
> $ ./groupexit; echo $?
> 5
Ah, good. I didn't think of _exit... Thanks.
>
> (gdb) run
> Starting program: /home/jkratoch/t/groupexit
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> [New Thread 0x7ffff7822700 (LWP 18087)]
> [Thread 0x7ffff7822700 (LWP 18087) exited]
> [Inferior 1 (process 18084) exited with code 05]
>
>
> Thanks,
> Jan
>
>
> #include <pthread.h>
> #include <assert.h>
> #include <unistd.h>
> static void *start (void *arg)
> {
> sleep (1);
> _exit (5);
> return arg;
> }
> int main (void)
> {
> pthread_t thread1;
> int i;
> i = pthread_create (&thread1, NULL, start, NULL);
> assert (i == 0);
> pthread_exit (NULL);
> assert (0);
> return 0;
> }
>
--
Pedro Alves
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] Workaround for 10970, 12702, avoid calling waitpid
2011-05-18 18:53 ` Jan Kratochvil
2011-05-18 19:17 ` Pedro Alves
@ 2011-05-18 22:21 ` Doug Evans
2011-05-27 17:04 ` Jan Kratochvil
2 siblings, 0 replies; 10+ messages in thread
From: Doug Evans @ 2011-05-18 22:21 UTC (permalink / raw)
To: Jan Kratochvil; +Cc: gdb-patches
On Wed, May 18, 2011 at 11:53 AM, Jan Kratochvil
<jan.kratochvil@redhat.com> wrote:
> No regressions on {x86_64,x86_64-m32,i686}-fedora15-linux-gnu.
>
> I will check it in if no futher comments appear.
Thanks.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] Workaround for 10970, 12702, avoid calling waitpid
2011-05-18 18:53 ` Jan Kratochvil
2011-05-18 19:17 ` Pedro Alves
2011-05-18 22:21 ` Doug Evans
@ 2011-05-27 17:04 ` Jan Kratochvil
2 siblings, 0 replies; 10+ messages in thread
From: Jan Kratochvil @ 2011-05-27 17:04 UTC (permalink / raw)
To: gdb-patches; +Cc: Doug Evans, Pedro Alves
On Wed, 18 May 2011 20:53:02 +0200, Jan Kratochvil wrote:
> gdb/
> 2011-05-18 Jan Kratochvil <jan.kratochvil@redhat.com>
> Doug Evans <dje@google.com>
>
> Fix PR 10970, PR 12702.
> * linux-nat.c (linux_lwp_is_zombie): New function.
> (wait_lwp): Initialize status. New variable prev_mask. Block signals.
> Check for linux_lwp_is_zombie. Use WNOHANG and sigsuspend.
>
> testsuite/
> 2011-05-18 Jan Kratochvil <jan.kratochvil@redhat.com>
>
> * gdb.threads/leader-exit.c: New file.
> * gdb.threads/leader-exit.exp: New file.
Checked in:
http://sourceware.org/ml/gdb-cvs/2011-05/msg00228.html
Thanks,
Jan
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2011-05-27 17:04 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-17 18:19 [patch] Workaround for 10970, 12702, avoid calling waitpid Doug Evans
2011-05-17 22:08 ` Jan Kratochvil
2011-05-18 17:13 ` Jan Kratochvil
2011-05-18 18:17 ` Pedro Alves
2011-05-18 18:53 ` Jan Kratochvil
2011-05-18 19:17 ` Pedro Alves
2011-05-18 19:32 ` Jan Kratochvil
2011-05-18 19:53 ` Pedro Alves
2011-05-18 22:21 ` Doug Evans
2011-05-27 17:04 ` Jan Kratochvil
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox