From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 20373 invoked by alias); 14 Aug 2013 18:50:19 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 20355 invoked by uid 89); 14 Aug 2013 18:50:18 -0000 X-Spam-SWARE-Status: No, score=-4.6 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_THREADED,RCVD_IN_HOSTKARMA_W,RCVD_IN_HOSTKARMA_WL autolearn=ham version=3.3.2 Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Wed, 14 Aug 2013 18:50:17 +0000 Received: from svr-orw-exc-10.mgc.mentorg.com ([147.34.98.58]) by relay1.mentorg.com with esmtp id 1V9g99-00024O-0o from Luis_Gustavo@mentor.com ; Wed, 14 Aug 2013 11:50:15 -0700 Received: from NA1-MAIL.mgc.mentorg.com ([147.34.98.181]) by SVR-ORW-EXC-10.mgc.mentorg.com with Microsoft SMTPSVC(6.0.3790.4675); Wed, 14 Aug 2013 11:50:15 -0700 Received: from [172.30.2.104] ([172.30.2.104]) by NA1-MAIL.mgc.mentorg.com with Microsoft SMTPSVC(6.0.3790.3959); Wed, 14 Aug 2013 11:50:13 -0700 Message-ID: <520BD162.1020105@codesourcery.com> Date: Wed, 14 Aug 2013 18:50:00 -0000 From: Luis Machado Reply-To: lgustavo@codesourcery.com User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130623 Thunderbird/17.0.7 MIME-Version: 1.0 To: Pedro Alves CC: gdb-patches@sourceware.org Subject: Re: [PATCH] linux-nat.c: no need to block child signals so aggressively. References: <20130814182653.7136.72780.stgit@brno.lan> In-Reply-To: <20130814182653.7136.72780.stgit@brno.lan> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-SW-Source: 2013-08/txt/msg00387.txt.bz2 On 08/14/2013 03:26 PM, Pedro Alves wrote: > In http://sourceware.org/ml/gdb-patches/2013-08/msg00174.html , the > issue of child signal handling around ptrace option support discovery > being different between GDB and GDBserver came up. > > I recalled adding these block_child_signals calls, and the "We don't > want those ptrace calls to be interrupted" comment, but not exactly > why. So I looked into it. My first guess is that I got confused. > The patch that added this > rewrote > the linux native async support completely, and the old async support > code had the SIGCHLD handler itself do waitpid, so in places that we'd > want a blocking waitpid, we'd have to have the signal handler blocked. > That was probably the mindset I had at the time. Anyway, whatever the > case, looks like I was wrong on the need for this blocking. > Given GDBserver doesn't block like this, I investigated why this is > currently needed on GDB but not on GDBserver. > > I removed the block_child_signals (and restore) calls, and hacked > linux-nat.c to call linux_test_for_tracefork in a loop. Running the > resulting GDB, I then saw bad things happening. Specifically, I'd end > up with a bunch of zombies, and eventually, the machine would grind to > a half with insufficient resources. > > The issue is that linux_test_for_tracefork test forks, and has the > child fork again. If we don't block SIGCHLD on entry to the function, > the children will inherit SIGCHLD's action/disposition (meaning, > SIGCHLD will be unblocked in the child). When the first child forks > again a second child, and that child exits, the first child gets a > SIGCHLD. Now, when we try to wrap up for the whole options test, we > kill the first child, and collect the waitstatus. Here, when SIGCHLD > isn't blocked, GDB will first see the child reporting a stop with > SIGCHLD. gdbserver's option test does a PTRACE_KILL loop at the end, > which catches the SIGCHLD, and retries the kill. The GDB version did > not do that. So the GDB version would proceed, leaving the child > zombie, as nothing collected its final waitstatus. > > So this patch makes the GDB version of linux_test_for_tracefork do the > exact same as the GDBserver version, removes all this unnecessary > blocking throughout, and adds a couple comments at places that do need > it -- namely: places where we'll use sleep with sigsuspend; and > linux_async_pipe, as that destroys the pipe the signal handler > touches. > > Tested on x86_64 Fedora 17, sync and async. > > gdb/ > 2013-08-14 Pedro Alves > > * linux-nat.c (linux_test_for_tracefork) > (linux_test_for_tracesysgood, linux_child_follow_fork) > (lin_lwp_attach_lwp, linux_nat_resume): Don't block child signals. > (linux_nat_wait_1): Extend comment. > (linux_async_pipe): Add comment. > --- > gdb/linux-nat.c | 59 +++++++++++++------------------------------------------ > 1 file changed, 14 insertions(+), 45 deletions(-) > > diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c > index 45a6e5f..db23433 100644 > --- a/gdb/linux-nat.c > +++ b/gdb/linux-nat.c > @@ -395,20 +395,13 @@ linux_test_for_tracefork (int original_pid) > { > int child_pid, ret, status; > long second_pid; > - sigset_t prev_mask; > - > - /* We don't want those ptrace calls to be interrupted. */ > - block_child_signals (&prev_mask); > > linux_supports_tracefork_flag = 0; > linux_supports_tracevforkdone_flag = 0; > > ret = ptrace (PTRACE_SETOPTIONS, original_pid, 0, PTRACE_O_TRACEFORK); > if (ret != 0) > - { > - restore_child_signals_mask (&prev_mask); > - return; > - } > + return; > > child_pid = fork (); > if (child_pid == -1) > @@ -433,7 +426,6 @@ linux_test_for_tracefork (int original_pid) > if (ret != 0) > { > warning (_("linux_test_for_tracefork: failed to kill child")); > - restore_child_signals_mask (&prev_mask); > return; > } > > @@ -445,7 +437,6 @@ linux_test_for_tracefork (int original_pid) > warning (_("linux_test_for_tracefork: unexpected " > "wait status 0x%x from killed child"), status); > > - restore_child_signals_mask (&prev_mask); > return; > } > > @@ -482,12 +473,14 @@ linux_test_for_tracefork (int original_pid) > warning (_("linux_test_for_tracefork: unexpected result from waitpid " > "(%d, status 0x%x)"), ret, status); > > - ret = ptrace (PTRACE_KILL, child_pid, 0, 0); > - if (ret != 0) > - warning (_("linux_test_for_tracefork: failed to kill child")); > - my_waitpid (child_pid, &status, 0); > - > - restore_child_signals_mask (&prev_mask); > + do > + { > + ret = ptrace (PTRACE_KILL, child_pid, 0, 0); > + if (ret != 0) > + warning ("linux_test_for_tracefork: failed to kill child"); > + my_waitpid (child_pid, &status, 0); > + } > + while (WIFSTOPPED (status)); > } > > /* Determine if PTRACE_O_TRACESYSGOOD can be used to follow syscalls. > @@ -500,20 +493,14 @@ static void > linux_test_for_tracesysgood (int original_pid) > { > int ret; > - sigset_t prev_mask; > - > - /* We don't want those ptrace calls to be interrupted. */ > - block_child_signals (&prev_mask); > > linux_supports_tracesysgood_flag = 0; > > ret = ptrace (PTRACE_SETOPTIONS, original_pid, 0, PTRACE_O_TRACESYSGOOD); > if (ret != 0) > - goto out; > + return; > > linux_supports_tracesysgood_flag = 1; > -out: > - restore_child_signals_mask (&prev_mask); > } > > /* Determine wether we support PTRACE_O_TRACESYSGOOD option available. > @@ -630,12 +617,9 @@ delete_lwp_cleanup (void *lp_voidp) > static int > linux_child_follow_fork (struct target_ops *ops, int follow_child) > { > - sigset_t prev_mask; > int has_vforked; > int parent_pid, child_pid; > > - block_child_signals (&prev_mask); > - > has_vforked = (inferior_thread ()->pending_follow.kind > == TARGET_WAITKIND_VFORKED); > parent_pid = ptid_get_lwp (inferior_ptid); > @@ -954,7 +938,6 @@ holding the child stopped. Try \"set detach-on-fork\" or \ > check_for_thread_db (); > } > > - restore_child_signals_mask (&prev_mask); > return 0; > } > > @@ -1427,13 +1410,10 @@ int > lin_lwp_attach_lwp (ptid_t ptid) > { > struct lwp_info *lp; > - sigset_t prev_mask; > int lwpid; > > gdb_assert (is_lwp (ptid)); > > - block_child_signals (&prev_mask); > - > lp = find_lwp_pid (ptid); > lwpid = GET_LWP (ptid); > > @@ -1461,7 +1441,6 @@ lin_lwp_attach_lwp (ptid_t ptid) > /* We've already seen this thread stop, but we > haven't seen the PTRACE_EVENT_CLONE extended > event yet. */ > - restore_child_signals_mask (&prev_mask); > return 0; > } > else > @@ -1478,8 +1457,6 @@ lin_lwp_attach_lwp (ptid_t ptid) > { > if (WIFSTOPPED (status)) > add_to_pid_list (&stopped_pids, lwpid, status); > - > - restore_child_signals_mask (&prev_mask); > return 1; > } > } > @@ -1492,7 +1469,6 @@ lin_lwp_attach_lwp (ptid_t ptid) > to create them. */ > warning (_("Can't attach %s: %s"), target_pid_to_str (ptid), > safe_strerror (errno)); > - restore_child_signals_mask (&prev_mask); > return -1; > } > > @@ -1503,10 +1479,7 @@ lin_lwp_attach_lwp (ptid_t ptid) > > status = linux_nat_post_attach_wait (ptid, 0, &cloned, &signalled); > if (!WIFSTOPPED (status)) > - { > - restore_child_signals_mask (&prev_mask); > - return 1; > - } > + return 1; > > lp = add_lwp (ptid); > lp->stopped = 1; > @@ -1542,7 +1515,6 @@ lin_lwp_attach_lwp (ptid_t ptid) > } > > lp->last_resume_kind = resume_stop; > - restore_child_signals_mask (&prev_mask); > return 0; > } > > @@ -1975,7 +1947,6 @@ static void > linux_nat_resume (struct target_ops *ops, > ptid_t ptid, int step, enum gdb_signal signo) > { > - sigset_t prev_mask; > struct lwp_info *lp; > int resume_many; > > @@ -1988,8 +1959,6 @@ linux_nat_resume (struct target_ops *ops, > ? strsignal (gdb_signal_to_host (signo)) : "0"), > target_pid_to_str (inferior_ptid)); > > - block_child_signals (&prev_mask); > - > /* A specific PTID means `step only this process id'. */ > resume_many = (ptid_equal (minus_one_ptid, ptid) > || ptid_is_pid (ptid)); > @@ -2047,7 +2016,6 @@ linux_nat_resume (struct target_ops *ops, > "LLR: Short circuiting for status 0x%x\n", > lp->status); > > - restore_child_signals_mask (&prev_mask); > if (target_can_async_p ()) > { > target_async (inferior_event_handler, 0); > @@ -2080,7 +2048,6 @@ linux_nat_resume (struct target_ops *ops, > (signo != GDB_SIGNAL_0 > ? strsignal (gdb_signal_to_host (signo)) : "0")); > > - restore_child_signals_mask (&prev_mask); > if (target_can_async_p ()) > target_async (inferior_event_handler, 0); > } > @@ -3477,7 +3444,7 @@ linux_nat_wait_1 (struct target_ops *ops, > lp->resumed = 1; > } > > - /* Make sure SIGCHLD is blocked. */ > + /* Make sure SIGCHLD is blocked until the sigsuspend below. */ > block_child_signals (&prev_mask); > > retry: > @@ -4907,6 +4874,8 @@ linux_async_pipe (int enable) > { > sigset_t prev_mask; > > + /* Block child signals while we create/destroy the pipe, as > + their handler writes to it. */ > block_child_signals (&prev_mask); > > if (enable) > > > Based on the explanation, it makes sense to me. Thanks, Luis