From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 8182 invoked by alias); 13 Oct 2009 20:53:58 -0000 Received: (qmail 8168 invoked by uid 22791); 13 Oct 2009 20:53:58 -0000 X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 tests=AWL,BAYES_00,SPF_PASS,ZMIde_GENERICSPAM1 X-Spam-Check-By: sourceware.org Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (38.113.113.100) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 13 Oct 2009 20:53:54 +0000 Received: (qmail 28665 invoked from network); 13 Oct 2009 20:53:52 -0000 Received: from unknown (HELO orlando) (pedro@127.0.0.2) by mail.codesourcery.com with ESMTPA; 13 Oct 2009 20:53:52 -0000 From: Pedro Alves To: Paul Pluzhnikov Subject: Re: [patch] Fix for internal-error: linux_nat_post_attach_wait: Assertion `pid == new_pid && WIFSTOPPED (status)' failed. Date: Tue, 13 Oct 2009 20:53:00 -0000 User-Agent: KMail/1.9.10 Cc: gdb-patches@sourceware.org References: <20091013184120.30A5776761@ppluzhnikov.mtv.corp.google.com> In-Reply-To: <20091013184120.30A5776761@ppluzhnikov.mtv.corp.google.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200910132153.51171.pedro@codesourcery.com> X-IsSubscribed: yes Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2009-10/txt/msg00286.txt.bz2 On Tuesday 13 October 2009 19:41:20, Paul Pluzhnikov wrote: > warning: Can't attach LWP 15338: No such process > ../../src/gdb/linux-nat.c:1341: internal-error: linux_nat_post_attach_wait: Assertion `pid == new_pid && WIFSTOPPED (status)' failed. > > When assertion fails, status == 0. > > > 2009-10-13 Paul Pluzhnikov > > * linux-nat.c (linux_nat_post_attach_wait): Adjust assert. > > Index: linux-nat.c > =================================================================== > RCS file: /cvs/src/src/gdb/linux-nat.c,v > retrieving revision 1.151 > diff -u -p -u -r1.151 linux-nat.c > --- linux-nat.c 9 Oct 2009 01:57:12 -0000 1.151 > +++ linux-nat.c 13 Oct 2009 18:18:37 -0000 > @@ -1338,16 +1338,22 @@ linux_nat_post_attach_wait (ptid_t ptid, > *cloned = 1; > } > > - gdb_assert (pid == new_pid && WIFSTOPPED (status)); > + gdb_assert (pid == new_pid); > > - if (WSTOPSIG (status) != SIGSTOP) > + if (WIFSTOPPED (status)) > { > - *signalled = 1; > - if (debug_linux_nat) > - fprintf_unfiltered (gdb_stdlog, > - "LNPAW: Received %s after attaching\n", > - status_to_str (status)); > + if (WSTOPSIG (status) != SIGSTOP) > + { > + *signalled = 1; > + if (debug_linux_nat) > + fprintf_unfiltered (gdb_stdlog, > + "LNPAW: Received %s after attaching\n", > + status_to_str (status)); > + } > } > + else > + /* We could have been notified about LWP exit. */ > + gdb_assert (*cloned); > > return status; > } > Sorry, but this isn't correct. If you look at linux_nat_post_attach_wait's callers, you'll see that they assume that the LWP is stopped, but still alive. E.g, note how lin_lwp_attach_lwp is using an unconditional `WSTOPSIG (status)'. The assertion guarded us into doing such undefined things. Without the assertion, we get to handle the scenario. The patch makes it so that we still leaves the already gone LWP listed in the lwp list. If you _don't_ issue an "info threads" right after "attach", then, a "continue" will make linux_nat_resume try to resume this already gone thread along with the other alive lwps, which will fail with a hard and confusing error. So see this for yourself, try "attach; c" a few times, and sometimes you won't see any problem, but other times you should see errors. Event if we assumed you'd not get that error on resume, if the LWP happens to exit with any another exit code (or signalled), linux_nat_post_attach_wait's callers would store that `status' as a pending status (the lwp->status = status assignments). This meant that "attach; c" would make linux_nat_wait_1 notice the pending LWP exit event (pending events are handled before going into waitpid to fetch more events), and this event would be confused with a whole-process exit, and reported to infrun.c as TARGET_WAITKIND_EXITED. That is, a thread exit would makes the core of GDB think the whole process exited. I think what we should do is just get rid of the new LWP that we found exiting right after attaching. Pretend we never saw it existed. Pretend that we had attached to the process just right after this LWP exited. I would believe that we can also see the main LWP exit right after attach (in linux_nat_attach). I'm not sure what exactly is the best to do UI wise in that case. If we want to store the event pending to report later, we'll have to use the lwp->waitstatus field, not lwp->status, due to the fact that lwp->status == 0 is ambiguous with "no-stored-pending-event" (see status_callback). The simple alternative is to again just pretend that the process had exited before we managed to attach to it, get rid of it, and error out like we would if the process didn't exist at all when we tried to attach. -- Pedro Alves