From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 29804 invoked by alias); 14 Oct 2009 19:08:04 -0000 Received: (qmail 29791 invoked by uid 22791); 14 Oct 2009 19:08:02 -0000 X-SWARE-Spam-Status: No, hits=-2.5 required=5.0 tests=AWL,BAYES_00,SPF_PASS X-Spam-Check-By: sourceware.org Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (38.113.113.100) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 14 Oct 2009 19:07:59 +0000 Received: (qmail 20632 invoked from network); 14 Oct 2009 19:07:57 -0000 Received: from unknown (HELO orlando) (pedro@127.0.0.2) by mail.codesourcery.com with ESMTPA; 14 Oct 2009 19:07:57 -0000 From: Pedro Alves To: gdb-patches@sourceware.org Subject: Re: [patch] Fix for internal-error: linux_nat_post_attach_wait: Assertion `pid == new_pid && WIFSTOPPED (status)' failed. Date: Wed, 14 Oct 2009 19:08:00 -0000 User-Agent: KMail/1.9.10 Cc: Paul Pluzhnikov References: <20091013184120.30A5776761@ppluzhnikov.mtv.corp.google.com> <200910132153.51171.pedro@codesourcery.com> <8ac60eac0910141121r59573032na0ccee77f049366f@mail.gmail.com> In-Reply-To: <8ac60eac0910141121r59573032na0ccee77f049366f@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200910142007.59636.pedro@codesourcery.com> X-IsSubscribed: yes Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2009-10/txt/msg00312.txt.bz2 On Wednesday 14 October 2009 19:21:03, Paul Pluzhnikov wrote: > It's very hard to test, since it only happens once in about 50 > attaches, but I did catch this several times, and GDB appears to > have done the right thing. Ugh. Happened much more often for me, like 1 in 5... On Wednesday 14 October 2009 19:21:03, Paul Pluzhnikov wrote: > /* Wait for the LWP specified by LP, which we have just attached to. > - Returns a wait status for that LWP, to cache. */ > + Returns 1 if LWP has been stopped, else 0. */ > > static int > -linux_nat_post_attach_wait (ptid_t ptid, int first, int *cloned, > - int *signalled) > +linux_nat_post_attach_wait (ptid_t ptid, int first, int *status, > + int *cloned, int *signalled) > { I don't think you needed the interface change. The new interface has two ways of saying the LWP exited, the 0 return, and the !WIFSTOPPED(status), which looks strange. With such an interface, I would expect the return code to indicate a real failure, but an exit indication isn't really a failure. Any reason you'd not just do ... > - status = linux_nat_post_attach_wait (ptid, 0, &cloned, &signalled); > + if (!linux_nat_post_attach_wait (ptid, 0, &status, &cloned, &signalled)) > + return -1; > + status = linux_nat_post_attach_wait (ptid, 0, &cloned, &signalled); if (!WIFSTOPPED (status)) return -1; ? Anyway, I was just surprised. Either way is fine really. > > I would believe that we can also see the main LWP exit right > > after attach (in linux_nat_attach). I'm not sure what exactly > > is the best to do UI wise in that case. If we want to store > > the event pending to report later, we'll have to use > > the lwp->waitstatus field, not lwp->status, due to the > > fact that lwp->status == 0 is ambiguous with > > "no-stored-pending-event" (see status_callback). > > I believe I did this now. Not sure whether lp->resumed should also > be set here. Yes, it should. Otherwise, the pending event is ignored (see status_callback), and worse, in this case, you'll hit this assertion: linux_nat_wait_1: /* Make sure there is at least one LWP that has been resumed. */ gdb_assert (iterate_over_lwps (ptid, resumed_callback, NULL)); Oh, my bad. I now took a look at what infcmd.c:attach_command|attach_command_post_wait would do in the case of the target_wait right after attach returning a process exit, and it seems it would misbehave, and try to poke at the inferior's memory... Best not to go there for such a rare use case. So, leaving the exit event pending ends up fixing nothing afterall... > > > The simple > > alternative is to again just pretend that the process had > > exited before we managed to attach to it, get rid of it, > > and error out like we would if the process didn't exist > > at all when we tried to attach. > > This appears harder to do, since to_attach doesn't return anything. Back to the "simple" variant: The inferior just added is always inferior_ptid/current_inferior(). If you look at fork-child.c:startup_inferior, you'll see bits of code doing exactly what you'd need to do. E.g.: target_terminal_ours (); target_mourn_inferior (); if (WIFEXITED (status)) error (_("During startup program exited with code %d."), WIFEXITCODE (status)); else if (WIFSIGNALED (status)) error (_("During startup program exited with signal ..."), ...); But if this looks too much trouble for what it's worth, that's probably right :-). I'd be quite fine to ignore that use case for now. -- Pedro Alves