From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-67689-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 29804 invoked by alias); 14 Oct 2009 19:08:04 -0000
Received: (qmail 29791 invoked by uid 22791); 14 Oct 2009 19:08:02 -0000
X-SWARE-Spam-Status: No, hits=-2.5 required=5.0 	tests=AWL,BAYES_00,SPF_PASS
X-Spam-Check-By: sourceware.org
Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (38.113.113.100)     by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 14 Oct 2009 19:07:59 +0000
Received: (qmail 20632 invoked from network); 14 Oct 2009 19:07:57 -0000
Received: from unknown (HELO orlando) (pedro@127.0.0.2)   by mail.codesourcery.com with ESMTPA; 14 Oct 2009 19:07:57 -0000
From: Pedro Alves <pedro@codesourcery.com>
To: gdb-patches@sourceware.org
Subject: Re: [patch] Fix for internal-error: linux_nat_post_attach_wait:  	Assertion `pid == new_pid && WIFSTOPPED (status)' failed.
Date: Wed, 14 Oct 2009 19:08:00 -0000
User-Agent: KMail/1.9.10
Cc: Paul Pluzhnikov <ppluzhnikov@google.com>
References: <20091013184120.30A5776761@ppluzhnikov.mtv.corp.google.com> <200910132153.51171.pedro@codesourcery.com> <8ac60eac0910141121r59573032na0ccee77f049366f@mail.gmail.com>
In-Reply-To: <8ac60eac0910141121r59573032na0ccee77f049366f@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain;   charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200910142007.59636.pedro@codesourcery.com>
X-IsSubscribed: yes
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
X-SW-Source: 2009-10/txt/msg00312.txt.bz2

On Wednesday 14 October 2009 19:21:03, Paul Pluzhnikov wrote:

> It's very hard to test, since it only happens once in about 50
> attaches, but I did catch this several times, and GDB appears to
> have done the right thing.

Ugh.  Happened much more often for me, like 1 in 5...

On Wednesday 14 October 2009 19:21:03, Paul Pluzhnikov wrote:
>  /* Wait for the LWP specified by LP, which we have just attached to.
> -   Returns a wait status for that LWP, to cache.  */
> +   Returns 1 if LWP has been stopped, else 0.  */
>  
>  static int
> -linux_nat_post_attach_wait (ptid_t ptid, int first, int *cloned,
> -                           int *signalled)
> +linux_nat_post_attach_wait (ptid_t ptid, int first, int *status,
> +                           int *cloned, int *signalled)
>  {

I don't think you needed the interface change.  The new interface
has two ways of saying the LWP exited, the 0 return, and
the !WIFSTOPPED(status), which looks strange.  With such an
interface, I would expect the return code to indicate a real failure,
but an exit indication isn't really a failure.  Any reason you'd not
just do ...

> -      status = linux_nat_post_attach_wait (ptid, 0, &cloned, &signalled);
> +      if (!linux_nat_post_attach_wait (ptid, 0, &status, &cloned, &signalled))
> +       return -1;
> +

      status = linux_nat_post_attach_wait (ptid, 0, &cloned, &signalled);
      if (!WIFSTOPPED (status))
        return -1;

?  Anyway, I was just surprised.  Either way is fine really.

> > I would believe that we can also see the main LWP exit right
> > after attach (in linux_nat_attach).  I'm not sure what exactly
> > is the best to do UI wise in that case.  If we want to store
> > the event pending to report later, we'll have to use
> > the lwp->waitstatus field, not lwp->status, due to the
> > fact that lwp->status == 0 is ambiguous with
> > "no-stored-pending-event" (see status_callback).
> 
> I believe I did this now. Not sure whether lp->resumed should also
> be set here.

Yes, it should.  Otherwise, the pending event is ignored (see
status_callback), and worse, in this case, you'll hit this
assertion:

linux_nat_wait_1:

  /* Make sure there is at least one LWP that has been resumed.  */
  gdb_assert (iterate_over_lwps (ptid, resumed_callback, NULL));


Oh, my bad.  I now took a look at what
infcmd.c:attach_command|attach_command_post_wait would do
in the case of the target_wait right after attach returning a
process exit, and it seems it would misbehave, and try to poke
at the inferior's memory...  Best not to go there for such a rare 
use case.  So, leaving the exit event pending ends up
fixing nothing afterall...

> 
> > The simple
> > alternative is to again just pretend that the process had
> > exited before we managed to attach to it, get rid of it,
> > and error out like we would if the process didn't exist
> > at all when we tried to attach.
> 
> This appears harder to do, since to_attach doesn't return anything.

Back to the "simple" variant: The inferior just added is
always inferior_ptid/current_inferior().  If you look
at fork-child.c:startup_inferior, you'll see bits of
code doing exactly what you'd need to do.  E.g.:

	    target_terminal_ours ();
	    target_mourn_inferior ();
            if (WIFEXITED (status))
              error (_("During startup program exited with code %d."),
   	             WIFEXITCODE (status));
            else if (WIFSIGNALED (status))
              error (_("During startup program exited with signal ..."),
                     ...);

But if this looks too much trouble for what it's worth, that's
probably right :-).  I'd be quite fine to ignore that use case
for now.

-- 
Pedro Alves