A Friday 14 March 2008 21:16:46, Daniel Jacobowitz wrote:
> On Fri, Mar 14, 2008 at 08:10:22AM +0000, Pedro Alves wrote:
> > To enable/disable async mode, there's a new set/show command
> > "set linux-async 1" enables it.  I left it on in this patch, but
> > if/when this goes in, it should be turned off by default until
> > we fix at least the defines.exp problems.
>
> Could you move this to "maint set"?  Users shouldn't go twiddling it.
> When everything is together, I hope we can remove it again.
>

Done.

> Until then, it has to go in the manual; Eli went to a lot of trouble
> to fix all the undocumented commands :-)  The debug command needs
> to also, unless you want to lump it in with debug_linux_nat.
>


The debug output is too noisy to be merged yet.  But is is invaluable to
catch problems.  Added docs to the patch.

> > @@ -516,6 +707,12 @@ linux_child_follow_fork (struct target_o
> >  	 target_detach (which does other necessary cleanup).  */
> >
> >        push_target (ops);
> > +
> > +      if (async_was_enabled && !linux_nat_async_enabled)
> > +	/* target_detach may disable async depending on multi-threaded
> > +	   enabled or not.  Reenable it if needed.  */
> > +	linux_nat_enable_async ();
> > +
> >        linux_nat_switch_fork (inferior_ptid);
> >        check_for_thread_db ();
>
> This will drain the queue, losing any pending events.  In non-stop
> mode I can imagine that causing trouble.  Maybe we need to avoid
> target_detach now.
>

Yeah.  Here's how I fixed it:  target_detach, when the target is
multi-threaded, would call thread_db_detach, which calls
target_mourn_inferior.  The mourn inferior call (although most targets
don't mourn on detach), is there AFAICT mostly to reset
using_thread_db.  There's nothing in linux_nat_mourn_inferior that
we'd miss.  Notice that linux_nat_detach doesn't mourn, so a
single-threaded inferior doesn't get mourned.

So, I split linux_nat_detach in two (linux_nat_detach_inferior, and 
linux_nat_detach proper which calls linux_nat_detach_inferior), and
added a re_check_for_thread_db, to be used after following the child,
so thread_db knows about the new current inferior.

> > @@ -966,6 +1173,17 @@ linux_nat_attach (char *args, int from_t
> >    pid_t pid;
> >    int status;
> >    int cloned = 0;
> > +  sigset_t mask, prev_mask;
> > +
> > +  /* Make sure SIGCHLD is blocked.  The sync mode SIGCHLD handler
> > +     accesses the current_target, which is pushed by
> > +     linux_ops->to_attach below.  This stops the race.  It also
> > +     prevents the async mode to lose any SIGCHLD events, as the async
> > +     handler is only registered at function exit.  */
> > +  sigemptyset (&mask);
> > +  sigaddset (&mask, SIGCHLD);
> > +  sigprocmask (SIG_SETMASK, NULL, &prev_mask);
> > +  sigprocmask (SIG_BLOCK, &mask, NULL);
>
> That's the same as sigprocmask (SIG_BLOCK, &mask, &prev_mask).
>

Indeeed.  I've reworked and cleaned the patch, so these
sigprocmask handlings are mostly unneeded.  I think this version
looks much better.

> We can get a SIGCHLD pretty much any time, even when we think
> the target is stopped (e.g. if we detached from something
> that ran on the same terminal, and it finally exited).  Do we
> reach common code that might change the target stack while
> SIGCHLD is unblocked?  If it's just for the new assert, adding
> the assert may not be worth the trouble.
>

Well, the assert helped me a lot of times, I'd like to
leave it in.  I've just removed it's dependency on the
target_can_async_p call, so it's always safe now.

> >    /* FIXME: We should probably accept a list of process id's, and
> >       attach all of them.  */
> > @@ -994,9 +1212,36 @@ linux_nat_attach (char *args, int from_t
> >
> >    lp->stopped = 1;
> >
> > -  /* Fake the SIGSTOP that core GDB expects.  */
> > -  lp->status = W_STOPCODE (SIGSTOP);
> > -  lp->resumed = 1;
> > +  if (target_can_async_p ())
> > +    /* This needs to before the above, so the current target is
> > +       already set when the user didn't specify an exec file, and the
> > +       async SIGCHLD handler doesn't mess with the waitpid calls
> > +       above.  */
> > +    linux_nat_enable_async ();
>
> "This needs to before the above" has wrong grammar, and I don't know
> what it means; do you mean after?
>

Ah, stalled comments.  Would be "to go after the above".  The current
patch has this much better abstracted, so the comment is gone.

> > @@ -1199,6 +1456,13 @@ linux_nat_resume (ptid_t ptid, int step,
> >  			    "LLR: Short circuiting for status 0x%x\n",
> >  			    lp->status);
> >
> > +      if (target_can_async_p () && linux_nat_async_enabled)
> > +	{
> > +	  target_async (inferior_event_handler, 0);
> > +
> > +	  /* Wake event loop with special token, to get to WFI.  */
> > +	  linux_nat_event_pipe_push (-1, -1, -1);
> > +	}
> >        return;
> >      }
>
> If target_can_async_p, doesn't that imply linux_nat_async_enabled?

In that version it didn't.  target_can_async_p needed to return true,
even before async mode was really kicked in.  That's because the
fork_child path is still synchronous.  We need to make that async,
probably by inventing a new stop_reason (similarly to how we do with
attaching).  I was only enabling async in post_create_inferior,
so I needed those checks.  This version of the patch does it the other
way around.  It enables async in linux_nat_create_inferior,
but immediatelly masks it.  This way, I don't need those
linux_nat_async_enabled spread in several places.

Another change, is that I've found that an exception would unmask
the SIGCHLD blocking if an error was thrown while we start the
inferior.  I've fixed this by blocking SIGCHLD very early, in
_initialize_linux_nat.

I've also added comments to those state variables, and a few other
places.

>
> >  	  if (options & __WCLONE)
> > -	    sigsuspend (&suspend_mask);
> > +	    {
> > +	      if (target_can_async_p () && linux_nat_async_enabled)
> > +		{
> > +		  /* No interesting event.  */
> > +		  ourstatus->kind = TARGET_WAITKIND_IGNORE;
> > +
> > +		  /* Get ready for the next event.  */
> > +		  target_async (inferior_event_handler, 0);
> > +
> > +		  if (debug_linux_nat_async)
> > +		    fprintf_unfiltered (gdb_stdlog, "LLW: exit (ignore)\n");
> > +
> > +		  return minus_one_ptid;
>
> If we just return from here, we miss clear_sigio_trap and
> clear_sigint_trap.  But we keep calling set_sigint_trap on each
> entrance.  I think we'll eventually end up with them saving and
> restoring the wrong signal handlers.
>

Fixed.

> > +/* target_can_async_p implementation.  */
> > +
> > +static int
> > +linux_nat_can_async_p (void)
> > +{
> > +  /* NOTE: palves 2008-03-11: We're only async when the user requests
> > +     it explicitly with the "set linux-async" command.  Someday, linux
> > +     will always be async.  */
> > +  if (!linux_nat_async_permitted)
> > +    return 0;
> > +
> > +  /* See target.h/target_async_mask.  */
> > +  return linux_nat_async_mask_value;
> > +}
>
> I hope we can get rid of target_async_mask someday.  Would just a
> wait call suffice?  if (target_executing) reenter the event
> loop or something like that?
>

I had talked about it with Vlad a bit, but I'm afraid someone will
have to think about it some more.  The infcall.c code that needs
asynching is marked with a funny "SNIP - SNIP - SNIP - " for
everyone to see. :-)

> > +/* Disable async mode.  */
> > +
> > +static void
> > +linux_nat_disable_async (void)
> > +{
> > +  /* Unregister from event loop.  Don't go through the target stack
> > +     (target_async), as linux-nat may have been poped off already.  */
>
> popped
>

I've removed this call from linux_nat_{enabled|disable} async, so the
comment is gone.  But thanks, I was making that error a lot.  :-)

> > +  linux_nat_async (NULL, 0);
> > +
> > +  linux_nat_async_enabled = 0;
> > +
> > +  sigaction (SIGCHLD, &async_old_action, NULL);
> > +
> > +  /* Restore the original signal mask.  */
> > +  sigprocmask (SIG_SETMASK, &normal_mask, NULL);
> > +  sigemptyset (&blocked_mask);
> > +
> > +  clear_cached_waitpid_queue ();
> > +  linux_nat_async_enabled = 0;
> > +  linux_nat_num_queued_events = 0;
>
> Sets linux_nat_async_enabled twice.
>

Fixed.

Nick, I've also removed that extra unneeded line you pointed out.

New patch attached.  No regressions in async mode other than
the same defines.exp regressions, and no regressions when async
mode is disabled.  --- tested on x86-64-unknown-linux-gnu.

-- 
Pedro Alves