From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-56865-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 15022 invoked by alias); 26 Jun 2008 15:01:13 -0000
Received: (qmail 14965 invoked by uid 22791); 26 Jun 2008 15:01:06 -0000
X-Spam-Check-By: sourceware.org
Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (65.74.133.4)     by sourceware.org (qpsmtpd/0.31) with ESMTP; Thu, 26 Jun 2008 15:00:49 +0000
Received: (qmail 23424 invoked from network); 26 Jun 2008 15:00:46 -0000
Received: from unknown (HELO wind.local) (vladimir@127.0.0.2)   by mail.codesourcery.com with ESMTPA; 26 Jun 2008 15:00:46 -0000
From: Vladimir Prus <vladimir@codesourcery.com>
To: Daniel Jacobowitz <drow@false.org>
Subject: Re: [RFA] Add comments to linux-nat.c
Date: Thu, 26 Jun 2008 15:56:00 -0000
User-Agent: KMail/1.9.9
Cc: gdb-patches@sources.redhat.com
References: <200806141829.05646.vladimir@codesourcery.com> <20080626135149.GF22726@caradoc.them.org>
In-Reply-To: <20080626135149.GF22726@caradoc.them.org>
MIME-Version: 1.0
Content-Type: Multipart/Mixed;   boundary="Boundary-00=_Z86YIa6AGeS/5yg"
Message-Id: <200806261900.41555.vladimir@codesourcery.com>
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
X-SW-Source: 2008-06/txt/msg00479.txt.bz2


--Boundary-00=_Z86YIa6AGeS/5yg
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Content-length: 1119

On Thursday 26 June 2008 17:51:49 Daniel Jacobowitz wrote:
> On Sat, Jun 14, 2008 at 06:29:05PM +0400, Vladimir Prus wrote:
> > 
> > With the introduction of async mode, linux-nat.c became even more complex that it
> > was, and it became apparent that some high-level comments are needed. So, I've grabbed
> > Pedro and Dan on IRC and extracted the knowledge from their heads into a text file.
> > Here's the result. OK?
> 
> When you get a chance, could you repost this with Eli's comments
> addressed?  Here's a few more but the content looks fine.
> 
> > +/* This comments documents high-level logic of this file. 
> 
> This comment documents the, or These comments document the.
> 
> > +blocked, the signal becomes pending and sigsuspend, presumably, immediately
> > +noticed it and returns.
> 
> s/presumably, //.  You're correct.  Also noticed -> notices.
> 
> > +The main design point is that every time GDB is outside linux-nat.c, we have a
> > +SIGCLD handler installed that is called when something happens to the target
> 
> SIGCHLD

Here's a version with comments from Eli and you addressed. OK?

- Volodya

--Boundary-00=_Z86YIa6AGeS/5yg
Content-Type: text/x-diff;
  charset="iso-8859-1";
  name="linux-nat.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename="linux-nat.diff"
Content-length: 8568

commit 5c19806acfcff19bf6bd18a7c3bb707475574b4a
Author: Vladimir Prus <vladimir@codesourcery.com>
Date:   Sat Jun 14 18:21:36 2008 +0400

    Add comments to linux-nat.c

diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index a914c61..fad17d1 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -50,29 +50,100 @@
 #include "event-loop.h"
 #include "event-top.h"
 
-/* Note on this file's use of signals:
-
-   We stop threads by sending a SIGSTOP.  The use of SIGSTOP instead
-   of another signal is not entirely significant; we just need for a
-   signal to be delivered, so that we can intercept it.  SIGSTOP's
-   advantage is that it can not be blocked.  A disadvantage is that it
-   is not a real-time signal, so it can only be queued once; we do not
-   keep track of other sources of SIGSTOP.
-
-   Two other signals that can't be blocked are SIGCONT and SIGKILL.
-   But we can't use them, because they have special behavior when the
-   signal is generated - not when it is delivered.  SIGCONT resumes
-   the entire thread group and SIGKILL kills the entire thread group.
-
-   A delivered SIGSTOP would stop the entire thread group, not just the
-   thread we tkill'd.  But we never let the SIGSTOP deliver; we always
-   intercept and cancel it (by PTRACE_CONT without passing SIGSTOP).
-
-   We could use a real-time signal instead.  This would solve those
-   problems; we could use PTRACE_GETSIGINFO to locate the specific
-   stop signals sent by GDB.  But we would still have to have some
-   support for SIGSTOP, since PTRACE_ATTACH generates it, and there
-   are races with trying to find a signal that is not blocked.  */
+/* This comment documents high-level logic of this file. 
+
+Waiting for events in sync mode
+===============================
+
+When waiting for an event in a specific thread, we just use waitpid, passing
+the specific pid, and not passing WNOHANG.
+
+When waiting for an event in all threads, waitpid is not quite good. Prior to
+version 2.4, Linux can either wait for event in main thread, or in secondary
+threads. (2.4 has the __WALL flag).  So, if we use blocking waitpid, we might
+miss an event.  The solution is to use non-blocking waitpid, together with
+sigsuspend.  First, we use non-blocking waitpid to get an event in the main 
+process, if any. Second, we use non-blocking waitpid with the __WCLONED
+flag to check for events in cloned processes.  If nothing is found, we use
+sigsuspend to wait for SIGCHLD.  When SIGCHLD arrives, it means something
+happened to a child process -- and SIGCHLD will be delivered both for events
+in main debugged process and in cloned processes.  As soon as we know there's
+an event, we get back to calling nonblocking waitpid with and without __WCLONED.
+
+Note that SIGCHLD should be blocked between waitpid and sigsuspend calls,
+so that we don't miss a signal. If SIGCHLD arrives in between, when it's
+blocked, the signal becomes pending and sigsuspend immediately
+notices it and returns.
+
+Waiting for events in async mode
+================================
+
+In async mode, GDB should always be ready to handle both user input and target
+events, so neither blocking waitpid nor sigsuspend are viable
+options. Instead, we should notify the GDB main event loop whenever there's
+unprocessed event from the target.  The only way to notify this event loop is
+to make it wait on input from a pipe, and write something to the pipe whenever
+there's event. Obviously, if we fail to notify the event loop if there's
+target event, it's bad.  If we notify the event loop when there's no event
+from target, linux-nat.c will detect that there's no event, actually, and
+report event of type TARGET_WAITKIND_IGNORE, but it will waste time and
+better avoided.
+
+The main design point is that every time GDB is outside linux-nat.c, we have a
+SIGHCLD handler installed that is called when something happens to the target
+and notifies the GDB event loop. Also, the event is extracted from the target
+using waitpid and stored for future use.  Whenever GDB core decides to handle
+the event, and calls into linux-nat.c, we disable SIGCHLD and process things
+as in sync mode, except that before waitpid call we check if there are any
+previously read events.
+
+It could happen that during event processing, we'll try to get more events
+than there are events in the local queue, which will result to waitpid call.
+Those waitpid calls, while blocking, are guarantied to always have
+something for waitpid to return.  E.g., stopping a thread with SIGSTOP, and
+waiting for the lwp to stop.
+
+The event loop is notified about new events using a pipe. SIGCHLD handler does
+waitpid and writes the results in to a pipe. GDB event loop has the other end
+of the pipe among the sources. When event loop starts to process the event
+and calls a function in linux-nat.c, all events from the pipe are transferred
+into a local queue and SIGCHLD is blocked. Further processing goes as in sync
+mode. Before we return from linux_nat_wait, we transfer all unprocessed events
+from local queue back to the pipe, so that when we get back to event loop,
+event loop will notice there's something more to do.
+
+SIGCHLD is blocked when we're inside target_wait, so that should we actually
+want to wait for some more events, SIGCHLD handler does not steal them from
+us. Technically, it would be possible to add new events to the local queue but
+it's about the same amount of work as blocking SIGCHLD.
+
+This moving of events from pipe into local queue and back into pipe when we
+enter/leave linux-nat.c is somewhat ugly. Unfortunately, GDB event loop is
+home-grown and incapable to wait on any queue.
+
+Use of signals
+==============
+
+We stop threads by sending a SIGSTOP.  The use of SIGSTOP instead of another
+signal is not entirely significant; we just need for a signal to be delivered,
+so that we can intercept it.  SIGSTOP's advantage is that it can not be
+blocked.  A disadvantage is that it is not a real-time signal, so it can only
+be queued once; we do not keep track of other sources of SIGSTOP.
+
+Two other signals that can't be blocked are SIGCONT and SIGKILL.  But we can't
+use them, because they have special behavior when the signal is generated -
+not when it is delivered.  SIGCONT resumes the entire thread group and SIGKILL
+kills the entire thread group.
+
+A delivered SIGSTOP would stop the entire thread group, not just the thread we
+tkill'd.  But we never let the SIGSTOP be delivered; we always intercept and 
+cancel it (by PTRACE_CONT without passing SIGSTOP).
+
+We could use a real-time signal instead.  This would solve those problems; we
+could use PTRACE_GETSIGINFO to locate the specific stop signals sent by GDB.
+But we would still have to have some support for SIGSTOP, since PTRACE_ATTACH
+generates it, and there are races with trying to find a signal that is not
+blocked.  */
 
 #ifndef O_LARGEFILE
 #define O_LARGEFILE 0
@@ -171,17 +242,6 @@ static int linux_supports_tracevforkdone_flag = -1;
 
 /* Async mode support */
 
-/* To listen to target events asynchronously, we install a SIGCHLD
-   handler whose duty is to call waitpid (-1, ..., WNOHANG) to get all
-   the pending events into a pipe.  Whenever we're ready to handle
-   events asynchronously, this pipe is registered as the waitable file
-   handle in the event loop.  When we get to entry target points
-   coming out of the common code (target_wait, target_resume, ...),
-   that are going to call waitpid, we block SIGCHLD signals, and
-   remove all the events placed in the pipe into a local queue.  All
-   the subsequent calls to my_waitpid (a waitpid wrapper) check this
-   local queue first.  */
-
 /* True if async mode is currently on.  */
 static int linux_nat_async_enabled;
 
@@ -799,17 +859,6 @@ struct lwp_info *lwp_list;
 static int num_lwps;
 
 
-/* Since we cannot wait (in linux_nat_wait) for the initial process and
-   any cloned processes with a single call to waitpid, we have to use
-   the WNOHANG flag and call waitpid in a loop.  To optimize
-   things a bit we use `sigsuspend' to wake us up when a process has
-   something to report (it will send us a SIGCHLD if it has).  To make
-   this work we have to juggle with the signal mask.  We save the
-   original signal mask such that we can restore it before creating a
-   new process in order to avoid blocking certain signals in the
-   inferior.  We then block SIGCHLD during the waitpid/sigsuspend
-   loop.  */
-
 /* Original signal mask.  */
 static sigset_t normal_mask;
 

--Boundary-00=_Z86YIa6AGeS/5yg--