From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 15022 invoked by alias); 26 Jun 2008 15:01:13 -0000 Received: (qmail 14965 invoked by uid 22791); 26 Jun 2008 15:01:06 -0000 X-Spam-Check-By: sourceware.org Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (65.74.133.4) by sourceware.org (qpsmtpd/0.31) with ESMTP; Thu, 26 Jun 2008 15:00:49 +0000 Received: (qmail 23424 invoked from network); 26 Jun 2008 15:00:46 -0000 Received: from unknown (HELO wind.local) (vladimir@127.0.0.2) by mail.codesourcery.com with ESMTPA; 26 Jun 2008 15:00:46 -0000 From: Vladimir Prus To: Daniel Jacobowitz Subject: Re: [RFA] Add comments to linux-nat.c Date: Thu, 26 Jun 2008 15:56:00 -0000 User-Agent: KMail/1.9.9 Cc: gdb-patches@sources.redhat.com References: <200806141829.05646.vladimir@codesourcery.com> <20080626135149.GF22726@caradoc.them.org> In-Reply-To: <20080626135149.GF22726@caradoc.them.org> MIME-Version: 1.0 Content-Type: Multipart/Mixed; boundary="Boundary-00=_Z86YIa6AGeS/5yg" Message-Id: <200806261900.41555.vladimir@codesourcery.com> Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2008-06/txt/msg00479.txt.bz2 --Boundary-00=_Z86YIa6AGeS/5yg Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Content-length: 1119 On Thursday 26 June 2008 17:51:49 Daniel Jacobowitz wrote: > On Sat, Jun 14, 2008 at 06:29:05PM +0400, Vladimir Prus wrote: > > > > With the introduction of async mode, linux-nat.c became even more complex that it > > was, and it became apparent that some high-level comments are needed. So, I've grabbed > > Pedro and Dan on IRC and extracted the knowledge from their heads into a text file. > > Here's the result. OK? > > When you get a chance, could you repost this with Eli's comments > addressed? Here's a few more but the content looks fine. > > > +/* This comments documents high-level logic of this file. > > This comment documents the, or These comments document the. > > > +blocked, the signal becomes pending and sigsuspend, presumably, immediately > > +noticed it and returns. > > s/presumably, //. You're correct. Also noticed -> notices. > > > +The main design point is that every time GDB is outside linux-nat.c, we have a > > +SIGCLD handler installed that is called when something happens to the target > > SIGCHLD Here's a version with comments from Eli and you addressed. OK? - Volodya --Boundary-00=_Z86YIa6AGeS/5yg Content-Type: text/x-diff; charset="iso-8859-1"; name="linux-nat.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="linux-nat.diff" Content-length: 8568 commit 5c19806acfcff19bf6bd18a7c3bb707475574b4a Author: Vladimir Prus Date: Sat Jun 14 18:21:36 2008 +0400 Add comments to linux-nat.c diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c index a914c61..fad17d1 100644 --- a/gdb/linux-nat.c +++ b/gdb/linux-nat.c @@ -50,29 +50,100 @@ #include "event-loop.h" #include "event-top.h" -/* Note on this file's use of signals: - - We stop threads by sending a SIGSTOP. The use of SIGSTOP instead - of another signal is not entirely significant; we just need for a - signal to be delivered, so that we can intercept it. SIGSTOP's - advantage is that it can not be blocked. A disadvantage is that it - is not a real-time signal, so it can only be queued once; we do not - keep track of other sources of SIGSTOP. - - Two other signals that can't be blocked are SIGCONT and SIGKILL. - But we can't use them, because they have special behavior when the - signal is generated - not when it is delivered. SIGCONT resumes - the entire thread group and SIGKILL kills the entire thread group. - - A delivered SIGSTOP would stop the entire thread group, not just the - thread we tkill'd. But we never let the SIGSTOP deliver; we always - intercept and cancel it (by PTRACE_CONT without passing SIGSTOP). - - We could use a real-time signal instead. This would solve those - problems; we could use PTRACE_GETSIGINFO to locate the specific - stop signals sent by GDB. But we would still have to have some - support for SIGSTOP, since PTRACE_ATTACH generates it, and there - are races with trying to find a signal that is not blocked. */ +/* This comment documents high-level logic of this file. + +Waiting for events in sync mode +=============================== + +When waiting for an event in a specific thread, we just use waitpid, passing +the specific pid, and not passing WNOHANG. + +When waiting for an event in all threads, waitpid is not quite good. Prior to +version 2.4, Linux can either wait for event in main thread, or in secondary +threads. (2.4 has the __WALL flag). So, if we use blocking waitpid, we might +miss an event. The solution is to use non-blocking waitpid, together with +sigsuspend. First, we use non-blocking waitpid to get an event in the main +process, if any. Second, we use non-blocking waitpid with the __WCLONED +flag to check for events in cloned processes. If nothing is found, we use +sigsuspend to wait for SIGCHLD. When SIGCHLD arrives, it means something +happened to a child process -- and SIGCHLD will be delivered both for events +in main debugged process and in cloned processes. As soon as we know there's +an event, we get back to calling nonblocking waitpid with and without __WCLONED. + +Note that SIGCHLD should be blocked between waitpid and sigsuspend calls, +so that we don't miss a signal. If SIGCHLD arrives in between, when it's +blocked, the signal becomes pending and sigsuspend immediately +notices it and returns. + +Waiting for events in async mode +================================ + +In async mode, GDB should always be ready to handle both user input and target +events, so neither blocking waitpid nor sigsuspend are viable +options. Instead, we should notify the GDB main event loop whenever there's +unprocessed event from the target. The only way to notify this event loop is +to make it wait on input from a pipe, and write something to the pipe whenever +there's event. Obviously, if we fail to notify the event loop if there's +target event, it's bad. If we notify the event loop when there's no event +from target, linux-nat.c will detect that there's no event, actually, and +report event of type TARGET_WAITKIND_IGNORE, but it will waste time and +better avoided. + +The main design point is that every time GDB is outside linux-nat.c, we have a +SIGHCLD handler installed that is called when something happens to the target +and notifies the GDB event loop. Also, the event is extracted from the target +using waitpid and stored for future use. Whenever GDB core decides to handle +the event, and calls into linux-nat.c, we disable SIGCHLD and process things +as in sync mode, except that before waitpid call we check if there are any +previously read events. + +It could happen that during event processing, we'll try to get more events +than there are events in the local queue, which will result to waitpid call. +Those waitpid calls, while blocking, are guarantied to always have +something for waitpid to return. E.g., stopping a thread with SIGSTOP, and +waiting for the lwp to stop. + +The event loop is notified about new events using a pipe. SIGCHLD handler does +waitpid and writes the results in to a pipe. GDB event loop has the other end +of the pipe among the sources. When event loop starts to process the event +and calls a function in linux-nat.c, all events from the pipe are transferred +into a local queue and SIGCHLD is blocked. Further processing goes as in sync +mode. Before we return from linux_nat_wait, we transfer all unprocessed events +from local queue back to the pipe, so that when we get back to event loop, +event loop will notice there's something more to do. + +SIGCHLD is blocked when we're inside target_wait, so that should we actually +want to wait for some more events, SIGCHLD handler does not steal them from +us. Technically, it would be possible to add new events to the local queue but +it's about the same amount of work as blocking SIGCHLD. + +This moving of events from pipe into local queue and back into pipe when we +enter/leave linux-nat.c is somewhat ugly. Unfortunately, GDB event loop is +home-grown and incapable to wait on any queue. + +Use of signals +============== + +We stop threads by sending a SIGSTOP. The use of SIGSTOP instead of another +signal is not entirely significant; we just need for a signal to be delivered, +so that we can intercept it. SIGSTOP's advantage is that it can not be +blocked. A disadvantage is that it is not a real-time signal, so it can only +be queued once; we do not keep track of other sources of SIGSTOP. + +Two other signals that can't be blocked are SIGCONT and SIGKILL. But we can't +use them, because they have special behavior when the signal is generated - +not when it is delivered. SIGCONT resumes the entire thread group and SIGKILL +kills the entire thread group. + +A delivered SIGSTOP would stop the entire thread group, not just the thread we +tkill'd. But we never let the SIGSTOP be delivered; we always intercept and +cancel it (by PTRACE_CONT without passing SIGSTOP). + +We could use a real-time signal instead. This would solve those problems; we +could use PTRACE_GETSIGINFO to locate the specific stop signals sent by GDB. +But we would still have to have some support for SIGSTOP, since PTRACE_ATTACH +generates it, and there are races with trying to find a signal that is not +blocked. */ #ifndef O_LARGEFILE #define O_LARGEFILE 0 @@ -171,17 +242,6 @@ static int linux_supports_tracevforkdone_flag = -1; /* Async mode support */ -/* To listen to target events asynchronously, we install a SIGCHLD - handler whose duty is to call waitpid (-1, ..., WNOHANG) to get all - the pending events into a pipe. Whenever we're ready to handle - events asynchronously, this pipe is registered as the waitable file - handle in the event loop. When we get to entry target points - coming out of the common code (target_wait, target_resume, ...), - that are going to call waitpid, we block SIGCHLD signals, and - remove all the events placed in the pipe into a local queue. All - the subsequent calls to my_waitpid (a waitpid wrapper) check this - local queue first. */ - /* True if async mode is currently on. */ static int linux_nat_async_enabled; @@ -799,17 +859,6 @@ struct lwp_info *lwp_list; static int num_lwps; -/* Since we cannot wait (in linux_nat_wait) for the initial process and - any cloned processes with a single call to waitpid, we have to use - the WNOHANG flag and call waitpid in a loop. To optimize - things a bit we use `sigsuspend' to wake us up when a process has - something to report (it will send us a SIGCHLD if it has). To make - this work we have to juggle with the signal mask. We save the - original signal mask such that we can restore it before creating a - new process in order to avoid blocking certain signals in the - inferior. We then block SIGCHLD during the waitpid/sigsuspend - loop. */ - /* Original signal mask. */ static sigset_t normal_mask; --Boundary-00=_Z86YIa6AGeS/5yg--