From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 15908 invoked by alias); 27 Jul 2008 21:13:42 -0000 Received: (qmail 15897 invoked by uid 22791); 27 Jul 2008 21:13:40 -0000 X-Spam-Check-By: sourceware.org Received: from NaN.false.org (HELO nan.false.org) (208.75.86.248) by sourceware.org (qpsmtpd/0.31) with ESMTP; Sun, 27 Jul 2008 21:13:20 +0000 Received: from nan.false.org (localhost [127.0.0.1]) by nan.false.org (Postfix) with ESMTP id F197198215 for ; Sun, 27 Jul 2008 21:13:18 +0000 (GMT) Received: from caradoc.them.org (22.svnf5.xdsl.nauticom.net [209.195.183.55]) by nan.false.org (Postfix) with ESMTP id BA38F9813A for ; Sun, 27 Jul 2008 21:13:18 +0000 (GMT) Received: from drow by caradoc.them.org with local (Exim 4.69) (envelope-from ) id 1KNDYM-0005iH-8J for gdb-patches@sourceware.org; Sun, 27 Jul 2008 17:13:18 -0400 Date: Sun, 27 Jul 2008 21:13:00 -0000 From: Daniel Jacobowitz To: gdb-patches@sourceware.org Subject: Re: [rfa/linux] Make SIGINT handling more robust Message-ID: <20080727211318.GE19234@caradoc.them.org> Mail-Followup-To: gdb-patches@sourceware.org References: <20050404042131.GA8509@nevyn.them.org> <20050404041520.GA7722@nevyn.them.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20050404042131.GA8509@nevyn.them.org> <20050404041520.GA7722@nevyn.them.org> User-Agent: Mutt/1.5.17 (2008-05-11) X-IsSubscribed: yes Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2008-07/txt/msg00510.txt.bz2 Two more patches from April 2005. Updated, retested on mips-linux, and checked in (together, since one is a test for the other). This simplifies Control-C handling for Linux native. It should not have much effect on NPTL targets, but my home mips-linux test system still uses LinuxThreads. On Mon, Apr 04, 2005 at 12:15:20AM -0400, Daniel Jacobowitz wrote: > This patch, which I've had kicking around for a year or so, makes SIGINT > handling much more reliable. > > The problem being solved by this code (both the current version, and the > replacement below) is this: in a LinuxThreads program, every thread is a > process. When a signal is sent to the process group, every thread will > receive it. If the user hits C-c at the console, the signal will get sent > to every thread; the user will be presented with an indeterminate number of > SIGINT reports instead of one. This could apply to other signals, too, but > SIGINT was the only one handled by the old code. > > Previously, we had flush_callback. This function resumed the inferior when > it saw a pending SIGINT, in order to force the SIGINT to be delivered. This > method is unreliable; it had the potential to lose other signals, and often > generated a failed assertion for "lp->status == 0". > > This patch relies completely on recording the pending signal, and discarding > it once it is delivered. An unexpected SIGINT will cause a flag to be set > on every thread. Then, as long as /proc indicates that the signal is > pending, we will leave the flag set. If a SIGINT is received while the flag > is set, it is discarded. At various points we check to see if the signal > is still pending; if it isn't, then we assume it was delivered to some other > thread (the POSIX and NPTL semantics) and clear the flag. > > Tested on i686-pc-linux-gnu, with both LinuxThreads and NPTL. With > LinuxThreads, this removes an intermittent failure in watchthreads.exp > (improved testcase coming soon). Also tested with my previous patch on > mips-unknown-linux-gnu, where results for schedlock.exp go from abyssmal > to fairly good. On Mon, Apr 04, 2005 at 12:21:31AM -0400, Daniel Jacobowitz wrote: > The intermittent failure of this test using LinuxThreads puzzled me for a > while. It looked like this: > > (gdb) continue > Continuing. > [expect sends ^C] > (gdb) Quit > (gdb) FAIL: > > Now where'd that prompt come from? Turns out using after for delays isn't > such a good idea. We lose inferior output while sleeping. Same thing > for "exec sleep 1". But if we use gdb_expect, we can see what's going on: > a second copy of the previous SIGINT arrived after the continue command. > So GDB was already stopped when the ^C was sent, thus the Quit message. > > This patch updates the testcase so that the log shows what's going on, and > adds an explicit test that we don't get duplicate signals. With my previous > patch from today, the test passes. -- Daniel Jacobowitz CodeSourcery 2008-07-27 Daniel Jacobowitz * linux-nat.c (resume_callback): Add more debugging output. (linux_nat_has_pending_sigint): New function, based on linux_nat_has_pending. (set_ignore_sigint, maybe_clear_ignore_sigint): New functions. (stop_wait_callback): Remove flush_mask handling. Honor ignore_sigint. Call maybe_clear_ignore_sigint. Pass NULL to recursive calls. (linux_nat_has_pending, flush_callback): Remove. (linux_nat_filter_event): Check for ignore_sigint. (linux_nat_wait): Remove flush_mask support and call to flush_callback. Use set_ignore_sigint and maybe_clear_ignore_sigint. * linux-nat.h (struct lwp_info): Add ignore_sigint field. 2008-07-27 Daniel Jacobowitz * gdb.threads/manythreads.exp: Use remote_expect instead of after. Add a test for duplicated SIGINTs. Index: gdb-only/gdb/linux-nat.c =================================================================== --- gdb-only.orig/gdb/linux-nat.c 2008-07-26 22:16:18.000000000 -0400 +++ gdb-only/gdb/linux-nat.c 2008-07-26 22:38:54.000000000 -0400 @@ -1603,6 +1603,12 @@ resume_callback (struct lwp_info *lp, vo lp->step = 0; memset (&lp->siginfo, 0, sizeof (lp->siginfo)); } + else if (lp->stopped && debug_linux_nat) + fprintf_unfiltered (gdb_stdlog, "RC: Not resuming sibling %s (has pending)\n", + target_pid_to_str (lp->ptid)); + else if (debug_linux_nat) + fprintf_unfiltered (gdb_stdlog, "RC: Not resuming sibling %s (not stopped)\n", + target_pid_to_str (lp->ptid)); return 0; } @@ -2036,14 +2042,66 @@ stop_callback (struct lwp_info *lp, void return 0; } -/* Wait until LP is stopped. If DATA is non-null it is interpreted as - a pointer to a set of signals to be flushed immediately. */ +/* Return non-zero if LWP PID has a pending SIGINT. */ static int -stop_wait_callback (struct lwp_info *lp, void *data) +linux_nat_has_pending_sigint (int pid) { - sigset_t *flush_mask = data; + sigset_t pending, blocked, ignored; + int i; + linux_proc_pending_signals (pid, &pending, &blocked, &ignored); + + if (sigismember (&pending, SIGINT) + && !sigismember (&ignored, SIGINT)) + return 1; + + return 0; +} + +/* Set a flag in LP indicating that we should ignore its next SIGINT. */ + +static int +set_ignore_sigint (struct lwp_info *lp, void *data) +{ + /* If a thread has a pending SIGINT, consume it; otherwise, set a + flag to consume the next one. */ + if (lp->stopped && lp->status != 0 && WIFSTOPPED (lp->status) + && WSTOPSIG (lp->status) == SIGINT) + lp->status = 0; + else + lp->ignore_sigint = 1; + + return 0; +} + +/* If LP does not have a SIGINT pending, then clear the ignore_sigint flag. + This function is called after we know the LWP has stopped; if the LWP + stopped before the expected SIGINT was delivered, then it will never have + arrived. Also, if the signal was delivered to a shared queue and consumed + by a different thread, it will never be delivered to this LWP. */ + +static void +maybe_clear_ignore_sigint (struct lwp_info *lp) +{ + if (!lp->ignore_sigint) + return; + + if (!linux_nat_has_pending_sigint (GET_LWP (lp->ptid))) + { + if (debug_linux_nat) + fprintf_unfiltered (gdb_stdlog, + "MCIS: Clearing bogus flag for %s\n", + target_pid_to_str (lp->ptid)); + lp->ignore_sigint = 0; + } +} + +/* Wait until LP is stopped. */ + +static int +stop_wait_callback (struct lwp_info *lp, void *data) +{ if (!lp->stopped) { int status; @@ -2052,26 +2110,24 @@ stop_wait_callback (struct lwp_info *lp, if (status == 0) return 0; - /* Ignore any signals in FLUSH_MASK. */ - if (flush_mask && sigismember (flush_mask, WSTOPSIG (status))) + if (lp->ignore_sigint && WIFSTOPPED (status) + && WSTOPSIG (status) == SIGINT) { - if (!lp->signalled) - { - lp->stopped = 1; - return 0; - } + lp->ignore_sigint = 0; errno = 0; ptrace (PTRACE_CONT, GET_LWP (lp->ptid), 0, 0); if (debug_linux_nat) fprintf_unfiltered (gdb_stdlog, - "PTRACE_CONT %s, 0, 0 (%s)\n", + "PTRACE_CONT %s, 0, 0 (%s) (discarding SIGINT)\n", target_pid_to_str (lp->ptid), errno ? safe_strerror (errno) : "OK"); - return stop_wait_callback (lp, flush_mask); + return stop_wait_callback (lp, NULL); } + maybe_clear_ignore_sigint (lp); + if (WSTOPSIG (status) != SIGSTOP) { if (WSTOPSIG (status) == SIGTRAP) @@ -2108,7 +2164,7 @@ stop_wait_callback (struct lwp_info *lp, } /* Hold this event/waitstatus while we check to see if there are any more (we still want to get that SIGSTOP). */ - stop_wait_callback (lp, data); + stop_wait_callback (lp, NULL); if (target_can_async_p ()) { @@ -2169,7 +2225,7 @@ stop_wait_callback (struct lwp_info *lp, /* Hold this event/waitstatus while we check to see if there are any more (we still want to get that SIGSTOP). */ - stop_wait_callback (lp, data); + stop_wait_callback (lp, NULL); /* If the lp->status field is still empty, use it to hold this event. If not, then this event must be @@ -2202,96 +2258,6 @@ stop_wait_callback (struct lwp_info *lp, return 0; } -/* Check whether PID has any pending signals in FLUSH_MASK. If so set - the appropriate bits in PENDING, and return 1 - otherwise return 0. */ - -static int -linux_nat_has_pending (int pid, sigset_t *pending, sigset_t *flush_mask) -{ - sigset_t blocked, ignored; - int i; - - linux_proc_pending_signals (pid, pending, &blocked, &ignored); - - if (!flush_mask) - return 0; - - for (i = 1; i < NSIG; i++) - if (sigismember (pending, i)) - if (!sigismember (flush_mask, i) - || sigismember (&blocked, i) - || sigismember (&ignored, i)) - sigdelset (pending, i); - - if (sigisemptyset (pending)) - return 0; - - return 1; -} - -/* DATA is interpreted as a mask of signals to flush. If LP has - signals pending, and they are all in the flush mask, then arrange - to flush them. LP should be stopped, as should all other threads - it might share a signal queue with. */ - -static int -flush_callback (struct lwp_info *lp, void *data) -{ - sigset_t *flush_mask = data; - sigset_t pending, intersection, blocked, ignored; - int pid, status; - - /* Normally, when an LWP exits, it is removed from the LWP list. The - last LWP isn't removed till later, however. So if there is only - one LWP on the list, make sure it's alive. */ - if (lwp_list == lp && lp->next == NULL) - if (!linux_nat_thread_alive (lp->ptid)) - return 0; - - /* Just because the LWP is stopped doesn't mean that new signals - can't arrive from outside, so this function must be careful of - race conditions. However, because all threads are stopped, we - can assume that the pending mask will not shrink unless we resume - the LWP, and that it will then get another signal. We can't - control which one, however. */ - - if (lp->status) - { - if (debug_linux_nat) - printf_unfiltered (_("FC: LP has pending status %06x\n"), lp->status); - if (WIFSTOPPED (lp->status) && sigismember (flush_mask, WSTOPSIG (lp->status))) - lp->status = 0; - } - - /* While there is a pending signal we would like to flush, continue - the inferior and collect another signal. But if there's already - a saved status that we don't want to flush, we can't resume the - inferior - if it stopped for some other reason we wouldn't have - anywhere to save the new status. In that case, we must leave the - signal unflushed (and possibly generate an extra SIGINT stop). - That's much less bad than losing a signal. */ - while (lp->status == 0 - && linux_nat_has_pending (GET_LWP (lp->ptid), &pending, flush_mask)) - { - int ret; - - errno = 0; - ret = ptrace (PTRACE_CONT, GET_LWP (lp->ptid), 0, 0); - if (debug_linux_nat) - fprintf_unfiltered (gdb_stderr, - "FC: Sent PTRACE_CONT, ret %d %d\n", ret, errno); - - lp->stopped = 0; - stop_wait_callback (lp, flush_mask); - if (debug_linux_nat) - fprintf_unfiltered (gdb_stderr, - "FC: Wait finished; saved status is %d\n", - lp->status); - } - - return 0; -} - /* Return non-zero if LP has a wait status pending. */ static int @@ -2657,6 +2623,36 @@ linux_nat_filter_event (int lwpid, int s return NULL; } + /* Make sure we don't report a SIGINT that we have already displayed + for another thread. */ + if (lp->ignore_sigint + && WIFSTOPPED (status) && WSTOPSIG (status) == SIGINT) + { + if (debug_linux_nat) + fprintf_unfiltered (gdb_stdlog, + "LLW: Delayed SIGINT caught for %s.\n", + target_pid_to_str (lp->ptid)); + + /* This is a delayed SIGINT. */ + lp->ignore_sigint = 0; + + registers_changed (); + linux_ops->to_resume (pid_to_ptid (GET_LWP (lp->ptid)), + lp->step, TARGET_SIGNAL_0); + if (debug_linux_nat) + fprintf_unfiltered (gdb_stdlog, + "LLW: %s %s, 0, 0 (discard SIGINT)\n", + lp->step ? + "PTRACE_SINGLESTEP" : "PTRACE_CONT", + target_pid_to_str (lp->ptid)); + + lp->stopped = 0; + gdb_assert (lp->resumed); + + /* Discard the event. */ + return NULL; + } + /* An interesting event. */ gdb_assert (lp); return lp; @@ -2715,7 +2711,6 @@ linux_nat_wait (ptid_t ptid, struct targ int options = 0; int status = 0; pid_t pid = PIDGET (ptid); - sigset_t flush_mask; if (debug_linux_nat_async) fprintf_unfiltered (gdb_stdlog, "LLW: enter\n"); @@ -2737,8 +2732,6 @@ linux_nat_wait (ptid_t ptid, struct targ set_executing (lp->ptid, 1); } - sigemptyset (&flush_mask); - /* Block events while we're here. */ linux_nat_async_events (sigchld_sync); @@ -2948,11 +2941,15 @@ retry: if (signo == TARGET_SIGNAL_INT && signal_pass_state (signo) == 0) { /* If ^C/BREAK is typed at the tty/console, SIGINT gets - forwarded to the entire process group, that is, all LWP's - will receive it. Since we only want to report it once, - we try to flush it from all LWPs except this one. */ - sigaddset (&flush_mask, SIGINT); + forwarded to the entire process group, that is, all LWPs + will receive it - unless they're using CLONE_THREAD to + share signals. Since we only want to report it once, we + mark it as ignored for all LWPs except this one. */ + iterate_over_lwps (set_ignore_sigint, NULL); + lp->ignore_sigint = 0; } + else + maybe_clear_ignore_sigint (lp); } /* This LWP is stopped now. */ @@ -2969,8 +2966,7 @@ retry: /* ... and wait until all of them have reported back that they're no longer running. */ - iterate_over_lwps (stop_wait_callback, &flush_mask); - iterate_over_lwps (flush_callback, &flush_mask); + iterate_over_lwps (stop_wait_callback, NULL); /* If we're not waiting for a specific LWP, choose an event LWP from among those that have had events. Giving equal priority Index: gdb-only/gdb/linux-nat.h =================================================================== --- gdb-only.orig/gdb/linux-nat.h 2008-07-26 13:48:56.000000000 -0400 +++ gdb-only/gdb/linux-nat.h 2008-07-26 22:26:30.000000000 -0400 @@ -62,6 +62,9 @@ struct lwp_info be the address of a hardware watchpoint. */ struct siginfo siginfo; + /* Non-zero if we expect a duplicated SIGINT. */ + int ignore_sigint; + /* If WAITSTATUS->KIND != TARGET_WAITKIND_SPURIOUS, the waitstatus for this LWP's last event. This may correspond to STATUS above, or to a local variable in lin_lwp_wait. */ Index: gdb-only/gdb/testsuite/gdb.threads/manythreads.exp =================================================================== --- gdb-only.orig/gdb/testsuite/gdb.threads/manythreads.exp 2008-06-28 11:41:11.000000000 -0400 +++ gdb-only/gdb/testsuite/gdb.threads/manythreads.exp 2008-07-26 23:02:13.000000000 -0400 @@ -58,8 +58,11 @@ gdb_test_multiple "continue" "first cont } } +# Wait one second. This is better than the TCL "after" command, because +# we don't lose GDB's output while we do it. +remote_expect host 1 { timeout { } } + # Send a Ctrl-C and verify that we can do info threads and continue -after 1000 send_gdb "\003" set message "stop threads 1" gdb_test_multiple "" "stop threads 1" { @@ -110,8 +113,35 @@ gdb_test_multiple "continue" "second con } } +# Wait another second. If the program stops on its own, GDB has failed +# to handle duplicate SIGINTs sent to multiple threads. +set failed 0 +remote_expect host 1 { + -re "\\\[New \[^\]\]*\\\]\r\n" { + exp_continue -continue_timer + } + -re "\\\[\[^\]\]* exited\\\]\r\n" { + exp_continue -continue_timer + } + -re "Thread \[^\n\]* executing\r\n" { + exp_continue -continue_timer + } + -re "Program received signal SIGINT.*$gdb_prompt $" { + if { $failed == 0 } { + fail "check for duplicate SIGINT" + } + send_gdb "continue\n" + set failed 1 + exp_continue + } + timeout { + if { $failed == 0 } { + pass "check for duplicate SIGINT" + } + } +} + # Send another Ctrl-C and verify that we can do info threads and quit -after 1000 send_gdb "\003" set message "stop threads 2" gdb_test_multiple "" "stop threads 2" {