From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 2158 invoked by alias); 24 Jan 2014 12:44:16 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 2147 invoked by uid 89); 24 Jan 2014 12:44:15 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 24 Jan 2014 12:44:14 +0000 Received: from int-mx01.intmail.prod.int.phx2.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s0OCiC9c020141 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 24 Jan 2014 07:44:12 -0500 Received: from [127.0.0.1] (ovpn01.gateway.prod.ext.ams2.redhat.com [10.39.146.11]) by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id s0OCiAv0027515; Fri, 24 Jan 2014 07:44:11 -0500 Message-ID: <52E2601A.6080202@redhat.com> Date: Fri, 24 Jan 2014 12:44:00 -0000 From: Pedro Alves User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130625 Thunderbird/17.0.7 MIME-Version: 1.0 To: Sterling Augustine CC: gdb-patches Subject: Re: [PATCH] RFC: All stepping threads need the same checks and cleanup as the currently stepping thread References: <52DFCFCF.4030101@redhat.com> <52E16518.1020800@redhat.com> In-Reply-To: <52E16518.1020800@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-SW-Source: 2014-01/txt/msg00937.txt.bz2 On 01/23/2014 06:53 PM, Pedro Alves wrote: > I "rejected" it in the sense that that loses signals as mentioned in the > second patch. > >> The second patch you made breaks things by bypassing the >> step-range fix up in process_event_stop_test. > > Of course, for hardware step targets, it's essentially a revertion > of the first patch. It's unfinished! It reverted the first patch, > and then fixed things only for software step targets. It's needs > finishing for hardware step targets... So I thought about this some more, and I think I now recall/realize why I reverted the hardware step path to just do keep_going like before. It's actually the same reason I failed to create a reproducer that oversteps before. Assuming a well behaved target backend (and no bugs in the core infrun code), it shouldn't really be possible to overstep here. Consider, e.g. a thread T1 is stopped at xxx0000, below. [ADDR ] xxx0000 <<< T1 stopped here xxx0001 xxx0002 xxx0003 GDB tells this thread to single-step. Another thread (T2) hits some other event (say SIGPROF). The backend starts stopping all threads in order to report that event to the core, and while doing that, either T1 manages to finish the step, or the event in SIGPROF was reported so fast, that T1 didn't really get a chance to be scheduled by the kernel, and so it reports a SIGSTOP (or some other async signal). If T1 does finish the step and reports a SIGTRAP, then the target backend prefers reporting that event first over the SIGPROF in T2 -- see linux-nat.c:select_event_lwp's "Give preference to any LWP that is being single-stepped.". When that path is taken, SIGPROF is left pending to report later pre-emptively on the next resume attempt -- see linux_nat_resume's "LLR: Short circuiting for status". Even if GDB didn't give preference to the stepping thread's event, say, it reported the SIGPROF event for T2 first, leaving the SIGTRAP for T1 pending, that'd be okay in terms of overstepping concerns, because then what we have would be, before: xxx0000 <<< T1 stopped here xxx0001 xxx0002 xxx0003 after single-step request, T2 hits SIGPROF, and T1 stops at xxx0001 with SIGTRAP: xxx0000 xxx0001 <<< T1 stopped here w/SIGTRAP xxx0002 xxx0003 and then GDB reports T2's SIGPROF to the core, leaving T1's SIGTRAP pending. Assuming SIGPROF is set to "handle SIGPROF nostop ...", infrun's switch_back_to_stepping_thread would switch back to T1, and tell it to single-step. This is where I was blurry and worrying about overstepping, because T1 is _already_ at xxx0001 but the core wasn't told about it. So I was mistakenly worried that that new step request would make the thread step to xxx0002. But, it's not really an issue, because T1 still has the SIGTRAP for previous finished step at xxx0001 pending, and won't really be resumed (again, that's linux_nat_resume's "LLR: Short circuiting for status".). Now, the reason the backend prefers reporting the stepping thread's SIGTRAPs first, is because if T1 has a SIGTRAP pending for a previous step request, and now GDB issues a continue on T1 instead of a step, and T1 reports the SIGTRAP, the core will be confused over that SIGTRAP (because from the core's perpective, the thread wasn't told to step), and report it to the user as a spurious SIGTRAP. An alternative way to handle this would be for the backend to keep track of the fact that the pending SIGTRAP was for a step request, and seeing a continue request, discard the pending SIGTRAP and indeed continue the thread free. (gdbserver's linux-low.c used to do this at some point, and then I switched it to do what gdb's linux-nat.c does, IIRC.) -- Pedro Alves