From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-109735-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 2158 invoked by alias); 24 Jan 2014 12:44:16 -0000
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
Received: (qmail 2147 invoked by uid 89); 24 Jan 2014 12:44:15 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 24 Jan 2014 12:44:14 +0000
Received: from int-mx01.intmail.prod.int.phx2.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11])	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s0OCiC9c020141	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);	Fri, 24 Jan 2014 07:44:12 -0500
Received: from [127.0.0.1] (ovpn01.gateway.prod.ext.ams2.redhat.com [10.39.146.11])	by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id s0OCiAv0027515;	Fri, 24 Jan 2014 07:44:11 -0500
Message-ID: <52E2601A.6080202@redhat.com>
Date: Fri, 24 Jan 2014 12:44:00 -0000
From: Pedro Alves <palves@redhat.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130625 Thunderbird/17.0.7
MIME-Version: 1.0
To: Sterling Augustine <saugustine@google.com>
CC: gdb-patches <gdb-patches@sourceware.org>
Subject: Re: [PATCH] RFC: All stepping threads need the same checks and cleanup as the currently stepping thread
References: <CAEG7qUzQ2YR+89he=hMa7eNZb9fMxp1o20ActJvXx10aeVOySw@mail.gmail.com> <52DFCFCF.4030101@redhat.com> <CAEG7qUzG1K08CLgeXcv6U7gZ+ryVuXfS9+9R9e6NBzK7EAEcgg@mail.gmail.com> <52E16518.1020800@redhat.com>
In-Reply-To: <52E16518.1020800@redhat.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-SW-Source: 2014-01/txt/msg00937.txt.bz2

On 01/23/2014 06:53 PM, Pedro Alves wrote:

> I "rejected" it in the sense that that loses signals as mentioned in the
> second patch.
> 
>> The second patch you made breaks things by bypassing the
>> step-range fix up in process_event_stop_test.
> 
> Of course, for hardware step targets, it's essentially a revertion
> of the first patch.  It's unfinished!  It reverted the first patch,
> and then fixed things only for software step targets.  It's needs
> finishing for hardware step targets...

So I thought about this some more, and I think I now
recall/realize why I reverted the hardware step path to just do
keep_going like before.  It's actually the same reason I failed to
create a reproducer that oversteps before.  Assuming a well behaved
target backend (and no bugs in the core infrun code), it shouldn't
really be possible to overstep here.  Consider, e.g. a thread
T1 is stopped at xxx0000, below.

 [ADDR ]
 xxx0000  <<< T1 stopped here
 xxx0001
 xxx0002
 xxx0003

GDB tells this thread to single-step.  Another thread (T2)
hits some other event (say SIGPROF).  The backend starts
stopping all threads in order to report that event to the
core, and while doing that, either T1 manages to finish
the step, or the event in SIGPROF was reported so fast,
that T1 didn't really get a chance to be scheduled by the
kernel, and so it reports a SIGSTOP (or some other async
signal).  If T1 does finish the step and reports a SIGTRAP, then
the target backend prefers reporting that event first over
the SIGPROF in T2 -- see linux-nat.c:select_event_lwp's
"Give preference to any LWP that is being single-stepped.".
When that path is taken, SIGPROF is left pending to report
later pre-emptively on the next resume attempt -- see
linux_nat_resume's "LLR: Short circuiting for status".

Even if GDB didn't give preference to the stepping thread's
event, say, it reported the SIGPROF event for T2 first, leaving the
SIGTRAP for T1 pending, that'd be okay in terms of overstepping
concerns, because then what we have would be, before:

 xxx0000  <<< T1 stopped here
 xxx0001
 xxx0002
 xxx0003

after single-step request, T2 hits SIGPROF, and T1
stops at xxx0001 with SIGTRAP:

 xxx0000
 xxx0001  <<< T1 stopped here w/SIGTRAP
 xxx0002
 xxx0003

and then GDB reports T2's SIGPROF to the core, leaving T1's
SIGTRAP pending.  Assuming SIGPROF is set to
"handle SIGPROF nostop ...", infrun's switch_back_to_stepping_thread
would switch back to T1, and tell it to single-step.  This is where
I was blurry and worrying about overstepping, because T1 is _already_
at xxx0001 but the core wasn't told about it.  So I was mistakenly worried
that that new step request would make the thread step to xxx0002.  But,
it's not really an issue, because T1 still has the SIGTRAP for previous
finished step at xxx0001 pending, and won't really be resumed (again,
that's linux_nat_resume's  "LLR: Short circuiting for status".).

Now, the reason the backend prefers reporting the stepping thread's
SIGTRAPs first, is because if T1 has a SIGTRAP pending for a previous
step request, and now GDB issues a continue on T1 instead of a step,
and T1 reports the SIGTRAP, the core will be confused over that SIGTRAP
(because from the core's perpective, the thread wasn't told to step),
and report it to the user as a spurious SIGTRAP.  An alternative way
to handle this would be for the backend to keep track of the fact
that the pending SIGTRAP was for a step request, and seeing a
continue request, discard the pending SIGTRAP and indeed continue
the thread free.  (gdbserver's linux-low.c used to do this at
some point, and then I switched it to do what gdb's linux-nat.c
does, IIRC.)

-- 
Pedro Alves