From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-109656-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 22160 invoked by alias); 22 Jan 2014 14:04:07 -0000
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
Received: (qmail 22144 invoked by uid 89); 22 Jan 2014 14:04:06 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 22 Jan 2014 14:04:05 +0000
Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22])	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s0ME416s011836	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);	Wed, 22 Jan 2014 09:04:02 -0500
Received: from [127.0.0.1] (ovpn01.gateway.prod.ext.ams2.redhat.com [10.39.146.11])	by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id s0ME3xKJ015560;	Wed, 22 Jan 2014 09:04:00 -0500
Message-ID: <52DFCFCF.4030101@redhat.com>
Date: Wed, 22 Jan 2014 14:04:00 -0000
From: Pedro Alves <palves@redhat.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130625 Thunderbird/17.0.7
MIME-Version: 1.0
To: Sterling Augustine <saugustine@google.com>
CC: gdb-patches <gdb-patches@sourceware.org>
Subject: Re: [PATCH] RFC: All stepping threads need the same checks and cleanup as the currently stepping thread
References: <CAEG7qUzQ2YR+89he=hMa7eNZb9fMxp1o20ActJvXx10aeVOySw@mail.gmail.com>
In-Reply-To: <CAEG7qUzQ2YR+89he=hMa7eNZb9fMxp1o20ActJvXx10aeVOySw@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-SW-Source: 2014-01/txt/msg00858.txt.bz2

On 01/21/2014 10:17 PM, Sterling Augustine wrote:

> But the problem is that the thread out of the step range is not the
> currently stepped thread. 

Then I'm quite confused.  Each thread has its own step range.  How come
the thread that is not stepped has a step range at all then?

> Process_event_stop_test calls
> switch_back_to_stepped_thread, which, in turn, calls resume, bypassing
> the extra logic in process_event_stop_test to fix up the step range,
> and leading to the assertion error.

The issue seems to me, as previously discussed, not really about missing
the "fix up" of the step range, but rather that we overstep the thread
by mistake.  Running the thread through process_event_stop_test makes us
detect that the step finished (before we ever get to fix up the step
range).  That is, we switch back to the stepping thread,
and re-step it, ignoring the possibility that that thread might have
already moved past the step range, but not have had a chance to report
that trap to the core yet (because events are serialized).

The thing missing is a testcase clearly showing that that's indeed
the issue in question.  I spent a few days trying to write one from
scratch a while ago, but failed, because linux-nat.c always gives
preference to reporting the stepping/SIGTRAP thread if there are multiple
simultaneous events, and it seemed like another signal needs to be
involved to trigger this.
Perhaps we could confirm all this already in a log produced by
your extra debug outputs ran against your big app?

> But I don't see any reason to assume that only the current thread
> would need the additional cleanup found in process_event_stop_test. In
> fact, switch_back_to_stepped_thread has a special case (hidden in
> currently_stepping_or_nexting_callback), to _prevent_ it from
> restarting the current thread, presumably so it that thread can get
> the additional cleanup.
> 
> The enclosed patch does two things:
> 
> 1. Adds a large amount of tracing, which helped me diagnose the problem.
> 2. Changes switch_back_to_stepped_thread to still switch back to a
> stepped thread, but to avoid restarting it, allowing the additional
> checks in process_event_stop_test to work their magic.

I'm not convinced the first two branches in switch_back_to_stepped_thread
should be changed at all.  So without those that reduces to exactly the
original patch I had shown you originally:

 https://github.com/palves/gdb/commit/b6b55ba610f8db5d89ec7405c93013a10d9a1c20

Does that alone fix things for you?


In that branch, I then later rewrote that fix differently:

 https://github.com/palves/gdb/commit/1d56ddf439b6f7e7fa9759cf1f8e02106eea6af5

The idea of that "better fix" was to handle the case mentioned in
this comment:

+       There might be some cases where this loses signal
+       information, if a signal has arrived at exactly the same
+       time that the PC changed, but this is the best we can do
+       with the information available.

by setting a breakpoint at the current PC, and re-resuming the thread.
That means that if there was indeed some other signal/event pending,
we'd collect it first.  But that's unfinished, and breaks hardware-step
targets again in the process, for it only handles software-step targets.
The thing preventing moving this forward is a testcase (or a log showing
clearly that the problem is what I say above it is, which should show
the steps needed to construct a testcase).

-- 
Pedro Alves