From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-35822-listarch-gdb-patches=sources.redhat.com@sources.redhat.com>
Received: (qmail 2924 invoked by alias); 15 Sep 2004 18:21:55 -0000
Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-patches-subscribe@sources.redhat.com>
List-Archive: <http://sources.redhat.com/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sources.redhat.com>
List-Help: <mailto:gdb-patches-help@sources.redhat.com>, <http://sources.redhat.com/ml/#faqs>
Sender: gdb-patches-owner@sources.redhat.com
Received: (qmail 2914 invoked from network); 15 Sep 2004 18:21:53 -0000
Received: from unknown (HELO nevyn.them.org) (66.93.172.17)
  by sourceware.org with SMTP; 15 Sep 2004 18:21:53 -0000
Received: from drow by nevyn.them.org with local (Exim 4.34 #1 (Debian))
	id 1C7ePh-000614-A8; Wed, 15 Sep 2004 14:21:53 -0400
Date: Wed, 15 Sep 2004 18:21:00 -0000
From: Daniel Jacobowitz <drow@false.org>
To: Jeff Johnston <jjohnstn@redhat.com>
Cc: gdb-patches@sources.redhat.com
Subject: Re: [RFC]: Ugly thread step situation
Message-ID: <20040915182153.GA10009@nevyn.them.org>
Mail-Followup-To: Jeff Johnston <jjohnstn@redhat.com>,
	gdb-patches@sources.redhat.com
References: <Pine.LNX.4.44.0409141729550.8039-200000@tooth.toronto.redhat.com> <20040914232505.GA16818@nevyn.them.org> <41487D8D.7060800@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <41487D8D.7060800@redhat.com>
User-Agent: Mutt/1.5.5.1+cvs20040105i
X-SW-Source: 2004-09/txt/msg00260.txt.bz2

On Wed, Sep 15, 2004 at 01:36:13PM -0400, Jeff Johnston wrote:
> Daniel Jacobowitz wrote:
> >On Tue, Sep 14, 2004 at 05:44:47PM -0400, jjohnstn wrote:
> >
> >>I recently tracked down a problem with gdb on RHEL3 Linux regarding 
> >>stepping threads.  What happens is that in some instances, lin-lwp.c is 
> >>asked to step the thread of interest.  We then wait on all threads.  Due 
> >>to some form of race condition, the wait does not get back the trap from 
> >>the stepped thread.  If we have a number of waiting events (e.g. thread 
> >>create events, other breakpoints), lin-lwp picks one of them.
> >
> >
> >Could you explain this bit a little more?  What comes back instead for
> >the thread that was stepping?  Do we stop it with a SIGSTOP?
> >
> >Is there a testcase?
> >
> 
> Attached.  This was the test-case given for Red Hat Bugzilla bug 130896.

Given Andrew's caveat, I will just look at the trace instead.  I see
that bug isn't publicly visible.

> LLR: PTRACE_SINGLESTEP process 10066, 0 (resume event thread)
> LLW: waitpid 10062 received Trace/breakpoint trap (stopped)
> LLTA: PTRACE_PEEKUSER LWP 10062, 0, 0 (OK)
> LLW: Candidate event Trace/breakpoint trap (stopped) in LWP 10062.
> SC:  kill LWP 10068 **<SIGSTOP>**
> SC:  lwp kill 0 ERRNO-OK
> SC:  kill LWP 10067 **<SIGSTOP>**
> SC:  lwp kill 0 ERRNO-OK
> SC:  kill LWP 10066 **<SIGSTOP>**
> SC:  lwp kill 0 ERRNO-OK
> WL: waitpid LWP 10068 received Stopped (signal) (stopped)
> WL: waitpid LWP 10067 received Trace/breakpoint trap (stopped)
> PTRACE_CONT LWP 10067, 0, 0 (OK)
> SWC: Candidate SIGTRAP event in LWP 10067
> WL: waitpid LWP 10067 received Stopped (signal) (stopped)
> WL: waitpid LWP 10066 received Stopped (signal) (stopped)
> FC: LP has pending status 00057f
> SEL: Found 2 SIGTRAP events, selecting #0  <=== should not happen
> CBC: Push back breakpoint for LWP 10062
> LLW: trap_ptid is LWP 10067.
> 
> Program received signal SIGTRAP, Trace/breakpoint trap.
> [Switching to Thread -1231361104 (LWP 10067)]
> 0x080489cb in synchronize (tid=3063606192) at gdbtest.C:18
> 18	    pthread_mutex_lock(&mutex);

In this trace, we single-step process 10066 [we resumed all the other
threads first - I snipped too much].  Then we wait.  We receive an
event from 10062.  We stop all other threads.  We receive another event
from 10067, and a SIGSTOP from 10066 and 10068.  So the thread had not
yet been scheduled when we sent it the SIGSTOP, and it hasn't done the
single step yet.

If I am interpreting your patch correctly, it will cause us to use
waitpid without __WNOHANG on LWP 10066.  If 10066 is stepping over a
blocking syscall, and another thread hits a breakpoint, we want to
display the breakpoint.  Otherwise we'll get the wrong behavior in the
non-race-condition situation where one thread goes to sleep.

Infrun needs to verify that the thread which had an event was the
thread which stepped.  If it isn't, then hopefully the single-step has
not happened - this will be true using x86 and PTRACE_SINGLESTEP, but
may require some other changes for software singlestep that I think I
have queued somewhere.


-- 
Daniel Jacobowitz