From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 2924 invoked by alias); 15 Sep 2004 18:21:55 -0000 Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sources.redhat.com Received: (qmail 2914 invoked from network); 15 Sep 2004 18:21:53 -0000 Received: from unknown (HELO nevyn.them.org) (66.93.172.17) by sourceware.org with SMTP; 15 Sep 2004 18:21:53 -0000 Received: from drow by nevyn.them.org with local (Exim 4.34 #1 (Debian)) id 1C7ePh-000614-A8; Wed, 15 Sep 2004 14:21:53 -0400 Date: Wed, 15 Sep 2004 18:21:00 -0000 From: Daniel Jacobowitz To: Jeff Johnston Cc: gdb-patches@sources.redhat.com Subject: Re: [RFC]: Ugly thread step situation Message-ID: <20040915182153.GA10009@nevyn.them.org> Mail-Followup-To: Jeff Johnston , gdb-patches@sources.redhat.com References: <20040914232505.GA16818@nevyn.them.org> <41487D8D.7060800@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <41487D8D.7060800@redhat.com> User-Agent: Mutt/1.5.5.1+cvs20040105i X-SW-Source: 2004-09/txt/msg00260.txt.bz2 On Wed, Sep 15, 2004 at 01:36:13PM -0400, Jeff Johnston wrote: > Daniel Jacobowitz wrote: > >On Tue, Sep 14, 2004 at 05:44:47PM -0400, jjohnstn wrote: > > > >>I recently tracked down a problem with gdb on RHEL3 Linux regarding > >>stepping threads. What happens is that in some instances, lin-lwp.c is > >>asked to step the thread of interest. We then wait on all threads. Due > >>to some form of race condition, the wait does not get back the trap from > >>the stepped thread. If we have a number of waiting events (e.g. thread > >>create events, other breakpoints), lin-lwp picks one of them. > > > > > >Could you explain this bit a little more? What comes back instead for > >the thread that was stepping? Do we stop it with a SIGSTOP? > > > >Is there a testcase? > > > > Attached. This was the test-case given for Red Hat Bugzilla bug 130896. Given Andrew's caveat, I will just look at the trace instead. I see that bug isn't publicly visible. > LLR: PTRACE_SINGLESTEP process 10066, 0 (resume event thread) > LLW: waitpid 10062 received Trace/breakpoint trap (stopped) > LLTA: PTRACE_PEEKUSER LWP 10062, 0, 0 (OK) > LLW: Candidate event Trace/breakpoint trap (stopped) in LWP 10062. > SC: kill LWP 10068 **** > SC: lwp kill 0 ERRNO-OK > SC: kill LWP 10067 **** > SC: lwp kill 0 ERRNO-OK > SC: kill LWP 10066 **** > SC: lwp kill 0 ERRNO-OK > WL: waitpid LWP 10068 received Stopped (signal) (stopped) > WL: waitpid LWP 10067 received Trace/breakpoint trap (stopped) > PTRACE_CONT LWP 10067, 0, 0 (OK) > SWC: Candidate SIGTRAP event in LWP 10067 > WL: waitpid LWP 10067 received Stopped (signal) (stopped) > WL: waitpid LWP 10066 received Stopped (signal) (stopped) > FC: LP has pending status 00057f > SEL: Found 2 SIGTRAP events, selecting #0 <=== should not happen > CBC: Push back breakpoint for LWP 10062 > LLW: trap_ptid is LWP 10067. > > Program received signal SIGTRAP, Trace/breakpoint trap. > [Switching to Thread -1231361104 (LWP 10067)] > 0x080489cb in synchronize (tid=3063606192) at gdbtest.C:18 > 18 pthread_mutex_lock(&mutex); In this trace, we single-step process 10066 [we resumed all the other threads first - I snipped too much]. Then we wait. We receive an event from 10062. We stop all other threads. We receive another event from 10067, and a SIGSTOP from 10066 and 10068. So the thread had not yet been scheduled when we sent it the SIGSTOP, and it hasn't done the single step yet. If I am interpreting your patch correctly, it will cause us to use waitpid without __WNOHANG on LWP 10066. If 10066 is stepping over a blocking syscall, and another thread hits a breakpoint, we want to display the breakpoint. Otherwise we'll get the wrong behavior in the non-race-condition situation where one thread goes to sleep. Infrun needs to verify that the thread which had an event was the thread which stepped. If it isn't, then hopefully the single-step has not happened - this will be true using x86 and PTRACE_SINGLESTEP, but may require some other changes for software singlestep that I think I have queued somewhere. -- Daniel Jacobowitz