From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24581 invoked by alias); 14 Sep 2004 23:25:07 -0000 Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sources.redhat.com Received: (qmail 24564 invoked from network); 14 Sep 2004 23:25:06 -0000 Received: from unknown (HELO nevyn.them.org) (66.93.172.17) by sourceware.org with SMTP; 14 Sep 2004 23:25:06 -0000 Received: from drow by nevyn.them.org with local (Exim 4.34 #1 (Debian)) id 1C7MfZ-00065Y-VQ; Tue, 14 Sep 2004 19:25:06 -0400 Date: Tue, 14 Sep 2004 23:25:00 -0000 From: Daniel Jacobowitz To: jjohnstn Cc: gdb-patches@sources.redhat.com Subject: Re: [RFC]: Ugly thread step situation Message-ID: <20040914232505.GA16818@nevyn.them.org> Mail-Followup-To: jjohnstn , gdb-patches@sources.redhat.com References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.5.1+cvs20040105i X-SW-Source: 2004-09/txt/msg00243.txt.bz2 On Tue, Sep 14, 2004 at 05:44:47PM -0400, jjohnstn wrote: > I recently tracked down a problem with gdb on RHEL3 Linux regarding > stepping threads. What happens is that in some instances, lin-lwp.c is > asked to step the thread of interest. We then wait on all threads. Due > to some form of race condition, the wait does not get back the trap from > the stepped thread. If we have a number of waiting events (e.g. thread > create events, other breakpoints), lin-lwp picks one of them. Could you explain this bit a little more? What comes back instead for the thread that was stepping? Do we stop it with a SIGSTOP? Is there a testcase? > Now it gets interesting. Infrun.c thinks the current thread is being > stepped and isn't ready for a breakpoint coming back. On x86, it makes a > miscalculation of the pc value (for a breakpoint it should back up 1, for > a step it doesn't have to). We end up pointing at an invalid pc (we > didn't back up 1) and everything falls apart from there. > > To fix this quickly, I added the accompanying patch to lin-lwp.c. What it > does is ensure that we wait on any currently stepping lwp. In truth, this > isn't as bad as it sounds. The lin-lwp code later on is set up to pick > the stepping lwp over all other events. This just keeps the scenario > above from occurring. > > Obviously, this doesn't solve everything. Perhaps the decrement of the pc > needs to be done once we have established whether the thread has changed > underneath us. We also could use a hook to run the lwp list and find out > if the current lwp was stepping or encountered a breakpoint. > > Anyway, if the consensus is that the patch is helpful in the short-term, I > am more than happy to check it in. > > -- Jeff J. > > 2004-09-14 Jeff Johnston > > * lin-lwp.c (find_singlestep_lwp_callback): New static function. > (lin_lwp_wait): Change code to specifically wait on any LWP > that is currently stepping. This sounds sort of like a problem I debugged on MIPS and hppa, but never managed to reproduce. I had tabled the patch until I had more time to look at it - always a mistake. The same patch may help here. Could you tell me what resume_ptid is before the call to target_resume, in resume? The call in which we request the single-step, I mean. -- Daniel Jacobowitz