From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 6325 invoked by alias); 21 Feb 2006 21:53:08 -0000 Received: (qmail 6305 invoked by uid 22791); 21 Feb 2006 21:53:06 -0000 X-Spam-Check-By: sourceware.org Received: from nevyn.them.org (HELO nevyn.them.org) (66.93.172.17) by sourceware.org (qpsmtpd/0.31.1) with ESMTP; Tue, 21 Feb 2006 21:53:04 +0000 Received: from drow by nevyn.them.org with local (Exim 4.54) id 1FBfQe-0000E0-Ie; Tue, 21 Feb 2006 16:52:16 -0500 Date: Tue, 21 Feb 2006 22:59:00 -0000 From: Daniel Jacobowitz To: Mark Kettenis Cc: sjackman@gmail.com, gdb-patches@sourceware.org Subject: Re: Fix a crash when stepping and unwinding fails Message-ID: <20060221215216.GA440@nevyn.them.org> Mail-Followup-To: Mark Kettenis , sjackman@gmail.com, gdb-patches@sourceware.org References: <20060220220331.GA29363@nevyn.them.org> <200602212015.k1LKFGrj005090@elgar.sibelius.xs4all.nl> <20060221202833.GA30161@nevyn.them.org> <200602212050.k1LKowmP012208@elgar.sibelius.xs4all.nl> <20060221205748.GA31483@nevyn.them.org> <200602212134.k1LLY3Sq028067@elgar.sibelius.xs4all.nl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200602212134.k1LLY3Sq028067@elgar.sibelius.xs4all.nl> User-Agent: Mutt/1.5.8i X-IsSubscribed: yes Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2006-02/txt/msg00413.txt.bz2 On Tue, Feb 21, 2006 at 10:34:03PM +0100, Mark Kettenis wrote: > > It's this assumption I don't think is right. I have plenty of > > anecdotal evidence from yesterday that it's not right, in fact. > > If the prologue analyzer can't handle the code at $pc, then > > what do you expect it to put into the frame ID? Or if it thinks we > > are in the outermost frame? > > But get_current_frame() should be the innermost frame when we execute > this code. So the prologue analyzer can't be involved here. However, Remember the off-by-one distinction between prev_register and this_id? The sentinel frame and its unwinder provide the current frame's registers via their prev_register method. But the current frame's unwinder is needed to compute the current frame's ID, and that's the prologue analyzer in this case... > yes, it seems that step_frame_idd can end up as null_frame_id, if > get_current_frame() is also the outermost frame at the same time. ... so the outermost frame will generally have null_frame_id. Not sure if that's true when we stop because we encounter main, but it's definitely true when we run out of unwindable frames. > > > How can we hit the last frame? If we're hitting the last frame, where > > > did we come from? > > > > > > It may very well be that there are GDB bugs that make step_frame_id > > > equal to null_frame_id. If we can't trace those bugs right now, we > > > should probably sprinkle a few gdb_assert()'s around and try to solve > > > the issues when we hit those. > > > > We use the null frame ID to represent the outermost frame. If we can't > > find another frame outer to this one, then we assume this one is the > > outermost. > > Yes, it seems there are issues here. The frame ID is supposed to be > unique for a particular frame, yet there's a possibility that two > distinct frames both end up with the null frame ID. Are there? I think there's only one - the outermost. We've only got that and the sentinel frame, and the sentinel frame I think doesn't have an ID. Perhaps part of the problem here is that step_frame_id is set to null_frame_id when it is invalid; maybe we should keep that separate. I don't think it would help me though. Perhaps the real problem is the use of null_frame_id for both the outermost frame and completely unknown frames. It would be nice if we could tell here: if (frame_id_eq (frame_unwind_id (get_current_frame ()), step_frame_id)) that frame_unwind_id has returned something completely invalid instead of the outermost frame. One way to do that in our current representation would be to check that the frame ID for the current frame is not null_ptid. > > Just to sketch out my example a bit more: the embedded OS I'm debugging > > lives in ROM. The application I've supplied to GDB lives in RAM. In > > some later stage of the project, hopefully, I will have GDB magically > > load some other ELF files (that I don't have yet) to cover the ROM > > code; but right now I can't do that and there's no guarantee I'll have > > debug info covering all of it anyway. So we're executing code way > > out in the boondocks. GDB doesn't have any way on this platform > > (ARM Thumb) to guess where the start of a function is if it doesn't > > have a symbol table; so it can't be sure that we've really reached the > > first instruction of a function, so it has no idea whether $lr is valid > > or not. > > But that really means that we shouldn't be messing with step-resume > breakpoints here. The whole notion of functions that can be stepped > into isn't there. Yes, it is. I've executed "step" in a place where I do have symbol information (and working unwinders). It's taken me into a place where I don't (a DLL in ROM). Since I don't have debug information any more GDB would like to step back out to the call site, except it fails because we've moved out of its known area. -- Daniel Jacobowitz CodeSourcery