From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17857 invoked by alias); 22 Feb 2006 21:55:02 -0000 Received: (qmail 17803 invoked by uid 22791); 22 Feb 2006 21:55:01 -0000 X-Spam-Check-By: sourceware.org Received: from sibelius.xs4all.nl (HELO sibelius.xs4all.nl) (82.92.89.47) by sourceware.org (qpsmtpd/0.31) with ESMTP; Wed, 22 Feb 2006 21:55:00 +0000 Received: from elgar.sibelius.xs4all.nl (root@elgar.sibelius.xs4all.nl [192.168.0.2]) by sibelius.xs4all.nl (8.13.4/8.13.4) with ESMTP id k1MLsOMj017075; Wed, 22 Feb 2006 22:54:24 +0100 (CET) Received: from elgar.sibelius.xs4all.nl (kettenis@localhost.sibelius.xs4all.nl [127.0.0.1]) by elgar.sibelius.xs4all.nl (8.13.4/8.13.3) with ESMTP id k1MLsOpK012730; Wed, 22 Feb 2006 22:54:24 +0100 (CET) Received: (from kettenis@localhost) by elgar.sibelius.xs4all.nl (8.13.4/8.13.4/Submit) id k1MLsOxa008001; Wed, 22 Feb 2006 22:54:24 +0100 (CET) Date: Wed, 22 Feb 2006 22:00:00 -0000 Message-Id: <200602222154.k1MLsOxa008001@elgar.sibelius.xs4all.nl> From: Mark Kettenis To: drow@false.org CC: sjackman@gmail.com, gdb-patches@sourceware.org In-reply-to: <20060221215216.GA440@nevyn.them.org> (message from Daniel Jacobowitz on Tue, 21 Feb 2006 16:52:16 -0500) Subject: Re: Fix a crash when stepping and unwinding fails References: <20060220220331.GA29363@nevyn.them.org> <200602212015.k1LKFGrj005090@elgar.sibelius.xs4all.nl> <20060221202833.GA30161@nevyn.them.org> <200602212050.k1LKowmP012208@elgar.sibelius.xs4all.nl> <20060221205748.GA31483@nevyn.them.org> <200602212134.k1LLY3Sq028067@elgar.sibelius.xs4all.nl> <20060221215216.GA440@nevyn.them.org> Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2006-02/txt/msg00422.txt.bz2 > Date: Tue, 21 Feb 2006 16:52:16 -0500 > From: Daniel Jacobowitz > > > yes, it seems that step_frame_idd can end up as null_frame_id, if > > get_current_frame() is also the outermost frame at the same time. > > ... so the outermost frame will generally have null_frame_id. Not sure > if that's true when we stop because we encounter main If we can unwind from main, its frame ID will not be null_frame_id. > > > > How can we hit the last frame? If we're hitting the last frame, where > > > > did we come from? > > > > > > > > It may very well be that there are GDB bugs that make step_frame_id > > > > equal to null_frame_id. If we can't trace those bugs right now, we > > > > should probably sprinkle a few gdb_assert()'s around and try to solve > > > > the issues when we hit those. > > > > > > We use the null frame ID to represent the outermost frame. If we can't > > > find another frame outer to this one, then we assume this one is the > > > outermost. > > > > Yes, it seems there are issues here. The frame ID is supposed to be > > unique for a particular frame, yet there's a possibility that two > > distinct frames both end up with the null frame ID. > > Are there? I think there's only one - the outermost. We've only got > that and the sentinel frame, and the sentinel frame I think doesn't > have an ID. If we can't unwind from a frame for some reason, then its frame ID will also be null_frame_id. And in multi-threaded programs there can be multiple outermost frames. > Perhaps part of the problem here is that step_frame_id is set to > null_frame_id when it is invalid; maybe we should keep that separate. Not sure; it might not be necessary if we generate a proper frame ID for the outermost frame. > I don't think it would help me though. Perhaps the real problem is the > use of null_frame_id for both the outermost frame and completely > unknown frames. It would be nice if we could tell here: > > if (frame_id_eq (frame_unwind_id (get_current_frame ()), step_frame_id)) > > that frame_unwind_id has returned something completely invalid instead > of the outermost frame Indeed. > One way to do that in our current representation would be to check that > the frame ID for the current frame is not null_ptid. Yes, I think it makes sense to punt trying to insert a step-resume breakpoint earlier than you do in your patch. > > > Just to sketch out my example a bit more: the embedded OS I'm debugging > > > lives in ROM. The application I've supplied to GDB lives in RAM. In > > > some later stage of the project, hopefully, I will have GDB magically > > > load some other ELF files (that I don't have yet) to cover the ROM > > > code; but right now I can't do that and there's no guarantee I'll have > > > debug info covering all of it anyway. So we're executing code way > > > out in the boondocks. GDB doesn't have any way on this platform > > > (ARM Thumb) to guess where the start of a function is if it doesn't > > > have a symbol table; so it can't be sure that we've really reached the > > > first instruction of a function, so it has no idea whether $lr is valid > > > or not. > > > > But that really means that we shouldn't be messing with step-resume > > breakpoints here. The whole notion of functions that can be stepped > > into isn't there. > > Yes, it is. I've executed "step" in a place where I do have symbol > information (and working unwinders). It's taken me into a place where > I don't (a DLL in ROM). Since I don't have debug information any more > GDB would like to step back out to the call site, except it fails > because we've moved out of its known area. Ah, ok. But that means that step_frame_id really shouldn't be equal to null_frame_id. It will certainly not be null_frame_id if it isn't the outermost frame. And whether the place where you issued "step" is the outermost frame or not should not influence GDB's behaviour here. Mark