From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 29217 invoked by alias); 13 May 2006 16:43:35 -0000 Received: (qmail 29205 invoked by uid 22791); 13 May 2006 16:43:34 -0000 X-Spam-Check-By: sourceware.org Received: from sibelius.xs4all.nl (HELO sibelius.xs4all.nl) (82.92.89.47) by sourceware.org (qpsmtpd/0.31) with ESMTP; Sat, 13 May 2006 16:43:31 +0000 Received: from elgar.sibelius.xs4all.nl (root@elgar.sibelius.xs4all.nl [192.168.0.2]) by sibelius.xs4all.nl (8.13.4/8.13.4) with ESMTP id k4DGgjMu011600; Sat, 13 May 2006 18:42:45 +0200 (CEST) Received: from elgar.sibelius.xs4all.nl (kettenis@localhost.sibelius.xs4all.nl [127.0.0.1]) by elgar.sibelius.xs4all.nl (8.13.6/8.13.6) with ESMTP id k4DGgivN007156; Sat, 13 May 2006 18:42:44 +0200 (CEST) Received: (from kettenis@localhost) by elgar.sibelius.xs4all.nl (8.13.6/8.13.6/Submit) id k4DGgiqa018273; Sat, 13 May 2006 18:42:44 +0200 (CEST) Date: Sat, 13 May 2006 16:49:00 -0000 Message-Id: <200605131642.k4DGgiqa018273@elgar.sibelius.xs4all.nl> From: Mark Kettenis To: drow@false.org CC: gdb-patches@sourceware.org In-reply-to: <20060513151338.GB3721@nevyn.them.org> (message from Daniel Jacobowitz on Sat, 13 May 2006 11:13:38 -0400) Subject: Re: [RFC] Move the frame zero PC check earlier References: <20060510180312.GA12606@nevyn.them.org> <200605130946.k4D9kZ2M001331@elgar.sibelius.xs4all.nl> <20060513151338.GB3721@nevyn.them.org> Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2006-05/txt/msg00299.txt.bz2 > Date: Sat, 13 May 2006 11:13:38 -0400 > From: Daniel Jacobowitz > > On Sat, May 13, 2006 at 11:46:35AM +0200, Mark Kettenis wrote: > > > Tested on x86_64-pc-linux-gnu, and by hand against SymbianOS, > > > where it gives much nicer looking backtraces. > > > > Our goal shouldn't be nicer looking backtraces. It should be > > providing the user with all information needed to fix bugs in their > > programs. Your patch is removing such a bit of information, and > > therefore unacceptable to me. Sorry :(. > > Sorry, Mark, I completely disagree with you on this issue. Let's at > least discuss it, please? No problem. > You said that removing the 0x00000000 frame removed information. I > disagree. It's not a valid frame, "up"'ing into it isn't going to give > you anything sensible for saved registers unless the return address was > the only thing on the stack that got clobbered (fairly rare). Sure, and I wasn't arguing that the frame itself was of any use. But the fact that it gets printed in the backtrace is useful, since it indicates that GDB fell of the stack while doing the backtrace. > Instead, with the patch, the backtrace will appear to just suddenly > stop. Yes, and that's exactly my problem. It will be much more difficult to spot that GDB just fell off the stack. Another problem is that this makes the PC == 0 case even more special than the PC != 0 case, where we still will print the bogus frame in the backtrace. > If the function at the bottom of the backtrace isn't an entry > point, the fact that the backtrace has just suddenly stopped is a > pretty big clue that the stack is horked. Sure, but you won't notice until you start actually looking at the function names in the backtrace. At first sight the backtrace will look perfectly ok. > Explanatory output ("why did that backtrace stop?") is available in > "set debug frame 1". If you think it's routinely useful, then we can > make it available in some prettier form, perhaps in "info frame" for > the outermost frame. If we can reliably tell that a frame is the outermost frame, we might indeed print that as part of "info frame". > Also, I don't think that "gdb is confused" errors are as desirable as > you think they are. This extra frame has been reported to me as a bug > at least three times that I can think of (twice for RTOSes and once for > Linux KGDB). I can imagine you'd like to get these people off your back. And perhaps they're right that the extra frame is caused by a bug in GDB. But that bug is not the printing of the extra frame itself. The bug is GDB not being able to determine that it is at the end of the stack, which might actually be a bug in the compiler or system libraries they're using. > Such messages upset users when their stack is _not_ horked. For > example, when GDB's prologue unwinder can't handle a prologue for a > non-leaf function on the stack, often you'll get this "friendly" > message: > > error (_("Previous frame identical to this frame (corrupt stack?)")); > > I've had users come up to me and say that they wasted hours looking for > the stack corruption GDB was complaining about and in fact it was just > a weakness in the unwinder. Then we should improve the unwinder. If we didn't error out with that error, the backtrace would never end. > And Joel recently reported that Ada tasking generates this message > on at least one platform, and users are unhappy about that, too. IIRC this is a case where the outermost frame wasn't marked properly, or at least not detected as such by GDB. That's the problem that needs to be fixed. > I think that determining the end of stack cleanly is one of the more > important things for GDB to get right. Yes indeed. And one of your other mails (to which I didn't reply yet) tries to address that, and we certainly should do something like you wrote there. But the patch we're discussing here is just papering over the problems. > And when we've run out of useful information, the stack appears to > end, and we're quite justified in reporting that the stack ended. > It's quite complex enough already without reporting "but the end of > the stack looks a little funny to me...". No, if a stack doesn't end properly on a platform where it should end properly, that's useful information that should be reported to the user. Mark