[RFC] Move the frame zero PC check earlier

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

* [RFC] Move the frame zero PC check earlier
@ 2006-05-10 18:03 Daniel Jacobowitz
  2006-05-11 10:42 ` Andrew STUBBS
                   ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Daniel Jacobowitz @ 2006-05-10 18:03 UTC (permalink / raw)
  To: gdb-patches

In this patch:

  http://sourceware.org/ml/gdb-patches/2004-12/msg00328.html

Andrew added a generic check for two "normal" frames in a row where the
older one has a saved PC of zero.  This is pretty well understood as a
convention for terminating backtraces - either intentionally or
unintentionally.

The problem is, given where that check takes place, it is in my opinion one
frame off from the actual problem.  You get backtraces that look like this:

(gdb) bt
#0  catcher (sig=11) at /space/fsf/commit/src/gdb/testsuite/gdb.base/savedregs.c:43
#1  0x00002ac148ec6e90 in killpg () from /lib/libc.so.6
#2  0x0000000000000000 in ?? ()

On the one hand, that third frame is a nice marker for this case in that it
explains noisily that GDB is confused.  In this case, if I point GDB at
.debug_frame data for my C library (conveniently found by default in
/usr/lib/debug) it successfully backtraces through to main, so that's good.

On the other hand, that third frame is ugly and generally useless.  The
attached patch moves the check one frame earlier, so that we only get this
backtrace:

#0  catcher (sig=11) at /space/fsf/commit/src/gdb/testsuite/gdb.base/savedregs.c:43
#1  0x00002ac148ec6e90 in killpg () from /lib/libc.so.6

Benefits: cleaner looking backtraces; the check is only done once per frame
instead of on every call to get_prev_frame.  Disadvantages: for all frames
at level > 0, it causes this check to be done when we look at THIS frame
instead of when we unwind to the PREV frame, which forces us to locate the
correct sniffer earlier.  If you have a sigtramp sniffer which reads memory,
this might cause an extra memory read.  However, it won't happen in the
critical stepping path - we don't need this check when "unwinding frame 0
from the sentinel frame - so I think this is an acceptable tradeoff.  For
any frame other than the top frame, the thing you're most likely to do with
it is backtrace right through it.

Tested on x86_64-pc-linux-gnu, and by hand against SymbianOS, where it gives
much nicer looking backtraces.

Any comments?

-- 
Daniel Jacobowitz
CodeSourcery

2006-05-10  Daniel Jacobowitz  <dan@codesourcery.com>

	* frame.c (get_prev_frame): Move check for pc == 0 ...
	(get_prev_frame_1): ... to here.

Index: frame.c
===================================================================
RCS file: /cvs/src/src/gdb/frame.c,v
retrieving revision 1.211
diff -u -p -r1.211 frame.c
--- frame.c	17 Dec 2005 22:33:59 -0000	1.211
+++ frame.c	10 May 2006 17:48:32 -0000
@@ -1123,6 +1123,26 @@ get_prev_frame_1 (struct frame_info *thi
   this_frame->prev = prev_frame;
   prev_frame->next = this_frame;

+  /* Now that the frame chain is in a consistant state, check whether
+     this frame is useful.  If it is not, unlink it.  Its storage will
+     be reclaimed the next time the frame cache is flushed, and we
+     will not try to unwind THIS_FRAME again.  */
+
+  /* Assume that the only way to get a zero PC is through something
+     like a SIGSEGV or a dummy frame, and hence that NORMAL frames
+     will never unwind a zero PC.  This will look up the unwinder
+     for the newly created frame, to determine its type.  */
+  if (prev_frame->level > 0
+      && get_frame_type (prev_frame) == NORMAL_FRAME
+      && get_frame_type (this_frame) == NORMAL_FRAME
+      && get_frame_pc (prev_frame) == 0)
+    {
+      if (frame_debug)
+	fprintf_unfiltered (gdb_stdlog, "-> // zero PC}\n");
+      this_frame->prev = NULL;
+      return NULL;
+    }
+
   if (frame_debug)
     {
       fprintf_unfiltered (gdb_stdlog, "-> ");
@@ -1300,18 +1320,6 @@ get_prev_frame (struct frame_info *this_
       return NULL;
     }

-  /* Assume that the only way to get a zero PC is through something
-     like a SIGSEGV or a dummy frame, and hence that NORMAL frames
-     will never unwind a zero PC.  */
-  if (this_frame->level > 0
-      && get_frame_type (this_frame) == NORMAL_FRAME
-      && get_frame_type (get_next_frame (this_frame)) == NORMAL_FRAME
-      && get_frame_pc (this_frame) == 0)
-    {
-      frame_debug_got_null_frame (gdb_stdlog, this_frame, "zero PC");
-      return NULL;
-    }
-
   return get_prev_frame_1 (this_frame);
 }

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-10 18:03 [RFC] Move the frame zero PC check earlier Daniel Jacobowitz
@ 2006-05-11 10:42 ` Andrew STUBBS
  2006-05-11 22:24 ` Jim Blandy
  2006-05-13 10:14 ` Mark Kettenis
  2 siblings, 0 replies; 38+ messages in thread
From: Andrew STUBBS @ 2006-05-11 10:42 UTC (permalink / raw)
  To: gdb-patches

Daniel Jacobowitz wrote:
> Tested on x86_64-pc-linux-gnu, and by hand against SymbianOS, where it gives
> much nicer looking backtraces.
> 
> Any comments?

Those frames have been bothering me for ages. This patch fixes the 
problem for sh-elf with ST's OS21 thread backtrace beautifully.

Thanks

Andrew


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-10 18:03 [RFC] Move the frame zero PC check earlier Daniel Jacobowitz
  2006-05-11 10:42 ` Andrew STUBBS
@ 2006-05-11 22:24 ` Jim Blandy
  2006-05-11 22:32   ` Daniel Jacobowitz
  2006-05-13 10:14 ` Mark Kettenis
  2 siblings, 1 reply; 38+ messages in thread
From: Jim Blandy @ 2006-05-11 22:24 UTC (permalink / raw)
  To: gdb-patches


It looks good to me.  I'm just curious why we even bother looking up
the frame type for the older frame whose PC is zero.  (I understand
this test is inherited from the existing code.)

Daniel Jacobowitz <drow@false.org> writes:

> In this patch:
>
>   http://sourceware.org/ml/gdb-patches/2004-12/msg00328.html
>
> Andrew added a generic check for two "normal" frames in a row where the
> older one has a saved PC of zero.  This is pretty well understood as a
> convention for terminating backtraces - either intentionally or
> unintentionally.
>
> The problem is, given where that check takes place, it is in my opinion one
> frame off from the actual problem.  You get backtraces that look like this:
>
> (gdb) bt
> #0  catcher (sig=11) at /space/fsf/commit/src/gdb/testsuite/gdb.base/savedregs.c:43
> #1  0x00002ac148ec6e90 in killpg () from /lib/libc.so.6
> #2  0x0000000000000000 in ?? ()
>
> On the one hand, that third frame is a nice marker for this case in that it
> explains noisily that GDB is confused.  In this case, if I point GDB at
> .debug_frame data for my C library (conveniently found by default in
> /usr/lib/debug) it successfully backtraces through to main, so that's good.
>
> On the other hand, that third frame is ugly and generally useless.  The
> attached patch moves the check one frame earlier, so that we only get this
> backtrace:
>
> #0  catcher (sig=11) at /space/fsf/commit/src/gdb/testsuite/gdb.base/savedregs.c:43
> #1  0x00002ac148ec6e90 in killpg () from /lib/libc.so.6
>
> Benefits: cleaner looking backtraces; the check is only done once per frame
> instead of on every call to get_prev_frame.  Disadvantages: for all frames
> at level > 0, it causes this check to be done when we look at THIS frame
> instead of when we unwind to the PREV frame, which forces us to locate the
> correct sniffer earlier.  If you have a sigtramp sniffer which reads memory,
> this might cause an extra memory read.  However, it won't happen in the
> critical stepping path - we don't need this check when "unwinding frame 0
> from the sentinel frame - so I think this is an acceptable tradeoff.  For
> any frame other than the top frame, the thing you're most likely to do with
> it is backtrace right through it.
>
> Tested on x86_64-pc-linux-gnu, and by hand against SymbianOS, where it gives
> much nicer looking backtraces.
>
> Any comments?
>
> -- 
> Daniel Jacobowitz
> CodeSourcery
>
> 2006-05-10  Daniel Jacobowitz  <dan@codesourcery.com>
>
> 	* frame.c (get_prev_frame): Move check for pc == 0 ...
> 	(get_prev_frame_1): ... to here.
>
> Index: frame.c
> ===================================================================
> RCS file: /cvs/src/src/gdb/frame.c,v
> retrieving revision 1.211
> diff -u -p -r1.211 frame.c
> --- frame.c	17 Dec 2005 22:33:59 -0000	1.211
> +++ frame.c	10 May 2006 17:48:32 -0000
> @@ -1123,6 +1123,26 @@ get_prev_frame_1 (struct frame_info *thi
>    this_frame->prev = prev_frame;
>    prev_frame->next = this_frame;
>  
> +  /* Now that the frame chain is in a consistant state, check whether
> +     this frame is useful.  If it is not, unlink it.  Its storage will
> +     be reclaimed the next time the frame cache is flushed, and we
> +     will not try to unwind THIS_FRAME again.  */
> +
> +  /* Assume that the only way to get a zero PC is through something
> +     like a SIGSEGV or a dummy frame, and hence that NORMAL frames
> +     will never unwind a zero PC.  This will look up the unwinder
> +     for the newly created frame, to determine its type.  */
> +  if (prev_frame->level > 0
> +      && get_frame_type (prev_frame) == NORMAL_FRAME
> +      && get_frame_type (this_frame) == NORMAL_FRAME
> +      && get_frame_pc (prev_frame) == 0)
> +    {
> +      if (frame_debug)
> +	fprintf_unfiltered (gdb_stdlog, "-> // zero PC}\n");
> +      this_frame->prev = NULL;
> +      return NULL;
> +    }
> +
>    if (frame_debug)
>      {
>        fprintf_unfiltered (gdb_stdlog, "-> ");
> @@ -1300,18 +1320,6 @@ get_prev_frame (struct frame_info *this_
>        return NULL;
>      }
>  
> -  /* Assume that the only way to get a zero PC is through something
> -     like a SIGSEGV or a dummy frame, and hence that NORMAL frames
> -     will never unwind a zero PC.  */
> -  if (this_frame->level > 0
> -      && get_frame_type (this_frame) == NORMAL_FRAME
> -      && get_frame_type (get_next_frame (this_frame)) == NORMAL_FRAME
> -      && get_frame_pc (this_frame) == 0)
> -    {
> -      frame_debug_got_null_frame (gdb_stdlog, this_frame, "zero PC");
> -      return NULL;
> -    }
> -
>    return get_prev_frame_1 (this_frame);
>  }
>  


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-11 22:24 ` Jim Blandy
@ 2006-05-11 22:32   ` Daniel Jacobowitz
  2006-05-12  6:21     ` Jim Blandy
  0 siblings, 1 reply; 38+ messages in thread
From: Daniel Jacobowitz @ 2006-05-11 22:32 UTC (permalink / raw)
  To: Jim Blandy; +Cc: gdb-patches

On Thu, May 11, 2006 at 03:24:31PM -0700, Jim Blandy wrote:
> 
> It looks good to me.  I'm just curious why we even bother looking up
> the frame type for the older frame whose PC is zero.  (I understand
> this test is inherited from the existing code.)

You've gotta.  It looks like this:

#0 somefunc ()
#1 <signal handler called>
#2 0x00000000 in Nothing At All
#3 function_which_calls_its_arg (arg = (void (*)()) 0x00000000)
#4 main ()

There was a check for this at some point in the distant past - I
believe it was committed, not just proposed - which didn't do this.
Ergo signull.exp.

Er... wait a second!  You're talking about a different frame than I am,
aren't you?  Do you mean this?

+  if (prev_frame->level > 0
+      && get_frame_type (this_frame) == NORMAL_FRAME
+      && get_frame_pc (prev_frame) == 0)

In which case we could do this even earlier.  Can there be sigtramp or
dummy frames which have pc == 0?

-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-11 22:32   ` Daniel Jacobowitz
@ 2006-05-12  6:21     ` Jim Blandy
  2006-05-12 12:46       ` Daniel Jacobowitz
  0 siblings, 1 reply; 38+ messages in thread
From: Jim Blandy @ 2006-05-12  6:21 UTC (permalink / raw)
  To: gdb-patches


Daniel Jacobowitz <drow@false.org> writes:
> On Thu, May 11, 2006 at 03:24:31PM -0700, Jim Blandy wrote:
>> 
>> It looks good to me.  I'm just curious why we even bother looking up
>> the frame type for the older frame whose PC is zero.  (I understand
>> this test is inherited from the existing code.)
>
> You've gotta.  It looks like this:
>
> #0 somefunc ()
> #1 <signal handler called>
> #2 0x00000000 in Nothing At All
> #3 function_which_calls_its_arg (arg = (void (*)()) 0x00000000)
> #4 main ()
>
> There was a check for this at some point in the distant past - I
> believe it was committed, not just proposed - which didn't do this.
> Ergo signull.exp.
>
> Er... wait a second!  You're talking about a different frame than I am,
> aren't you?  Do you mean this?
>
> +  if (prev_frame->level > 0
> +      && get_frame_type (this_frame) == NORMAL_FRAME
> +      && get_frame_pc (prev_frame) == 0)

I think so.  Using the backtrace above as an example, I understood why
we need to know frame #1's type, but I didn't see the point in
checking frame #2's type.

But I think I do now.  If CALL_DUMMY_LOCATION is AT_ENTRY_POINT, and
the entry point is at address zero, then the test as written above
would truncate backtraces at dummy frames.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-12  6:21     ` Jim Blandy
@ 2006-05-12 12:46       ` Daniel Jacobowitz
  0 siblings, 0 replies; 38+ messages in thread
From: Daniel Jacobowitz @ 2006-05-12 12:46 UTC (permalink / raw)
  To: gdb-patches

On Thu, May 11, 2006 at 11:20:03PM -0700, Jim Blandy wrote:
> I think so.  Using the backtrace above as an example, I understood why
> we need to know frame #1's type, but I didn't see the point in
> checking frame #2's type.
> 
> But I think I do now.  If CALL_DUMMY_LOCATION is AT_ENTRY_POINT, and
> the entry point is at address zero, then the test as written above
> would truncate backtraces at dummy frames.

Yeah, that's the only case I could think of.  It's a shame to do extra
work for it everywhere - but probably safer.  I can reorder the test to
only check the frame type if the PC is zero, though.

-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-10 18:03 [RFC] Move the frame zero PC check earlier Daniel Jacobowitz
  2006-05-11 10:42 ` Andrew STUBBS
  2006-05-11 22:24 ` Jim Blandy
@ 2006-05-13 10:14 ` Mark Kettenis
  2006-05-13 15:17   ` Daniel Jacobowitz
  2006-05-15 13:57   ` Andrew STUBBS
  2 siblings, 2 replies; 38+ messages in thread
From: Mark Kettenis @ 2006-05-13 10:14 UTC (permalink / raw)
  To: drow; +Cc: gdb-patches

> Date: Wed, 10 May 2006 14:03:12 -0400
> From: Daniel Jacobowitz <drow@false.org>
> 
> In this patch:
> 
>   http://sourceware.org/ml/gdb-patches/2004-12/msg00328.html
> 
> Andrew added a generic check for two "normal" frames in a row where the
> older one has a saved PC of zero.  This is pretty well understood as a
> convention for terminating backtraces - either intentionally or
> unintentionally.
> 
> The problem is, given where that check takes place, it is in my opinion one
> frame off from the actual problem.  You get backtraces that look like this:
> 
> (gdb) bt
> #0  catcher (sig=11) at /space/fsf/commit/src/gdb/testsuite/gdb.base/savedregs.c:43
> #1  0x00002ac148ec6e90 in killpg () from /lib/libc.so.6
> #2  0x0000000000000000 in ?? ()
> 
> On the one hand, that third frame is a nice marker for this case in that it
> explains noisily that GDB is confused.  In this case, if I point GDB at
> .debug_frame data for my C library (conveniently found by default in
> /usr/lib/debug) it successfully backtraces through to main, so that's good.

And this is exactly the reason why things are done the way they are
done.  People should accept that the unwinder can fail, and we should
provide a way to indicate this.  There are many ways in which an
unwinder can fail and they're not always detectable.  One very
important scemario is where your program is thrashing the stack,
overwriting the return address.  We absulutely need to provide the
user some indication that something is wrong.  Currently this is the
extra frame we're printing.

> On the other hand, that third frame is ugly and generally useless

While the third frame itself doesn't provide any useful information,
its presence is very useful, since it indicates that GDB has lost
track..

> The attached patch moves the check one frame earlier, so that we
> only get this backtrace:
> 
> #0  catcher (sig=11) at /space/fsf/commit/src/gdb/testsuite/gdb.base/savedregs.c:43
> #1  0x00002ac148ec6e90 in killpg () from /lib/libc.so.6
> 
> Benefits: cleaner looking backtraces; the check is only done once per frame
> instead of on every call to get_prev_frame.  Disadvantages: for all frames
> at level > 0, it causes this check to be done when we look at THIS frame
> instead of when we unwind to the PREV frame, which forces us to locate the
> correct sniffer earlier.  If you have a sigtramp sniffer which reads memory,
> this might cause an extra memory read.  However, it won't happen in the
> critical stepping path - we don't need this check when "unwinding frame 0
> from the sentinel frame - so I think this is an acceptable tradeoff.  For
> any frame other than the top frame, the thing you're most likely to do with
> it is backtrace right through it.
> 
> Tested on x86_64-pc-linux-gnu, and by hand against SymbianOS, where it gives
> much nicer looking backtraces.

Our goal shouldn't be nicer looking backtraces.  It should be
providing the user with all information needed to fix bugs in their
programs.  Your patch is removing such a bit of information, and
therefore unacceptable to me.  Sorry :(.

Mark


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-13 10:14 ` Mark Kettenis
@ 2006-05-13 15:17   ` Daniel Jacobowitz
  2006-05-13 15:46     ` Daniel Jacobowitz
  2006-05-13 16:49     ` Mark Kettenis
  2006-05-15 13:57   ` Andrew STUBBS
  1 sibling, 2 replies; 38+ messages in thread
From: Daniel Jacobowitz @ 2006-05-13 15:17 UTC (permalink / raw)
  To: Mark Kettenis; +Cc: gdb-patches

On Sat, May 13, 2006 at 11:46:35AM +0200, Mark Kettenis wrote:
> > Tested on x86_64-pc-linux-gnu, and by hand against SymbianOS, where it gives
> > much nicer looking backtraces.
> 
> Our goal shouldn't be nicer looking backtraces.  It should be
> providing the user with all information needed to fix bugs in their
> programs.  Your patch is removing such a bit of information, and
> therefore unacceptable to me.  Sorry :(.

Sorry, Mark, I completely disagree with you on this issue.  Let's at
least discuss it, please?

You said that removing the 0x00000000 frame removed information.  I
disagree.  It's not a valid frame, "up"'ing into it isn't going to give
you anything sensible for saved registers unless the return address was
the only thing on the stack that got clobbered (fairly rare).  Instead,
with the patch, the backtrace will appear to just suddenly stop.  If
the function at the bottom of the backtrace isn't an entry point, the
fact that the backtrace has just suddenly stopped is a pretty big clue
that the stack is horked.

Explanatory output ("why did that backtrace stop?") is available in
"set debug frame 1".  If you think it's routinely useful, then we can
make it available in some prettier form, perhaps in "info frame" for
the outermost frame.

Also, I don't think that "gdb is confused" errors are as desirable as
you think they are.  This extra frame has been reported to me as a bug
at least three times that I can think of (twice for RTOSes and once for
Linux KGDB).

Such messages upset users when their stack is _not_ horked.  For
example, when GDB's prologue unwinder can't handle a prologue for a
non-leaf function on the stack, often you'll get this "friendly"
message:

  error (_("Previous frame identical to this frame (corrupt stack?)"));

I've had users come up to me and say that they wasted hours looking for
the stack corruption GDB was complaining about and in fact it was just
a weakness in the unwinder.  And Joel recently reported that Ada
tasking generates this message on at least one platform, and users are
unhappy about that, too.

I think that determining the end of stack cleanly is one of the more
important things for GDB to get right.  And when we've run out of
useful information, the stack appears to end, and we're quite justified
in reporting that the stack ended.  It's quite complex enough already
without reporting "but the end of the stack looks a little funny to
me...".

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-13 15:17   ` Daniel Jacobowitz
@ 2006-05-13 15:46     ` Daniel Jacobowitz
  2006-05-13 17:08       ` Mark Kettenis
  2006-05-13 16:49     ` Mark Kettenis
  1 sibling, 1 reply; 38+ messages in thread
From: Daniel Jacobowitz @ 2006-05-13 15:46 UTC (permalink / raw)
  To: Mark Kettenis, gdb-patches

On Sat, May 13, 2006 at 11:13:38AM -0400, Daniel Jacobowitz wrote:
> I think that determining the end of stack cleanly is one of the more
> important things for GDB to get right.  And when we've run out of
> useful information, the stack appears to end, and we're quite justified
> in reporting that the stack ended.  It's quite complex enough already
> without reporting "but the end of the stack looks a little funny to
> me...".

By the way, there's plenty of precedent for this in GDB, including some
you've written yourself.  If the saved value of %ebp on the stack gets
clobbered, when the i386 prologue analyzer is involved, we'll
gracefully report that there are no more frames (cache->base == 0).
Why should this be different?

-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-13 15:17   ` Daniel Jacobowitz
  2006-05-13 15:46     ` Daniel Jacobowitz
@ 2006-05-13 16:49     ` Mark Kettenis
  2006-05-13 18:53       ` Daniel Jacobowitz
  2006-05-16 21:38       ` Daniel Jacobowitz
  1 sibling, 2 replies; 38+ messages in thread
From: Mark Kettenis @ 2006-05-13 16:49 UTC (permalink / raw)
  To: drow; +Cc: gdb-patches

> Date: Sat, 13 May 2006 11:13:38 -0400
> From: Daniel Jacobowitz <drow@false.org>
> 
> On Sat, May 13, 2006 at 11:46:35AM +0200, Mark Kettenis wrote:
> > > Tested on x86_64-pc-linux-gnu, and by hand against SymbianOS,
> > > where it gives much nicer looking backtraces.
> > 
> > Our goal shouldn't be nicer looking backtraces.  It should be
> > providing the user with all information needed to fix bugs in their
> > programs.  Your patch is removing such a bit of information, and
> > therefore unacceptable to me.  Sorry :(.
> 
> Sorry, Mark, I completely disagree with you on this issue.  Let's at
> least discuss it, please?

No problem.

> You said that removing the 0x00000000 frame removed information.  I
> disagree.  It's not a valid frame, "up"'ing into it isn't going to give
> you anything sensible for saved registers unless the return address was
> the only thing on the stack that got clobbered (fairly rare).

Sure, and I wasn't arguing that the frame itself was of any use.  But
the fact that it gets printed in the backtrace is useful, since it
indicates that GDB fell of the stack while doing the backtrace.

> Instead, with the patch, the backtrace will appear to just suddenly
> stop.

Yes, and that's exactly my problem.  It will be much more difficult to
spot that GDB just fell off the stack.  Another problem is that this
makes the PC == 0 case even more special than the PC != 0 case, where
we still will print the bogus frame in the backtrace.

> If the function at the bottom of the backtrace isn't an entry
> point, the fact that the backtrace has just suddenly stopped is a
> pretty big clue that the stack is horked.

Sure, but you won't notice until you start actually looking at the
function names in the backtrace.  At first sight the backtrace will
look perfectly ok.

> Explanatory output ("why did that backtrace stop?") is available in
> "set debug frame 1".  If you think it's routinely useful, then we can
> make it available in some prettier form, perhaps in "info frame" for
> the outermost frame.

If we can reliably tell that a frame is the outermost frame, we might
indeed print that as part of "info frame".

> Also, I don't think that "gdb is confused" errors are as desirable as
> you think they are.  This extra frame has been reported to me as a bug
> at least three times that I can think of (twice for RTOSes and once for
> Linux KGDB).

I can imagine you'd like to get these people off your back.  And
perhaps they're right that the extra frame is caused by a bug in GDB.
But that bug is not the printing of the extra frame itself.  The bug
is GDB not being able to determine that it is at the end of the stack,
which might actually be a bug in the compiler or system libraries
they're using.

> Such messages upset users when their stack is _not_ horked.  For
> example, when GDB's prologue unwinder can't handle a prologue for a
> non-leaf function on the stack, often you'll get this "friendly"
> message:
> 
>   error (_("Previous frame identical to this frame (corrupt stack?)"));
> 
> I've had users come up to me and say that they wasted hours looking for
> the stack corruption GDB was complaining about and in fact it was just
> a weakness in the unwinder.

Then we should improve the unwinder.  If we didn't error out with that
error, the backtrace would never end.

> And Joel recently reported that Ada tasking generates this message
> on at least one platform, and users are unhappy about that, too.

IIRC this is a case where the outermost frame wasn't marked properly,
or at least not detected as such by GDB.  That's the problem that
needs to be fixed.

> I think that determining the end of stack cleanly is one of the more
> important things for GDB to get right.

Yes indeed.  And one of your other mails (to which I didn't reply yet)
tries to address that, and we certainly should do something like you
wrote there.  But the patch we're discussing here is just papering
over the problems.

> And when we've run out of useful information, the stack appears to
> end, and we're quite justified in reporting that the stack ended.
> It's quite complex enough already without reporting "but the end of
> the stack looks a little funny to me...".

No, if a stack doesn't end properly on a platform where it should end
properly, that's useful information that should be reported to the
user.

Mark

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-13 15:46     ` Daniel Jacobowitz
@ 2006-05-13 17:08       ` Mark Kettenis
  0 siblings, 0 replies; 38+ messages in thread
From: Mark Kettenis @ 2006-05-13 17:08 UTC (permalink / raw)
  To: drow; +Cc: gdb-patches

> Date: Sat, 13 May 2006 11:17:48 -0400
> From: Daniel Jacobowitz <drow@false.org>
> 
> On Sat, May 13, 2006 at 11:13:38AM -0400, Daniel Jacobowitz wrote:
> > I think that determining the end of stack cleanly is one of the more
> > important things for GDB to get right.  And when we've run out of
> > useful information, the stack appears to end, and we're quite justified
> > in reporting that the stack ended.  It's quite complex enough already
> > without reporting "but the end of the stack looks a little funny to
> > me...".
> 
> By the way, there's plenty of precedent for this in GDB, including some
> you've written yourself.  If the saved value of %ebp on the stack gets
> clobbered, when the i386 prologue analyzer is involved, we'll
> gracefully report that there are no more frames (cache->base == 0).
> Why should this be different?

Because once upon a time, when GCC still exclusively used %ebp as a
frame pointer, %ebp == 0 was the standard way to mark the outermost
frame.  I should probably remove that check.

Mark


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-13 16:49     ` Mark Kettenis
@ 2006-05-13 18:53       ` Daniel Jacobowitz
  2006-05-16 21:38       ` Daniel Jacobowitz
  1 sibling, 0 replies; 38+ messages in thread
From: Daniel Jacobowitz @ 2006-05-13 18:53 UTC (permalink / raw)
  To: Mark Kettenis; +Cc: gdb-patches

I'll respond to this in more depth later; I still disagree with much of
it :-)  But I don't have time this moment, and I wanted to answer one
particular thing:

On Sat, May 13, 2006 at 06:42:44PM +0200, Mark Kettenis wrote:
> > And when we've run out of useful information, the stack appears to
> > end, and we're quite justified in reporting that the stack ended.
> > It's quite complex enough already without reporting "but the end of
> > the stack looks a little funny to me...".
> 
> No, if a stack doesn't end properly on a platform where it should end
> properly, that's useful information that should be reported to the
> user.

Now you've gotten me really confused.  PC == 0 is the "proper" end of
the stack in lots of platform ABIs; that's why the check was added to
frame.c in the first place instead of leaving it in the PA-RISC
specific file where it was just before it was added.  That's exactly
why I want to detect it: because it signals a clean end to the stack.

-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-13 10:14 ` Mark Kettenis
  2006-05-13 15:17   ` Daniel Jacobowitz
@ 2006-05-15 13:57   ` Andrew STUBBS
  1 sibling, 0 replies; 38+ messages in thread
From: Andrew STUBBS @ 2006-05-15 13:57 UTC (permalink / raw)
  To: Mark Kettenis; +Cc: drow, gdb-patches

Mark Kettenis wrote:
> And this is exactly the reason why things are done the way they are
> done.  People should accept that the unwinder can fail, and we should
> provide a way to indicate this.  There are many ways in which an
> unwinder can fail and they're not always detectable.  One very
> important scemario is where your program is thrashing the stack,
> overwriting the return address.  We absulutely need to provide the
> user some indication that something is wrong.  Currently this is the
> extra frame we're printing.

Except that I get this when there is nothing wrong. I.e. It is crying wolf.

The problem I have seems to be that the function that launches a thread 
has dwarf debug information and GDB can't accept that the first function 
on the stack can have this. It unwinds all through the stack till it 
reaches the end of the dwarf functions and then tries the machine 
specific unwinder. This MUST return at least one frame, even though it 
knows there isn't one. If it returns NULL, like the dwarf unwinder does, 
  then you get an internal error message, so it returns a null frame ID 
and causes a null frame in the display.

Daniel's patch fixes my problem.

> Our goal shouldn't be nicer looking backtraces.  It should be
> providing the user with all information needed to fix bugs in their
> programs.  Your patch is removing such a bit of information, and
> therefore unacceptable to me.  Sorry :(.

It shouldn't be saying there are problems where there are none.

Andrew

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-13 16:49     ` Mark Kettenis
  2006-05-13 18:53       ` Daniel Jacobowitz
@ 2006-05-16 21:38       ` Daniel Jacobowitz
  2006-05-16 22:19         ` Mark Kettenis
  1 sibling, 1 reply; 38+ messages in thread
From: Daniel Jacobowitz @ 2006-05-16 21:38 UTC (permalink / raw)
  To: Mark Kettenis; +Cc: gdb-patches

I won't reply paragraph-by-paragraph; there's getting to be a lot of
paragraphs.  Please, let me know if I've cut something that I shouldn't
have.

On Sat, May 13, 2006 at 06:42:44PM +0200, Mark Kettenis wrote:
> > Explanatory output ("why did that backtrace stop?") is available in
> > "set debug frame 1".  If you think it's routinely useful, then we can
> > make it available in some prettier form, perhaps in "info frame" for
> > the outermost frame.
> 
> If we can reliably tell that a frame is the outermost frame, we might
> indeed print that as part of "info frame".

That's just about the opposite of what I'm suggesting.  I think that
"the stack ended jaggedly" might be useful in "info frame".

> > Also, I don't think that "gdb is confused" errors are as desirable as
> > you think they are.  This extra frame has been reported to me as a bug
> > at least three times that I can think of (twice for RTOSes and once for
> > Linux KGDB).
> 
> I can imagine you'd like to get these people off your back.  And
> perhaps they're right that the extra frame is caused by a bug in GDB.
> But that bug is not the printing of the extra frame itself.  The bug
> is GDB not being able to determine that it is at the end of the stack,
> which might actually be a bug in the compiler or system libraries
> they're using.

No!  No no no.

First of all, I'm not just trying to get them off my back.  I think
they're right and it shouldn't be displayed.  Second, this _is_ GDB
being able to determine that it's at the end of the stack.

A return address of zero is a fairly common convention for this.
It's natural, if you think about it.  On architectures with a
well entrenched frame pointer (exhibit A, our earlier conversation
about cache->base and %ebp on x86) then that can be initialized to zero
either before calling a generic higher-level function or else by
handwritten startup code.  The same is true for the return address; it
can be set to return to nowhere.  Neither "makes sense", so they are
useful markers.  About the only other option is the stack pointer, and
you can't do that unless you're calling handwritten startup code that
also knows where the stack is supposed to go - pretty rare in modern
systems where that code is being called by anything other than a reset
vector.

> Then we should improve the unwinder.  If we didn't error out with that
> error, the backtrace would never end.

As you well know, in many cases it is either impractical or downright
impossible to improve the prologue unwinder, e.g. when OS vendors ship
system libraries with neither unwind information nor symbols.

> > And Joel recently reported that Ada tasking generates this message
> > on at least one platform, and users are unhappy about that, too.
> 
> IIRC this is a case where the outermost frame wasn't marked properly,
> or at least not detected as such by GDB.  That's the problem that
> needs to be fixed.

I guess that depends what you mean by "marked properly" and what you
mean by "fixed".

The problem was that a routine in the system libraries was called
directly from new threads.  The name of the routine is OS-dependant
and maybe even OS-version-dependant.  Changing the system libraries is
out of the question here; all the world isn't free software and I
believe the system in question was either HP-UX or OSF.  Recognizing it
by name would, I suppose, work - but you'd have to be careful of the
user reusing a fairly generic function name!  Or else limit the check
to a specific shared object, and ditch caring about static linking.

Seems kind of sketchy to me.

> > And when we've run out of useful information, the stack appears to
> > end, and we're quite justified in reporting that the stack ended.
> > It's quite complex enough already without reporting "but the end of
> > the stack looks a little funny to me...".
> 
> No, if a stack doesn't end properly on a platform where it should end
> properly, that's useful information that should be reported to the
> user.

I answered this one in another message, and it's basically the same as
my first bit above in this message.  I think it's precisely proper to
report this as an end of stack condition.

We're reading the stack frame from memory, so we're automatically
vulnerable to displaying corrupted information if the user has
scribbled on the stack.  But I don't think that merits displaying
something that we know is garbage.  A non-zero unknown PC might be
garbage or it might be code we don't have symbols for; we display it as
if it were code we didn't have a symbol for.  I posit that there's a
difference in kind between that and a zero PC, which might be garbage
or might be the end of the stack; and by analogy, I'm suggesting we
display it as if it is the end of the stack.

Anyway, that's my opinion.  I have no idea how to proceed on this;
I don't really expect any of that to change your mind, and you probably
don't use any system where this extra frame is a serious annoyance.

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-16 21:38       ` Daniel Jacobowitz
@ 2006-05-16 22:19         ` Mark Kettenis
  2006-05-16 22:46           ` Daniel Jacobowitz
  0 siblings, 1 reply; 38+ messages in thread
From: Mark Kettenis @ 2006-05-16 22:19 UTC (permalink / raw)
  To: drow; +Cc: gdb-patches

> Date: Tue, 16 May 2006 16:45:03 -0400
> From: Daniel Jacobowitz <drow@false.org>
> 
> I won't reply paragraph-by-paragraph; there's getting to be a lot of
> paragraphs.  Please, let me know if I've cut something that I shouldn't
> have.
> 
> On Sat, May 13, 2006 at 06:42:44PM +0200, Mark Kettenis wrote:
> > > Explanatory output ("why did that backtrace stop?") is available in
> > > "set debug frame 1".  If you think it's routinely useful, then we can
> > > make it available in some prettier form, perhaps in "info frame" for
> > > the outermost frame.
> > 
> > If we can reliably tell that a frame is the outermost frame, we might
> > indeed print that as part of "info frame".
> 
> That's just about the opposite of what I'm suggesting.  I think that
> "the stack ended jaggedly" might be useful in "info frame".

So we disagree.

> > > Also, I don't think that "gdb is confused" errors are as desirable as
> > > you think they are.  This extra frame has been reported to me as a bug
> > > at least three times that I can think of (twice for RTOSes and once for
> > > Linux KGDB).
> > 
> > I can imagine you'd like to get these people off your back.  And
> > perhaps they're right that the extra frame is caused by a bug in GDB.
> > But that bug is not the printing of the extra frame itself.  The bug
> > is GDB not being able to determine that it is at the end of the stack,
> > which might actually be a bug in the compiler or system libraries
> > they're using.
> 
> No!  No no no.
> 
> First of all, I'm not just trying to get them off my back.  I think
> they're right and it shouldn't be displayed.  Second, this _is_ GDB
> being able to determine that it's at the end of the stack.

Yes, if GDB is being able to detect that the stack has ended, it
should defenitely not print another, bugus frame.

> A return address of zero is a fairly common convention for this.

Fairly common, perhaps, but not universal.

> It's natural, if you think about it.  On architectures with a
> well entrenched frame pointer (exhibit A, our earlier conversation
> about cache->base and %ebp on x86) then that can be initialized to zero
> either before calling a generic higher-level function or else by
> handwritten startup code.  The same is true for the return address; it
> can be set to return to nowhere.  Neither "makes sense", so they are
> useful markers.  About the only other option is the stack pointer, and
> you can't do that unless you're calling handwritten startup code that
> also knows where the stack is supposed to go - pretty rare in modern
> systems where that code is being called by anything other than a reset
> vector.

So there are several conventions, and these make sense for a specific
ISA or perhaps even a specific OS.

> > Then we should improve the unwinder.  If we didn't error out with that
> > error, the backtrace would never end.
> 
> As you well know, in many cases it is either impractical or downright
> impossible to improve the prologue unwinder, e.g. when OS vendors ship
> system libraries with neither unwind information nor symbols.

And this is exactly the case where I think the jagged end of the
backtrace is important.  It indicates that GDB lost track somewhere
and that the backtrace can't be trusted.

> > > And Joel recently reported that Ada tasking generates this message
> > > on at least one platform, and users are unhappy about that, too.
> > 
> > IIRC this is a case where the outermost frame wasn't marked properly,
> > or at least not detected as such by GDB.  That's the problem that
> > needs to be fixed.
> 
> I guess that depends what you mean by "marked properly" and what you
> mean by "fixed".
> 
> The problem was that a routine in the system libraries was called
> directly from new threads.  The name of the routine is OS-dependant
> and maybe even OS-version-dependant.  Changing the system libraries is
> out of the question here; all the world isn't free software and I
> believe the system in question was either HP-UX or OSF.  Recognizing it
> by name would, I suppose, work - but you'd have to be careful of the
> user reusing a fairly generic function name!  Or else limit the check
> to a specific shared object, and ditch caring about static linking.

If this is on a proprietary system and the system vendor did not
provide a reliable way to determine we're falling off the stack,
there's not much we can do.  That's one of the reasons why we should
encourage people to use free software, that can be changed to do the
right thing.

> > > And when we've run out of useful information, the stack appears to
> > > end, and we're quite justified in reporting that the stack ended.
> > > It's quite complex enough already without reporting "but the end of
> > > the stack looks a little funny to me...".
> > 
> > No, if a stack doesn't end properly on a platform where it should end
> > properly, that's useful information that should be reported to the
> > user.
> 
> I answered this one in another message, and it's basically the same as
> my first bit above in this message.  I think it's precisely proper to
> report this as an end of stack condition.
> 
> We're reading the stack frame from memory, so we're automatically
> vulnerable to displaying corrupted information if the user has
> scribbled on the stack.  But I don't think that merits displaying
> something that we know is garbage.  A non-zero unknown PC might be
> garbage or it might be code we don't have symbols for; we display it as
> if it were code we didn't have a symbol for.  I posit that there's a
> difference in kind between that and a zero PC, which might be garbage
> or might be the end of the stack; and by analogy, I'm suggesting we
> display it as if it is the end of the stack.

But the zero PC is *not* universal.  Therefore it should be treated
the same as the non-zero garbage PC.

> Anyway, that's my opinion.  I have no idea how to proceed on this;
> I don't really expect any of that to change your mind, and you probably
> don't use any system where this extra frame is a serious annoyance.

I cannot imagine that a single extra frame to be a serious annoyance.
I can see that the extra frame looses its signalling function on
systems where it's seen a lot in cases where the stack actually ends
that way.

Mark


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-16 22:19         ` Mark Kettenis
@ 2006-05-16 22:46           ` Daniel Jacobowitz
  2006-05-16 23:53             ` PAUL GILLIAM
  0 siblings, 1 reply; 38+ messages in thread
From: Daniel Jacobowitz @ 2006-05-16 22:46 UTC (permalink / raw)
  To: Mark Kettenis; +Cc: gdb-patches

On Tue, May 16, 2006 at 11:37:35PM +0200, Mark Kettenis wrote:
> So there are several conventions, and these make sense for a specific
> ISA or perhaps even a specific OS.

So... would you be happier if I set a gdbarch flag to control this?

I still think this is the wrong choice, and I would prefer not to do it
without opinions from a few of GDB's other maintainers, but a
compromise is better than nothing at all.

It's this statement in particular that I am disagreeing with:

> But the zero PC is *not* universal.  Therefore it should be treated
> the same as the non-zero garbage PC.

Not every system uses it, but no one overloads it to mean anything else
during correct operation and it is unambiguous.  I think that does make
it universal.  And you can not draw clear target lines on where it is
and is not used; for instance, I think at least one Linux kernel
debugging stub ends backtraces this way.  That might not be current
information; we made up a GDB convention for dwarf2 unwind terminators
and I've been trying to persuade the kernel developers to use it.

But even ignoring the issues of non-free operating systems (which I
have a different opinion on supporting, as you know), I think that
adapting existing runtimes to the debugger is putting the cart before
the horse, and free operating systems can use this convention too,
and there's a lot of them out there.

By the way, if you go ahead and remove that %ebp check, you're going to
introduce exactly this problem for current glibc on IA-32; suddenly
you'll get a bogus frame below thread_start after clone in every
thread.

> And this is exactly the case where I think the jagged end of the
> backtrace is important.  It indicates that GDB lost track somewhere
> and that the backtrace can't be trusted.

I find that the backtrace stopping at a random function is clear
enough, personally.

> I cannot imagine that a single extra frame to be a serious annoyance.
> I can see that the extra frame looses its signalling function on
> systems where it's seen a lot in cases where the stack actually ends
> that way.

For you, or for me, it is not a serious annoyance.  For users a lot
further out from the system, on top of an IDE and a development
environment, it is confusing.  For the folks who have to document those
IDEs and answer customer support questions about them when their
customers get confused, it becomes a real problem.  It's something ugly
and incorrect on their backtrace window that needs an apologetic
paragraph in the manual.  That's my benchmark for cosmetic bugs that
are worth fixing; if I was answering "why is it that way" and the best
reason I could give would be an apology, then it needs to be fixed.

That's where the lion's share of GDB use is coming to be - on top of
GUIs like KDevelop and Eclipse and probably a dozen others, for users
with less low-level experience.

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-16 22:46           ` Daniel Jacobowitz
@ 2006-05-16 23:53             ` PAUL GILLIAM
  2006-05-18  1:35               ` Joel Brobecker
  0 siblings, 1 reply; 38+ messages in thread
From: PAUL GILLIAM @ 2006-05-16 23:53 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: Mark Kettenis, gdb-patches

On Tue, 2006-05-16 at 18:18 -0400, Daniel Jacobowitz wrote:
> On Tue, May 16, 2006 at 11:37:35PM +0200, Mark Kettenis wrote:
. . .
> > And this is exactly the case where I think the jagged end of the
> > backtrace is important.  It indicates that GDB lost track somewhere
> > and that the backtrace can't be trusted.
> 
> I find that the backtrace stopping at a random function is clear
> enough, personally.
> 

The stack, up to where it hit the jagged edge, *can* be trusted.

The trick is to let the user know that there might be more to the
stack, but what GDB has shown so far is valid.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-16 23:53             ` PAUL GILLIAM
@ 2006-05-18  1:35               ` Joel Brobecker
  2006-05-18  9:31                 ` Jim Blandy
  0 siblings, 1 reply; 38+ messages in thread
From: Joel Brobecker @ 2006-05-18  1:35 UTC (permalink / raw)
  To: PAUL GILLIAM; +Cc: Daniel Jacobowitz, Mark Kettenis, gdb-patches

FWIW, I read the whole thread a couple of times, and I agree with
Daniel. Something that I have also noticed is that these bogus
frames actually cause the average users to lose confidence in the
entire backtrace.

I think that stopping at PC == 0 for all architecture is a good
approach, and his justification for it is convincinng.

-- 
Joel

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-18  1:35               ` Joel Brobecker
@ 2006-05-18  9:31                 ` Jim Blandy
  2006-05-18 10:09                   ` Andrew STUBBS
  0 siblings, 1 reply; 38+ messages in thread
From: Jim Blandy @ 2006-05-18  9:31 UTC (permalink / raw)
  To: Joel Brobecker
  Cc: PAUL GILLIAM, Daniel Jacobowitz, Mark Kettenis, gdb-patches

Joel Brobecker <brobecker@adacore.com> writes:
> FWIW, I read the whole thread a couple of times, and I agree with
> Daniel. Something that I have also noticed is that these bogus
> frames actually cause the average users to lose confidence in the
> entire backtrace.
>
> I think that stopping at PC == 0 for all architecture is a good
> approach, and his justification for it is convincinng.

For the record, at the top of this thread I said I thought it was
fine, too.  I've run into these often enough due to deliberate
attempts by runtimes to terminate the stack that I think it outweighs
the (minor, to my mind) value of seeing a 0x00000000 frame that
indicates an actual error.

GDB should be honest with the user about what it finds, but I don't
think we can be a multi-platform debugger and be that picky about
confining each bit of logic to exactly the platforms that promise to
uphold it.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-18  9:31                 ` Jim Blandy
@ 2006-05-18 10:09                   ` Andrew STUBBS
  2006-05-18 17:36                     ` Jim Blandy
  0 siblings, 1 reply; 38+ messages in thread
From: Andrew STUBBS @ 2006-05-18 10:09 UTC (permalink / raw)
  To: Jim Blandy
  Cc: Joel Brobecker, PAUL GILLIAM, Daniel Jacobowitz, Mark Kettenis,
	gdb-patches

Jim Blandy wrote:
> For the record, at the top of this thread I said I thought it was
> fine, too.  I've run into these often enough due to deliberate
> attempts by runtimes to terminate the stack that I think it outweighs
> the (minor, to my mind) value of seeing a 0x00000000 frame that
> indicates an actual error.
> 
> GDB should be honest with the user about what it finds, but I don't
> think we can be a multi-platform debugger and be that picky about
> confining each bit of logic to exactly the platforms that promise to
> uphold it.

How about adding a command:

set backtrace terminate-on-zero-pc on|off

and let the user decide. Set it to 'on' by default on the principle 
that, if they aren't aware of the possibility, users don't want to see 
zero frames they don't understand.

Just a thought.

Andrew


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-18 10:09                   ` Andrew STUBBS
@ 2006-05-18 17:36                     ` Jim Blandy
  2006-05-18 18:09                       ` PAUL GILLIAM
  0 siblings, 1 reply; 38+ messages in thread
From: Jim Blandy @ 2006-05-18 17:36 UTC (permalink / raw)
  To: Andrew STUBBS
  Cc: Joel Brobecker, PAUL GILLIAM, Daniel Jacobowitz, Mark Kettenis,
	gdb-patches


Andrew STUBBS <andrew.stubbs@st.com> writes:
> Jim Blandy wrote:
>> For the record, at the top of this thread I said I thought it was
>> fine, too.  I've run into these often enough due to deliberate
>> attempts by runtimes to terminate the stack that I think it outweighs
>> the (minor, to my mind) value of seeing a 0x00000000 frame that
>> indicates an actual error.
>> GDB should be honest with the user about what it finds, but I don't
>> think we can be a multi-platform debugger and be that picky about
>> confining each bit of logic to exactly the platforms that promise to
>> uphold it.
>
> How about adding a command:
>
> set backtrace terminate-on-zero-pc on|off
>
> and let the user decide. Set it to 'on' by default on the principle
> that, if they aren't aware of the possibility, users don't want to see
> zero frames they don't understand.
>
> Just a thought.

Well, that would lift the burden of the decision from our shoulders
(that is, GDB developers', not restricted to the folks here) to the
users'.  I think we're probably in a better position to make it.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-18 17:36                     ` Jim Blandy
@ 2006-05-18 18:09                       ` PAUL GILLIAM
  2006-05-18 20:04                         ` Jim Blandy
  0 siblings, 1 reply; 38+ messages in thread
From: PAUL GILLIAM @ 2006-05-18 18:09 UTC (permalink / raw)
  To: Jim Blandy
  Cc: Andrew STUBBS, Joel Brobecker, Daniel Jacobowitz, Mark Kettenis,
	gdb-patches

On Thu, 2006-05-18 at 09:53 -0700, Jim Blandy wrote:
> Andrew STUBBS <andrew.stubbs@st.com> writes:
> > Jim Blandy wrote:
> >> For the record, at the top of this thread I said I thought it was
> >> fine, too.  I've run into these often enough due to deliberate
> >> attempts by runtimes to terminate the stack that I think it outweighs
> >> the (minor, to my mind) value of seeing a 0x00000000 frame that
> >> indicates an actual error.
> >> GDB should be honest with the user about what it finds, but I don't
> >> think we can be a multi-platform debugger and be that picky about
> >> confining each bit of logic to exactly the platforms that promise to
> >> uphold it.
> >
> > How about adding a command:
> >
> > set backtrace terminate-on-zero-pc on|off
> >
> > and let the user decide. Set it to 'on' by default on the principle
> > that, if they aren't aware of the possibility, users don't want to see
> > zero frames they don't understand.
> >
> > Just a thought.
> 
> Well, that would lift the burden of the decision from our shoulders
> (that is, GDB developers', not restricted to the folks here) to the
> users'.  I think we're probably in a better position to make it.

No, that lifts the burden of deciding for *all* users from our
shoulders, allowing those users who understand the issues to override
our decision.

On the other hand, it would add one more user tweek that most users
would not understand or care about.

It's a hard call.

I'd vote for the new command and classify it as obscure.

-=# Paul #=-


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-18 18:09                       ` PAUL GILLIAM
@ 2006-05-18 20:04                         ` Jim Blandy
  2006-05-18 20:43                           ` Mark Kettenis
  0 siblings, 1 reply; 38+ messages in thread
From: Jim Blandy @ 2006-05-18 20:04 UTC (permalink / raw)
  To: pgilliam
  Cc: Andrew STUBBS, Joel Brobecker, Daniel Jacobowitz, Mark Kettenis,
	gdb-patches

PAUL GILLIAM <pgilliam@us.ibm.com> writes:
> On Thu, 2006-05-18 at 09:53 -0700, Jim Blandy wrote:
>> Andrew STUBBS <andrew.stubbs@st.com> writes:
>> > Jim Blandy wrote:
>> >> For the record, at the top of this thread I said I thought it was
>> >> fine, too.  I've run into these often enough due to deliberate
>> >> attempts by runtimes to terminate the stack that I think it outweighs
>> >> the (minor, to my mind) value of seeing a 0x00000000 frame that
>> >> indicates an actual error.
>> >> GDB should be honest with the user about what it finds, but I don't
>> >> think we can be a multi-platform debugger and be that picky about
>> >> confining each bit of logic to exactly the platforms that promise to
>> >> uphold it.
>> >
>> > How about adding a command:
>> >
>> > set backtrace terminate-on-zero-pc on|off
>> >
>> > and let the user decide. Set it to 'on' by default on the principle
>> > that, if they aren't aware of the possibility, users don't want to see
>> > zero frames they don't understand.
>> >
>> > Just a thought.
>> 
>> Well, that would lift the burden of the decision from our shoulders
>> (that is, GDB developers', not restricted to the folks here) to the
>> users'.  I think we're probably in a better position to make it.
>
> No, that lifts the burden of deciding for *all* users from our
> shoulders, allowing those users who understand the issues to override
> our decision.
>
> On the other hand, it would add one more user tweek that most users
> would not understand or care about.
>
> It's a hard call.
>
> I'd vote for the new command and classify it as obscure.

Okay --- I should be less equivocal.  It is a bad idea to add this
command, obscure or otherwise.  Engineers do this all the time when
they're having difficulty settling on a solution: they figure, hey,
let's let the users decide!  It's presented as "giving the users more
control".  Nobody wants to argue with that: every day we run into
something we wish we could tweak.  Today I wished I could control the
'From' address vixie cron uses in its mail messages.  But it's a
mistake to treat this as an indication of the general desireability of
more switches and knobs.  And half the time the knobs are just
workarounds for some other problem --- for example, I think the real
problem with my cron is that outgoing mail on my machine is
misconfigured.

Nobody has written us saying they want to choose whether GDB treats a
zero return address as indicating the end of the stack.  Rather, many
users have written us complaining that GDB displays extra frames at
the end of well-formed, non-corrupt stacks.  And over the course of
the what seems like dozens of embedded GDB ports I've debugged since
1997, I've come across the same behavior many times myself.

The only reason presented in this thread for displaying those frames
at all is that it can be an indication of a bug in GDB.  This is based
on the (sound) general argument that GDB should be picky about the
state it examines, and report anomalies.  But that heuristic just
doesn't make sense when the particular "anomaly" at hand is, in fact,
a valid and deliberate end-of-stack indicator on many systems.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-18 20:04                         ` Jim Blandy
@ 2006-05-18 20:43                           ` Mark Kettenis
  2006-05-18 23:31                             ` Jim Blandy
                                               ` (2 more replies)
  0 siblings, 3 replies; 38+ messages in thread
From: Mark Kettenis @ 2006-05-18 20:43 UTC (permalink / raw)
  To: jimb; +Cc: pgilliam, andrew.stubbs, brobecker, drow, mark.kettenis, gdb-patches

> From: Jim Blandy <jimb@codesourcery.com>
> Date: Thu, 18 May 2006 11:09:33 -0700
> 
> Okay --- I should be less equivocal.  It is a bad idea to add this
> command, obscure or otherwise.  Engineers do this all the time when
> they're having difficulty settling on a solution: they figure, hey,
> let's let the users decide!  It's presented as "giving the users more
> control".  Nobody wants to argue with that: every day we run into
> something we wish we could tweak.  Today I wished I could control the
> 'From' address vixie cron uses in its mail messages.  But it's a
> mistake to treat this as an indication of the general desireability of
> more switches and knobs.  And half the time the knobs are just
> workarounds for some other problem --- for example, I think the real
> problem with my cron is that outgoing mail on my machine is
> misconfigured.

I 100% agree.  Knobs are bad.  Besides we'd be quibbling about the
default setting for that knob ;-).

> Nobody has written us saying they want to choose whether GDB treats a
> zero return address as indicating the end of the stack.  Rather, many
> users have written us complaining that GDB displays extra frames at
> the end of well-formed, non-corrupt stacks.  And over the course of
> the what seems like dozens of embedded GDB ports I've debugged since
> 1997, I've come across the same behavior many times myself.

If we're sure that zero return address actually signals the end of the
stack, then indeed we should not print the extra frame.  I'm not
arguing with that.  But that's defenitely 

> The only reason presented in this thread for displaying those frames
> at all is that it can be an indication of a bug in GDB.

No.  There are two reasons why we should print those frames, and I
consider both of them not to be an indication of a bug in GDB.

1. Because of a bug in the program you're debugging, it has
   overwritten the return address on the stack.  Currently this causes
   the extra frame to be printed signalling the user that something is
   wrong.  Daniel's patch changes this, but only if the return address
   is overwritten with zero.

2. It may be fundamentally impossible to unwind code produced by an
   optimizing compiler without additional debug info.  We can't
   consider the fact that GDB gets the return address wrong if the
   debug info is missing a bug in GDB.  Again the extra frame signals
   the user that something is wrong.

Reason #2 is exactly the example Daniel gave in the message that
started this thread.  He noticed that something was wrong and fixed
the problem by provide the necessary debug info.  Now Daniel would
probably have noticed the problem even if the extra frame had not been
there.  But it almost certainly would have taken him a bit more time,
and a less experienced user might not have noticed it.

> This is based on the (sound) general argument that GDB should be
> picky about the state it examines, and report anomalies.  But that
> heuristic just doesn't make sense when the particular "anomaly" at
> hand is, in fact, a valid and deliberate end-of-stack indicator on
> many systems.

Many systems, but certainly not all systems.  At least i386, amd64,
sparc and sparc64 don't use this convention.  I'm sure there are
others.  From the traffic on the lists it sure seems as if 90% of the
people are using gdb to remotely debug an ARM target from a Windows
system, but there must be some sort of selection effect going on here,
since more than 10% of our users must be using GDB on i386/amd64
GNU/Linux systems.

Mark

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-18 20:43                           ` Mark Kettenis
@ 2006-05-18 23:31                             ` Jim Blandy
  2006-05-20 22:26                               ` Mark Kettenis
  2006-05-19  3:32                             ` Daniel Jacobowitz
  2006-05-19 12:26                             ` Eli Zaretskii
  2 siblings, 1 reply; 38+ messages in thread
From: Jim Blandy @ 2006-05-18 23:31 UTC (permalink / raw)
  To: Mark Kettenis; +Cc: pgilliam, andrew.stubbs, brobecker, drow, gdb-patches


Mark Kettenis <mark.kettenis@xs4all.nl> writes:
>> Nobody has written us saying they want to choose whether GDB treats a
>> zero return address as indicating the end of the stack.  Rather, many
>> users have written us complaining that GDB displays extra frames at
>> the end of well-formed, non-corrupt stacks.  And over the course of
>> the what seems like dozens of embedded GDB ports I've debugged since
>> 1997, I've come across the same behavior many times myself.
>
> If we're sure that zero return address actually signals the end of the
> stack, then indeed we should not print the extra frame.  I'm not
> arguing with that.  But that's defenitely 

You've said a few times that you agree GDB should support this
convention where it is followed.  Dan's patch accomplishes that, but
in a way you don't like.  Do you have a suggestion on how it should be
done?  Dan reluctantly suggested a gdbarch flag; what do you think of
that?


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-18 20:43                           ` Mark Kettenis
  2006-05-18 23:31                             ` Jim Blandy
@ 2006-05-19  3:32                             ` Daniel Jacobowitz
  2006-05-20 21:30                               ` Mark Kettenis
  2006-05-19 12:26                             ` Eli Zaretskii
  2 siblings, 1 reply; 38+ messages in thread
From: Daniel Jacobowitz @ 2006-05-19  3:32 UTC (permalink / raw)
  To: Mark Kettenis; +Cc: jimb, pgilliam, andrew.stubbs, brobecker, gdb-patches

On Thu, May 18, 2006 at 10:04:09PM +0200, Mark Kettenis wrote:
> If we're sure that zero return address actually signals the end of the
> stack, then indeed we should not print the extra frame.  I'm not
> arguing with that.  But that's defenitely 

Incomplete sentence?  But, I think there was enough context.

> Many systems, but certainly not all systems.  At least i386, amd64,
> sparc and sparc64 don't use this convention.

I hate to break it to you but... that's not 100% true.  Most psABI
documents don't cover clean stack ending, so operating systems often
have to pick their own (or sometimes they do specify it and OS's go off
and do their own thing anyway, I expect).  I've just checked, and
sparc-vxworks definitely does end backtraces for kernel mode tasks by
setting the return address to 0 before it spawns a new task.

i386-vxworks sets %ebp to zero to indicate the end of the stack, I
believe.

I checked RTEMS too, but the results were somewhat inconclusive; I'm
not sure it deliberately initializes all registers.  That was an
educational dive through RTOS source, though.

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-18 20:43                           ` Mark Kettenis
  2006-05-18 23:31                             ` Jim Blandy
  2006-05-19  3:32                             ` Daniel Jacobowitz
@ 2006-05-19 12:26                             ` Eli Zaretskii
  2006-05-19 18:12                               ` Jim Blandy
  2 siblings, 1 reply; 38+ messages in thread
From: Eli Zaretskii @ 2006-05-19 12:26 UTC (permalink / raw)
  To: Mark Kettenis; +Cc: gdb-patches

> Date: Thu, 18 May 2006 22:04:09 +0200 (CEST)
> From: Mark Kettenis <mark.kettenis@xs4all.nl>
> CC: pgilliam@us.ibm.com, andrew.stubbs@st.com, brobecker@adacore.com,         drow@false.org, mark.kettenis@xs4all.nl, gdb-patches@sourceware.org
> 
> > The only reason presented in this thread for displaying those frames
> > at all is that it can be an indication of a bug in GDB.
> 
> No.  There are two reasons why we should print those frames, and I
> consider both of them not to be an indication of a bug in GDB.
> 
> 1. Because of a bug in the program you're debugging, it has
>    overwritten the return address on the stack.  Currently this causes
>    the extra frame to be printed signalling the user that something is
>    wrong.  Daniel's patch changes this, but only if the return address
>    is overwritten with zero.
> 
> 2. It may be fundamentally impossible to unwind code produced by an
>    optimizing compiler without additional debug info.  We can't
>    consider the fact that GDB gets the return address wrong if the
>    debug info is missing a bug in GDB.  Again the extra frame signals
>    the user that something is wrong.

I think it was already suggested in this lengthy thread to display
some kind of message to alert the user.  For example:

  (Backtrace terminated due to zero return address.)

Would this make everybody fairly happy to zero in on a solution?


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-19 12:26                             ` Eli Zaretskii
@ 2006-05-19 18:12                               ` Jim Blandy
  2006-05-19 18:53                                 ` Eli Zaretskii
  0 siblings, 1 reply; 38+ messages in thread
From: Jim Blandy @ 2006-05-19 18:12 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Mark Kettenis, gdb-patches

Eli Zaretskii <eliz@gnu.org> writes:
>> Date: Thu, 18 May 2006 22:04:09 +0200 (CEST)
>> From: Mark Kettenis <mark.kettenis@xs4all.nl>
>> CC: pgilliam@us.ibm.com, andrew.stubbs@st.com, brobecker@adacore.com,         drow@false.org, mark.kettenis@xs4all.nl, gdb-patches@sourceware.org
>> 
>> > The only reason presented in this thread for displaying those frames
>> > at all is that it can be an indication of a bug in GDB.
>> 
>> No.  There are two reasons why we should print those frames, and I
>> consider both of them not to be an indication of a bug in GDB.
>> 
>> 1. Because of a bug in the program you're debugging, it has
>>    overwritten the return address on the stack.  Currently this causes
>>    the extra frame to be printed signalling the user that something is
>>    wrong.  Daniel's patch changes this, but only if the return address
>>    is overwritten with zero.
>> 
>> 2. It may be fundamentally impossible to unwind code produced by an
>>    optimizing compiler without additional debug info.  We can't
>>    consider the fact that GDB gets the return address wrong if the
>>    debug info is missing a bug in GDB.  Again the extra frame signals
>>    the user that something is wrong.
>
> I think it was already suggested in this lengthy thread to display
> some kind of message to alert the user.  For example:
>
>   (Backtrace terminated due to zero return address.)
>
> Would this make everybody fairly happy to zero in on a solution?

(UNFAIR UNHAPPINESS ABOUT NON-ZERO SOLUTIONS FOR EVERYBODY!!!)

Well, no: the stacks we'd like to display are healthy and well-formed,
according to the conventions of the system; there's nothing
non-standard about them at all.  So they ought to display as normal
stacks --- on those systems.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-19 18:12                               ` Jim Blandy
@ 2006-05-19 18:53                                 ` Eli Zaretskii
  2006-05-22 23:15                                   ` Jim Blandy
  0 siblings, 1 reply; 38+ messages in thread
From: Eli Zaretskii @ 2006-05-19 18:53 UTC (permalink / raw)
  To: Jim Blandy; +Cc: mark.kettenis, gdb-patches

> Cc: Mark Kettenis <mark.kettenis@xs4all.nl>,  gdb-patches@sourceware.org
> From: Jim Blandy <jimb@codesourcery.com>
> Date: Fri, 19 May 2006 10:38:36 -0700
> 
> > I think it was already suggested in this lengthy thread to display
> > some kind of message to alert the user.  For example:
> >
> >   (Backtrace terminated due to zero return address.)
> >
> > Would this make everybody fairly happy to zero in on a solution?
> 
> (UNFAIR UNHAPPINESS ABOUT NON-ZERO SOLUTIONS FOR EVERYBODY!!!)
> 
> Well, no: the stacks we'd like to display are healthy and well-formed,
> according to the conventions of the system; there's nothing
> non-standard about them at all.  So they ought to display as normal
> stacks --- on those systems.

Sorry, I'm too dumb today to see what's humor here and what's for
real.  It sounds like you want to see no message at all, and OTOH, you
also objected to having a user option for turning the message on and
off.  That leaves us at an impasse.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-19  3:32                             ` Daniel Jacobowitz
@ 2006-05-20 21:30                               ` Mark Kettenis
  0 siblings, 0 replies; 38+ messages in thread
From: Mark Kettenis @ 2006-05-20 21:30 UTC (permalink / raw)
  To: drow; +Cc: jimb, pgilliam, andrew.stubbs, brobecker, gdb-patches

> Date: Thu, 18 May 2006 19:30:17 -0400
> From: Daniel Jacobowitz <drow@false.org>
> 
> On Thu, May 18, 2006 at 10:04:09PM +0200, Mark Kettenis wrote:
> > If we're sure that zero return address actually signals the end of the
> > stack, then indeed we should not print the extra frame.  I'm not
> > arguing with that.  But that's defenitely 
> 
> Incomplete sentence?  But, I think there was enough context.

Think I wanted to say something like:

...not the only case where we see a zero pc.

> > Many systems, but certainly not all systems.  At least i386, amd64,
> > sparc and sparc64 don't use this convention.
> 
> I hate to break it to you but... that's not 100% true.  Most psABI
> documents don't cover clean stack ending, so operating systems often
> have to pick their own (or sometimes they do specify it and OS's go off
> and do their own thing anyway, I expect).  I've just checked, and
> sparc-vxworks definitely does end backtraces for kernel mode tasks by
> setting the return address to 0 before it spawns a new task.

Interesting because sparc is one of the few architectures that has a
framepointer register that you can't use for something else, such that
the a "zero framepointer" convention would actually work reliably.
Also not that on sparc we can actually unwind past a zero return
address as long as the frame pointer chain is still intact.

> i386-vxworks sets %ebp to zero to indicate the end of the stack, I
> believe.

This is the traditional way to signal the end of the stack.
Unfortunately GCC's move towards generating framepointer-less code,
makes this convention pretty much useless.  On i386, we should
probably move towards setting both %eip and %ebp to zero to indicate
the end of the stack.

> I checked RTEMS too, but the results were somewhat inconclusive; I'm
> not sure it deliberately initializes all registers.  That was an
> educational dive through RTOS source, though.

What really matters, is whether the threads library marks the end of
stack properly.  For the main thread, we usually end the backtrace at
main() (or its equivalent) and people debugging the startup code that
runs before main() probably now what they're doing.

Mark

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-18 23:31                             ` Jim Blandy
@ 2006-05-20 22:26                               ` Mark Kettenis
  2006-05-21  2:12                                 ` Daniel Jacobowitz
  0 siblings, 1 reply; 38+ messages in thread
From: Mark Kettenis @ 2006-05-20 22:26 UTC (permalink / raw)
  To: jimb; +Cc: pgilliam, andrew.stubbs, brobecker, drow, gdb-patches

> From: Jim Blandy <jimb@codesourcery.com>
> Date: Thu, 18 May 2006 14:17:54 -0700
> 
> Mark Kettenis <mark.kettenis@xs4all.nl> writes:
> >> Nobody has written us saying they want to choose whether GDB treats a
> >> zero return address as indicating the end of the stack.  Rather, many
> >> users have written us complaining that GDB displays extra frames at
> >> the end of well-formed, non-corrupt stacks.  And over the course of
> >> the what seems like dozens of embedded GDB ports I've debugged since
> >> 1997, I've come across the same behavior many times myself.
> >
> > If we're sure that zero return address actually signals the end of the
> > stack, then indeed we should not print the extra frame.  I'm not
> > arguing with that.  But that's defenitely 
> 
> You've said a few times that you agree GDB should support this
> convention where it is followed.  Dan's patch accomplishes that, but
> in a way you don't like.  Do you have a suggestion on how it should be
> done?  Dan reluctantly suggested a gdbarch flag; what do you think of
> that?

I actually think that something like that is the way to go.  It's
closely related to what Dan wrote about in:

http://sourceware.org/ml/gdb/2006-05/msg00109.html

and I'd like to have a go at implementing option #2 in that mail.
Unfortunately I'm leaving for a a four-week trip tomorrow.  I won't be
able to read my mail for most of the time between now and june 17.

Mark


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-20 22:26                               ` Mark Kettenis
@ 2006-05-21  2:12                                 ` Daniel Jacobowitz
  2006-07-21 15:52                                   ` Andrew STUBBS
  0 siblings, 1 reply; 38+ messages in thread
From: Daniel Jacobowitz @ 2006-05-21  2:12 UTC (permalink / raw)
  To: Mark Kettenis; +Cc: jimb, pgilliam, andrew.stubbs, brobecker, gdb-patches

On Sat, May 20, 2006 at 11:29:04PM +0200, Mark Kettenis wrote:
> I actually think that something like that is the way to go.  It's
> closely related to what Dan wrote about in:
> 
> http://sourceware.org/ml/gdb/2006-05/msg00109.html
> 
> and I'd like to have a go at implementing option #2 in that mail.

That (the frame unwinder end-of-stack method) wouldn't actually help
with this problem; that's why I sent the two separately (they were
originally the same message when I was writing it).  The architecture
unwinder could report a saved pc of zero as terminating the stack, but
in all the cases I'm interested in, the DWARF-2 unwinder is actually
used for the bottom frame.

> Unfortunately I'm leaving for a a four-week trip tomorrow.  I won't be
> able to read my mail for most of the time between now and june 17.

Well, enjoy your trip.  I have a thousand different things that need
merging, and I'll be working on them over the coming weeks; but it
won't do me any harm to table this one until we have time to discuss it
further, as long as we've deliberately postponed it rather than let it
fall through :-)

So, I shall not pursue this patch right now, and we'll discuss it later
(probably not until July; the GCC Summit at the end of June is going to
wreck my schedule around when you get back).  Thanks for all the
comments!

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-19 18:53                                 ` Eli Zaretskii
@ 2006-05-22 23:15                                   ` Jim Blandy
  0 siblings, 0 replies; 38+ messages in thread
From: Jim Blandy @ 2006-05-22 23:15 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mark.kettenis, gdb-patches

Eli Zaretskii <eliz@gnu.org> writes:
>> Cc: Mark Kettenis <mark.kettenis@xs4all.nl>,  gdb-patches@sourceware.org
>> From: Jim Blandy <jimb@codesourcery.com>
>> Date: Fri, 19 May 2006 10:38:36 -0700
>> 
>> > I think it was already suggested in this lengthy thread to display
>> > some kind of message to alert the user.  For example:
>> >
>> >   (Backtrace terminated due to zero return address.)
>> >
>> > Would this make everybody fairly happy to zero in on a solution?
>> 
>> (UNFAIR UNHAPPINESS ABOUT NON-ZERO SOLUTIONS FOR EVERYBODY!!!)
>> 
>> Well, no: the stacks we'd like to display are healthy and well-formed,
>> according to the conventions of the system; there's nothing
>> non-standard about them at all.  So they ought to display as normal
>> stacks --- on those systems.
>
> Sorry, I'm too dumb today to see what's humor here and what's for
> real.  It sounds like you want to see no message at all, and OTOH, you
> also objected to having a user option for turning the message on and
> off.  That leaves us at an impasse.

The first sentence was meant to be funny.  The following paragraph was
completely serious.  I don't think we're at an impasse.

Right now, on systems that terminate their stacks with zero return
addresses, GDB displays well-formed stacks incorrectly, showing an
extra frame after the oldest real frame.  Everyone agrees, including
Mark, that this behavior is wrong, and should be fixed.

Daniel's original proposal, which Joel and I think is fine, was to
make GDB treat a null return address as a proper end-of-stack on all
systems.  This would be a simple, localized change.  It would fix the
bug observed on zero-return-addr-stack-end systems.

The drawback Mark pointed out is that, on a system that does not use a
zero return address to indicate the end of the stack, if a stack has
been corrupted by having some return address overwritten with a zero,
GDB will display that stack as ending normally.  That is, because
Daniel's proposed change would affect all systems, not just systems
which intentionally use zero return addresses, it could make certain
kinds of corruption somewhat less apparent: instead of ending with a
weird frame with a 0x00000000 PC, the backtrace will simply end at the
prior frame.  You'll still have a stack that ends before you'd expect,
but it won't end in a mess.

The compromise is to do something which makes the interpretation of
zero return addresses specific to those ABI's that use it.  Daniel
suggested a gdbarch flag, but Mark wants to pursue having the
unwinders on such ABI's make the decision.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-05-21  2:12                                 ` Daniel Jacobowitz
@ 2006-07-21 15:52                                   ` Andrew STUBBS
  2006-07-22 11:23                                     ` Mark Kettenis
  0 siblings, 1 reply; 38+ messages in thread
From: Andrew STUBBS @ 2006-07-21 15:52 UTC (permalink / raw)
  To: gdb-patches

Did anything more happen to this since May?

This is a feature I would like to see.

Daniel Jacobowitz wrote:
> On Sat, May 20, 2006 at 11:29:04PM +0200, Mark Kettenis wrote:
>> I actually think that something like that is the way to go.  It's
>> closely related to what Dan wrote about in:
>>
>> http://sourceware.org/ml/gdb/2006-05/msg00109.html
>>
>> and I'd like to have a go at implementing option #2 in that mail.
> 
> That (the frame unwinder end-of-stack method) wouldn't actually help
> with this problem; that's why I sent the two separately (they were
> originally the same message when I was writing it).  The architecture
> unwinder could report a saved pc of zero as terminating the stack, but
> in all the cases I'm interested in, the DWARF-2 unwinder is actually
> used for the bottom frame.
> 
>> Unfortunately I'm leaving for a a four-week trip tomorrow.  I won't be
>> able to read my mail for most of the time between now and june 17.
> 
> Well, enjoy your trip.  I have a thousand different things that need
> merging, and I'll be working on them over the coming weeks; but it
> won't do me any harm to table this one until we have time to discuss it
> further, as long as we've deliberately postponed it rather than let it
> fall through :-)
> 
> So, I shall not pursue this patch right now, and we'll discuss it later
> (probably not until July; the GCC Summit at the end of June is going to
> wreck my schedule around when you get back).  Thanks for all the
> comments!
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-07-21 15:52                                   ` Andrew STUBBS
@ 2006-07-22 11:23                                     ` Mark Kettenis
  2006-07-24 19:32                                       ` Daniel Jacobowitz
  0 siblings, 1 reply; 38+ messages in thread
From: Mark Kettenis @ 2006-07-22 11:23 UTC (permalink / raw)
  To: andrew.stubbs; +Cc: gdb-patches

> Date: Fri, 21 Jul 2006 16:52:08 +0100
> From: Andrew STUBBS <andrew.stubbs@st.com>
> 
> This is a feature I would like to see.

Thanks Andrew, for reminding us.

> Daniel Jacobowitz wrote:
> > On Sat, May 20, 2006 at 11:29:04PM +0200, Mark Kettenis wrote:
> >> I actually think that something like that is the way to go.  It's
> >> closely related to what Dan wrote about in:
> >>
> >> http://sourceware.org/ml/gdb/2006-05/msg00109.html
> >>
> >> and I'd like to have a go at implementing option #2 in that mail.
> > 
> > That (the frame unwinder end-of-stack method) wouldn't actually help
> > with this problem; that's why I sent the two separately (they were
> > originally the same message when I was writing it).  The architecture
> > unwinder could report a saved pc of zero as terminating the stack, but
> > in all the cases I'm interested in, the DWARF-2 unwinder is actually
> > used for the bottom frame.

But the "having a saved pc of zero" is only one of the conventions
used for terminating the frame chain.  So the DWARF-2 unwinder would
still fail to do the right thing for ISA/ABI's that use a different
convention.  So I think we need an ISA/ABI-specefic callback to
determine whether a frame is the outermost frame, just like we already
have for signal trampolines.  This actually matches the following idea
I had pretty well:

The outermost frame is special, just like sigtramp and dummy frames.
Why not introduce a new frame type OUTERMOST_FRAME and make
get_prev_frame() return NULL if that's the type of THIS_FRAME's?  This
would require some changes to the current frame unwinder
infrastructure, since the type is currently hardcoded into the
unwinder.  That would have the additional benefit that we could get
rid of the bogosity that we have multiple frame unwinder definitions
in the DWARF-2 unwinder just to handle signal trampolines.

Thoughts?

Mark

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-07-22 11:23                                     ` Mark Kettenis
@ 2006-07-24 19:32                                       ` Daniel Jacobowitz
  2006-07-26 22:16                                         ` Mark Kettenis
  0 siblings, 1 reply; 38+ messages in thread
From: Daniel Jacobowitz @ 2006-07-24 19:32 UTC (permalink / raw)
  To: Mark Kettenis; +Cc: andrew.stubbs, gdb-patches

On Sat, Jul 22, 2006 at 01:22:53PM +0200, Mark Kettenis wrote:
> But the "having a saved pc of zero" is only one of the conventions
> used for terminating the frame chain.  So the DWARF-2 unwinder would
> still fail to do the right thing for ISA/ABI's that use a different
> convention.  So I think we need an ISA/ABI-specefic callback to
> determine whether a frame is the outermost frame, just like we already
> have for signal trampolines.

I'm fine with this; it seems easy enough to work with.  Have you got
any other conventions in mind?  For functions which might use the DWARF
unwinder, I've only seen the return address undefined convention (which
we basically made up), and the PC == 0 convention.  We could implement
e.g. %ebp == 0, but you've pointed out that it can misfire with
-fomit-frame-pointer.

Where would the PC==0 test be applied?  I and several others in
this thread have said that it makes sense in every case for every
platform - that's why I wanted it in frame.c, rather than in each
individual unwinder.  I can't tell from this paragraph whether we've
convinced you, or whether you would push it out to the individual
architectures to decide.  I'm unhappy with individual architecture
knobs for basically the same reason we've agreed it shouldn't be a user
knob.

> The outermost frame is special, just like sigtramp and dummy frames.
> Why not introduce a new frame type OUTERMOST_FRAME and make
> get_prev_frame() return NULL if that's the type of THIS_FRAME's?  This
> would require some changes to the current frame unwinder
> infrastructure, since the type is currently hardcoded into the
> unwinder.  That would have the additional benefit that we could get
> rid of the bogosity that we have multiple frame unwinder definitions
> in the DWARF-2 unwinder just to handle signal trampolines.

I suspect that OUTERMOST_FRAME would be a lot of trouble.  There's a
lot of checks on frame types that look for "abnormal" frames via
->type != NORMAL_FRAME; but a function called by an OUTERMOST_FRAME was
a normal call, just like a function called by a NORMAL_FRAME.  That's
why I suggested a new unwinder method in struct frame_unwind.

-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-07-24 19:32                                       ` Daniel Jacobowitz
@ 2006-07-26 22:16                                         ` Mark Kettenis
  2006-07-26 22:25                                           ` Daniel Jacobowitz
  0 siblings, 1 reply; 38+ messages in thread
From: Mark Kettenis @ 2006-07-26 22:16 UTC (permalink / raw)
  To: drow; +Cc: andrew.stubbs, gdb-patches

> Date: Mon, 24 Jul 2006 15:32:45 -0400
> From: Daniel Jacobowitz <drow@false.org>
> 
> On Sat, Jul 22, 2006 at 01:22:53PM +0200, Mark Kettenis wrote:
> > But the "having a saved pc of zero" is only one of the conventions
> > used for terminating the frame chain.  So the DWARF-2 unwinder would
> > still fail to do the right thing for ISA/ABI's that use a different
> > convention.  So I think we need an ISA/ABI-specefic callback to
> > determine whether a frame is the outermost frame, just like we already
> > have for signal trampolines.
> 
> I'm fine with this; it seems easy enough to work with.  Have you got
> any other conventions in mind?  For functions which might use the DWARF
> unwinder, I've only seen the return address undefined convention (which
> we basically made up), and the PC == 0 convention.  We could implement
> e.g. %ebp == 0, but you've pointed out that it can misfire with
> -fomit-frame-pointer.

On ISA/ABI's with a "hard" frame pointer, FP == 0 is a widely used
convention, i.e. %fp == 0 on sparc and sparc64.  For i386, where %ebp
== 0 no longer works because the compiler people realized it isn't a
"fard" frame pointer, a condition like %eip == 0 && %ebp == 0 might
actually be workable.  For some operating systems we might even want
to look the function name.

> Where would the PC==0 test be applied?  I and several others in
> this thread have said that it makes sense in every case for every
> platform - that's why I wanted it in frame.c, rather than in each
> individual unwinder.  I can't tell from this paragraph whether we've
> convinced you, or whether you would push it out to the individual
> architectures to decide.  I'm unhappy with individual architecture
> knobs for basically the same reason we've agreed it shouldn't be a user
> knob.

I though I have been very clear: PC == 0 is unacceptable as a generic
condition for terminating stack frames.  So yes, this will be pushed
into the individual architectures.  That's good since it forces people
to think about the proper termination condition for their ISA/ABI.

> > The outermost frame is special, just like sigtramp and dummy frames.
> > Why not introduce a new frame type OUTERMOST_FRAME and make
> > get_prev_frame() return NULL if that's the type of THIS_FRAME's?  This
> > would require some changes to the current frame unwinder
> > infrastructure, since the type is currently hardcoded into the
> > unwinder.  That would have the additional benefit that we could get
> > rid of the bogosity that we have multiple frame unwinder definitions
> > in the DWARF-2 unwinder just to handle signal trampolines.
> 
> I suspect that OUTERMOST_FRAME would be a lot of trouble.  There's a
> lot of checks on frame types that look for "abnormal" frames via
> ->type != NORMAL_FRAME; but a function called by an OUTERMOST_FRAME was
> a normal call, just like a function called by a NORMAL_FRAME.  That's
> why I suggested a new unwinder method in struct frame_unwind.

Actually there is only a limited number of such tests in our tree: a
couple of them in frame.c, and two in s390-tdep.c.  Many of the checks
in frame.c actually deal with stuff that we might want to do
differently for OUTERMOST_FRAME.  The checks in s390-tdep.c both have
this FIXME on them:

  /* FIXME: cagney/2004-05-01: This sanity check shouldn't be needed,
     instead the code should simpliy rely on its analysis.  */

Mark


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] Move the frame zero PC check earlier
  2006-07-26 22:16                                         ` Mark Kettenis
@ 2006-07-26 22:25                                           ` Daniel Jacobowitz
  0 siblings, 0 replies; 38+ messages in thread
From: Daniel Jacobowitz @ 2006-07-26 22:25 UTC (permalink / raw)
  To: Mark Kettenis; +Cc: andrew.stubbs, gdb-patches

On Thu, Jul 27, 2006 at 12:15:29AM +0200, Mark Kettenis wrote:
> > Where would the PC==0 test be applied?  I and several others in
> > this thread have said that it makes sense in every case for every
> > platform - that's why I wanted it in frame.c, rather than in each
> > individual unwinder.  I can't tell from this paragraph whether we've
> > convinced you, or whether you would push it out to the individual
> > architectures to decide.  I'm unhappy with individual architecture
> > knobs for basically the same reason we've agreed it shouldn't be a user
> > knob.
> 
> I though I have been very clear: PC == 0 is unacceptable as a generic
> condition for terminating stack frames.  So yes, this will be pushed
> into the individual architectures.  That's good since it forces people
> to think about the proper termination condition for their ISA/ABI.

I guess I was hoping we'd convinced you.  I demonstrated that at least
one of the architectures you listed as not having use for this
convention, under at least one operating system, actually did use it.
Jim and Joel and Andrew all piped up to support using it universally.

But if you're dead set against it, I don't see any way forward except
to concede.  This means I'm going to keep getting bug reports with the
same solutions on different architectures for a while :-(  Oh well.

> > > The outermost frame is special, just like sigtramp and dummy frames.
> > > Why not introduce a new frame type OUTERMOST_FRAME and make
> > > get_prev_frame() return NULL if that's the type of THIS_FRAME's?  This
> > > would require some changes to the current frame unwinder
> > > infrastructure, since the type is currently hardcoded into the
> > > unwinder.  That would have the additional benefit that we could get
> > > rid of the bogosity that we have multiple frame unwinder definitions
> > > in the DWARF-2 unwinder just to handle signal trampolines.
> > 
> > I suspect that OUTERMOST_FRAME would be a lot of trouble.  There's a
> > lot of checks on frame types that look for "abnormal" frames via
> > ->type != NORMAL_FRAME; but a function called by an OUTERMOST_FRAME was
> > a normal call, just like a function called by a NORMAL_FRAME.  That's
> > why I suggested a new unwinder method in struct frame_unwind.
> 
> Actually there is only a limited number of such tests in our tree: a
> couple of them in frame.c, and two in s390-tdep.c.  Many of the checks
> in frame.c actually deal with stuff that we might want to do
> differently for OUTERMOST_FRAME.  The checks in s390-tdep.c both have
> this FIXME on them:

You're right - I thought there were far more than there actually are.
I guess this would work.

Maybe we need a few predicate functions for these types...

>   /* FIXME: cagney/2004-05-01: This sanity check shouldn't be needed,
>      instead the code should simpliy rely on its analysis.  */

Andrew and I argued about this comment several times.  I disagree with
it, and in fact have various patches floating around to do similar
tests on more architectures.  It's an extremely useful technique, done
carefully.

-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2006-07-26 22:25 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-05-10 18:03 [RFC] Move the frame zero PC check earlier Daniel Jacobowitz
2006-05-11 10:42 ` Andrew STUBBS
2006-05-11 22:24 ` Jim Blandy
2006-05-11 22:32   ` Daniel Jacobowitz
2006-05-12  6:21     ` Jim Blandy
2006-05-12 12:46       ` Daniel Jacobowitz
2006-05-13 10:14 ` Mark Kettenis
2006-05-13 15:17   ` Daniel Jacobowitz
2006-05-13 15:46     ` Daniel Jacobowitz
2006-05-13 17:08       ` Mark Kettenis
2006-05-13 16:49     ` Mark Kettenis
2006-05-13 18:53       ` Daniel Jacobowitz
2006-05-16 21:38       ` Daniel Jacobowitz
2006-05-16 22:19         ` Mark Kettenis
2006-05-16 22:46           ` Daniel Jacobowitz
2006-05-16 23:53             ` PAUL GILLIAM
2006-05-18  1:35               ` Joel Brobecker
2006-05-18  9:31                 ` Jim Blandy
2006-05-18 10:09                   ` Andrew STUBBS
2006-05-18 17:36                     ` Jim Blandy
2006-05-18 18:09                       ` PAUL GILLIAM
2006-05-18 20:04                         ` Jim Blandy
2006-05-18 20:43                           ` Mark Kettenis
2006-05-18 23:31                             ` Jim Blandy
2006-05-20 22:26                               ` Mark Kettenis
2006-05-21  2:12                                 ` Daniel Jacobowitz
2006-07-21 15:52                                   ` Andrew STUBBS
2006-07-22 11:23                                     ` Mark Kettenis
2006-07-24 19:32                                       ` Daniel Jacobowitz
2006-07-26 22:16                                         ` Mark Kettenis
2006-07-26 22:25                                           ` Daniel Jacobowitz
2006-05-19  3:32                             ` Daniel Jacobowitz
2006-05-20 21:30                               ` Mark Kettenis
2006-05-19 12:26                             ` Eli Zaretskii
2006-05-19 18:12                               ` Jim Blandy
2006-05-19 18:53                                 ` Eli Zaretskii
2006-05-22 23:15                                   ` Jim Blandy
2006-05-15 13:57   ` Andrew STUBBS

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox