More info on PR/9711 (quadratic slowdown for deep stack traces)

Mirror of the gdb mailing list
 help / color / mirror / Atom feed

* More info on PR/9711 (quadratic slowdown for deep stack traces)
@ 2009-09-01 20:48 Joel Brobecker
  2009-09-02  3:42 ` Paul Pluzhnikov
  2009-09-02 11:43 ` Daniel Jacobowitz
  0 siblings, 2 replies; 7+ messages in thread
From: Joel Brobecker @ 2009-09-01 20:48 UTC (permalink / raw)
  To: gdb

Hello,

I think I understand a little better what's causing the debugger
to be really slow in the testcase provided with the PR. I logged
the following entry in the PR yesterday:

> The slowdown comes from printing the frame corresponding to "main",
> because argv is stored in %rsi. The following transcript shows a more
> isolated way of reproducing the issue:
> 
> (gdb) set print frame-arguments none
> (gdb) run 5000
> Starting program: /[...]/deep 5000
> 
> Program received signal SIGABRT, Aborted.
> 0x00007ffff7a9efb5 in raise () from /lib/libc.so.6
> (gdb) frame 5004
> #5004 0x00000000004005a2 in main (argc=..., argv=...) at deep.c:34
> 34       return visit (&n);
> (gdb) p /x $rsi
> $1 = 0x748b
> 
> The "p /x $rsi" take a long time, while the rest is pretty much
> instantaneous.  That's as much as I can do for today, but look at it
> some more tomorrow.

I reconstructed the part that takes time:

value_contents_all (...):
  => value_fetch_lazy:
     | while (VALUE_LVAL (new_val) == lval_register
     | {      && value_lazy (new_val))
     | { frame = frame_find_by_id (VALUE_FRAME_ID (new_val))
     | {   [ for (frame = get_current_frame (); ; frame = prev_frame)
     | {   [ | if (frame_id_eq (id, this))
     | {   [ |   return frame
     | {   [ | prev_frame = get_prev_frame (frame)
     | {   [ |   { if ([...] && inside_main_func (this_frame))
     | {   [ |   {   return NULL
     | {   [ |   { if ([...] && inside_entry_func (this_frame))
     | {   [ |   {   return NULL
     | {   [ |   { if ([...] && get_frame_pc (this_frame) == 0)
     | {   [ |   {   return NULL
     | {   [ |   { return get_prev_frame_1 (this_frame);
     | {   [ |   {   | if (this_frame->prev_p)
     | {   [ |   {   |   return this_frame->prev;
     | {   [ | if (!prev_frame)
     | {   [ |   return NULL
     | {   [ | if (get_frame_type (frame) == NORMAL_FRAME
     | {   [ |     && !frame_id_inner ()
     | {   [ |     && !frame_id_inner ())
     | {   [ |   return NULL
     | { new_val = get_frame_register_value (frame, regnum);
     | {   [ return frame_unwind_register_value (frame->next, regnum)
     | {   [   |  value = frame->unwind->prev_register (frame, [...], regnum);
     | {   [   |    { case DWARF2_FRAME_REG_UNSPECIFIED:
     | {   [   |    {   return frame_unwind_got_register (this_frame, regnum, regnum);
     | {   [   |    {    [  return return value_of_register_lazy (frame, new_regnum);

It might be a little cryptic, but it gives a good overall idea of what
gets called and why.

The first interesting piece of information is that the time is spent
trying to get the the contents of our lval_register value
(value_contents_all), and that it corresponds to a while loop.
This loop does two things:

  1. Find the value's frame, using the frame ID.
     This turns out to be a non-trivial operation!
        (a) we start at frame #0
        (b) for every frame:
            + if frame->id does not match:
                -> get the previous frame, which as you can see above
                   involves a lot of checking (main func, entry point,
                   null pc, etc)

  2. Call get_frame_register_value (frame, regnum)
     This, as far as I can tell from code inspection, should return
     pretty promptly a *lazy* register lval for the same register
     number, but with the next frame.

The while loop combined with the frame_find_by_id is O(n^2), because
they both walk the entire frame chain. But what makes it really slow,
I think, is the fact that each call to get_prev_frame is really
expensive because of the various checks.

One improvement I'm looking at is the fact that get_prev_frame is
doing redundant checks when the previous frame has already been
computed. So I'm currently testing the idea of adding:

  if (this_frame->prev_p)
    return this_frame->prev;

(before doing the various checks and then calling get_prev_frame_1).
Does anyone see why this would be a problem? It does not change the
fact that the current algorithm is still O(n^2), but it does make
the frame_find_by_id loop a lot lighter. With the example above and
10_000 iterations, my debug traces showed a reduction from 40+ secs
down to about 13secs.

There might be some other micro optimizations that we could do,
I haven't looked further. But to break the quadratic behavior,
I suspect we need to avoid the call to frame_find_by_id, which means
that we would net to have prev_register routine return the frame in
addition to the value - I think that we would have to be very careful
with what we do with the frame.  Another option that might be more workable
is to have the prev_register routine always return a non-lazy value,
going down the call stack as far as necessary until finally getting
the actual register value.

Any other suggestions, perhaps?

Taking a step back, it does not feel like this is actually a super
common issue. For this to happen, we need a program such that we get
a really deep stack (with 1000 iterations, it takes 1.5 secs before
my change, about 0.5 sec after IIRC), and a register that does not
get saved throughout the entire stack. Since we still end up printing
the correct value at the end, is it worth enhancing the design to
handle this case?

-- 
Joel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: More info on PR/9711 (quadratic slowdown for deep stack traces)
  2009-09-01 20:48 More info on PR/9711 (quadratic slowdown for deep stack traces) Joel Brobecker
@ 2009-09-02  3:42 ` Paul Pluzhnikov
  2009-09-02  4:52   ` Joel Brobecker
  2009-09-02 11:43 ` Daniel Jacobowitz
  1 sibling, 1 reply; 7+ messages in thread
From: Paul Pluzhnikov @ 2009-09-02  3:42 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: gdb

On Tue, Sep 1, 2009 at 1:48 PM, Joel Brobecker<brobecker@adacore.com> wrote:

> Taking a step back, it does not feel like this is actually a super
> common issue. For this to happen, we need a program such that we get
> a really deep stack (with 1000 iterations, it takes 1.5 secs before
> my change, about 0.5 sec after IIRC), and a register that does not
> get saved throughout the entire stack. Since we still end up printing
> the correct value at the end, is it worth enhancing the design to
> handle this case?

It's been almost a year since I filed this PR, so am hazy on details :-(

AFAIR, the real problem showed up while debugging GDB itself, when I
made it go into infinite recursion loop. Making programs spin into the
ground via infinite recursion is not that uncommon (IMHO) and when
that happens, you do get 100_000 or more frames, and usually you only
care about the outermost 10 of so. It is quite annoying if GDB takes
several minutes to tell you what these 10 interesting frames are.

Cheers,
-- 
Paul Pluzhnikov

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: More info on PR/9711 (quadratic slowdown for deep stack traces)
  2009-09-02  3:42 ` Paul Pluzhnikov
@ 2009-09-02  4:52   ` Joel Brobecker
  2009-09-02  4:57     ` Paul Pluzhnikov
  0 siblings, 1 reply; 7+ messages in thread
From: Joel Brobecker @ 2009-09-02  4:52 UTC (permalink / raw)
  To: Paul Pluzhnikov; +Cc: gdb

> AFAIR, the real problem showed up while debugging GDB itself, when I
> made it go into infinite recursion loop. Making programs spin into the
> ground via infinite recursion is not that uncommon (IMHO) and when
> that happens, you do get 100_000 or more frames, and usually you only
> care about the outermost 10 of so. It is quite annoying if GDB takes
> several minutes to tell you what these 10 interesting frames are.

I should probably say that I am not contesting the fact that the problem
can happen in real life. I did assume that, given the requirements
for it to happen, the problem was not that common, and perhaps I was
mistaken. It's always hard to say how common an issue is.

That being said, here are the current parameters:

  - I will submit a patch tomorrow that implements the first idea
    that I floated. Namely, if the previous frame has already been
    computed, then have get_prev_frame return that.  This cuts down
    most of the time spent during the register value computation
    (roughly 60% with 10_000 frames).

  - I don't see how, right now, we could get rid of the quadratic
    behavior. It's embedded in the current design: We now get register
    values, and values cannot store the frame directly, it has to be
    the frame ID. This means a frame lookup from ID, which is the
    second loop causing the n^2 behavior.

I am hoping that Daniel, who has more experience than I do in the
area of unwinding, might be able to suggest something that would
help us get rid of the double loop. But, assuming that my patch
is approved, do you think that we should delay the release in order
to get this changed into an O(n) behavior?

-- 
Joel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: More info on PR/9711 (quadratic slowdown for deep stack traces)
  2009-09-02  4:52   ` Joel Brobecker
@ 2009-09-02  4:57     ` Paul Pluzhnikov
  0 siblings, 0 replies; 7+ messages in thread
From: Paul Pluzhnikov @ 2009-09-02  4:57 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: gdb

On Tue, Sep 1, 2009 at 9:52 PM, Joel Brobecker<brobecker@adacore.com> wrote:

> do you think that we should delay the release in order
> to get this changed into an O(n) behavior?

No, not me: I have in fact not encountered this problem in all of last
year, so perhaps it is quite uncommon :-)

-- 
Paul Pluzhnikov


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: More info on PR/9711 (quadratic slowdown for deep stack traces)
  2009-09-01 20:48 More info on PR/9711 (quadratic slowdown for deep stack traces) Joel Brobecker
  2009-09-02  3:42 ` Paul Pluzhnikov
@ 2009-09-02 11:43 ` Daniel Jacobowitz
  2009-09-02 16:24   ` Joel Brobecker
  1 sibling, 1 reply; 7+ messages in thread
From: Daniel Jacobowitz @ 2009-09-02 11:43 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: gdb

On Tue, Sep 01, 2009 at 01:48:15PM -0700, Joel Brobecker wrote:
> One improvement I'm looking at is the fact that get_prev_frame is
> doing redundant checks when the previous frame has already been
> computed. So I'm currently testing the idea of adding:
> 
>   if (this_frame->prev_p)
>     return this_frame->prev;
> 
> (before doing the various checks and then calling get_prev_frame_1).
> Does anyone see why this would be a problem? It does not change the

This almost, almost works.  I can see two problems:

* Can the results of those checks ever change?  For instance, does
"set backtrace past-main off" flush the stack frame?  If not, it won't
take effect right away after this change.

* There are places that call get_prev_frame_1 directly.  So one frame
past main may be in the chain, but we usually don't want it.

Now, the results of the checks could be cached in the frame object...

> There might be some other micro optimizations that we could do,
> I haven't looked further. But to break the quadratic behavior,
> I suspect we need to avoid the call to frame_find_by_id, which means
> that we would net to have prev_register routine return the frame in
> addition to the value - I think that we would have to be very careful
> with what we do with the frame.

Have frame_find_by_id keep a cache with the same lifetime as
current_frame.  If it saves the last frame it returned, we can try
that frame and the previous frame before going on to searching from
the current frame.

Also, have frame_find_by_id use get_prev_frame_1?  I'm not sure why we
don't.  You'd have to look at the callers, but I can't see the problem.

-- 
Daniel Jacobowitz
CodeSourcery

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: More info on PR/9711 (quadratic slowdown for deep stack traces)
  2009-09-02 11:43 ` Daniel Jacobowitz
@ 2009-09-02 16:24   ` Joel Brobecker
  2009-09-03 18:40     ` Joel Brobecker
  0 siblings, 1 reply; 7+ messages in thread
From: Joel Brobecker @ 2009-09-02 16:24 UTC (permalink / raw)
  To: gdb

> This almost, almost works.  I can see two problems:
> 
> * Can the results of those checks ever change?  For instance, does
> "set backtrace past-main off" flush the stack frame?  If not, it won't
> take effect right away after this change.

Yes - that's one of the issues I knew about. It wasn't making sense
to redo the checks in the case of finding the frame from its ID,
but I suppose it would make sense for "bt" to stop after we "set
"backtrace past-main off".

I suppose we could immediately flush the extra frames when the setting
changes to "off", but on the other hand, I think that the right thing
to do in frame_find_by_id is to call get_prev_frame_1 anyway. I will
test that next.

> Have frame_find_by_id keep a cache with the same lifetime as
> current_frame.  If it saves the last frame it returned, we can try
> that frame and the previous frame before going on to searching from
> the current frame.

That's a good idea. It seems relatively easy to do, since I can see
that the current_frame is always reset to NULL through
reinit_frame_cache, so it should be easy to have the same lifetime
as the frame chain. I will see if I can implement that too.

-- 
Joel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: More info on PR/9711 (quadratic slowdown for deep stack traces)
  2009-09-02 16:24   ` Joel Brobecker
@ 2009-09-03 18:40     ` Joel Brobecker
  0 siblings, 0 replies; 7+ messages in thread
From: Joel Brobecker @ 2009-09-03 18:40 UTC (permalink / raw)
  To: gdb

> I suppose we could immediately flush the extra frames when the setting
> changes to "off", but on the other hand, I think that the right thing
> to do in frame_find_by_id is to call get_prev_frame_1 anyway. I will
> test that next.

Just for the record, this turned out to be a bad idea.  In the case
when the previous frames had not been computed, we might end up
trying to unwind past main.  This causes a few regressions in one
of gdb.mi testcases. I didn't investigate precisely what was happening
since the suggestion became pretty much irrelevant when I implemented
the cache. See:
   http://www.sourceware.org/ml/gdb-patches/2009-09/msg00085.html

-- 
Joel

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-09-03 18:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-01 20:48 More info on PR/9711 (quadratic slowdown for deep stack traces) Joel Brobecker
2009-09-02  3:42 ` Paul Pluzhnikov
2009-09-02  4:52   ` Joel Brobecker
2009-09-02  4:57     ` Paul Pluzhnikov
2009-09-02 11:43 ` Daniel Jacobowitz
2009-09-02 16:24   ` Joel Brobecker
2009-09-03 18:40     ` Joel Brobecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox