[RFA] use frame IDs to detect function calls while stepping

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

* [RFA] use frame IDs to detect function calls while stepping
@ 2004-02-05  4:41 Joel Brobecker
  2004-02-05 17:13 ` Joel Brobecker
  0 siblings, 1 reply; 23+ messages in thread
From: Joel Brobecker @ 2004-02-05  4:41 UTC (permalink / raw)
  To: gdb-patches

Hello,

This is a followup on the discussion that took place in the following
thread:

    [RFA] OSF/1 - "next" over prologueless function call
    http://sources.redhat.com/ml/gdb-patches/2003-12/msg00037.html

During the discussion, it appeared that it was better to rely on
frame IDs to detect cases when we stepped inside a function rather
than further adjusting the test that checks whether we landed at
the begining of a function or not.

After a bit of testing on various platforms, I found that only relying
on a test that checks the ID of frame #1 against the step_frame_id was
not sufficient, unfortunately. The sparc-solaris testing revealed 2
regressions.

After careful analysis of the regressions and a bit of dicussion
with Andrew, here is what we found:

  1. We sometimes step levels of function calls down from the point
     where we started stepping. This is to get past the dynsym
     resolve code. So once we get more than one level deep, the
     frame ID test can no longer work.

     That was regression #1.

  2. We have a testcase where we try to "next" our way out of a function
     for which we have no line number information. The expected output
     was to run until the end of the program. But instead we stopped
     before.

     It turned out that we were landing inside a shared library
     trampoline after having left the function we were in, so again
     the frame ID check didn't kick in. We didn't know what to do,
     simply stopped there.

     That was regression #2.

Given the current implementation, and the analysis of the regressions,
we determined that the logic should be something like this:

  . If we landed in undebuggable code (identified by lack of symbol
    name), and we're stepping over this kind of code, then:

        Run our way out of the function.

  . If we are in a shlib call trampoline, then:

        Likewise.
        (This test was already part of the previous check, BTW)

  . If we are in a function called from the function where we started
    stepping, as identified by frame ID unwind, then:

        Likewise.

I tried it and no longer had any regression.

2004-02-05  J. Brobecker  <brobecker@gnat.com>

        * infrun.c (handle_inferior_event): Rewrite the checks detecting
        cases when we stepped inside a function.

Tested on alpha-tru64, pa-hpux, sparc-solaris and x86-linux.
And it also fixes the problem I originally reported.

OK to apply?

Thanks,
-- 
Joel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFA] use frame IDs to detect function calls while stepping
  2004-02-05  4:41 [RFA] use frame IDs to detect function calls while stepping Joel Brobecker
@ 2004-02-05 17:13 ` Joel Brobecker
  2004-02-05 18:54   ` Elena Zannoni
  2004-02-05 19:01   ` Daniel Jacobowitz
  0 siblings, 2 replies; 23+ messages in thread
From: Joel Brobecker @ 2004-02-05 17:13 UTC (permalink / raw)
  To: gdb-patches

[-- Attachment #1: Type: text/plain, Size: 2551 bytes --]

[aghaaaaa .... With the patch this time, with my thanks to Elena and Daniel]

Hello,

This is a followup on the discussion that took place in the following
thread:

    [RFA] OSF/1 - "next" over prologueless function call
    http://sources.redhat.com/ml/gdb-patches/2003-12/msg00037.html

During the discussion, it appeared that it was better to rely on
frame IDs to detect cases when we stepped inside a function rather
than further adjusting the test that checks whether we landed at
the begining of a function or not.

After a bit of testing on various platforms, I found that only relying
on a test that checks the ID of frame #1 against the step_frame_id was
not sufficient, unfortunately. The sparc-solaris testing revealed 2
regressions.

After careful analysis of the regressions and a bit of dicussion
with Andrew, here is what we found:

  1. We sometimes step levels of function calls down from the point
     where we started stepping. This is to get past the dynsym
     resolve code. So once we get more than one level deep, the
     frame ID test can no longer work.
     
     That was regression #1.

  2. We have a testcase where we try to "next" our way out of a function
     for which we have no line number information. The expected output
     was to run until the end of the program. But instead we stopped
     before.

     It turned out that we were landing inside a shared library
     trampoline after having left the function we were in, so again
     the frame ID check didn't kick in. We didn't know what to do,
     simply stopped there.

     That was regression #2.

Given the current implementation, and the analysis of the regressions,
we determined that the logic should be something like this:

  . If we landed in undebuggable code (identified by lack of symbol
    name), and we're stepping over this kind of code, then:
    
        Run our way out of the function.
  
  . If we are in a shlib call trampoline, then:

        Likewise.
        (This test was already part of the previous check, BTW)

  . If we are in a function called from the function where we started
    stepping, as identified by frame ID unwind, then:

        Likewise.

I tried it and no longer had any regression.

2004-02-05  J. Brobecker  <brobecker@gnat.com>

        * infrun.c (handle_inferior_event): Rewrite the checks detecting
        cases when we stepped inside a function.

Tested on alpha-tru64, pa-hpux, sparc-solaris and x86-linux.
And it also fixes the problem I originally reported.

OK to apply?

Thanks,
-- 
Joel

[-- Attachment #2: infrun.c.diff --]
[-- Type: text/plain, Size: 2945 bytes --]

Index: infrun.c
===================================================================
RCS file: /cvs/src/src/gdb/infrun.c,v
retrieving revision 1.134
diff -u -p -r1.134 infrun.c
--- infrun.c	1 Feb 2004 18:05:09 -0000	1.134
+++ infrun.c	4 Feb 2004 17:35:39 -0000
@@ -2482,6 +2482,17 @@ process_event_stop_test:
       return;
     }
 
+  if (ecs->stop_func_name == NULL
+      && step_over_calls == STEP_OVER_UNDEBUGGABLE)
+    {
+      /* We couldn't determine where we stopped, so we just stepped
+         inside undebuggable code.  Since we want to step over this
+         kind of code, we keep going until the inferior returns from
+         the current function.  */
+      handle_step_into_function (ecs);
+      return;
+    }
+
   /* We can't update step_sp every time through the loop, because
      reading the stack pointer would slow down stepping too much.
      But we can update it every time we leave the step range.  */
@@ -2571,15 +2582,44 @@ process_event_stop_test:
       return;
     }
 
-  if (((stop_pc == ecs->stop_func_start	/* Quick test */
-	|| in_prologue (stop_pc, ecs->stop_func_start))
-       && !IN_SOLIB_RETURN_TRAMPOLINE (stop_pc, ecs->stop_func_name))
-      || IN_SOLIB_CALL_TRAMPOLINE (stop_pc, ecs->stop_func_name)
-      || ecs->stop_func_name == 0)
+  if (legacy_frame_p (current_gdbarch))
     {
-      /* It's a subroutine call.  */
-      handle_step_into_function (ecs);
-      return;
+      /* FIXME: brobecker/2004-02-04: The current architecture is still
+         using the legacy frame code, so we prefer not to rely on frame IDs
+         to check whether we just stepped into a function or not.  Some
+         experiments conducted on sparc-solaris before it was converted
+         to the new frame code showed that it could introduce some
+         severe problems.  Once all targets have transitioned to the new
+         frame code, this block can be deleted.  */
+      if (((stop_pc == ecs->stop_func_start	/* Quick test */
+            || in_prologue (stop_pc, ecs->stop_func_start))
+           && !IN_SOLIB_RETURN_TRAMPOLINE (stop_pc, ecs->stop_func_name))
+          || IN_SOLIB_CALL_TRAMPOLINE (stop_pc, ecs->stop_func_name)
+          || ecs->stop_func_name == 0)
+        {
+          /* It's a subroutine call.  */
+          handle_step_into_function (ecs);
+          return;
+        }
+    }
+  else
+    {
+      if (IN_SOLIB_CALL_TRAMPOLINE (stop_pc, ecs->stop_func_name))
+        {
+          /* We landed in a shared library call trampoline, so it
+             is a subroutine call.  */
+          handle_step_into_function (ecs);
+          return;
+        }
+
+      if (frame_id_eq (get_frame_id (get_prev_frame (get_current_frame ())),
+                       step_frame_id))
+        {
+          /* It's a subroutine call.  */
+          handle_step_into_function (ecs);
+          return;
+        }
+
     }
 
   /* We've wandered out of the step range.  */

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFA] use frame IDs to detect function calls while stepping
  2004-02-05 17:13 ` Joel Brobecker
@ 2004-02-05 18:54   ` Elena Zannoni
  2004-02-07  4:01     ` Joel Brobecker
  2004-02-05 19:01   ` Daniel Jacobowitz
  1 sibling, 1 reply; 23+ messages in thread
From: Elena Zannoni @ 2004-02-05 18:54 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: gdb-patches

Joel Brobecker writes:
 > [aghaaaaa .... With the patch this time, with my thanks to Elena and Daniel]
 > 
 > Hello,
 > 
 > This is a followup on the discussion that took place in the following
 > thread:
 > 
 >     [RFA] OSF/1 - "next" over prologueless function call
 >     http://sources.redhat.com/ml/gdb-patches/2003-12/msg00037.html
 > 
 > During the discussion, it appeared that it was better to rely on
 > frame IDs to detect cases when we stepped inside a function rather
 > than further adjusting the test that checks whether we landed at
 > the begining of a function or not.
 > 
 > After a bit of testing on various platforms, I found that only relying
 > on a test that checks the ID of frame #1 against the step_frame_id was
 > not sufficient, unfortunately. The sparc-solaris testing revealed 2
 > regressions.
 > 
 > After careful analysis of the regressions and a bit of dicussion
 > with Andrew, here is what we found:
 > 
 >   1. We sometimes step levels of function calls down from the point
 >      where we started stepping. This is to get past the dynsym
 >      resolve code. So once we get more than one level deep, the
 >      frame ID test can no longer work.
 >      
 >      That was regression #1.
 > 
 >   2. We have a testcase where we try to "next" our way out of a function
 >      for which we have no line number information. The expected output
 >      was to run until the end of the program. But instead we stopped
 >      before.
 > 
 >      It turned out that we were landing inside a shared library
 >      trampoline after having left the function we were in, so again
 >      the frame ID check didn't kick in. We didn't know what to do,
 >      simply stopped there.
 > 
 >      That was regression #2.
 > 
 > Given the current implementation, and the analysis of the regressions,
 > we determined that the logic should be something like this:
 > 
 >   . If we landed in undebuggable code (identified by lack of symbol
 >     name), and we're stepping over this kind of code, then:
 >     
 >         Run our way out of the function.
 >   

Could this kind of logic be handled inside handle_step_into_function
(which already checks for UNDEBUGGABLE)?  I.e. is this case caught by
the complex condition in the old frame case which makes you call
handle_step_into_function? For the new frame case, this is regression
#1,  did you have this bug/regression with the old frame code?


 >   . If we are in a shlib call trampoline, then:
 > 
 >         Likewise.
 >         (This test was already part of the previous check, BTW)
 > 
 >   . If we are in a function called from the function where we started
 >     stepping, as identified by frame ID unwind, then:
 > 
 >         Likewise.
 > 
 > I tried it and no longer had any regression.
 > 

I think the explanations above should go in the function as comments.


 > +  else
 > +    {
 > +      if (IN_SOLIB_CALL_TRAMPOLINE (stop_pc, ecs->stop_func_name))
 > +        {
 > +          /* We landed in a shared library call trampoline, so it
 > +             is a subroutine call.  */
 > +          handle_step_into_function (ecs);
 > +          return;
 > +        }


I am not sure I understand why the case above is not covered by the
test below.  Is this to fix regression #1? I.e multiple frames? 

 > +
 > +      if (frame_id_eq (get_frame_id (get_prev_frame (get_current_frame ())),
 > +                       step_frame_id))
 > +        {
 > +          /* It's a subroutine call.  */
 > +          handle_step_into_function (ecs);
 > +          return;
 > +        }
 > +
 >      }
 >  
 >    /* We've wandered out of the step range.  */


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFA] use frame IDs to detect function calls while stepping
  2004-02-05 17:13 ` Joel Brobecker
  2004-02-05 18:54   ` Elena Zannoni
@ 2004-02-05 19:01   ` Daniel Jacobowitz
  2004-02-05 19:23     ` Elena Zannoni
  1 sibling, 1 reply; 23+ messages in thread
From: Daniel Jacobowitz @ 2004-02-05 19:01 UTC (permalink / raw)
  To: gdb-patches

On Thu, Feb 05, 2004 at 09:13:24PM +0400, Joel Brobecker wrote:
> +  if (ecs->stop_func_name == NULL
> +      && step_over_calls == STEP_OVER_UNDEBUGGABLE)
> +    {
> +      /* We couldn't determine where we stopped, so we just stepped
> +         inside undebuggable code.  Since we want to step over this
> +         kind of code, we keep going until the inferior returns from
> +         the current function.  */

The test and the comment don't seem to match.  Code with only minimal
symbols will still set stop_func_name.

> +      if (IN_SOLIB_CALL_TRAMPOLINE (stop_pc, ecs->stop_func_name))
> +        {
> +          /* We landed in a shared library call trampoline, so it
> +             is a subroutine call.  */
> +          handle_step_into_function (ecs);
> +          return;
> +        }
> +
> +      if (frame_id_eq (get_frame_id (get_prev_frame (get_current_frame ())),
> +                       step_frame_id))
> +        {
> +          /* It's a subroutine call.  */
> +          handle_step_into_function (ecs);
> +          return;
> +        }
> +

get_prev_frame can return NULL.  In fact, it generally does in main.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFA] use frame IDs to detect function calls while stepping
  2004-02-05 19:01   ` Daniel Jacobowitz
@ 2004-02-05 19:23     ` Elena Zannoni
  2004-02-05 19:49       ` Daniel Jacobowitz
  0 siblings, 1 reply; 23+ messages in thread
From: Elena Zannoni @ 2004-02-05 19:23 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb-patches

Daniel Jacobowitz writes:

 > > +      if (IN_SOLIB_CALL_TRAMPOLINE (stop_pc, ecs->stop_func_name))
 > > +        {
 > > +          /* We landed in a shared library call trampoline, so it
 > > +             is a subroutine call.  */
 > > +          handle_step_into_function (ecs);
 > > +          return;
 > > +        }
 > > +
 > > +      if (frame_id_eq (get_frame_id (get_prev_frame (get_current_frame ())),
 > > +                       step_frame_id))
 > > +        {
 > > +          /* It's a subroutine call.  */
 > > +          handle_step_into_function (ecs);
 > > +          return;
 > > +        }
 > > +
 > 
 > get_prev_frame can return NULL.  In fact, it generally does in main.
 > 

I don't think it matters, frame_id_eq will handle a comparison with
null_frame_id.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFA] use frame IDs to detect function calls while stepping
  2004-02-05 19:23     ` Elena Zannoni
@ 2004-02-05 19:49       ` Daniel Jacobowitz
  2004-02-09 19:21         ` Andrew Cagney
  0 siblings, 1 reply; 23+ messages in thread
From: Daniel Jacobowitz @ 2004-02-05 19:49 UTC (permalink / raw)
  To: gdb-patches

On Thu, Feb 05, 2004 at 02:20:10PM -0500, Elena Zannoni wrote:
> Daniel Jacobowitz writes:
> 
>  > > +      if (IN_SOLIB_CALL_TRAMPOLINE (stop_pc, ecs->stop_func_name))
>  > > +        {
>  > > +          /* We landed in a shared library call trampoline, so it
>  > > +             is a subroutine call.  */
>  > > +          handle_step_into_function (ecs);
>  > > +          return;
>  > > +        }
>  > > +
>  > > +      if (frame_id_eq (get_frame_id (get_prev_frame (get_current_frame ())),
>  > > +                       step_frame_id))
>  > > +        {
>  > > +          /* It's a subroutine call.  */
>  > > +          handle_step_into_function (ecs);
>  > > +          return;
>  > > +        }
>  > > +
>  > 
>  > get_prev_frame can return NULL.  In fact, it generally does in main.
>  > 
> 
> I don't think it matters, frame_id_eq will handle a comparison with
> null_frame_id.

Oh, you're right - I was expecting get_frame_id (NULL) to be a problem,
but it isn't.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFA] use frame IDs to detect function calls while stepping
  2004-02-05 18:54   ` Elena Zannoni
@ 2004-02-07  4:01     ` Joel Brobecker
  2004-02-27 15:23       ` Andrew Cagney
  0 siblings, 1 reply; 23+ messages in thread
From: Joel Brobecker @ 2004-02-07  4:01 UTC (permalink / raw)
  To: Elena Zannoni; +Cc: gdb-patches

Elena, Daniel,

> Could this kind of logic be handled inside handle_step_into_function
> (which already checks for UNDEBUGGABLE)?

Hmmm, I realize I forgot to mention that I'd like to eventually split
handle_step_into_function() into several smaller and simpler functions.
That is: Now that I have divided a large condition into smaller ones,
we might be able to write a smaller function dedicated to each case
the smaller tests handle.

This is only a wish, as I haven't had time to look more closely into
this. But I wanted to keep this option open, which is why I used the
UNDEBUGGABLE check in addition to the stop_func_name one.

> I.e. is this case caught by the complex condition in the old frame
> case which makes you call handle_step_into_function? For the new frame
> case, this is regression #1,  did you have this bug/regression with
> the old frame code?

With the old frame code (and hence using the complex condition), the
test identified by regression #1 was PASSing. The reason why it was
passing was because ecs->stop_func_name == NULL is part of the complex
condition.

This also emphasizes the fact that I slightly modified the handling of
inferior events when I added the STEP_OVER_UNDEBUGGABLE check. This
check wasn't there before (in the complex condition), and perhaps
we would all feel less nervous if I removed it, and then moved the
stop_func_name check next to the other new checks.

> > +  if (ecs->stop_func_name == NULL
> > +      && step_over_calls == STEP_OVER_UNDEBUGGABLE)
> > +    {
> > +      /* We couldn't determine where we stopped, so we just stepped
> > +         inside undebuggable code.  Since we want to step over this
> > +         kind of code, we keep going until the inferior returns from
> > +         the current function.  */
> 
> The test and the comment don't seem to match.  Code with only minimal
> Symbols will still set stop_func_name.

Perhaps the comment is a bit awkward, because I really meant that we
don't have any symbol (not even minimal) to determine where we landed.

        /* There is no symbol, not even a minimal symbol, corresponding
           to the address where we just stopped.  So we just stepped
           inside undebuggable code.  Since we want to step over this
           kind of code, we keep going until the inferior returns from
           the current function.  */

>  >   . If we are in a shlib call trampoline, then:
>  > 
>  >         Likewise.
>  >         (This test was already part of the previous check, BTW)
>  > 
>  >   . If we are in a function called from the function where we started
>  >     stepping, as identified by frame ID unwind, then:
>  > 
>  >         Likewise.
>  > 
>  > I tried it and no longer had any regression.
>  > 
> 
> I think the explanations above should go in the function as comments.

That's what I tried to do in the comments I added. Which part would
you like to see added, and where would you add it?

>  > +  else
>  > +    {
>  > +      if (IN_SOLIB_CALL_TRAMPOLINE (stop_pc, ecs->stop_func_name))
>  > +        {
>  > +          /* We landed in a shared library call trampoline, so it
>  > +             is a subroutine call.  */
>  > +          handle_step_into_function (ecs);
>  > +          return;
>  > +        }
> 
> 
> I am not sure I understand why the case above is not covered by the
> test below.  Is this to fix regression #1? I.e multiple frames? 

Hmmm, good question...

Although it does fix regression #2, it is not the only reason why we
added this test. It was also based on logic (see "After ... here is
what we found", in my previous message).

I should admit that in the case at hand (regression #2), the unwinder
is having a hard time unwinding out of this code, and causes the
frame ID check to fail. I don't remember seeing several levels of
function call.

However, I still thought that this test was necessary because we could
just as well have reached this trampoline one or more levels of function
call down, just as we end up stepping into undebuggable code in
regression #1.

-- 
Joel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFA] use frame IDs to detect function calls while stepping
  2004-02-05 19:49       ` Daniel Jacobowitz
@ 2004-02-09 19:21         ` Andrew Cagney
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Cagney @ 2004-02-09 19:21 UTC (permalink / raw)
  To: Daniel Jacobowitz, Joel Brobecker, Elena Zannoni; +Cc: gdb-patches

> On Thu, Feb 05, 2004 at 02:20:10PM -0500, Elena Zannoni wrote:
> 
>> Daniel Jacobowitz writes:
>> 
>>  > > +      if (IN_SOLIB_CALL_TRAMPOLINE (stop_pc, ecs->stop_func_name))
>>  > > +        {
>>  > > +          /* We landed in a shared library call trampoline, so it
>>  > > +             is a subroutine call.  */
>>  > > +          handle_step_into_function (ecs);
>>  > > +          return;
>>  > > +        }
>>  > > +
>>  > > +      if (frame_id_eq (get_frame_id (get_prev_frame (get_current_frame ())),
>>  > > +                       step_frame_id))
>>  > > +        {
>>  > > +          /* It's a subroutine call.  */
>>  > > +          handle_step_into_function (ecs);
>>  > > +          return;
>>  > > +        }
>>  > > +
>>  > 
>>  > get_prev_frame can return NULL.  In fact, it generally does in main.
>>  > 
>> 
>> I don't think it matters, frame_id_eq will handle a comparison with
>> null_frame_id.
> 
> 
> Oh, you're right - I was expecting get_frame_id (NULL) to be a problem,
> but it isn't.

Hmm, I suspect there's a latent bug here.  In this context, when at 
main, get_prev_frame should unconditionally return main's caller.  Even 
when backtrace-beyond-main is false.

This suggests that some of the conditions found in get_prev_frame 
(namely in_entry_func and in_main_func) should be separated out.

However, that can be sorted later.

Andrew



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFA] use frame IDs to detect function calls while stepping
  2004-02-07  4:01     ` Joel Brobecker
@ 2004-02-27 15:23       ` Andrew Cagney
  2004-03-19  0:09         ` Joel Brobecker
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Cagney @ 2004-02-27 15:23 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: Elena Zannoni, gdb-patches


>>>  > +  else
>>>  > +    {
>>>  > +      if (IN_SOLIB_CALL_TRAMPOLINE (stop_pc, ecs->stop_func_name))
>>>  > +        {
>>>  > +          /* We landed in a shared library call trampoline, so it
>>>  > +             is a subroutine call.  */
>>>  > +          handle_step_into_function (ecs);
>>>  > +          return;
>>>  > +        }
>>> 
>>> 
>>> I am not sure I understand why the case above is not covered by the
>>> test below.  Is this to fix regression #1? I.e multiple frames? 
> 
> 
> Hmmm, good question...
> 
> Although it does fix regression #2, it is not the only reason why we
> added this test. It was also based on logic (see "After ... here is
> what we found", in my previous message).
> 
> I should admit that in the case at hand (regression #2), the unwinder
> is having a hard time unwinding out of this code, and causes the
> frame ID check to fail. I don't remember seeing several levels of
> function call.
> 
> However, I still thought that this test was necessary because we could
> just as well have reached this trampoline one or more levels of function
> call down, just as we end up stepping into undebuggable code in
> regression #1.

I'd not noticed this issue.  Hmm, if GDB's single stepping then the 
second test should cover this case.  It's when GDB is free running that 
we might find ourselves stopped IN_SOLIB_CALL_TRAMPOLINE.  If it is the 
latter case then I'm not sure that silently single stepping away from 
where the program stopped is being helpful.

Can you try the testsuite without that check?  If the results are ok 
then (with other changes) commit it.  If its not we need to re-think 
whats happening :-(  Yes, this will mean it goes into 6.1.

Andrew



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFA] use frame IDs to detect function calls while stepping
  2004-03-19  0:09         ` Joel Brobecker
@ 2004-03-01 19:48           ` Joel Brobecker
  2004-03-19  0:09           ` Joel Brobecker
  1 sibling, 0 replies; 23+ messages in thread
From: Joel Brobecker @ 2004-03-01 19:48 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: Elena Zannoni, gdb-patches

> >>> > +      if (IN_SOLIB_CALL_TRAMPOLINE (stop_pc, ecs->stop_func_name))
> >>> > +        {
> >>> > +          /* We landed in a shared library call trampoline, so it
> >>> > +             is a subroutine call.  */
> >>> > +          handle_step_into_function (ecs);
> >>> > +          return;
> >>> > +        }
> >>>
> >>>
> >>>I am not sure I understand why the case above is not covered by the
> >>>test below.  Is this to fix regression #1? I.e multiple frames? 
> >
> >
> >Hmmm, good question...
> >
> >Although it does fix regression #2, it is not the only reason why we
> >added this test. It was also based on logic (see "After ... here is
> >what we found", in my previous message).
> >
> >I should admit that in the case at hand (regression #2), the unwinder
> >is having a hard time unwinding out of this code, and causes the
> >frame ID check to fail. I don't remember seeing several levels of
> >function call.
> >
> >However, I still thought that this test was necessary because we could
> >just as well have reached this trampoline one or more levels of function
> >call down, just as we end up stepping into undebuggable code in
> >regression #1.
> 
> I'd not noticed this issue.  Hmm, if GDB's single stepping then the 
> second test should cover this case.  It's when GDB is free running that 
> we might find ourselves stopped IN_SOLIB_CALL_TRAMPOLINE.  If it is the 
> latter case then I'm not sure that silently single stepping away from 
> where the program stopped is being helpful.
> 
> Can you try the testsuite without that check?  If the results are ok 
> then (with other changes) commit it.  If its not we need to re-think 
> whats happening :-(  Yes, this will mean it goes into 6.1.

Yes, there is a regression (regression #2 in my previous message) if
we leave that test out. It's been a while since I posted that patch,
so I don't remember the details anymore :-/. I'll dig again later today.

I did rerun the testsuite without it to double-check that my
recollection was right. And I also ran the testsuite with the frame_id
patch you recently posted, hoping that it might solve the extra
regression. Unfortunately, I am sorry to report that it actually
introduced another 5 or 6 regressions...

On the other hand, as soon as I add the check back, we're down to
zero regression (that is, with your frame_id patch as well).

More details later today about the failing test when the test above
is removed.

-- 
Joel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFA] use frame IDs to detect function calls while stepping
  2004-03-19  0:09           ` Joel Brobecker
@ 2004-03-01 23:52             ` Joel Brobecker
  2004-03-02  6:16             ` Joel Brobecker
  1 sibling, 0 replies; 23+ messages in thread
From: Joel Brobecker @ 2004-03-01 23:52 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: Elena Zannoni, gdb-patches

Ok, I'm dark red with shame.

> Yes, there is a regression (regression #2 in my previous message) if
> we leave that test out. It's been a while since I posted that patch,
> so I don't remember the details anymore :-/. I'll dig again later today.

It looks like my initial analysis of the cause of regression #2 was
wrong. I will simply say that I was mislead by the fact that GDB
reported that we stopped in "_PROCEDURE_LINKAGE_TABLE_" while we
were actually inside function "exit", and let it go at that.

Here is a description of the regression which appears if we don't
include the following test:

+      if (IN_SOLIB_CALL_TRAMPOLINE (stop_pc, ecs->stop_func_name))
+        {
+          /* We landed in a shared library call trampoline, so it
+             is a subroutine call.  */
+          handle_step_into_function (ecs);
+          return;
+        }

The problem occurs with ending-run:

        % gdb ending-run
        (gdb) b ending-run.c:33
        Breakpoint 1 at 0x10828: file gdb.base/ending-run.c, line 33.
        (gdb) run
        Starting program: /[...]/gdb.base/ending-run 
        -1 2 7 14 23 34 47 62 79  Goodbye!

        Breakpoint 1, main () at gdb.base/ending-run.c:33
        33      }
        (gdb) n
        0x000105ec in _start ()
        (gdb) n
        Single stepping until exit from function _start, 
        which has no line number information.
        0x00020950 in _PROCEDURE_LINKAGE_TABLE_ ()

The expected behavior for the last "next" command is for GDB
to run until the inferior exits:

        (gdb) n
        Single stepping until exit from function _start, 
        which has no line number information.

        Program exited normally.

Unfortunately, here is what happens. At 0x000105ec, before we do
our second "next" command, we are about to execute the following
code:

        0x000105ec <_start+100>:        call  0x20950 <exit>
        0x000105f0 <_start+104>:        nop 

After two iterations (one for the call insn, and one for the delay
slot), GDB lands at the begining of function "exit" at 0x00020950,
which is:

        0x00020950 <exit+0>:    sethi  %hi(0xf000), %g1
        0x00020954 <exit+4>:    b,a   0x20914 <_PROCEDURE_LINKAGE_TABLE_>
        0x00020958 <exit+8>:    nop 

So at this point, the registers window has not been rotated.
I don't know if this is the cause for this problem, but at this
point GDB is unable to unwind the call stack:

        (gdb) bt
        #0  0x00020950 in _PROCEDURE_LINKAGE_TABLE_ ()

(And gets the wrong procedure name as well, but that's a separate
issue - although "x /i" does report what I believe is the correct
name, strange!).

I am looking into the sparc unwinder code right now, to try to
understand a bit better the source of the problem.

-- 
Joel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFA] use frame IDs to detect function calls while stepping
  2004-03-19  0:09           ` Joel Brobecker
  2004-03-01 23:52             ` Joel Brobecker
@ 2004-03-02  6:16             ` Joel Brobecker
  2004-03-03 21:12               ` Mark Kettenis
                                 ` (2 more replies)
  1 sibling, 3 replies; 23+ messages in thread
From: Joel Brobecker @ 2004-03-02  6:16 UTC (permalink / raw)
  To: Andrew Cagney, kettenis; +Cc: Elena Zannoni, gdb-patches

> Here is a description of the regression which appears if we don't
> include the following test:
> 
> +      if (IN_SOLIB_CALL_TRAMPOLINE (stop_pc, ecs->stop_func_name))
> +        {
> +          /* We landed in a shared library call trampoline, so it
> +             is a subroutine call.  */
> +          handle_step_into_function (ecs);
> +          return;
> +        }
> 
> The problem occurs with ending-run:
> 
>         % gdb ending-run
>         (gdb) b ending-run.c:33
>         Breakpoint 1 at 0x10828: file gdb.base/ending-run.c, line 33.
>         (gdb) run
>         Starting program: /[...]/gdb.base/ending-run 
>         -1 2 7 14 23 34 47 62 79  Goodbye!
>         
>         Breakpoint 1, main () at gdb.base/ending-run.c:33
>         33      }
>         (gdb) n
>         0x000105ec in _start ()
>         (gdb) n
>         Single stepping until exit from function _start, 
>         which has no line number information.
>         0x00020950 in _PROCEDURE_LINKAGE_TABLE_ ()
> 
> The expected behavior for the last "next" command is for GDB
> to run until the inferior exits:
> 
>         (gdb) n
>         Single stepping until exit from function _start, 
>         which has no line number information.
>         
>         Program exited normally.
> 
> Unfortunately, here is what happens. At 0x000105ec, before we do
> our second "next" command, we are about to execute the following
> code:
> 
>         0x000105ec <_start+100>:        call  0x20950 <exit>
>         0x000105f0 <_start+104>:        nop 
> 
> After two iterations (one for the call insn, and one for the delay
> slot), GDB lands at the begining of function "exit" at 0x00020950,
> which is:
> 
>         0x00020950 <exit+0>:    sethi  %hi(0xf000), %g1
>         0x00020954 <exit+4>:    b,a   0x20914 <_PROCEDURE_LINKAGE_TABLE_>
>         0x00020958 <exit+8>:    nop 
> 
> So at this point, the registers window has not been rotated.
> I don't know if this is the cause for this problem, but at this
> point GDB is unable to unwind the call stack:
> 
>         (gdb) bt
>         #0  0x00020950 in _PROCEDURE_LINKAGE_TABLE_ ()
> 
> (And gets the wrong procedure name as well, but that's a separate
> issue - although "x /i" does report what I believe is the correct
> name, strange!).
> 
> I am looking into the sparc unwinder code right now, to try to
> understand a bit better the source of the problem.

I think I found the source of the glitch. I may have the solution
to fix it, but my little finger is telling that it might be a bit
too extreme... Maybe MarkK has some comments about this?

What happens is that, at the point when we reach function "exit",
the FP register is null:

        (gdb) p /x $fp
        $2 = 0x0

The sparc unwinder in sparc_frame_cache() detects this, thinks there is
something wrong, and aborts early. So, we never unwind the "_start"
frame, and hence the following frame ID check doesn't notice the
function call, as it should have in this case:

+      if (frame_id_eq (get_frame_id (get_prev_frame (get_current_frame ())),
+                       step_frame_id))
+        {
+          /* It's a subroutine call.  */
+          handle_step_into_function (ecs);
+          return;
+        }

With this example in mind, it seemed to me that the assertion that %fp
register is not null is unfortunately incorrect. Given that the rest of
the code in sparc_frame_cache() wasn't using the value of that register,
I commented out the assertion, and retried.

<<
@@ -620,8 +620,8 @@ sparc_frame_cache (struct frame_info *ne
      frame.  */
 
   cache->base = frame_unwind_register_unsigned (next_frame, SPARC_FP_REGNUM);
-  if (cache->base == 0)
-    return cache;
+  // if (cache->base == 0)
+  //   return cache;
 
   cache->pc = frame_func_unwind (next_frame);
   if (cache->pc != 0)
>>

That causes the problem above to disappear. I even went as far as
to run the testsuite: No change (apart from fixing the regression
I observed).

sparc_frame_cache() seems well designed to handle null %fp registers.
It doesn't use its value when scanning the prologue, and then knows
to use %sp in its place as the frame base. So the frame should be
correctly unwound.

Comments?
-- 
Joel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFA] use frame IDs to detect function calls while stepping
  2004-03-19  0:09               ` Andrew Cagney
@ 2004-03-02 15:48                 ` Andrew Cagney
  2004-03-19  0:09                 ` Joel Brobecker
  1 sibling, 0 replies; 23+ messages in thread
From: Andrew Cagney @ 2004-03-02 15:48 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: kettenis, Elena Zannoni, gdb-patches

>>After two iterations (one for the call insn, and one for the delay
>>> slot), GDB lands at the begining of function "exit" at 0x00020950,
>>> which is:
>>> 
>>>         0x00020950 <exit+0>:    sethi  %hi(0xf000), %g1
>>>         0x00020954 <exit+4>:    b,a   0x20914 <_PROCEDURE_LINKAGE_TABLE_>
>>>         0x00020958 <exit+8>:    nop 
>>> 
>>> So at this point, the registers window has not been rotated.
>>> I don't know if this is the cause for this problem, but at this
>>> point GDB is unable to unwind the call stack:
>>> 
>>>         (gdb) bt
>>>         #0  0x00020950 in _PROCEDURE_LINKAGE_TABLE_ ()
>>> 
>>> (And gets the wrong procedure name as well, but that's a separate
>>> issue - although "x /i" does report what I believe is the correct
>>> name, strange!).
>>> 
>>> I am looking into the sparc unwinder code right now, to try to
>>> understand a bit better the source of the problem.
> 
> 
> I think I found the source of the glitch. I may have the solution
> to fix it, but my little finger is telling that it might be a bit
> too extreme... Maybe MarkK has some comments about this?

How about this as a compromize:

- in 6.1
your original patch (but with a comment saying that the
+      if (IN_SOLIB_CALL_TRAMPOLINE (stop_pc, ecs->stop_func_name))
is a hack and shouldn't be included in the mainline)

- in mainline:
Assuming Mark's ok with the sparc changes, your patch without that part?

This way 6.1 is robust regardless of which SPARC architecture code is in 
place.

Andrew



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFA] use frame IDs to detect function calls while stepping
  2004-03-19  0:09                 ` Joel Brobecker
@ 2004-03-02 22:07                   ` Joel Brobecker
  2004-03-06  0:15                   ` Andrew Cagney
  1 sibling, 0 replies; 23+ messages in thread
From: Joel Brobecker @ 2004-03-02 22:07 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: kettenis, Elena Zannoni, gdb-patches

> How about this as a compromize:
> 
> - in 6.1
> your original patch (but with a comment saying that the
> +      if (IN_SOLIB_CALL_TRAMPOLINE (stop_pc, ecs->stop_func_name))
> is a hack and shouldn't be included in the mainline)
> 
> - in mainline:
> Assuming Mark's ok with the sparc changes, your patch without that part?
> 
> This way 6.1 is robust regardless of which SPARC architecture code is in 
> place.

it looks reasonable, but it'd be nice to have Mark's opinion on the
unwinder problem. How about we let it sit for another day or two before
we commit anything?

Also, do we need this change in 6.1? It seemed like a pretty minor bug
that I managed to produce on Tru64 only. Or were there other occurrences
of this bug?

-- 
Joel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFA] use frame IDs to detect function calls while stepping
  2004-03-02  6:16             ` Joel Brobecker
@ 2004-03-03 21:12               ` Mark Kettenis
  2004-03-19  0:09                 ` Mark Kettenis
  2004-03-19  0:09               ` Andrew Cagney
  2004-03-19  0:09               ` Joel Brobecker
  2 siblings, 1 reply; 23+ messages in thread
From: Mark Kettenis @ 2004-03-03 21:12 UTC (permalink / raw)
  To: brobecker; +Cc: cagney, ezannoni, gdb-patches

   Date: Mon, 1 Mar 2004 22:16:42 -0800
   From: Joel Brobecker <brobecker@gnat.com>

   > The expected behavior for the last "next" command is for GDB
   > to run until the inferior exits:
   > 
   >         (gdb) n
   >         Single stepping until exit from function _start, 
   >         which has no line number information.
   >         
   >         Program exited normally.
   > 
   > Unfortunately, here is what happens. At 0x000105ec, before we do
   > our second "next" command, we are about to execute the following
   > code:
   > 
   >         0x000105ec <_start+100>:        call  0x20950 <exit>
   >         0x000105f0 <_start+104>:        nop 
   > 
   > After two iterations (one for the call insn, and one for the delay
   > slot), GDB lands at the begining of function "exit" at 0x00020950,
   > which is:
   > 
   >         0x00020950 <exit+0>:    sethi  %hi(0xf000), %g1
   >         0x00020954 <exit+4>:    b,a   0x20914 <_PROCEDURE_LINKAGE_TABLE_>
   >         0x00020958 <exit+8>:    nop 
   > 
   > So at this point, the registers window has not been rotated.
   > I don't know if this is the cause for this problem, but at this
   > point GDB is unable to unwind the call stack:
   > 
   >         (gdb) bt
   >         #0  0x00020950 in _PROCEDURE_LINKAGE_TABLE_ ()
   > 
   > (And gets the wrong procedure name as well, but that's a separate
   > issue - although "x /i" does report what I believe is the correct
   > name, strange!).

Actually, it doesn't get the name wrong.  The function "exit" lives in
a shared library.  Since this is the first time you're calling exit(),
you don't end up in the function itself, but in the PLT entry for
"exit".  This PLT entry will invoke the dynamic linker to direct the
program to the real "exit" function [1].  The PLT entry function is
part of the Procedure Linkage Table (PLT) which is usally named
_PROCEDURE_LINKAGE_TABLE.

   > I am looking into the sparc unwinder code right now, to try to
   > understand a bit better the source of the problem.

   I think I found the source of the glitch. I may have the solution
   to fix it, but my little finger is telling that it might be a bit
   too extreme... Maybe MarkK has some comments about this?

It's not "extreme" at all.  Read on...

   What happens is that, at the point when we reach function "exit",
   the FP register is null:

	   (gdb) p /x $fp
	   $2 = 0x0

   The sparc unwinder in sparc_frame_cache() detects this, thinks there is
   something wrong, and aborts early. So, we never unwind the "_start"
   frame, and hence the following frame ID check doesn't notice the
   function call, as it should have in this case:

   +      if (frame_id_eq (get_frame_id (get_prev_frame (get_current_frame ())),
   +                       step_frame_id))
   +        {
   +          /* It's a subroutine call.  */
   +          handle_step_into_function (ecs);
   +          return;
   +        }

   With this example in mind, it seemed to me that the assertion that %fp
   register is not null is unfortunately incorrect. Given that the rest of
   the code in sparc_frame_cache() wasn't using the value of that register,
   I commented out the assertion, and retried.

OK.  Here %fp == 0 is marking the outermost frame.  So what we're
having is that we're effectively calling a frameless function (the PLT
entry) from the outermost frame.  Since this function is frameless, we
shouldn't be looking at %fp, but at %sp.  However, we bail out before
we do so.  So the check is bogus!  

I'm currently testing the attached patch.  If I see no regressions,
I'll check it in.  Is it needed on the branch too?

Mark

[1] It'll also patch things up such that any future calls to exit()
will go directly to the real function.  Pointless of course for
exit(), but this mechanism is used for all calls to functions living
in a shared library.

Index: ChangeLog
from  Mark Kettenis  <kettenis@gnu.org>

	* sparc-tdep.c (sparc_frame_cache): Don't bail out if %fp is zero.
	Reorganize code a bit.

Index: sparc-tdep.c
===================================================================
RCS file: /cvs/src/src/gdb/sparc-tdep.c,v
retrieving revision 1.149
diff -u -p -r1.149 sparc-tdep.c
--- sparc-tdep.c 7 Feb 2004 20:40:35 -0000 1.149
+++ sparc-tdep.c 3 Mar 2004 21:11:29 -0000
@@ -615,14 +615,6 @@ sparc_frame_cache (struct frame_info *ne
   cache = sparc_alloc_frame_cache ();
   *this_cache = cache;

-  /* In priciple, for normal frames, %fp (%i6) holds the frame
-     pointer, which holds the base address for the current stack
-     frame.  */
-
-  cache->base = frame_unwind_register_unsigned (next_frame, SPARC_FP_REGNUM);
-  if (cache->base == 0)
-    return cache;
-
   cache->pc = frame_func_unwind (next_frame);
   if (cache->pc != 0)
     {
@@ -632,10 +624,18 @@ sparc_frame_cache (struct frame_info *ne

   if (cache->frameless_p)
     {
-      /* We didn't find a valid frame, which means that CACHE->base
-	 currently holds the frame pointer for our calling frame.  */
-      cache->base = frame_unwind_register_unsigned (next_frame,
-						    SPARC_SP_REGNUM);
+      /* This function is frameless, so %fp (%i6) holds the frame
+         pointer for our calling frame.  Use %sp (%o6) as this frame's
+         base address.  */
+      cache->base =
+	frame_unwind_register_unsigned (next_frame, SPARC_SP_REGNUM);
+    }
+  else
+    {
+      /* For normal frames, %fp (%i6) holds the frame pointer, the
+         base address for the current stack frame.  */
+      cache->base =
+	frame_unwind_register_unsigned (next_frame, SPARC_FP_REGNUM);
     }

   return cache;

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFA] use frame IDs to detect function calls while stepping
  2004-03-19  0:09                 ` Joel Brobecker
  2004-03-02 22:07                   ` Joel Brobecker
@ 2004-03-06  0:15                   ` Andrew Cagney
  2004-03-19  0:09                     ` Andrew Cagney
  1 sibling, 1 reply; 23+ messages in thread
From: Andrew Cagney @ 2004-03-06  0:15 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: kettenis, Elena Zannoni, gdb-patches

>>How about this as a compromize:
>>> 
>>> - in 6.1
>>> your original patch (but with a comment saying that the
>>> +      if (IN_SOLIB_CALL_TRAMPOLINE (stop_pc, ecs->stop_func_name))
>>> is a hack and shouldn't be included in the mainline)
>>> 
>>> - in mainline:
>>> Assuming Mark's ok with the sparc changes, your patch without that part?
>>> 
>>> This way 6.1 is robust regardless of which SPARC architecture code is in 
>>> place.
> 
> 
> it looks reasonable, but it'd be nice to have Mark's opinion on the
> unwinder problem. How about we let it sit for another day or two before
> we commit anything?
> 
> Also, do we need this change in 6.1? It seemed like a pretty minor bug
> that I managed to produce on Tru64 only. Or were there other occurrences
> of this bug?

So mainline only it is .... (For safety, I was testing my frame 
unwinders with this applied).

Andrew



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFA] use frame IDs to detect function calls while stepping
  2004-03-02  6:16             ` Joel Brobecker
  2004-03-03 21:12               ` Mark Kettenis
@ 2004-03-19  0:09               ` Andrew Cagney
  2004-03-02 15:48                 ` Andrew Cagney
  2004-03-19  0:09                 ` Joel Brobecker
  2004-03-19  0:09               ` Joel Brobecker
  2 siblings, 2 replies; 23+ messages in thread
From: Andrew Cagney @ 2004-03-19  0:09 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: kettenis, Elena Zannoni, gdb-patches

>>After two iterations (one for the call insn, and one for the delay
>>> slot), GDB lands at the begining of function "exit" at 0x00020950,
>>> which is:
>>> 
>>>         0x00020950 <exit+0>:    sethi  %hi(0xf000), %g1
>>>         0x00020954 <exit+4>:    b,a   0x20914 <_PROCEDURE_LINKAGE_TABLE_>
>>>         0x00020958 <exit+8>:    nop 
>>> 
>>> So at this point, the registers window has not been rotated.
>>> I don't know if this is the cause for this problem, but at this
>>> point GDB is unable to unwind the call stack:
>>> 
>>>         (gdb) bt
>>>         #0  0x00020950 in _PROCEDURE_LINKAGE_TABLE_ ()
>>> 
>>> (And gets the wrong procedure name as well, but that's a separate
>>> issue - although "x /i" does report what I believe is the correct
>>> name, strange!).
>>> 
>>> I am looking into the sparc unwinder code right now, to try to
>>> understand a bit better the source of the problem.
> 
> 
> I think I found the source of the glitch. I may have the solution
> to fix it, but my little finger is telling that it might be a bit
> too extreme... Maybe MarkK has some comments about this?

How about this as a compromize:

- in 6.1
your original patch (but with a comment saying that the
+      if (IN_SOLIB_CALL_TRAMPOLINE (stop_pc, ecs->stop_func_name))
is a hack and shouldn't be included in the mainline)

- in mainline:
Assuming Mark's ok with the sparc changes, your patch without that part?

This way 6.1 is robust regardless of which SPARC architecture code is in 
place.

Andrew



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFA] use frame IDs to detect function calls while stepping
  2004-03-02  6:16             ` Joel Brobecker
  2004-03-03 21:12               ` Mark Kettenis
  2004-03-19  0:09               ` Andrew Cagney
@ 2004-03-19  0:09               ` Joel Brobecker
  2 siblings, 0 replies; 23+ messages in thread
From: Joel Brobecker @ 2004-03-19  0:09 UTC (permalink / raw)
  To: Andrew Cagney, kettenis; +Cc: Elena Zannoni, gdb-patches

> Here is a description of the regression which appears if we don't
> include the following test:
> 
> +      if (IN_SOLIB_CALL_TRAMPOLINE (stop_pc, ecs->stop_func_name))
> +        {
> +          /* We landed in a shared library call trampoline, so it
> +             is a subroutine call.  */
> +          handle_step_into_function (ecs);
> +          return;
> +        }
> 
> The problem occurs with ending-run:
> 
>         % gdb ending-run
>         (gdb) b ending-run.c:33
>         Breakpoint 1 at 0x10828: file gdb.base/ending-run.c, line 33.
>         (gdb) run
>         Starting program: /[...]/gdb.base/ending-run 
>         -1 2 7 14 23 34 47 62 79  Goodbye!
>         
>         Breakpoint 1, main () at gdb.base/ending-run.c:33
>         33      }
>         (gdb) n
>         0x000105ec in _start ()
>         (gdb) n
>         Single stepping until exit from function _start, 
>         which has no line number information.
>         0x00020950 in _PROCEDURE_LINKAGE_TABLE_ ()
> 
> The expected behavior for the last "next" command is for GDB
> to run until the inferior exits:
> 
>         (gdb) n
>         Single stepping until exit from function _start, 
>         which has no line number information.
>         
>         Program exited normally.
> 
> Unfortunately, here is what happens. At 0x000105ec, before we do
> our second "next" command, we are about to execute the following
> code:
> 
>         0x000105ec <_start+100>:        call  0x20950 <exit>
>         0x000105f0 <_start+104>:        nop 
> 
> After two iterations (one for the call insn, and one for the delay
> slot), GDB lands at the begining of function "exit" at 0x00020950,
> which is:
> 
>         0x00020950 <exit+0>:    sethi  %hi(0xf000), %g1
>         0x00020954 <exit+4>:    b,a   0x20914 <_PROCEDURE_LINKAGE_TABLE_>
>         0x00020958 <exit+8>:    nop 
> 
> So at this point, the registers window has not been rotated.
> I don't know if this is the cause for this problem, but at this
> point GDB is unable to unwind the call stack:
> 
>         (gdb) bt
>         #0  0x00020950 in _PROCEDURE_LINKAGE_TABLE_ ()
> 
> (And gets the wrong procedure name as well, but that's a separate
> issue - although "x /i" does report what I believe is the correct
> name, strange!).
> 
> I am looking into the sparc unwinder code right now, to try to
> understand a bit better the source of the problem.

I think I found the source of the glitch. I may have the solution
to fix it, but my little finger is telling that it might be a bit
too extreme... Maybe MarkK has some comments about this?

What happens is that, at the point when we reach function "exit",
the FP register is null:

        (gdb) p /x $fp
        $2 = 0x0

The sparc unwinder in sparc_frame_cache() detects this, thinks there is
something wrong, and aborts early. So, we never unwind the "_start"
frame, and hence the following frame ID check doesn't notice the
function call, as it should have in this case:

+      if (frame_id_eq (get_frame_id (get_prev_frame (get_current_frame ())),
+                       step_frame_id))
+        {
+          /* It's a subroutine call.  */
+          handle_step_into_function (ecs);
+          return;
+        }

With this example in mind, it seemed to me that the assertion that %fp
register is not null is unfortunately incorrect. Given that the rest of
the code in sparc_frame_cache() wasn't using the value of that register,
I commented out the assertion, and retried.

<<
@@ -620,8 +620,8 @@ sparc_frame_cache (struct frame_info *ne
      frame.  */
 
   cache->base = frame_unwind_register_unsigned (next_frame, SPARC_FP_REGNUM);
-  if (cache->base == 0)
-    return cache;
+  // if (cache->base == 0)
+  //   return cache;
 
   cache->pc = frame_func_unwind (next_frame);
   if (cache->pc != 0)
>>

That causes the problem above to disappear. I even went as far as
to run the testsuite: No change (apart from fixing the regression
I observed).

sparc_frame_cache() seems well designed to handle null %fp registers.
It doesn't use its value when scanning the prologue, and then knows
to use %sp in its place as the frame base. So the frame should be
correctly unwound.

Comments?
-- 
Joel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFA] use frame IDs to detect function calls while stepping
  2004-03-19  0:09         ` Joel Brobecker
  2004-03-01 19:48           ` Joel Brobecker
@ 2004-03-19  0:09           ` Joel Brobecker
  2004-03-01 23:52             ` Joel Brobecker
  2004-03-02  6:16             ` Joel Brobecker
  1 sibling, 2 replies; 23+ messages in thread
From: Joel Brobecker @ 2004-03-19  0:09 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: Elena Zannoni, gdb-patches

Ok, I'm dark red with shame.

> Yes, there is a regression (regression #2 in my previous message) if
> we leave that test out. It's been a while since I posted that patch,
> so I don't remember the details anymore :-/. I'll dig again later today.

It looks like my initial analysis of the cause of regression #2 was
wrong. I will simply say that I was mislead by the fact that GDB
reported that we stopped in "_PROCEDURE_LINKAGE_TABLE_" while we
were actually inside function "exit", and let it go at that.

Here is a description of the regression which appears if we don't
include the following test:

+      if (IN_SOLIB_CALL_TRAMPOLINE (stop_pc, ecs->stop_func_name))
+        {
+          /* We landed in a shared library call trampoline, so it
+             is a subroutine call.  */
+          handle_step_into_function (ecs);
+          return;
+        }

The problem occurs with ending-run:

        % gdb ending-run
        (gdb) b ending-run.c:33
        Breakpoint 1 at 0x10828: file gdb.base/ending-run.c, line 33.
        (gdb) run
        Starting program: /[...]/gdb.base/ending-run 
        -1 2 7 14 23 34 47 62 79  Goodbye!

        Breakpoint 1, main () at gdb.base/ending-run.c:33
        33      }
        (gdb) n
        0x000105ec in _start ()
        (gdb) n
        Single stepping until exit from function _start, 
        which has no line number information.
        0x00020950 in _PROCEDURE_LINKAGE_TABLE_ ()

The expected behavior for the last "next" command is for GDB
to run until the inferior exits:

        (gdb) n
        Single stepping until exit from function _start, 
        which has no line number information.

        Program exited normally.

Unfortunately, here is what happens. At 0x000105ec, before we do
our second "next" command, we are about to execute the following
code:

        0x000105ec <_start+100>:        call  0x20950 <exit>
        0x000105f0 <_start+104>:        nop 

After two iterations (one for the call insn, and one for the delay
slot), GDB lands at the begining of function "exit" at 0x00020950,
which is:

        0x00020950 <exit+0>:    sethi  %hi(0xf000), %g1
        0x00020954 <exit+4>:    b,a   0x20914 <_PROCEDURE_LINKAGE_TABLE_>
        0x00020958 <exit+8>:    nop 

So at this point, the registers window has not been rotated.
I don't know if this is the cause for this problem, but at this
point GDB is unable to unwind the call stack:

        (gdb) bt
        #0  0x00020950 in _PROCEDURE_LINKAGE_TABLE_ ()

(And gets the wrong procedure name as well, but that's a separate
issue - although "x /i" does report what I believe is the correct
name, strange!).

I am looking into the sparc unwinder code right now, to try to
understand a bit better the source of the problem.

-- 
Joel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFA] use frame IDs to detect function calls while stepping
  2004-03-19  0:09               ` Andrew Cagney
  2004-03-02 15:48                 ` Andrew Cagney
@ 2004-03-19  0:09                 ` Joel Brobecker
  2004-03-02 22:07                   ` Joel Brobecker
  2004-03-06  0:15                   ` Andrew Cagney
  1 sibling, 2 replies; 23+ messages in thread
From: Joel Brobecker @ 2004-03-19  0:09 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: kettenis, Elena Zannoni, gdb-patches

> How about this as a compromize:
> 
> - in 6.1
> your original patch (but with a comment saying that the
> +      if (IN_SOLIB_CALL_TRAMPOLINE (stop_pc, ecs->stop_func_name))
> is a hack and shouldn't be included in the mainline)
> 
> - in mainline:
> Assuming Mark's ok with the sparc changes, your patch without that part?
> 
> This way 6.1 is robust regardless of which SPARC architecture code is in 
> place.

it looks reasonable, but it'd be nice to have Mark's opinion on the
unwinder problem. How about we let it sit for another day or two before
we commit anything?

Also, do we need this change in 6.1? It seemed like a pretty minor bug
that I managed to produce on Tru64 only. Or were there other occurrences
of this bug?

-- 
Joel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFA] use frame IDs to detect function calls while stepping
  2004-02-27 15:23       ` Andrew Cagney
@ 2004-03-19  0:09         ` Joel Brobecker
  2004-03-01 19:48           ` Joel Brobecker
  2004-03-19  0:09           ` Joel Brobecker
  0 siblings, 2 replies; 23+ messages in thread
From: Joel Brobecker @ 2004-03-19  0:09 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: Elena Zannoni, gdb-patches

> >>> > +      if (IN_SOLIB_CALL_TRAMPOLINE (stop_pc, ecs->stop_func_name))
> >>> > +        {
> >>> > +          /* We landed in a shared library call trampoline, so it
> >>> > +             is a subroutine call.  */
> >>> > +          handle_step_into_function (ecs);
> >>> > +          return;
> >>> > +        }
> >>>
> >>>
> >>>I am not sure I understand why the case above is not covered by the
> >>>test below.  Is this to fix regression #1? I.e multiple frames? 
> >
> >
> >Hmmm, good question...
> >
> >Although it does fix regression #2, it is not the only reason why we
> >added this test. It was also based on logic (see "After ... here is
> >what we found", in my previous message).
> >
> >I should admit that in the case at hand (regression #2), the unwinder
> >is having a hard time unwinding out of this code, and causes the
> >frame ID check to fail. I don't remember seeing several levels of
> >function call.
> >
> >However, I still thought that this test was necessary because we could
> >just as well have reached this trampoline one or more levels of function
> >call down, just as we end up stepping into undebuggable code in
> >regression #1.
> 
> I'd not noticed this issue.  Hmm, if GDB's single stepping then the 
> second test should cover this case.  It's when GDB is free running that 
> we might find ourselves stopped IN_SOLIB_CALL_TRAMPOLINE.  If it is the 
> latter case then I'm not sure that silently single stepping away from 
> where the program stopped is being helpful.
> 
> Can you try the testsuite without that check?  If the results are ok 
> then (with other changes) commit it.  If its not we need to re-think 
> whats happening :-(  Yes, this will mean it goes into 6.1.

Yes, there is a regression (regression #2 in my previous message) if
we leave that test out. It's been a while since I posted that patch,
so I don't remember the details anymore :-/. I'll dig again later today.

I did rerun the testsuite without it to double-check that my
recollection was right. And I also ran the testsuite with the frame_id
patch you recently posted, hoping that it might solve the extra
regression. Unfortunately, I am sorry to report that it actually
introduced another 5 or 6 regressions...

On the other hand, as soon as I add the check back, we're down to
zero regression (that is, with your frame_id patch as well).

More details later today about the failing test when the test above
is removed.

-- 
Joel


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFA] use frame IDs to detect function calls while stepping
  2004-03-03 21:12               ` Mark Kettenis
@ 2004-03-19  0:09                 ` Mark Kettenis
  0 siblings, 0 replies; 23+ messages in thread
From: Mark Kettenis @ 2004-03-19  0:09 UTC (permalink / raw)
  To: brobecker; +Cc: cagney, ezannoni, gdb-patches

   Date: Mon, 1 Mar 2004 22:16:42 -0800
   From: Joel Brobecker <brobecker@gnat.com>

   > The expected behavior for the last "next" command is for GDB
   > to run until the inferior exits:
   > 
   >         (gdb) n
   >         Single stepping until exit from function _start, 
   >         which has no line number information.
   >         
   >         Program exited normally.
   > 
   > Unfortunately, here is what happens. At 0x000105ec, before we do
   > our second "next" command, we are about to execute the following
   > code:
   > 
   >         0x000105ec <_start+100>:        call  0x20950 <exit>
   >         0x000105f0 <_start+104>:        nop 
   > 
   > After two iterations (one for the call insn, and one for the delay
   > slot), GDB lands at the begining of function "exit" at 0x00020950,
   > which is:
   > 
   >         0x00020950 <exit+0>:    sethi  %hi(0xf000), %g1
   >         0x00020954 <exit+4>:    b,a   0x20914 <_PROCEDURE_LINKAGE_TABLE_>
   >         0x00020958 <exit+8>:    nop 
   > 
   > So at this point, the registers window has not been rotated.
   > I don't know if this is the cause for this problem, but at this
   > point GDB is unable to unwind the call stack:
   > 
   >         (gdb) bt
   >         #0  0x00020950 in _PROCEDURE_LINKAGE_TABLE_ ()
   > 
   > (And gets the wrong procedure name as well, but that's a separate
   > issue - although "x /i" does report what I believe is the correct
   > name, strange!).

Actually, it doesn't get the name wrong.  The function "exit" lives in
a shared library.  Since this is the first time you're calling exit(),
you don't end up in the function itself, but in the PLT entry for
"exit".  This PLT entry will invoke the dynamic linker to direct the
program to the real "exit" function [1].  The PLT entry function is
part of the Procedure Linkage Table (PLT) which is usally named
_PROCEDURE_LINKAGE_TABLE.

   > I am looking into the sparc unwinder code right now, to try to
   > understand a bit better the source of the problem.

   I think I found the source of the glitch. I may have the solution
   to fix it, but my little finger is telling that it might be a bit
   too extreme... Maybe MarkK has some comments about this?

It's not "extreme" at all.  Read on...

   What happens is that, at the point when we reach function "exit",
   the FP register is null:

	   (gdb) p /x $fp
	   $2 = 0x0

   The sparc unwinder in sparc_frame_cache() detects this, thinks there is
   something wrong, and aborts early. So, we never unwind the "_start"
   frame, and hence the following frame ID check doesn't notice the
   function call, as it should have in this case:

   +      if (frame_id_eq (get_frame_id (get_prev_frame (get_current_frame ())),
   +                       step_frame_id))
   +        {
   +          /* It's a subroutine call.  */
   +          handle_step_into_function (ecs);
   +          return;
   +        }

   With this example in mind, it seemed to me that the assertion that %fp
   register is not null is unfortunately incorrect. Given that the rest of
   the code in sparc_frame_cache() wasn't using the value of that register,
   I commented out the assertion, and retried.

OK.  Here %fp == 0 is marking the outermost frame.  So what we're
having is that we're effectively calling a frameless function (the PLT
entry) from the outermost frame.  Since this function is frameless, we
shouldn't be looking at %fp, but at %sp.  However, we bail out before
we do so.  So the check is bogus!  

I'm currently testing the attached patch.  If I see no regressions,
I'll check it in.  Is it needed on the branch too?

Mark

[1] It'll also patch things up such that any future calls to exit()
will go directly to the real function.  Pointless of course for
exit(), but this mechanism is used for all calls to functions living
in a shared library.

Index: ChangeLog
from  Mark Kettenis  <kettenis@gnu.org>

	* sparc-tdep.c (sparc_frame_cache): Don't bail out if %fp is zero.
	Reorganize code a bit.

Index: sparc-tdep.c
===================================================================
RCS file: /cvs/src/src/gdb/sparc-tdep.c,v
retrieving revision 1.149
diff -u -p -r1.149 sparc-tdep.c
--- sparc-tdep.c 7 Feb 2004 20:40:35 -0000 1.149
+++ sparc-tdep.c 3 Mar 2004 21:11:29 -0000
@@ -615,14 +615,6 @@ sparc_frame_cache (struct frame_info *ne
   cache = sparc_alloc_frame_cache ();
   *this_cache = cache;

-  /* In priciple, for normal frames, %fp (%i6) holds the frame
-     pointer, which holds the base address for the current stack
-     frame.  */
-
-  cache->base = frame_unwind_register_unsigned (next_frame, SPARC_FP_REGNUM);
-  if (cache->base == 0)
-    return cache;
-
   cache->pc = frame_func_unwind (next_frame);
   if (cache->pc != 0)
     {
@@ -632,10 +624,18 @@ sparc_frame_cache (struct frame_info *ne

   if (cache->frameless_p)
     {
-      /* We didn't find a valid frame, which means that CACHE->base
-	 currently holds the frame pointer for our calling frame.  */
-      cache->base = frame_unwind_register_unsigned (next_frame,
-						    SPARC_SP_REGNUM);
+      /* This function is frameless, so %fp (%i6) holds the frame
+         pointer for our calling frame.  Use %sp (%o6) as this frame's
+         base address.  */
+      cache->base =
+	frame_unwind_register_unsigned (next_frame, SPARC_SP_REGNUM);
+    }
+  else
+    {
+      /* For normal frames, %fp (%i6) holds the frame pointer, the
+         base address for the current stack frame.  */
+      cache->base =
+	frame_unwind_register_unsigned (next_frame, SPARC_FP_REGNUM);
     }

   return cache;

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFA] use frame IDs to detect function calls while stepping
  2004-03-06  0:15                   ` Andrew Cagney
@ 2004-03-19  0:09                     ` Andrew Cagney
  0 siblings, 0 replies; 23+ messages in thread
From: Andrew Cagney @ 2004-03-19  0:09 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: kettenis, Elena Zannoni, gdb-patches

>>How about this as a compromize:
>>> 
>>> - in 6.1
>>> your original patch (but with a comment saying that the
>>> +      if (IN_SOLIB_CALL_TRAMPOLINE (stop_pc, ecs->stop_func_name))
>>> is a hack and shouldn't be included in the mainline)
>>> 
>>> - in mainline:
>>> Assuming Mark's ok with the sparc changes, your patch without that part?
>>> 
>>> This way 6.1 is robust regardless of which SPARC architecture code is in 
>>> place.
> 
> 
> it looks reasonable, but it'd be nice to have Mark's opinion on the
> unwinder problem. How about we let it sit for another day or two before
> we commit anything?
> 
> Also, do we need this change in 6.1? It seemed like a pretty minor bug
> that I managed to produce on Tru64 only. Or were there other occurrences
> of this bug?

So mainline only it is .... (For safety, I was testing my frame 
unwinders with this applied).

Andrew



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2004-03-06  0:15 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-02-05  4:41 [RFA] use frame IDs to detect function calls while stepping Joel Brobecker
2004-02-05 17:13 ` Joel Brobecker
2004-02-05 18:54   ` Elena Zannoni
2004-02-07  4:01     ` Joel Brobecker
2004-02-27 15:23       ` Andrew Cagney
2004-03-19  0:09         ` Joel Brobecker
2004-03-01 19:48           ` Joel Brobecker
2004-03-19  0:09           ` Joel Brobecker
2004-03-01 23:52             ` Joel Brobecker
2004-03-02  6:16             ` Joel Brobecker
2004-03-03 21:12               ` Mark Kettenis
2004-03-19  0:09                 ` Mark Kettenis
2004-03-19  0:09               ` Andrew Cagney
2004-03-02 15:48                 ` Andrew Cagney
2004-03-19  0:09                 ` Joel Brobecker
2004-03-02 22:07                   ` Joel Brobecker
2004-03-06  0:15                   ` Andrew Cagney
2004-03-19  0:09                     ` Andrew Cagney
2004-03-19  0:09               ` Joel Brobecker
2004-02-05 19:01   ` Daniel Jacobowitz
2004-02-05 19:23     ` Elena Zannoni
2004-02-05 19:49       ` Daniel Jacobowitz
2004-02-09 19:21         ` Andrew Cagney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox