Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed
* [RFA] Tru64 stack unwinding fix
@ 2002-06-14 12:12 Joel Brobecker
  2002-06-14 12:22 ` Joel Brobecker
  2002-06-18 16:52 ` Andrew Cagney
  0 siblings, 2 replies; 4+ messages in thread
From: Joel Brobecker @ 2002-06-14 12:12 UTC (permalink / raw)
  To: gdb-patches

[-- Attachment #1: Type: text/plain, Size: 7062 bytes --]

This problem happens on both Tru64 4.0f and 5.1/5.1A.

To reproduce the problem, you can use any Ada program using tasks,
and try to perform a task switch:
<<
(gdb) info tasks
  ID       TID P-ID Pri Stack  % State                  Name
   1 140025400    0  30  Unknown Child Termination Wait main_task
*  2 140049000    1  30   48K 10 Accepting RV with 3    maitre_d
   3 140049c00    1  30   48K  7 Waiting on entry call  dijkstra
   4 140050000    1  30   48K  7 Waiting on entry call  stroustrup
   5 140054800    1  30   48K  7 Waiting on entry call  anderson
   6 140055400    1  30   48K  7 Runnable               ichbiah
   7 140061000    1  30   48K  7 Waiting on entry call  taft
(gdb) task 4
warning: Hit heuristic-fence-post without finding         <<<---  OOPS!
warning: enclosing function for address 0x1400433e0
This warning occurs if you are debugging a function without any symbols
(for example, in a stripped executable).  In that case, you may wish to
increase the size of the search with the `set heuristic-fence-post' command.

Otherwise, you told GDB there was a function where there isn't one, or
(more likely) you have encountered a bug in GDB.
[Switching to task 4]
#0  0x3ff8057d43c in __hstTransferRegistersPC () from /usr/shlib/libpthread.so
>>

The real problem is not the task switching operation per se, but
rather the fact that GDB fails to unwind the stack correctly. This
problem also appears if we try to use "bt" after the task switch:
<<
(gdb) bt
#0  0x3ff8057d43c in __hstTransferRegistersPC () from /usr/shlib/libpthread.so
#1  0x3ff8056e8e4 in __osTransferContext () from /usr/shlib/libpthread.so
#2  0x3ff80560c30 in __dspDispatch () from /usr/shlib/libpthread.so
warning: Hit heuristic-fence-post without finding     <<<---   Same oops!
warning: enclosing function for address 0x1400433e0
>>

I'm sorry the test case involves Ada, but that is the only way
I found to reproduce it at the time when I looked at this problem.

The problem happens on all non-active tasks, but otherwise GDB works
fine with the active one. So why does it always fail on the non-active
tasks, and no in the active one? The answer lies in the fact that all
non-active threads call the same procedure to perform a context switch,
and end up being stopped at the same code location, somewhere inside
__hstTransferRegisters. This means that the the stack trace for all
non-current tasks always starts like this:

 #0  0x3ff805ca8cc in __hstTransferRegistersPC () from /usr/shlib/libpthread.so
 #1  0x3ff805af458 in __osTransferContext () from /usr/shlib/libpthread.so
 #2  0x3ff805a39c4 in __dspTransferContext () from /usr/shlib/libpthread.so
 #3  0x3ff805a1068 in __dspDispatch () from /usr/shlib/libpthread.so

GDB is able to unwind up to __dspTransferContext, and then always fails.
I found that on alpha, GDB uses a heuristic algorithm to compute the
stack (that is actually not surprising given the warning message): For
each frame, it actually computes the "proc_desc" by reading the
instructions in the function prologue.

I later found that GDB was computing an incorrect return address for 
frame #2. Using this incorrect return address, GDB naturally could not
locate the function which called __dspTransferContext, and therefore
reported an error. The answer to this failure lied in the
computation of the proc_desc of frame #1, but we need to look at its
code to understand why:

0x3ff805af250 <__osTransferContext>:    ldah    gp,16321(t12)
0x3ff805af254 <__osTransferContext+4>:  unop
0x3ff805af258 <__osTransferContext+8>:  lda     gp,-3312(gp)
0x3ff805af25c <__osTransferContext+12>: unop
0x3ff805af260 <__osTransferContext+16>: lda     sp,-64(sp)
0x3ff805af264 <__osTransferContext+20>: stq     ra,0(sp)
0x3ff805af268 <__osTransferContext+24>: stq     s0,8(sp)
0x3ff805af26c <__osTransferContext+28>: stq     s1,16(sp)
0x3ff805af270 <__osTransferContext+32>: stq     s2,24(sp)
0x3ff805af274 <__osTransferContext+36>: stq     s3,32(sp)
0x3ff805af278 <__osTransferContext+40>: stq     s4,40(sp)
0x3ff805af27c <__osTransferContext+44>: unop
0x3ff805af280 <__osTransferContext+48>: stq     fp,48(sp)
0x3ff805af284 <__osTransferContext+52>: mov     sp,fp
[...]
0x3ff805af304 <__osTransferContext+180>:        lda     sp,-336(sp)
[...]

As we can see with the lda instruction at +16, the frame size is 64
bytes.  And instruction at +20 shows that the return address is saved
at 0(sp). At first, GDB gets the correct frame size, and finds that
ra is at 0(sp) and therefore computes the adress where ra is saved by
adding 0 to the current value of sp. This is where things start to go
wrong.

I have also pasted one very important instruction at +180, which expands
the size of the current frame (typical of a frame which does some
alloca)! So 0(sp) does not point to the saved $ra anymore, Ouch!
Fortunately, the instruction at +52 shows that $sp has been saved into
$fp, so $ra (and all other saved registers) can be retrieved using $fp
as the base address instead of $sp. This is the main point of my change:
when the $fp register is used in the frame, then use it, rather than
using $sp.

Another error in the current code (somewhat related really) is that the
heuristic algorithm was accumulating all stack allocations detected in
the function being inspected. In our case, it found two "lda sp,
nnn(sp)" instructions, one at +16, and one at +180, and therefore
considered the size of the frame to be 64+336=400 bytes. Oups. I also
fixed the code to only take into account the first stack allocation to
get the frame size.

With both fixes, the backtrace started working in this case as well:
#0  0x3ff805ca8cc in __hstTransferRegistersPC () from /usr/shlib/libpthread.so
#1  0x3ff805af458 in __osTransferContext () from /usr/shlib/libpthread.so
#2  0x3ff805a39c4 in __dspTransferContext () from /usr/shlib/libpthread.so
#3  0x3ff805a1068 in __dspDispatch () from /usr/shlib/libpthread.so
#4  0x3ff805a01f4 in __cvWaitPrim () from /usr/shlib/libpthread.so
#5  0x3ff8059da1c in __pthread_cond_wait () from /usr/shlib/libpthread.so
#6  0x12003dedc in system.tasking.entry_calls.wait_for_completion ()
    at s-taenca.adb:6
#7  0x120047b10 in system.tasking.rendezvous.call_synchronous ()
    at s-tasren.adb:6
#8  0x120043acc in system.tasking.rendezvous.call_simple () at s-tasren.adb:6
#9  0x12002f824 in phil.philosopher (<_task>=0x140042ee0) at phil.adb:49
#10 0x120049b40 in system.tasking.stages.task_wrapper () at s-tassta.adb:6
#11 0x3ff805bca7c in __thdBase () from /usr/shlib/libpthread.so

Joy!

Here is the ChangeLog:
2002-06-14  Joel Brobecker  <brobecker@gnat.com>
        * alpha-tdep.c (heuristic_proc_desc): Compute the size of the
        current frame using only the first stack size adjustment. All
        subsequent size adjustments are not considered to be part of
        the "static" part of the current frame.
        Compute the address of the saved registers relative to the
        Frame Pointer ($fp) instead of the Stack Pointer if $fp is
        in use in this frame.

Ok to commit?

Thanks,
-- 
Joel

[-- Attachment #2: alpha-tdep.c.diff --]
[-- Type: text/plain, Size: 4470 bytes --]

Index: alpha-tdep.c
===================================================================
RCS file: /cvs/src/src/gdb/alpha-tdep.c,v
retrieving revision 1.34
diff -c -3 -p -r1.34 alpha-tdep.c
*** alpha-tdep.c	21 May 2002 15:36:02 -0000	1.34
--- alpha-tdep.c	14 Jun 2002 19:01:15 -0000
*************** heuristic_proc_desc (CORE_ADDR start_pc,
*** 649,659 ****
--- 649,661 ----
  		     struct frame_info *next_frame)
  {
    CORE_ADDR sp = read_next_frame_reg (next_frame, SP_REGNUM);
+   CORE_ADDR vfp = sp;
    CORE_ADDR cur_pc;
    int frame_size;
    int has_frame_reg = 0;
    unsigned long reg_mask = 0;
    int pcreg = -1;
+   int regno;
  
    if (start_pc == 0)
      return NULL;
*************** heuristic_proc_desc (CORE_ADDR start_pc,
*** 678,684 ****
        if ((word & 0xffff0000) == 0x23de0000)	/* lda $sp,n($sp) */
  	{
  	  if (word & 0x8000)
! 	    frame_size += (-word) & 0xffff;
  	  else
  	    /* Exit loop if a positive stack adjustment is found, which
  	       usually means that the stack cleanup code in the function
--- 680,691 ----
        if ((word & 0xffff0000) == 0x23de0000)	/* lda $sp,n($sp) */
  	{
  	  if (word & 0x8000)
!           {
!             /* Consider only the first stack allocation instruction
!                to contain the static size of the frame. */
!             if (frame_size == 0)
! 	        frame_size += (-word) & 0xffff;
!           }
  	  else
  	    /* Exit loop if a positive stack adjustment is found, which
  	       usually means that the stack cleanup code in the function
*************** heuristic_proc_desc (CORE_ADDR start_pc,
*** 690,696 ****
  	{
  	  int reg = (word & 0x03e00000) >> 21;
  	  reg_mask |= 1 << reg;
! 	  temp_saved_regs[reg] = sp + (short) word;
  
  	  /* Starting with OSF/1-3.2C, the system libraries are shipped
  	     without local symbols, but they still contain procedure
--- 697,712 ----
  	{
  	  int reg = (word & 0x03e00000) >> 21;
  	  reg_mask |= 1 << reg;
! 
!           /* Do not compute the address where the register was saved yet,
!              because we don't know yet if the offset will need to be
!              relative to $sp or $fp (we can not compute the address relative
!              to $sp if $sp is updated during the execution of the current
!              subroutine, for instance when doing some alloca). So just store
!              the offset for the moment, and compute the address later
!              when we know whether this frame has a frame pointer or not.
!            */
!           temp_saved_regs[reg] = (short) word;
  
  	  /* Starting with OSF/1-3.2C, the system libraries are shipped
  	     without local symbols, but they still contain procedure
*************** heuristic_proc_desc (CORE_ADDR start_pc,
*** 719,726 ****
  	}
        else if ((word & 0xffe0ffff) == 0x6be08001)	/* ret zero,reg,1 */
  	pcreg = (word >> 16) & 0x1f;
!       else if (word == 0x47de040f)	/* bis sp,sp fp */
! 	  has_frame_reg = 1;
      }
    if (pcreg == -1)
      {
--- 735,749 ----
  	}
        else if ((word & 0xffe0ffff) == 0x6be08001)	/* ret zero,reg,1 */
  	pcreg = (word >> 16) & 0x1f;
!       else if (word == 0x47de040f || word == 0x47fe040f) /* bis sp,sp fp */
!         {
!           /* ??? I am not sure what instruction is 0x47fe040f, and I
!              am suspecting that there was a typo and should have been
!              0x47fe040f. I'm keeping it in the test above until further
!              investigation */
! 	    has_frame_reg = 1;
!           vfp = read_next_frame_reg (next_frame, ALPHA_GCC_FP_REGNUM);
!         }
      }
    if (pcreg == -1)
      {
*************** heuristic_proc_desc (CORE_ADDR start_pc,
*** 759,764 ****
--- 782,799 ----
      PROC_FRAME_REG (&temp_proc_desc) = ALPHA_GCC_FP_REGNUM;
    else
      PROC_FRAME_REG (&temp_proc_desc) = SP_REGNUM;
+ 
+   /* At this point, we know which of the Stack Pointer of the Frame Pointer
+      to use as the reference address to compute the saved registers address.
+      But in both cases, the processing above has set vfp to this reference
+      address, so just need to increment the offset of each saved register
+      by this address. */
+   for (regno = 0; regno < NUM_REGS; regno++)
+     {
+       if (reg_mask & 1 << regno)
+ 	temp_saved_regs[regno] += vfp;
+     }
+ 
    PROC_FRAME_OFFSET (&temp_proc_desc) = frame_size;
    PROC_REG_MASK (&temp_proc_desc) = reg_mask;
    PROC_PC_REG (&temp_proc_desc) = (pcreg == -1) ? ALPHA_RA_REGNUM : pcreg;

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFA] Tru64 stack unwinding fix
  2002-06-14 12:12 [RFA] Tru64 stack unwinding fix Joel Brobecker
@ 2002-06-14 12:22 ` Joel Brobecker
  2002-06-18 16:52 ` Andrew Cagney
  1 sibling, 0 replies; 4+ messages in thread
From: Joel Brobecker @ 2002-06-14 12:22 UTC (permalink / raw)
  To: gdb-patches

> Here is the ChangeLog:
> 2002-06-14  Joel Brobecker  <brobecker@gnat.com>
>         * alpha-tdep.c (heuristic_proc_desc): Compute the size of the
>         current frame using only the first stack size adjustment. All
>         subsequent size adjustments are not considered to be part of
>         the "static" part of the current frame.
>         Compute the address of the saved registers relative to the
>         Frame Pointer ($fp) instead of the Stack Pointer if $fp is
>         in use in this frame.

I forgot to say: no regression introduced by this change.

-- 
Joel


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFA] Tru64 stack unwinding fix
  2002-06-14 12:12 [RFA] Tru64 stack unwinding fix Joel Brobecker
  2002-06-14 12:22 ` Joel Brobecker
@ 2002-06-18 16:52 ` Andrew Cagney
  2002-06-18 18:20   ` Joel Brobecker
  1 sibling, 1 reply; 4+ messages in thread
From: Andrew Cagney @ 2002-06-18 16:52 UTC (permalink / raw)
  To: Joel Brobecker; +Cc: gdb-patches


> I'm sorry the test case involves Ada, but that is the only way
> I found to reproduce it at the time when I looked at this problem.

I suspect a non-ada aware GDB and a breakpoint on 
*__hstTransferRegistersPC would do the trick :-)


> Here is the ChangeLog:
> 2002-06-14  Joel Brobecker  <brobecker@gnat.com>
>         * alpha-tdep.c (heuristic_proc_desc): Compute the size of the
>         current frame using only the first stack size adjustment. All
>         subsequent size adjustments are not considered to be part of
>         the "static" part of the current frame.
>         Compute the address of the saved registers relative to the
>         Frame Pointer ($fp) instead of the Stack Pointer if $fp is
>         in use in this frame.
> 
> Ok to commit?

Yes (you're in a better position then most to test it).  The change to 
handle FP and alloca() is very important and something often missed.

 > +   /* At this point, we know which of the Stack Pointer of the Frame 
Pointer

  .. or the ..

enjoy,
Andrew

PS: Some things unrelated to your patch I noticed:

> !           vfp = read_next_frame_reg (next_frame, ALPHA_GCC_FP_REGNUM);

I think someone should re-implement read_next_frame_reg() using:

/* Unwind the stack frame so that the value of REGNUM, in the previous
    frame is returned.  If VALUEP is NULL, don't fetch/compute the
    value.  Instead just return the location of the value.  */

extern void frame_register_unwind (struct frame_info *frame, int regnum,
                                    int *optimizedp, enum lval_type *lvalp,
                                    CORE_ADDR *addrp, int *realnump,
                                    void *valuep);

as this will work with CFI and generic dummy frames.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFA] Tru64 stack unwinding fix
  2002-06-18 16:52 ` Andrew Cagney
@ 2002-06-18 18:20   ` Joel Brobecker
  0 siblings, 0 replies; 4+ messages in thread
From: Joel Brobecker @ 2002-06-18 18:20 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: gdb-patches

> > Ok to commit?
> 
> Yes (you're in a better position then most to test it).  The change to 
> handle FP and alloca() is very important and something often missed.

Thanks. This is now in (I also corrected the little typo you found
in one of the comments).

> I suspect a non-ada aware GDB and a breakpoint on 
> *__hstTransferRegistersPC would do the trick :-)

Yes it did, silly me :-).

Just for the record, here is the output before my patch:
<<
(gdb) b __hstTransferRegistersPC
Breakpoint 2 at 0x3ff8057d43c
(gdb) c
Continuing.

Breakpoint 2, 0x000003ff8057d43c in __hstTransferRegistersPC ()
   from /usr/shlib/libpthread.so
(gdb) bt
#0  0x000003ff8057d43c in __hstTransferRegistersPC ()
   from /usr/shlib/libpthread.so
#1  0x000003ff8056e8e4 in __osTransferContext () from /usr/shlib/libpthread.so
warning: Hit beginning of text section without finding
warning: enclosing function for address 0x13
>>

> I think someone should re-implement read_next_frame_reg() using:
> 
> /* Unwind the stack frame so that the value of REGNUM, in the previous
>     frame is returned.  If VALUEP is NULL, don't fetch/compute the
>     value.  Instead just return the location of the value.  */
> 
> extern void frame_register_unwind (struct frame_info *frame, int regnum,
>                                     int *optimizedp, enum lval_type *lvalp,
>                                     CORE_ADDR *addrp, int *realnump,
>                                     void *valuep);
> 
> as this will work with CFI and generic dummy frames.

Ack, I'll try to do this when I have a spare moment...

Thanks for the quick review.
-- 
Joel


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2002-06-19  1:20 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-06-14 12:12 [RFA] Tru64 stack unwinding fix Joel Brobecker
2002-06-14 12:22 ` Joel Brobecker
2002-06-18 16:52 ` Andrew Cagney
2002-06-18 18:20   ` Joel Brobecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox