* [RFA] Tru64 stack unwinding fix
@ 2002-06-14 12:12 Joel Brobecker
2002-06-14 12:22 ` Joel Brobecker
2002-06-18 16:52 ` Andrew Cagney
0 siblings, 2 replies; 4+ messages in thread
From: Joel Brobecker @ 2002-06-14 12:12 UTC (permalink / raw)
To: gdb-patches
[-- Attachment #1: Type: text/plain, Size: 7062 bytes --]
This problem happens on both Tru64 4.0f and 5.1/5.1A.
To reproduce the problem, you can use any Ada program using tasks,
and try to perform a task switch:
<<
(gdb) info tasks
ID TID P-ID Pri Stack % State Name
1 140025400 0 30 Unknown Child Termination Wait main_task
* 2 140049000 1 30 48K 10 Accepting RV with 3 maitre_d
3 140049c00 1 30 48K 7 Waiting on entry call dijkstra
4 140050000 1 30 48K 7 Waiting on entry call stroustrup
5 140054800 1 30 48K 7 Waiting on entry call anderson
6 140055400 1 30 48K 7 Runnable ichbiah
7 140061000 1 30 48K 7 Waiting on entry call taft
(gdb) task 4
warning: Hit heuristic-fence-post without finding <<<--- OOPS!
warning: enclosing function for address 0x1400433e0
This warning occurs if you are debugging a function without any symbols
(for example, in a stripped executable). In that case, you may wish to
increase the size of the search with the `set heuristic-fence-post' command.
Otherwise, you told GDB there was a function where there isn't one, or
(more likely) you have encountered a bug in GDB.
[Switching to task 4]
#0 0x3ff8057d43c in __hstTransferRegistersPC () from /usr/shlib/libpthread.so
>>
The real problem is not the task switching operation per se, but
rather the fact that GDB fails to unwind the stack correctly. This
problem also appears if we try to use "bt" after the task switch:
<<
(gdb) bt
#0 0x3ff8057d43c in __hstTransferRegistersPC () from /usr/shlib/libpthread.so
#1 0x3ff8056e8e4 in __osTransferContext () from /usr/shlib/libpthread.so
#2 0x3ff80560c30 in __dspDispatch () from /usr/shlib/libpthread.so
warning: Hit heuristic-fence-post without finding <<<--- Same oops!
warning: enclosing function for address 0x1400433e0
>>
I'm sorry the test case involves Ada, but that is the only way
I found to reproduce it at the time when I looked at this problem.
The problem happens on all non-active tasks, but otherwise GDB works
fine with the active one. So why does it always fail on the non-active
tasks, and no in the active one? The answer lies in the fact that all
non-active threads call the same procedure to perform a context switch,
and end up being stopped at the same code location, somewhere inside
__hstTransferRegisters. This means that the the stack trace for all
non-current tasks always starts like this:
#0 0x3ff805ca8cc in __hstTransferRegistersPC () from /usr/shlib/libpthread.so
#1 0x3ff805af458 in __osTransferContext () from /usr/shlib/libpthread.so
#2 0x3ff805a39c4 in __dspTransferContext () from /usr/shlib/libpthread.so
#3 0x3ff805a1068 in __dspDispatch () from /usr/shlib/libpthread.so
GDB is able to unwind up to __dspTransferContext, and then always fails.
I found that on alpha, GDB uses a heuristic algorithm to compute the
stack (that is actually not surprising given the warning message): For
each frame, it actually computes the "proc_desc" by reading the
instructions in the function prologue.
I later found that GDB was computing an incorrect return address for
frame #2. Using this incorrect return address, GDB naturally could not
locate the function which called __dspTransferContext, and therefore
reported an error. The answer to this failure lied in the
computation of the proc_desc of frame #1, but we need to look at its
code to understand why:
0x3ff805af250 <__osTransferContext>: ldah gp,16321(t12)
0x3ff805af254 <__osTransferContext+4>: unop
0x3ff805af258 <__osTransferContext+8>: lda gp,-3312(gp)
0x3ff805af25c <__osTransferContext+12>: unop
0x3ff805af260 <__osTransferContext+16>: lda sp,-64(sp)
0x3ff805af264 <__osTransferContext+20>: stq ra,0(sp)
0x3ff805af268 <__osTransferContext+24>: stq s0,8(sp)
0x3ff805af26c <__osTransferContext+28>: stq s1,16(sp)
0x3ff805af270 <__osTransferContext+32>: stq s2,24(sp)
0x3ff805af274 <__osTransferContext+36>: stq s3,32(sp)
0x3ff805af278 <__osTransferContext+40>: stq s4,40(sp)
0x3ff805af27c <__osTransferContext+44>: unop
0x3ff805af280 <__osTransferContext+48>: stq fp,48(sp)
0x3ff805af284 <__osTransferContext+52>: mov sp,fp
[...]
0x3ff805af304 <__osTransferContext+180>: lda sp,-336(sp)
[...]
As we can see with the lda instruction at +16, the frame size is 64
bytes. And instruction at +20 shows that the return address is saved
at 0(sp). At first, GDB gets the correct frame size, and finds that
ra is at 0(sp) and therefore computes the adress where ra is saved by
adding 0 to the current value of sp. This is where things start to go
wrong.
I have also pasted one very important instruction at +180, which expands
the size of the current frame (typical of a frame which does some
alloca)! So 0(sp) does not point to the saved $ra anymore, Ouch!
Fortunately, the instruction at +52 shows that $sp has been saved into
$fp, so $ra (and all other saved registers) can be retrieved using $fp
as the base address instead of $sp. This is the main point of my change:
when the $fp register is used in the frame, then use it, rather than
using $sp.
Another error in the current code (somewhat related really) is that the
heuristic algorithm was accumulating all stack allocations detected in
the function being inspected. In our case, it found two "lda sp,
nnn(sp)" instructions, one at +16, and one at +180, and therefore
considered the size of the frame to be 64+336=400 bytes. Oups. I also
fixed the code to only take into account the first stack allocation to
get the frame size.
With both fixes, the backtrace started working in this case as well:
#0 0x3ff805ca8cc in __hstTransferRegistersPC () from /usr/shlib/libpthread.so
#1 0x3ff805af458 in __osTransferContext () from /usr/shlib/libpthread.so
#2 0x3ff805a39c4 in __dspTransferContext () from /usr/shlib/libpthread.so
#3 0x3ff805a1068 in __dspDispatch () from /usr/shlib/libpthread.so
#4 0x3ff805a01f4 in __cvWaitPrim () from /usr/shlib/libpthread.so
#5 0x3ff8059da1c in __pthread_cond_wait () from /usr/shlib/libpthread.so
#6 0x12003dedc in system.tasking.entry_calls.wait_for_completion ()
at s-taenca.adb:6
#7 0x120047b10 in system.tasking.rendezvous.call_synchronous ()
at s-tasren.adb:6
#8 0x120043acc in system.tasking.rendezvous.call_simple () at s-tasren.adb:6
#9 0x12002f824 in phil.philosopher (<_task>=0x140042ee0) at phil.adb:49
#10 0x120049b40 in system.tasking.stages.task_wrapper () at s-tassta.adb:6
#11 0x3ff805bca7c in __thdBase () from /usr/shlib/libpthread.so
Joy!
Here is the ChangeLog:
2002-06-14 Joel Brobecker <brobecker@gnat.com>
* alpha-tdep.c (heuristic_proc_desc): Compute the size of the
current frame using only the first stack size adjustment. All
subsequent size adjustments are not considered to be part of
the "static" part of the current frame.
Compute the address of the saved registers relative to the
Frame Pointer ($fp) instead of the Stack Pointer if $fp is
in use in this frame.
Ok to commit?
Thanks,
--
Joel
[-- Attachment #2: alpha-tdep.c.diff --]
[-- Type: text/plain, Size: 4470 bytes --]
Index: alpha-tdep.c
===================================================================
RCS file: /cvs/src/src/gdb/alpha-tdep.c,v
retrieving revision 1.34
diff -c -3 -p -r1.34 alpha-tdep.c
*** alpha-tdep.c 21 May 2002 15:36:02 -0000 1.34
--- alpha-tdep.c 14 Jun 2002 19:01:15 -0000
*************** heuristic_proc_desc (CORE_ADDR start_pc,
*** 649,659 ****
--- 649,661 ----
struct frame_info *next_frame)
{
CORE_ADDR sp = read_next_frame_reg (next_frame, SP_REGNUM);
+ CORE_ADDR vfp = sp;
CORE_ADDR cur_pc;
int frame_size;
int has_frame_reg = 0;
unsigned long reg_mask = 0;
int pcreg = -1;
+ int regno;
if (start_pc == 0)
return NULL;
*************** heuristic_proc_desc (CORE_ADDR start_pc,
*** 678,684 ****
if ((word & 0xffff0000) == 0x23de0000) /* lda $sp,n($sp) */
{
if (word & 0x8000)
! frame_size += (-word) & 0xffff;
else
/* Exit loop if a positive stack adjustment is found, which
usually means that the stack cleanup code in the function
--- 680,691 ----
if ((word & 0xffff0000) == 0x23de0000) /* lda $sp,n($sp) */
{
if (word & 0x8000)
! {
! /* Consider only the first stack allocation instruction
! to contain the static size of the frame. */
! if (frame_size == 0)
! frame_size += (-word) & 0xffff;
! }
else
/* Exit loop if a positive stack adjustment is found, which
usually means that the stack cleanup code in the function
*************** heuristic_proc_desc (CORE_ADDR start_pc,
*** 690,696 ****
{
int reg = (word & 0x03e00000) >> 21;
reg_mask |= 1 << reg;
! temp_saved_regs[reg] = sp + (short) word;
/* Starting with OSF/1-3.2C, the system libraries are shipped
without local symbols, but they still contain procedure
--- 697,712 ----
{
int reg = (word & 0x03e00000) >> 21;
reg_mask |= 1 << reg;
!
! /* Do not compute the address where the register was saved yet,
! because we don't know yet if the offset will need to be
! relative to $sp or $fp (we can not compute the address relative
! to $sp if $sp is updated during the execution of the current
! subroutine, for instance when doing some alloca). So just store
! the offset for the moment, and compute the address later
! when we know whether this frame has a frame pointer or not.
! */
! temp_saved_regs[reg] = (short) word;
/* Starting with OSF/1-3.2C, the system libraries are shipped
without local symbols, but they still contain procedure
*************** heuristic_proc_desc (CORE_ADDR start_pc,
*** 719,726 ****
}
else if ((word & 0xffe0ffff) == 0x6be08001) /* ret zero,reg,1 */
pcreg = (word >> 16) & 0x1f;
! else if (word == 0x47de040f) /* bis sp,sp fp */
! has_frame_reg = 1;
}
if (pcreg == -1)
{
--- 735,749 ----
}
else if ((word & 0xffe0ffff) == 0x6be08001) /* ret zero,reg,1 */
pcreg = (word >> 16) & 0x1f;
! else if (word == 0x47de040f || word == 0x47fe040f) /* bis sp,sp fp */
! {
! /* ??? I am not sure what instruction is 0x47fe040f, and I
! am suspecting that there was a typo and should have been
! 0x47fe040f. I'm keeping it in the test above until further
! investigation */
! has_frame_reg = 1;
! vfp = read_next_frame_reg (next_frame, ALPHA_GCC_FP_REGNUM);
! }
}
if (pcreg == -1)
{
*************** heuristic_proc_desc (CORE_ADDR start_pc,
*** 759,764 ****
--- 782,799 ----
PROC_FRAME_REG (&temp_proc_desc) = ALPHA_GCC_FP_REGNUM;
else
PROC_FRAME_REG (&temp_proc_desc) = SP_REGNUM;
+
+ /* At this point, we know which of the Stack Pointer of the Frame Pointer
+ to use as the reference address to compute the saved registers address.
+ But in both cases, the processing above has set vfp to this reference
+ address, so just need to increment the offset of each saved register
+ by this address. */
+ for (regno = 0; regno < NUM_REGS; regno++)
+ {
+ if (reg_mask & 1 << regno)
+ temp_saved_regs[regno] += vfp;
+ }
+
PROC_FRAME_OFFSET (&temp_proc_desc) = frame_size;
PROC_REG_MASK (&temp_proc_desc) = reg_mask;
PROC_PC_REG (&temp_proc_desc) = (pcreg == -1) ? ALPHA_RA_REGNUM : pcreg;
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFA] Tru64 stack unwinding fix
2002-06-14 12:12 [RFA] Tru64 stack unwinding fix Joel Brobecker
@ 2002-06-14 12:22 ` Joel Brobecker
2002-06-18 16:52 ` Andrew Cagney
1 sibling, 0 replies; 4+ messages in thread
From: Joel Brobecker @ 2002-06-14 12:22 UTC (permalink / raw)
To: gdb-patches
> Here is the ChangeLog:
> 2002-06-14 Joel Brobecker <brobecker@gnat.com>
> * alpha-tdep.c (heuristic_proc_desc): Compute the size of the
> current frame using only the first stack size adjustment. All
> subsequent size adjustments are not considered to be part of
> the "static" part of the current frame.
> Compute the address of the saved registers relative to the
> Frame Pointer ($fp) instead of the Stack Pointer if $fp is
> in use in this frame.
I forgot to say: no regression introduced by this change.
--
Joel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFA] Tru64 stack unwinding fix
2002-06-14 12:12 [RFA] Tru64 stack unwinding fix Joel Brobecker
2002-06-14 12:22 ` Joel Brobecker
@ 2002-06-18 16:52 ` Andrew Cagney
2002-06-18 18:20 ` Joel Brobecker
1 sibling, 1 reply; 4+ messages in thread
From: Andrew Cagney @ 2002-06-18 16:52 UTC (permalink / raw)
To: Joel Brobecker; +Cc: gdb-patches
> I'm sorry the test case involves Ada, but that is the only way
> I found to reproduce it at the time when I looked at this problem.
I suspect a non-ada aware GDB and a breakpoint on
*__hstTransferRegistersPC would do the trick :-)
> Here is the ChangeLog:
> 2002-06-14 Joel Brobecker <brobecker@gnat.com>
> * alpha-tdep.c (heuristic_proc_desc): Compute the size of the
> current frame using only the first stack size adjustment. All
> subsequent size adjustments are not considered to be part of
> the "static" part of the current frame.
> Compute the address of the saved registers relative to the
> Frame Pointer ($fp) instead of the Stack Pointer if $fp is
> in use in this frame.
>
> Ok to commit?
Yes (you're in a better position then most to test it). The change to
handle FP and alloca() is very important and something often missed.
> + /* At this point, we know which of the Stack Pointer of the Frame
Pointer
.. or the ..
enjoy,
Andrew
PS: Some things unrelated to your patch I noticed:
> ! vfp = read_next_frame_reg (next_frame, ALPHA_GCC_FP_REGNUM);
I think someone should re-implement read_next_frame_reg() using:
/* Unwind the stack frame so that the value of REGNUM, in the previous
frame is returned. If VALUEP is NULL, don't fetch/compute the
value. Instead just return the location of the value. */
extern void frame_register_unwind (struct frame_info *frame, int regnum,
int *optimizedp, enum lval_type *lvalp,
CORE_ADDR *addrp, int *realnump,
void *valuep);
as this will work with CFI and generic dummy frames.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFA] Tru64 stack unwinding fix
2002-06-18 16:52 ` Andrew Cagney
@ 2002-06-18 18:20 ` Joel Brobecker
0 siblings, 0 replies; 4+ messages in thread
From: Joel Brobecker @ 2002-06-18 18:20 UTC (permalink / raw)
To: Andrew Cagney; +Cc: gdb-patches
> > Ok to commit?
>
> Yes (you're in a better position then most to test it). The change to
> handle FP and alloca() is very important and something often missed.
Thanks. This is now in (I also corrected the little typo you found
in one of the comments).
> I suspect a non-ada aware GDB and a breakpoint on
> *__hstTransferRegistersPC would do the trick :-)
Yes it did, silly me :-).
Just for the record, here is the output before my patch:
<<
(gdb) b __hstTransferRegistersPC
Breakpoint 2 at 0x3ff8057d43c
(gdb) c
Continuing.
Breakpoint 2, 0x000003ff8057d43c in __hstTransferRegistersPC ()
from /usr/shlib/libpthread.so
(gdb) bt
#0 0x000003ff8057d43c in __hstTransferRegistersPC ()
from /usr/shlib/libpthread.so
#1 0x000003ff8056e8e4 in __osTransferContext () from /usr/shlib/libpthread.so
warning: Hit beginning of text section without finding
warning: enclosing function for address 0x13
>>
> I think someone should re-implement read_next_frame_reg() using:
>
> /* Unwind the stack frame so that the value of REGNUM, in the previous
> frame is returned. If VALUEP is NULL, don't fetch/compute the
> value. Instead just return the location of the value. */
>
> extern void frame_register_unwind (struct frame_info *frame, int regnum,
> int *optimizedp, enum lval_type *lvalp,
> CORE_ADDR *addrp, int *realnump,
> void *valuep);
>
> as this will work with CFI and generic dummy frames.
Ack, I'll try to do this when I have a spare moment...
Thanks for the quick review.
--
Joel
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2002-06-19 1:20 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-06-14 12:12 [RFA] Tru64 stack unwinding fix Joel Brobecker
2002-06-14 12:22 ` Joel Brobecker
2002-06-18 16:52 ` Andrew Cagney
2002-06-18 18:20 ` Joel Brobecker
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox