Tracing another stack

Mirror of the gdb mailing list
 help / color / mirror / Atom feed

* Tracing another stack
@ 2015-11-28  6:03 Celelibi
  2015-11-28 13:37 ` Duane Ellis
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Celelibi @ 2015-11-28  6:03 UTC (permalink / raw)
  To: gdb

Hello,

I use gdb with the gdb-stub of qemu to debug a boot loader. When a
memory fault occurs, a message is printed with the content of most
registers and a new stack is created to run the handler that never
terminates.

Can I tell gdb to examine the stack given the content of the stack
pointer, stack base and program counter of a stack that is not the
current one?

I tried setting $rsp and $rip to the values I got from the printed
message, but it turns out it confuses gdb. The "bt" commands shows the
right first stack frame, but the next ones are those of the interrupt
handler.

Thanks in advance.
Celelibi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Tracing another stack
  2015-11-28  6:03 Tracing another stack Celelibi
@ 2015-11-28 13:37 ` Duane Ellis
  2015-12-01  8:46   ` Celelibi
  2015-11-30 16:28 ` Sterling Augustine
       [not found] ` <CAEG7qUxk2qKo4RM9syqco26EtQkeiviP3GOrHkqyJJViwAX3dQ@mail.gmail.com>
  2 siblings, 1 reply; 6+ messages in thread
From: Duane Ellis @ 2015-11-28 13:37 UTC (permalink / raw)
  To: Celelibi; +Cc: gdb


> On Nov 27, 2015, at 10:01 PM, Celelibi <celelibi@gmail.com> wrote:
> 
> Hello,
> 
> I use gdb with the gdb-stub of qemu to debug a boot loader. When a
> memory fault occurs, a message is printed with the content of most
> registers and a new stack is created to run the handler that never
> terminates.
> 
> Can I tell gdb to examine the stack given the content of the stack
> pointer, stack base and program counter of a stack that is not the
> current one?
> 
> I tried setting $rsp and $rip to the values I got from the printed
> message, but it turns out it confuses gdb. The "bt" commands shows the
> right first stack frame, but the next ones are those of the interrupt
> handler.
> 
> 
> Thanks in advance.
> Celelibi
> 

What is your target? (arm? x86? mips?)

What I do in these situations is this:

Step 1: I create a global ‘volatile’ variable that is set to zero

Step 2: The code - loops on that variable until it is non-zero
So in the normal (non-debugger-attached) case the system hangs, and a watch dog reset occurs.

But - when I have the debugger attached I set a breakpoint on that endless loop so I get a breakpoint hit.
And using the debugger i set that global variable to 1

Step 3: I can now step out of this code :-) and back through the exception return
	Which will eventually land me back in the offending location.

Depending upon the target (i.e.: ARM vrs X86) you might want to make this exception handler return to the PREV or NEXT instruction instead of the instruction that failed

At that point you have the location where the error occurs.

Another approach is this:
	If you know the offending address… you can often set a hardware *read* or *write* breakpoint on that location
	
-Duane


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Tracing another stack
  2015-11-28  6:03 Tracing another stack Celelibi
  2015-11-28 13:37 ` Duane Ellis
@ 2015-11-30 16:28 ` Sterling Augustine
       [not found] ` <CAEG7qUxk2qKo4RM9syqco26EtQkeiviP3GOrHkqyJJViwAX3dQ@mail.gmail.com>
  2 siblings, 0 replies; 6+ messages in thread
From: Sterling Augustine @ 2015-11-30 16:28 UTC (permalink / raw)
  To: Celelibi; +Cc: gdb

On Fri, Nov 27, 2015 at 10:01 PM, Celelibi <celelibi@gmail.com> wrote:
> Hello,
>
> I use gdb with the gdb-stub of qemu to debug a boot loader. When a
> memory fault occurs, a message is printed with the content of most
> registers and a new stack is created to run the handler that never
> terminates.
>
> Can I tell gdb to examine the stack given the content of the stack
> pointer, stack base and program counter of a stack that is not the
> current one?
>
> I tried setting $rsp and $rip to the values I got from the printed
> message, but it turns out it confuses gdb. The "bt" commands shows the
> right first stack frame, but the next ones are those of the interrupt
> handler.

If you have a reasonably mature gdb-stub, you can use the following commands:

# print a list of all threads known to gdb, with numbers
info threads

# switch to a thread numbered X from the above list
thread X

You can now get the back trace for that particular thread with "bt"

You could also do:

thread apply all backtrace

To get a back trace of every thread.

This may not work with certain immature stubs, but it should work with most.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Tracing another stack
  2015-11-28 13:37 ` Duane Ellis
@ 2015-12-01  8:46   ` Celelibi
       [not found]     ` <863D4E7B-2D4E-448B-8B41-EE97612A3BA3@duaneellis.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Celelibi @ 2015-12-01  8:46 UTC (permalink / raw)
  To: Duane Ellis; +Cc: gdb

2015-11-28 14:37 UTC+01:00, Duane Ellis <duane@duaneellis.com>:
>
>> On Nov 27, 2015, at 10:01 PM, Celelibi <celelibi@gmail.com> wrote:
>>
>> Hello,
>>
>> I use gdb with the gdb-stub of qemu to debug a boot loader. When a
>> memory fault occurs, a message is printed with the content of most
>> registers and a new stack is created to run the handler that never
>> terminates.
>>
>> Can I tell gdb to examine the stack given the content of the stack
>> pointer, stack base and program counter of a stack that is not the
>> current one?
>>
>> I tried setting $rsp and $rip to the values I got from the printed
>> message, but it turns out it confuses gdb. The "bt" commands shows the
>> right first stack frame, but the next ones are those of the interrupt
>> handler.
>>
>>
>> Thanks in advance.
>> Celelibi
>>
>
> What is your target? (arm? x86? mips?)

My target is x86_64 with OVMF as UEFI firmware.

>
> What I do in these situations is this:
>
> Step 1: I create a global ‘volatile’ variable that is set to zero
>
> Step 2: The code - loops on that variable until it is non-zero
> So in the normal (non-debugger-attached) case the system hangs, and a watch
> dog reset occurs.
>
> But - when I have the debugger attached I set a breakpoint on that endless
> loop so I get a breakpoint hit.
> And using the debugger i set that global variable to 1

Attaching the debugger soon enough isn't a problem. The problem is
that when an interrupt occurs (like a division by zero), the code that
gets executed isn't mine. And this code sets up a new stack and goes
into a fancy "while (1) {}".

I just found a solution that consists in setting a breakpoint directly
in the interrupt handler, before the stack is modified. But this is
definitely not as generic as examining another stack given its
address.

Maybe what I did by setting $rip and $rsp was good but gdb had a cache
of the stack frame?

>
> Step 3: I can now step out of this code :-) and back through the exception
> return
> 	Which will eventually land me back in the offending location.

What do you call "the exception return"?

>
> Depending upon the target (i.e.: ARM vrs X86) you might want to make this
> exception handler return to the PREV or NEXT instruction instead of the
> instruction that failed

This is has been decided by Intel whether the stacked instruction
pointer is the address of the buggy instruction or the next one. I'm
not sure what you mean then.

> Another approach is this:
> 	If you know the offending address… you can often set a hardware *read* or
> *write* breakpoint on that location

That's what I'd do if I knew what instructions trigger the bugs. :)

My goal is to set some memory protections (pages not executable or not
writable) and catch the memory errors of a complex piece of software
(the boot loader syslinux). So I have pretty much no clue about where
the offending addresses will be.

Best regards,
Celelibi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Tracing another stack
       [not found] ` <CAEG7qUxk2qKo4RM9syqco26EtQkeiviP3GOrHkqyJJViwAX3dQ@mail.gmail.com>
@ 2015-12-01  8:57   ` Celelibi
  0 siblings, 0 replies; 6+ messages in thread
From: Celelibi @ 2015-12-01  8:57 UTC (permalink / raw)
  To: Sterling Augustine; +Cc: gdb

2015-11-30 17:27 UTC+01:00, Sterling Augustine <saugustine@google.com>:
> On Fri, Nov 27, 2015 at 10:01 PM, Celelibi <celelibi@gmail.com> wrote:
>
>> Hello,
>>
>> I use gdb with the gdb-stub of qemu to debug a boot loader. When a
>> memory fault occurs, a message is printed with the content of most
>> registers and a new stack is created to run the handler that never
>> terminates.
>>
>> Can I tell gdb to examine the stack given the content of the stack
>> pointer, stack base and program counter of a stack that is not the
>> current one?
>>
>> I tried setting $rsp and $rip to the values I got from the printed
>> message, but it turns out it confuses gdb. The "bt" commands shows the
>> right first stack frame, but the next ones are those of the interrupt
>> handler.
>>
>
> If you have a reasonably mature gdb-stub, you can use the following
> commands:
>
> # print a list of all threads known to gdb, with numbers
> info threads
>
> # switch to a thread numbered X from the above list
> thread X
>
> You can now get the back trace for that particular thread with "bt"
>
> You could also do:
>
> thread apply all backtrace
>
> To get a back trace of every thread.
>
> This may not work with certain immature stubs, but it should work with
> most.
>

Well, I think you missed 2 important informations. The stub I use is
that of qemu, its threads are mapped to the CPUs available. Second,
there is a single thread, the new stack is created by an interrupt
handler.

Having a single thread isn't incompatible with having several stacks.
Asynchronous events can run some code in a new and completely
different stack designed specifically for event handlers. AFAIK,
signal handlers in Linux can use a specific stack with the SA_ONSTACK
option of sigaction(2).


Best regards,
Celelibi


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Tracing another stack
       [not found]     ` <863D4E7B-2D4E-448B-8B41-EE97612A3BA3@duaneellis.com>
@ 2015-12-05 18:33       ` Celelibi
  0 siblings, 0 replies; 6+ messages in thread
From: Celelibi @ 2015-12-05 18:33 UTC (permalink / raw)
  To: Duane Ellis; +Cc: gdb

2015-12-01 14:24 UTC+01:00, Duane Ellis <duane@duaneellis.com>:
>
>>> But - when I have the debugger attached I set a breakpoint on that
>>> endless
>>> loop so I get a breakpoint hit.
>>> And using the debugger i set that global variable to 1
>>
>> Attaching the debugger soon enough isn't a problem. The problem is
>> that when an interrupt occurs (like a division by zero), the code that
>> gets executed isn't mine. And this code sets up a new stack and goes
>> into a fancy "while (1) {}”.
>
>
> You are building UEFI … So why can’t you build replacement library (or
> object file) that you can use for the purposes of debug? Then remove this
> later in production.
>
>
>>
>> I just found a solution that consists in setting a breakpoint directly
>> in the interrupt handler, before the stack is modified. But this is
>> definitely not as generic as examining another stack given its
>> address.
>>
>> Maybe what I did by setting $rip and $rsp was good but gdb had a cache
>> of the stack frame?

Turns out I just needed to set $rbp as well. However, this technique
wouldn't restore the value of the registers. But those are printed by
QEMU when it generates some exceptions like "Divide Error". So it
should be possible to restore all the instructions that gdb allow me
to access (unfortunately, very few special registers).

>>
>>>
>>> Step 3: I can now step out of this code :-) and back through the
>>> exception
>>> return
>>> 	Which will eventually land me back in the offending location.
>>
>> What do you call "the exception return”?
>
> The structure of an exception handler is normally this:
>
> Entry:
> 	Step 1: Save special registers.
> 	Step 2: Establish *new* stack
> 	Step 3: Possibly get an exception code or reason number (i.e.: IRQ umber or
> TRAP number)
> 	Step 4: Call some handler function using the standard C calling protocol
> 	Step 5: Cleanup after function call
> 	Step 6: Go back to the original stack
> 	Step 7: Restore special registers
> 	Step 8: Perform the exception return instruction
>
> In your case, the “some handler” is effectively a “while(1) {}” loop - (step
> 4) and you have a breakpoint there.
>
> Your DIV0 or NULL POINTER access occurs and you hit the breakpoint
> 	
> My earlier example the while() loop would become:
>
> 1: volatile int variable;
> 2: some_handler(void)
> 3: {
> 4:	variable = 1;
> 5:	while( variable != 0 ){    }
> 6: }
>
> In my example, I would set the variable to 0, then continue to step,
> eventually I would step past line 6 and execute the function exit sequence.

Maybe the infinite loop in OVMF wasn't written in a facy way for
nothing after all.

VOID
EFIAPI
CpuDeadLoop (
  VOID
  )
{
  volatile UINTN  Index;

  for (Index = 0; Index == 0;);
}

Index is volatile, so I should be able to apply your technique as
well. I just tried and it works. Returning to the user code is just
not as trivial as it could be. I guess I could just put a breakpoint
on the iret instruction and automatically perform a "ni" command at
that point.

>
> Important: Write the offending address down, and keep repeating the
> experiment. Often you will see a common address or address range.
> If you do find a pattern then set up the hardware rd/wr breakpoint on that
> memory range.  It is no exact science, and you might have to do this a few
> times.

Well, in QEMU, my bugs are usually very repeatable.


Anyway, thanks for all the help.


Best regards,
Celelibi


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-12-05 18:33 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-28  6:03 Tracing another stack Celelibi
2015-11-28 13:37 ` Duane Ellis
2015-12-01  8:46   ` Celelibi
     [not found]     ` <863D4E7B-2D4E-448B-8B41-EE97612A3BA3@duaneellis.com>
2015-12-05 18:33       ` Celelibi
2015-11-30 16:28 ` Sterling Augustine
     [not found] ` <CAEG7qUxk2qKo4RM9syqco26EtQkeiviP3GOrHkqyJJViwAX3dQ@mail.gmail.com>
2015-12-01  8:57   ` Celelibi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox