From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-39831-listarch-gdb-patches=sources.redhat.com@sources.redhat.com>
Received: (qmail 26153 invoked by alias); 8 Jun 2005 16:58:35 -0000
Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-patches-subscribe@sources.redhat.com>
List-Archive: <http://sources.redhat.com/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sources.redhat.com>
List-Help: <mailto:gdb-patches-help@sources.redhat.com>, <http://sources.redhat.com/ml/#faqs>
Sender: gdb-patches-owner@sources.redhat.com
Received: (qmail 24943 invoked from network); 8 Jun 2005 16:58:05 -0000
Received: from unknown (192.220.74.81)
  by sourceware.org with QMTP; 8 Jun 2005 16:58:05 -0000
Received: (qmail 83799 invoked by uid 19025); 8 Jun 2005 16:58:05 -0000
Date: Wed, 08 Jun 2005 16:58:00 -0000
From: Jason Molenda <jason-swarelist@molenda.com>
To: gdb-patches@sources.redhat.com
Cc: Mark Kettenis <kettenis@jive.nl>
Subject: Re: The gdb x86 function prologue parser
Message-ID: <20050608095805.A67988@molenda.com>
References: <85C775AE-3B05-431E-96D2-49EA9D1413E6@apple.com> <20050608132431.GA4970@nevyn.them.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5.1i
In-Reply-To: <20050608132431.GA4970@nevyn.them.org>; from drow@false.org on Wed, Jun 08, 2005 at 09:24:31AM -0400
X-SW-Source: 2005-06/txt/msg00064.txt.bz2

Hi Daniel, thanks for the comments.

On Wed, Jun 08, 2005 at 09:24:31AM -0400, Daniel Jacobowitz wrote:

> I looked at your table.
> 
> (A) You've added jump instructions to it.  Assuming that I'm following
> what you're doing with this table correctly, I'm not real comfortable
> with that without special cases checking the targets of the jumps.

These showed up in a couple of functions that had hand-written
assembly at the start of the function.  Like there's one who has a
special agreement with its caller about the contents of EDX, and
it'd compare EDX to a value and then jump to an alternate location
if it matched.  The only way this code is executed is if the PC is
past the jmp instruction, so I wasn't too concerned about it --
SOMEHOW we got past the jmp and back again.

> (B) Can't you do this using the opcodes library?  

Yeah, that's the right way to go, I just got too far into this
scheme to switch for the release we made.  I'm a little cautious
about the opcodes disassembler being fast enough (the prologue parser
can be a fairly hot piece of code) but I don't have any evidence
one way or the other.

> > and a script that transforms the patterns into a test program and a  
> > Dejagnu expect script.  So you can ensure that you don't regress the  
> > prologue parser.  

> Interested?  Hell yes.  And the scripts, too.  This could be seriously
> useful.

Yeah, I didn't mean to be danging it tauntingly or anything; I just
didn't have time last night to get it together to send.  This weekend
and next week I'll be able to get through a lot of this stuff more
easily.  The script is nothing fancy, but then the whole point is that
the patterns list is the really important bit and it can be transformed
in any way you'd like in the future.  Right now it calls a function with
the pattern, and the pattern calls a function itself.  I think gdb puts
a breakpoint in the leaf function, finishes out back to main(), then
continues to the next breakpoint, doing backtraces along the way.

> 
> > And for goodness sakes, if we can't figure out anything  
> > about a function that's not at the top of the stack, don't you think  
> > it'd be reasonable to assume that the function has set up a stack  
> > frame and saved the caller's EBP?  
> 
> Because there is a GDB policy to determine information about the frame
> based on the current frame, not based on where it lies on the stack.
> I've experimented with this before; this change can have some weird
> consequences... for instance, in any case where we can backtrace
> through "foo" only because of the addition of this case, we won't be
> able to backtrace through "foo" if it is on top of the stack.

I'd say that's an expected behavior, but yes, it's true that this
can happen.  It'd be great if the prologue analyzer never got confused
and could always figure out how to find a function's caller's saved
fp/pc, but even if we switch to using the opcodes disassembler so
we never lose on another instruction, on MacOS X we can have libraries
where the functions up the stack that have no symbols whatsoever.  
We have no idea where the function might begin--all we know is a saved
address in the middle of a function.  In such a situation, is it
preferable that we can't backtrace past tricky functions like these?
After a month of working on the x86 port, I got so frustrated I wrote
a user command that could backtrace --

define x86-bt
  set $frameno = 1
  set $cur_ebp = $ebp
  printf "frame 0 EBP: 0x%08x EIP: 0x%08x\n", $ebp, $eip
  x/1i $eip
  set $prev_ebp = *((uint32_t *) $cur_ebp)
  set $prev_eip = *((uint32_t *) ($cur_ebp + 4))
  while $prev_ebp != 0
    printf "frame %d EBP: 0x%08x EIP: 0x%08x\n", $frameno, $prev_ebp, $prev_eip
    x/1i $prev_eip
    set $cur_ebp = $prev_ebp
    set $prev_ebp = *((uint32_t *) $cur_ebp)
    set $prev_eip = *((uint32_t *) ($cur_ebp + 4))
    set $frameno = $frameno + 1
  end
end

because I was having to do backtraces by manually walking the stack
so often.  That's when I said, "enough is enough, this is stupid that
gdb can't do this."


> You can find more information about this in the list archives, in
> plenty of places; most recently Mark pulled together an implementation
> of "set i386 trust-frame-pointer".

Yeah, I couldn't comment at the time.  Mark's change was wrong.  
He said himself,

  You probably want to reset it to 0 before continuing your program
  since I found out that bad things happen with some of the tests
  in the gdb testsuite with this turned on.
	http://sourceware.org/ml/gdb/2005-04/msg00177.html

That's neither necessary nor acceptable.  Mark's initial
reading of the Sleep() vs SleepEx() was IMO not correct.
	http://sourceware.org/ml/gdb/2005-04/msg00156.html

Sleep() sets up a stack frame, then jumps to SleepEx().
SleepEx doesn't set up a stack frame, but that's fine --
Sleep() did.  This is another instance that bolsters my
"if the function MUST have stored the caller's pc/fp, assume
it did" method -- if you try to analyze SleepEx() where
the PC is, you'll see a frameless function.  But it's in
the middle of the stack; it can't be frameless.

(I was jumping in my seat while that whole conversation was
going on ;)

> That said, it may still be better than nothing.  I am still undecided.

Well, there's my thinking on the issue.

> > +  potentially_frameless = frame_relative_level (next_frame) == -1 
> > +                       || get_frame_type (next_frame) == SIGTRAMP_FRAME;
> 
> You want != NORMAL_FRAME.  You've ignored the dummy frame case.

Oh yeah, thanks.

> > +      /* We found a function-start address, 
> > +         or $pc is at 0x0 (someone jmp'ed thru NULL ptr).  */
> > +  if ((cache->pc != 0 || current_pc == 0)
> 
> No way that's right.  A jump through 0x0 is no different from a jump
> through any other unmapped, non-code address.  Normally one uses a
> different frame unwinder for that case.

OK, I didn't know the right practice.  Right now it goes through
i386_cache_frame.  It's "frameless", of course, but we don't have
a function symbol for it so cache->pc (WTF is up with that structure
variable name, anyway--it means the start address of the function for
this frame) is 0 (i.e. unset).  While I was porting my changes to
my FC2 system yesterday I had to add this bit --

  /* If we have no idea where this function began (so we can't analyze
     the prologue in any way), what should we assume?  Frameless or not?  
     It's a tough call.  On Linux systems, a call to a system library
     involves a couple of trampoline jumps that have no symbols,
     so to work correctly on Linux we MUST assume frameless.  */
 if (potentially_frameless && cache->pc == 0)
    {
      cache->saved_regs[I386_EIP_REGNUM] = 4;
      cache->saved_regs[I386_EBP_REGNUM] = -1;
    }

Which should catch the pc==0x0 case.  I didn't need this code on
MacOS X so I had to special case the pc==0x0.  I'll have to test
this, of course, but I can probably rely on this added code to 
do the right thing.

J