From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26153 invoked by alias); 8 Jun 2005 16:58:35 -0000 Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sources.redhat.com Received: (qmail 24943 invoked from network); 8 Jun 2005 16:58:05 -0000 Received: from unknown (192.220.74.81) by sourceware.org with QMTP; 8 Jun 2005 16:58:05 -0000 Received: (qmail 83799 invoked by uid 19025); 8 Jun 2005 16:58:05 -0000 Date: Wed, 08 Jun 2005 16:58:00 -0000 From: Jason Molenda To: gdb-patches@sources.redhat.com Cc: Mark Kettenis Subject: Re: The gdb x86 function prologue parser Message-ID: <20050608095805.A67988@molenda.com> References: <85C775AE-3B05-431E-96D2-49EA9D1413E6@apple.com> <20050608132431.GA4970@nevyn.them.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <20050608132431.GA4970@nevyn.them.org>; from drow@false.org on Wed, Jun 08, 2005 at 09:24:31AM -0400 X-SW-Source: 2005-06/txt/msg00064.txt.bz2 Hi Daniel, thanks for the comments. On Wed, Jun 08, 2005 at 09:24:31AM -0400, Daniel Jacobowitz wrote: > I looked at your table. > > (A) You've added jump instructions to it. Assuming that I'm following > what you're doing with this table correctly, I'm not real comfortable > with that without special cases checking the targets of the jumps. These showed up in a couple of functions that had hand-written assembly at the start of the function. Like there's one who has a special agreement with its caller about the contents of EDX, and it'd compare EDX to a value and then jump to an alternate location if it matched. The only way this code is executed is if the PC is past the jmp instruction, so I wasn't too concerned about it -- SOMEHOW we got past the jmp and back again. > (B) Can't you do this using the opcodes library? Yeah, that's the right way to go, I just got too far into this scheme to switch for the release we made. I'm a little cautious about the opcodes disassembler being fast enough (the prologue parser can be a fairly hot piece of code) but I don't have any evidence one way or the other. > > and a script that transforms the patterns into a test program and a > > Dejagnu expect script. So you can ensure that you don't regress the > > prologue parser. > Interested? Hell yes. And the scripts, too. This could be seriously > useful. Yeah, I didn't mean to be danging it tauntingly or anything; I just didn't have time last night to get it together to send. This weekend and next week I'll be able to get through a lot of this stuff more easily. The script is nothing fancy, but then the whole point is that the patterns list is the really important bit and it can be transformed in any way you'd like in the future. Right now it calls a function with the pattern, and the pattern calls a function itself. I think gdb puts a breakpoint in the leaf function, finishes out back to main(), then continues to the next breakpoint, doing backtraces along the way. > > > And for goodness sakes, if we can't figure out anything > > about a function that's not at the top of the stack, don't you think > > it'd be reasonable to assume that the function has set up a stack > > frame and saved the caller's EBP? > > Because there is a GDB policy to determine information about the frame > based on the current frame, not based on where it lies on the stack. > I've experimented with this before; this change can have some weird > consequences... for instance, in any case where we can backtrace > through "foo" only because of the addition of this case, we won't be > able to backtrace through "foo" if it is on top of the stack. I'd say that's an expected behavior, but yes, it's true that this can happen. It'd be great if the prologue analyzer never got confused and could always figure out how to find a function's caller's saved fp/pc, but even if we switch to using the opcodes disassembler so we never lose on another instruction, on MacOS X we can have libraries where the functions up the stack that have no symbols whatsoever. We have no idea where the function might begin--all we know is a saved address in the middle of a function. In such a situation, is it preferable that we can't backtrace past tricky functions like these? After a month of working on the x86 port, I got so frustrated I wrote a user command that could backtrace -- define x86-bt set $frameno = 1 set $cur_ebp = $ebp printf "frame 0 EBP: 0x%08x EIP: 0x%08x\n", $ebp, $eip x/1i $eip set $prev_ebp = *((uint32_t *) $cur_ebp) set $prev_eip = *((uint32_t *) ($cur_ebp + 4)) while $prev_ebp != 0 printf "frame %d EBP: 0x%08x EIP: 0x%08x\n", $frameno, $prev_ebp, $prev_eip x/1i $prev_eip set $cur_ebp = $prev_ebp set $prev_ebp = *((uint32_t *) $cur_ebp) set $prev_eip = *((uint32_t *) ($cur_ebp + 4)) set $frameno = $frameno + 1 end end because I was having to do backtraces by manually walking the stack so often. That's when I said, "enough is enough, this is stupid that gdb can't do this." > You can find more information about this in the list archives, in > plenty of places; most recently Mark pulled together an implementation > of "set i386 trust-frame-pointer". Yeah, I couldn't comment at the time. Mark's change was wrong. He said himself, You probably want to reset it to 0 before continuing your program since I found out that bad things happen with some of the tests in the gdb testsuite with this turned on. http://sourceware.org/ml/gdb/2005-04/msg00177.html That's neither necessary nor acceptable. Mark's initial reading of the Sleep() vs SleepEx() was IMO not correct. http://sourceware.org/ml/gdb/2005-04/msg00156.html Sleep() sets up a stack frame, then jumps to SleepEx(). SleepEx doesn't set up a stack frame, but that's fine -- Sleep() did. This is another instance that bolsters my "if the function MUST have stored the caller's pc/fp, assume it did" method -- if you try to analyze SleepEx() where the PC is, you'll see a frameless function. But it's in the middle of the stack; it can't be frameless. (I was jumping in my seat while that whole conversation was going on ;) > That said, it may still be better than nothing. I am still undecided. Well, there's my thinking on the issue. > > + potentially_frameless = frame_relative_level (next_frame) == -1 > > + || get_frame_type (next_frame) == SIGTRAMP_FRAME; > > You want != NORMAL_FRAME. You've ignored the dummy frame case. Oh yeah, thanks. > > + /* We found a function-start address, > > + or $pc is at 0x0 (someone jmp'ed thru NULL ptr). */ > > + if ((cache->pc != 0 || current_pc == 0) > > No way that's right. A jump through 0x0 is no different from a jump > through any other unmapped, non-code address. Normally one uses a > different frame unwinder for that case. OK, I didn't know the right practice. Right now it goes through i386_cache_frame. It's "frameless", of course, but we don't have a function symbol for it so cache->pc (WTF is up with that structure variable name, anyway--it means the start address of the function for this frame) is 0 (i.e. unset). While I was porting my changes to my FC2 system yesterday I had to add this bit -- /* If we have no idea where this function began (so we can't analyze the prologue in any way), what should we assume? Frameless or not? It's a tough call. On Linux systems, a call to a system library involves a couple of trampoline jumps that have no symbols, so to work correctly on Linux we MUST assume frameless. */ if (potentially_frameless && cache->pc == 0) { cache->saved_regs[I386_EIP_REGNUM] = 4; cache->saved_regs[I386_EBP_REGNUM] = -1; } Which should catch the pc==0x0 case. I didn't need this code on MacOS X so I had to special case the pc==0x0. I'll have to test this, of course, but I can probably rely on this added code to do the right thing. J