From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 6592 invoked by alias); 15 Oct 2005 12:12:14 -0000 Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sources.redhat.com Received: (qmail 6583 invoked by uid 22791); 15 Oct 2005 12:12:11 -0000 Received: from romy.inter.net.il (HELO romy.inter.net.il) (192.114.186.66) by sourceware.org (qpsmtpd/0.30-dev) with ESMTP; Sat, 15 Oct 2005 12:12:11 +0000 Received: from HOME-C4E4A596F7 (IGLD-83-130-202-172.inter.net.il [83.130.202.172]) by romy.inter.net.il (MOS 3.5.8-GR) with ESMTP id CSA02288 (AUTH halo1); Sat, 15 Oct 2005 14:11:55 +0200 (IST) Date: Sat, 15 Oct 2005 12:12:00 -0000 Message-Id: From: Eli Zaretskii To: Jim Blandy CC: gdb-patches@sourceware.org In-reply-to: (message from Jim Blandy on Thu, 06 Oct 2005 16:51:11 -0700) Subject: Re: RFA: general prologue analysis framework Reply-to: Eli Zaretskii References: X-SW-Source: 2005-10/txt/msg00131.txt.bz2 > From: Jim Blandy > Date: Thu, 06 Oct 2005 16:51:11 -0700 > > + /* When we analyze a prologue, we're really doing 'abstract > + interpretation' or 'pseudo-evaluation': running the function's code > + in simulation, but using conservative approximations of the values > + it would have when it actually runs. For example, if our function > + starts with the instruction: > + > + addi r1, 42 # add 42 to r1 > + > + we don't know exactly what value will be in r1 after executing this > + instruction, but we do know it'll be 42 greater than its original > + value. > + > + If we then see an instruction like: > + > + addi r1, 22 # add 22 to r1 > + > + we still don't know what r1's value is, but again, we can say it is > + now 64 greater than its original value. > + > + If the next instruction were: > + > + mov r2, r1 # set r2 to r1's value > + > + then we can say that r2's value is now the original value of r1 > + plus 64. > + > + It's common for prologues to save registers on the stack, so we'll > + need to track the values of stack frame slots, as well as the > + registers. So after an instruction like this: > + > + mov (fp+4), r2 > + > + Then we'd know that the stack slot four bytes above the frame > + pointer holds the original value of r1 plus 64. > + > + And so on. > + > + Of course, this can only go so far before it gets unreasonable. If > + we wanted to be able to say anything about the value of r1 after > + the instruction: > + > + xor r1, r3 # exclusive-or r1 and r3, place result in r1 > + > + then things would get pretty complex. But remember, we're just > + doing a conservative approximation; if exclusive-or instructions > + aren't relevant to prologues, we can just say r1's value is now > + 'unknown'. We can ignore things that are too complex, if that loss > + of information is acceptable for our application. > + > + So when I say "conservative approximation" here, what I mean is an > + approximation that is either accurate, or marked "unknown", but > + never inaccurate. > + > + Once you've reached the current PC, or an instruction that you > + don't know how to simulate, you stop. Now you can examine the > + state of the registers and stack slots you've kept track of. > + > + - To see how large your stack frame is, just check the value of the > + stack pointer register; if it's the original value of the SP > + minus a constant, then that constant is the stack frame's size. > + If the SP's value has been marked as 'unknown', then that means > + the prologue has done something too complex for us to track, and > + we don't know the frame size. > + > + - To see where we've saved the previous frame's registers, we just > + search the values we've tracked --- stack slots, usually, but > + registers, too, if you want --- for something equal to the > + register's original value. If the ABI suggests a standard place > + to save a given register, then we can check there first, but > + really, anything that will get us back the original value will > + probably work. > + > + Sure, this takes some work. But prologue analyzers aren't > + quick-and-simple pattern patching to recognize a few fixed prologue > + forms any more; they're big, hairy functions. Along with inferior > + function calls, prologue analysis accounts for a substantial > + portion of the time needed to stabilize a GDB port. So I think > + it's worthwhile to look for an approach that will be easier to > + understand and maintain. In the approach used here: > + > + - It's easier to see that the analyzer is correct: you just see > + whether the analyzer properly (albiet conservatively) simulates > + the effect of each instruction. > + > + - It's easier to extend the analyzer: you can add support for new > + instructions, and know that you haven't broken anything that > + wasn't already broken before. > + > + - It's orthogonal: to gather new information, you don't need to > + complicate the code for each instruction. As long as your domain > + of conservative values is already detailed enough to tell you > + what you need, then all the existing instruction simulations are > + already gathering the right data for you. > + > + A 'struct prologue_value' is a conservative approximation of the > + real value the register or stack slot will have. */ Jim, I'd be thrilled to see this text in gdbint.texinfo (if and when the patch is committed), perhaps with a few more general words about prologue analysis, which is currently completely undocumented.