From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 14804 invoked by alias); 28 Mar 2006 18:29:42 -0000 Received: (qmail 14794 invoked by uid 22791); 28 Mar 2006 18:29:41 -0000 X-Spam-Check-By: sourceware.org Received: from xproxy.gmail.com (HELO xproxy.gmail.com) (66.249.82.204) by sourceware.org (qpsmtpd/0.31) with ESMTP; Tue, 28 Mar 2006 18:29:39 +0000 Received: by xproxy.gmail.com with SMTP id h29so1361563wxd for ; Tue, 28 Mar 2006 10:29:37 -0800 (PST) Received: by 10.70.77.15 with SMTP id z15mr7023943wxa; Tue, 28 Mar 2006 10:29:37 -0800 (PST) Received: by 10.70.126.16 with HTTP; Tue, 28 Mar 2006 10:29:37 -0800 (PST) Message-ID: <8f2776cb0603281029y35c11fb9v2ad31f4a0445d6b3@mail.gmail.com> Date: Tue, 28 Mar 2006 19:15:00 -0000 From: "Jim Blandy" To: "Eli Zaretskii" Subject: Re: RFA: prologue value modules Cc: "Jim Blandy" , gdb-patches@sources.redhat.com In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline References: X-IsSubscribed: yes Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2006-03/txt/msg00319.txt.bz2 On 3/28/06, Eli Zaretskii wrote: > > +Modern versions of GCC emit Dwarf call frame information (``CFI'') > > +that gives @value{GDBN} all the information it needs to do this. > > AFAICS, this is the first time the manual mentions "CFI", so I think > at least an index entry is in order, if not a short explanation of > what it is (perhaps in a footnote for now, pending some real > documentation in the future), or a pointer to some doc on the net. I had an entry for "call frame information"; I've added an entry for "CFI" next to it. The preceding paragraph is an explanation of what call frame information does; I've changed the text to make that a bit clearer. > > +and fragile. Keeping the prologue analyzers working as GCC (and the > > +ISA's themselves) evolved became a substantial task. > > Ditto for "ISA". I just wrote out "instruction set". I was being cryptic, for no good reaso= n. > > +@cindex @file{prologue-value.c} > > +@cindex abstract interpretation of function prologues > > +@cindex pseudo-evaluation of function prologues > > +To try to address this problem, the code in > > +@file{gdb/prologue-value.h} and @file{gdb/prologue-value.c} provide a > > Should these file names include the leading "gdb/" directory? I'm not > sure it's really required; OTOH, having too long strings in @file{} > might produce ugly printed version, because TeX does not know how to > hyphenate inside @file{}. Good point; I've dropped the "gdb/". > > +@example > > + mov r2, r1 # set r2 to r1's value > > +@end example > > This @example is indented differently than the rest. Any reason? Oversight. > > +@example > > +mov (fp+4), r2 > > +@end example > > +@noindent > > +Then we'd know that the stack slot four bytes above the frame pointer > ^^^^ > This should be a lowercase "then", since it doesn't start a sentence. Right you are. > > +register's original value. If the ABI suggests a standard place > > We have a section about the ABI, so I think a cross-reference there > will be a good idea. "ABI" is a bad term here, because it includes all sorts of stuff that isn't relevant to prologue analysis at all. I've replaced it with "calling conventions". > > So I think it's > > +worthwhile to look for an approach that will be easier to understand > > +and maintain. In the approach used here: > > I think we need to reword ``I'' and ``here'', so that they look > natural in the context of the manual (as opposed to a mail message or > a source file). Sorry --- I'd meant to fix the first-person references. I changed "here" to "above", which is more writerly. > > +The file @file{prologue-value.h} contains detailed comments explaining > > +the framework and how to use it. > > Would it be a good idea to have the listing of the API in the manual, > with short explanations? I'm not saying it is necessarily required, > but please give it a thought. I'd rather have the API described in the header file. I don't think we, as a project, can afford to maintain two copies of that sort of documentation, and I think the header files are best place for detailed, function-by-function information. How does this look? Index: src/gdb/doc/gdbint.texinfo =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- src.orig/gdb/doc/gdbint.texinfo +++ src/gdb/doc/gdbint.texinfo @@ -287,6 +287,175 @@ used to create a new @value{GDBN} frame @code{DEPRECATED_INIT_EXTRA_FRAME_INFO} and @code{DEPRECATED_INIT_FRAME_PC} will be called for the new frame. +@section Prologue Analysis + +@cindex prologue analysis +@cindex call frame information +@cindex CFI (call frame information) +To produce a backtrace and allow the user to manipulate older frames' +variables and arguments, @value{GDBN} needs to find the base addresses +of older frames, and discover where those frames' registers have been +saved. Since a frame's ``callee-saves'' registers get saved by +younger frames if and when they're reused, a frame's registers may be +scattered unpredictably across younger frames. This means that +changing the value of a register-allocated variable in an older frame +may actually entail writing to a save slot in some younger frame. + +Modern versions of GCC emit Dwarf call frame information (``CFI''), +which describes how to find frame base addresses and saved registers. +But CFI is not always available, so as a fallback @value{GDBN} uses a +technique called @dfn{prologue analysis} to find frame sizes and saved +registers. A prologue analyzer disassembles the function's machine +code starting from its entry point, and looks for instructions that +allocate frame space, save the stack pointer in a frame pointer +register, save registers, and so on. Obviously, this can't be done +accurately in general, but it's tractible to do well enough to be very +helpful. Prologue analysis predates the GNU toolchain's support for +CFI; at one time, prologue analysis was the only mechanism +@value{GDBN} used for stack unwinding at all, when the function +calling conventions didn't specify a fixed frame layout. + +In the olden days, function prologues were generated by hand-written, +target-specific code in GCC, and treated as opaque and untouchable by +optimizers. Looking at this code, it was usually straightforward to +write a prologue analyzer for @value{GDBN} that would accurately +understand all the prologues GCC would generate. However, over time +GCC became more aggressive about instruction scheduling, and began to +understand more about the semantics of the prologue instructions +themselves; in response, @value{GDBN}'s analyzers became more complex +and fragile. Keeping the prologue analyzers working as GCC (and the +instruction sets themselves) evolved became a substantial task. + +@cindex @file{prologue-value.c} +@cindex abstract interpretation of function prologues +@cindex pseudo-evaluation of function prologues +To try to address this problem, the code in @file{prologue-value.h} +and @file{prologue-value.c} provides a general framework for writing +prologue analyzers that are simpler and more robust than ad-hoc +analyzers. When we analyze a prologue using the prologue-value +framework, we're really doing ``abstract interpretation'' or +``pseudo-evaluation'': running the function's code in simulation, but +using conservative approximations of the values registers and memory +would hold when the code actually runs. For example, if our function +starts with the instruction: + +@example +addi r1, 42 # add 42 to r1 +@end example +@noindent +we don't know exactly what value will be in @code{r1} after executing +this instruction, but we do know it'll be 42 greater than its original +value. + +If we then see an instruction like: + +@example +addi r1, 22 # add 22 to r1 +@end example +@noindent +we still don't know what @code{r1's} value is, but again, we can say +it is now 64 greater than its original value. + +If the next instruction were: + +@example +mov r2, r1 # set r2 to r1's value +@end example +@noindent +then we can say that @code{r2's} value is now the original value of +@code{r1} plus 64. + +It's common for prologues to save registers on the stack, so we'll +need to track the values of stack frame slots, as well as the +registers. So after an instruction like this: + +@example +mov (fp+4), r2 +@end example +@noindent +then we'd know that the stack slot four bytes above the frame pointer +holds the original value of @code{r1} plus 64. + +And so on. + +Of course, this can only go so far before it gets unreasonable. If we +wanted to be able to say anything about the value of @code{r1} after +the instruction: + +@example +xor r1, r3 # exclusive-or r1 and r3, place result in r1 +@end example +@noindent +then things would get pretty complex. But remember, we're just doing +a conservative approximation; if exclusive-or instructions aren't +relevant to prologues, we can just say @code{r1}'s value is now +``unknown''. We can ignore things that are too complex, if that loss of +information is acceptable for our application. + +So when we say ``conservative approximation'' here, what we mean is an +approximation that is either accurate, or marked ``unknown'', but +never inaccurate. + +Using this framework, a prologue analyzer is simply an interpreter for +machine code, but one that uses conservative approximations for the +contents of registers and memory instead of actual values. Starting +from the function's entry point, you simulate instructions up to the +current PC, or an instruction that you don't know how to simulate. +Now you can examine the state of the registers and stack slots you've +kept track of. + +@itemize @bullet + +@item +To see how large your stack frame is, just check the value of the +stack pointer register; if it's the original value of the SP +minus a constant, then that constant is the stack frame's size. +If the SP's value has been marked as ``unknown'', then that means +the prologue has done something too complex for us to track, and +we don't know the frame size. + +@item +To see where we've saved the previous frame's registers, we just +search the values we've tracked --- stack slots, usually, but +registers, too, if you want --- for something equal to the register's +original value. If the calling conventions suggest a standard place +to save a given register, then we can check there first, but really, +anything that will get us back the original value will probably work. +@end itemize + +This does take some work. But prologue analyzers aren't +quick-and-simple pattern patching to recognize a few fixed prologue +forms any more; they're big, hairy functions. Along with inferior +function calls, prologue analysis accounts for a substantial portion +of the time needed to stabilize a @value{GDBN} port. So it's +worthwhile to look for an approach that will be easier to understand +and maintain. In the approach described above: + +@itemize @bullet + +@item +It's easier to see that the analyzer is correct: you just see +whether the analyzer properly (albiet conservatively) simulates +the effect of each instruction. + +@item +It's easier to extend the analyzer: you can add support for new +instructions, and know that you haven't broken anything that +wasn't already broken before. + +@item +It's orthogonal: to gather new information, you don't need to +complicate the code for each instruction. As long as your domain +of conservative values is already detailed enough to tell you +what you need, then all the existing instruction simulations are +already gathering the right data for you. + +@end itemize + +The file @file{prologue-value.h} contains detailed comments explaining +the framework and how to use it. + + @section Breakpoint Handling @cindex breakpoints