From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-43396-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 14804 invoked by alias); 28 Mar 2006 18:29:42 -0000
Received: (qmail 14794 invoked by uid 22791); 28 Mar 2006 18:29:41 -0000
X-Spam-Check-By: sourceware.org
Received: from xproxy.gmail.com (HELO xproxy.gmail.com) (66.249.82.204)     by sourceware.org (qpsmtpd/0.31) with ESMTP; Tue, 28 Mar 2006 18:29:39 +0000
Received: by xproxy.gmail.com with SMTP id h29so1361563wxd         for <gdb-patches@sources.redhat.com>; Tue, 28 Mar 2006 10:29:37 -0800 (PST)
Received: by 10.70.77.15 with SMTP id z15mr7023943wxa;         Tue, 28 Mar 2006 10:29:37 -0800 (PST)
Received: by 10.70.126.16 with HTTP; Tue, 28 Mar 2006 10:29:37 -0800 (PST)
Message-ID: <8f2776cb0603281029y35c11fb9v2ad31f4a0445d6b3@mail.gmail.com>
Date: Tue, 28 Mar 2006 19:15:00 -0000
From: "Jim Blandy" <jimb@red-bean.com>
To: "Eli Zaretskii" <eliz@gnu.org>
Subject: Re: RFA: prologue value modules
Cc: "Jim Blandy" <jimb@codesourcery.com>, gdb-patches@sources.redhat.com
In-Reply-To: <uy7yubyn7.fsf@gnu.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
References: <vt2ek0trd33.fsf@theseus.home.> <uhd5o5hua.fsf@gnu.org> 	 <vt2ek0r1rqq.fsf@theseus.home.> <vt21wwntp5m.fsf@theseus.home.> 	 <uy7yubyn7.fsf@gnu.org>
X-IsSubscribed: yes
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
X-SW-Source: 2006-03/txt/msg00319.txt.bz2

On 3/28/06, Eli Zaretskii <eliz@gnu.org> wrote:
> > +Modern versions of GCC emit Dwarf call frame information (``CFI'')
> > +that gives @value{GDBN} all the information it needs to do this.
>
> AFAICS, this is the first time the manual mentions "CFI", so I think
> at least an index entry is in order, if not a short explanation of
> what it is (perhaps in a footnote for now, pending some real
> documentation in the future), or a pointer to some doc on the net.

I had an entry for "call frame information"; I've added an entry for
"CFI" next to it.  The preceding paragraph is an explanation of what
call frame information does; I've changed the text to make that a bit
clearer.

> > +and fragile.  Keeping the prologue analyzers working as GCC (and the
> > +ISA's themselves) evolved became a substantial task.
>
> Ditto for "ISA".

I just wrote out "instruction set".  I was being cryptic, for no good reaso=
n.

> > +@cindex @file{prologue-value.c}
> > +@cindex abstract interpretation of function prologues
> > +@cindex pseudo-evaluation of function prologues
> > +To try to address this problem, the code in
> > +@file{gdb/prologue-value.h} and @file{gdb/prologue-value.c} provide a
>
> Should these file names include the leading "gdb/" directory?  I'm not
> sure it's really required; OTOH, having too long strings in @file{}
> might produce ugly printed version, because TeX does not know how to
> hyphenate inside @file{}.

Good point; I've dropped the "gdb/".

> > +@example
> > +   mov r2, r1      # set r2 to r1's value
> > +@end example
>
> This @example is indented differently than the rest.  Any reason?

Oversight.

> > +@example
> > +mov (fp+4), r2
> > +@end example
> > +@noindent
> > +Then we'd know that the stack slot four bytes above the frame pointer
>    ^^^^
> This should be a lowercase "then", since it doesn't start a sentence.

Right you are.

> > +register's original value.  If the ABI suggests a standard place
>
> We have a section about the ABI, so I think a cross-reference there
> will be a good idea.

"ABI" is a bad term here, because it includes all sorts of stuff that
isn't relevant to prologue analysis at all.  I've replaced it with
"calling conventions".

> >                                                      So I think it's
> > +worthwhile to look for an approach that will be easier to understand
> > +and maintain.  In the approach used here:
>
> I think we need to reword ``I'' and ``here'', so that they look
> natural in the context of the manual (as opposed to a mail message or
> a source file).

Sorry --- I'd meant to fix the first-person references.  I changed
"here" to "above", which is more writerly.

> > +The file @file{prologue-value.h} contains detailed comments explaining
> > +the framework and how to use it.
>
> Would it be a good idea to have the listing of the API in the manual,
> with short explanations?  I'm not saying it is necessarily required,
> but please give it a thought.

I'd rather have the API described in the header file.  I don't think
we, as a project, can afford to maintain two copies of that sort of
documentation, and I think the header files are best place for
detailed, function-by-function information.

How does this look?

Index: src/gdb/doc/gdbint.texinfo
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--- src.orig/gdb/doc/gdbint.texinfo
+++ src/gdb/doc/gdbint.texinfo
@@ -287,6 +287,175 @@ used to create a new @value{GDBN} frame
 @code{DEPRECATED_INIT_EXTRA_FRAME_INFO} and
 @code{DEPRECATED_INIT_FRAME_PC} will be called for the new frame.

+@section Prologue Analysis
+
+@cindex prologue analysis
+@cindex call frame information
+@cindex CFI (call frame information)
+To produce a backtrace and allow the user to manipulate older frames'
+variables and arguments, @value{GDBN} needs to find the base addresses
+of older frames, and discover where those frames' registers have been
+saved.  Since a frame's ``callee-saves'' registers get saved by
+younger frames if and when they're reused, a frame's registers may be
+scattered unpredictably across younger frames.  This means that
+changing the value of a register-allocated variable in an older frame
+may actually entail writing to a save slot in some younger frame.
+
+Modern versions of GCC emit Dwarf call frame information (``CFI''),
+which describes how to find frame base addresses and saved registers.
+But CFI is not always available, so as a fallback @value{GDBN} uses a
+technique called @dfn{prologue analysis} to find frame sizes and saved
+registers.  A prologue analyzer disassembles the function's machine
+code starting from its entry point, and looks for instructions that
+allocate frame space, save the stack pointer in a frame pointer
+register, save registers, and so on.  Obviously, this can't be done
+accurately in general, but it's tractible to do well enough to be very
+helpful.  Prologue analysis predates the GNU toolchain's support for
+CFI; at one time, prologue analysis was the only mechanism
+@value{GDBN} used for stack unwinding at all, when the function
+calling conventions didn't specify a fixed frame layout.
+
+In the olden days, function prologues were generated by hand-written,
+target-specific code in GCC, and treated as opaque and untouchable by
+optimizers.  Looking at this code, it was usually straightforward to
+write a prologue analyzer for @value{GDBN} that would accurately
+understand all the prologues GCC would generate.  However, over time
+GCC became more aggressive about instruction scheduling, and began to
+understand more about the semantics of the prologue instructions
+themselves; in response, @value{GDBN}'s analyzers became more complex
+and fragile.  Keeping the prologue analyzers working as GCC (and the
+instruction sets themselves) evolved became a substantial task.
+
+@cindex @file{prologue-value.c}
+@cindex abstract interpretation of function prologues
+@cindex pseudo-evaluation of function prologues
+To try to address this problem, the code in @file{prologue-value.h}
+and @file{prologue-value.c} provides a general framework for writing
+prologue analyzers that are simpler and more robust than ad-hoc
+analyzers.  When we analyze a prologue using the prologue-value
+framework, we're really doing ``abstract interpretation'' or
+``pseudo-evaluation'': running the function's code in simulation, but
+using conservative approximations of the values registers and memory
+would hold when the code actually runs.  For example, if our function
+starts with the instruction:
+
+@example
+addi r1, 42     # add 42 to r1
+@end example
+@noindent
+we don't know exactly what value will be in @code{r1} after executing
+this instruction, but we do know it'll be 42 greater than its original
+value.
+
+If we then see an instruction like:
+
+@example
+addi r1, 22     # add 22 to r1
+@end example
+@noindent
+we still don't know what @code{r1's} value is, but again, we can say
+it is now 64 greater than its original value.
+
+If the next instruction were:
+
+@example
+mov r2, r1      # set r2 to r1's value
+@end example
+@noindent
+then we can say that @code{r2's} value is now the original value of
+@code{r1} plus 64.
+
+It's common for prologues to save registers on the stack, so we'll
+need to track the values of stack frame slots, as well as the
+registers.  So after an instruction like this:
+
+@example
+mov (fp+4), r2
+@end example
+@noindent
+then we'd know that the stack slot four bytes above the frame pointer
+holds the original value of @code{r1} plus 64.
+
+And so on.
+
+Of course, this can only go so far before it gets unreasonable.  If we
+wanted to be able to say anything about the value of @code{r1} after
+the instruction:
+
+@example
+xor r1, r3      # exclusive-or r1 and r3, place result in r1
+@end example
+@noindent
+then things would get pretty complex.  But remember, we're just doing
+a conservative approximation; if exclusive-or instructions aren't
+relevant to prologues, we can just say @code{r1}'s value is now
+``unknown''.  We can ignore things that are too complex, if that loss of
+information is acceptable for our application.
+
+So when we say ``conservative approximation'' here, what we mean is an
+approximation that is either accurate, or marked ``unknown'', but
+never inaccurate.
+
+Using this framework, a prologue analyzer is simply an interpreter for
+machine code, but one that uses conservative approximations for the
+contents of registers and memory instead of actual values.  Starting
+from the function's entry point, you simulate instructions up to the
+current PC, or an instruction that you don't know how to simulate.
+Now you can examine the state of the registers and stack slots you've
+kept track of.
+
+@itemize @bullet
+
+@item
+To see how large your stack frame is, just check the value of the
+stack pointer register; if it's the original value of the SP
+minus a constant, then that constant is the stack frame's size.
+If the SP's value has been marked as ``unknown'', then that means
+the prologue has done something too complex for us to track, and
+we don't know the frame size.
+
+@item
+To see where we've saved the previous frame's registers, we just
+search the values we've tracked --- stack slots, usually, but
+registers, too, if you want --- for something equal to the register's
+original value.  If the calling conventions suggest a standard place
+to save a given register, then we can check there first, but really,
+anything that will get us back the original value will probably work.
+@end itemize
+
+This does take some work.  But prologue analyzers aren't
+quick-and-simple pattern patching to recognize a few fixed prologue
+forms any more; they're big, hairy functions.  Along with inferior
+function calls, prologue analysis accounts for a substantial portion
+of the time needed to stabilize a @value{GDBN} port.  So it's
+worthwhile to look for an approach that will be easier to understand
+and maintain.  In the approach described above:
+
+@itemize @bullet
+
+@item
+It's easier to see that the analyzer is correct: you just see
+whether the analyzer properly (albiet conservatively) simulates
+the effect of each instruction.
+
+@item
+It's easier to extend the analyzer: you can add support for new
+instructions, and know that you haven't broken anything that
+wasn't already broken before.
+
+@item
+It's orthogonal: to gather new information, you don't need to
+complicate the code for each instruction.  As long as your domain
+of conservative values is already detailed enough to tell you
+what you need, then all the existing instruction simulations are
+already gathering the right data for you.
+
+@end itemize
+
+The file @file{prologue-value.h} contains detailed comments explaining
+the framework and how to use it.
+
+
 @section Breakpoint Handling

 @cindex breakpoints