From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-41044-listarch-gdb-patches=sources.redhat.com@sources.redhat.com>
Received: (qmail 8271 invoked by alias); 8 Oct 2005 07:01:37 -0000
Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-patches-subscribe@sources.redhat.com>
List-Archive: <http://sources.redhat.com/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sources.redhat.com>
List-Help: <mailto:gdb-patches-help@sources.redhat.com>, <http://sources.redhat.com/ml/#faqs>
Sender: gdb-patches-owner@sources.redhat.com
Received: (qmail 8245 invoked by uid 22791); 8 Oct 2005 07:01:34 -0000
Received: from mx1.redhat.com (HELO mx1.redhat.com) (66.187.233.31)
    by sourceware.org (qpsmtpd/0.30-dev) with ESMTP; Sat, 08 Oct 2005 07:01:34 +0000
Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254])
	by mx1.redhat.com (8.12.11/8.12.11) with ESMTP id j9871WJA010699;
	Sat, 8 Oct 2005 03:01:32 -0400
Received: from devserv.devel.redhat.com (devserv.devel.redhat.com [172.16.58.1])
	by int-mx1.corp.redhat.com (8.11.6/8.11.6) with ESMTP id j9871MV27957;
	Sat, 8 Oct 2005 03:01:26 -0400
Received: from theseus.home..redhat.com (vpn26-2.sfbay.redhat.com [172.16.26.2])
	by devserv.devel.redhat.com (8.12.11/8.12.11) with ESMTP id j9871If1021218;
	Sat, 8 Oct 2005 03:01:19 -0400
To: "Nathan J. Williams" <nathanw@wasabisystems.com>
Cc: gdb-patches@sourceware.org
Subject: Re: RFA: general prologue analysis framework
References: <m3y8568as0.fsf@alligator.red-bean.com>
	<mtu8xx5rped.fsf@scrubbing-bubbles.mit.edu>
From: Jim Blandy <jimb@redhat.com>
Date: Sat, 08 Oct 2005 07:01:00 -0000
In-Reply-To: <mtu8xx5rped.fsf@scrubbing-bubbles.mit.edu> (Nathan J.
 Williams's message of "07 Oct 2005 17:24:58 -0400")
Message-ID: <vt23bncmr0i.fsf@theseus.home.>
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-SW-Source: 2005-10/txt/msg00066.txt.bz2


"Nathan J. Williams" <nathanw@wasabisystems.com> writes:
> Short form: "What about branches?"

Unconditional forward branches you just follow.  The s390 prologue
analyzer does this, because s390 prologues typically include forward
branches.  (Or they did the last time I played with the toolchain,
anyway.)

For conditional forward branches:

- If the value that the condition depends upon is known, then you can
  just follow the branch or not as appropriate.

- If the condition isn't known, you have to do both: take the branch
  and don't take the branch --- it's backtracking.

For unconditional backward branches, now you've got loops, so things
get interesting.

- When you hit an instruction you've reached before, compare the
  values you have now with the values you had the last time.

  + The registers and stack slots that are still the same, you can
    leave alone.

  + For registers or stack slots that have a different value this time
    around, you've got to make a judgement about how detailed you want
    your approximation of their values to be.  One possibility would
    be just to mark them as unknown; that is, your analysis would say
    that any register that had different values at different
    executions of an instruction was unknown at that instruction.

  If *all* the values are the same as they were last time, you can
  stop interpreting: you're not learning anything by continuing,
  because you've got the same state you had last time.

- Now iterate that whole process until you don't get any new
  information: no registers or stack slots change.  Each iteration, if
  it does anything, makes your values less precise, so you're
  iterating until you don't lose anything.

Conditional backward branches you can treat as if they were a forward
conditional over a backward unconditional.

What it all adds up to is doing flow analysis, but on machine code
instead of source code.  Abstract interpretation (that is, running the
program with values like pv_t that only approximate the real values
the program would be operating on) is just a really nice way of
looking at flow analysis.  You know your analysis is true to your
language because you're just writing an interpreter for the language;
all the weirdness is isolated in the values and the branches.

The trick to all this is that your approximate values (that is, pv_t)
have to be limited.  You can't have a fancy pv_t that holds
arbitrary-sized sets of values: if you did, then if at every iteration
something changes, you never reach the point where you're not learning
anything, and your analysis never stops.  Your "lattice" --- the
hierarchy of pv_t values going from most specific to least specific
(pvk_unknown) --- has to have a finite height, and the higher it is,
the more times your analysis might have to go around a loop, and the
longer your analysis will take to finish.

(What I like about my pv_t is that it's really simple, but it still
gets you exactly what you want for prologue analysis.  Not that it
really matters, because I've never handled anything but forward
branches.)


One wrench in the works is that alloca and C99 variable-sized arrays
can give you functions where the sp just isn't known at compile time.
As far as prologue analysis is concerned, I think those functions
would always need to use an fp, if they ever return at all.  But for
flow analysis, if you have an ever-expanding set of values you're
tracking, like stack slots, that's just as bad as having a pv_t
carrying an unbounded amount of detail.

I'm not sure this has any practical value, but it's fun to think
about.  I wonder if there are cool things we could do if we had such
detailed information about the machine code we were looking at.  Could
we step through heavily reordered code better?

(Reviewers: NONE OF THIS IS NECESSARY to make effective use of
prologue.[ch]; of the three analyzers I've written this way, only one
of them even handles forward unconditional branches.)