From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 8271 invoked by alias); 8 Oct 2005 07:01:37 -0000 Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sources.redhat.com Received: (qmail 8245 invoked by uid 22791); 8 Oct 2005 07:01:34 -0000 Received: from mx1.redhat.com (HELO mx1.redhat.com) (66.187.233.31) by sourceware.org (qpsmtpd/0.30-dev) with ESMTP; Sat, 08 Oct 2005 07:01:34 +0000 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.12.11/8.12.11) with ESMTP id j9871WJA010699; Sat, 8 Oct 2005 03:01:32 -0400 Received: from devserv.devel.redhat.com (devserv.devel.redhat.com [172.16.58.1]) by int-mx1.corp.redhat.com (8.11.6/8.11.6) with ESMTP id j9871MV27957; Sat, 8 Oct 2005 03:01:26 -0400 Received: from theseus.home..redhat.com (vpn26-2.sfbay.redhat.com [172.16.26.2]) by devserv.devel.redhat.com (8.12.11/8.12.11) with ESMTP id j9871If1021218; Sat, 8 Oct 2005 03:01:19 -0400 To: "Nathan J. Williams" Cc: gdb-patches@sourceware.org Subject: Re: RFA: general prologue analysis framework References: From: Jim Blandy Date: Sat, 08 Oct 2005 07:01:00 -0000 In-Reply-To: (Nathan J. Williams's message of "07 Oct 2005 17:24:58 -0400") Message-ID: User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-SW-Source: 2005-10/txt/msg00066.txt.bz2 "Nathan J. Williams" writes: > Short form: "What about branches?" Unconditional forward branches you just follow. The s390 prologue analyzer does this, because s390 prologues typically include forward branches. (Or they did the last time I played with the toolchain, anyway.) For conditional forward branches: - If the value that the condition depends upon is known, then you can just follow the branch or not as appropriate. - If the condition isn't known, you have to do both: take the branch and don't take the branch --- it's backtracking. For unconditional backward branches, now you've got loops, so things get interesting. - When you hit an instruction you've reached before, compare the values you have now with the values you had the last time. + The registers and stack slots that are still the same, you can leave alone. + For registers or stack slots that have a different value this time around, you've got to make a judgement about how detailed you want your approximation of their values to be. One possibility would be just to mark them as unknown; that is, your analysis would say that any register that had different values at different executions of an instruction was unknown at that instruction. If *all* the values are the same as they were last time, you can stop interpreting: you're not learning anything by continuing, because you've got the same state you had last time. - Now iterate that whole process until you don't get any new information: no registers or stack slots change. Each iteration, if it does anything, makes your values less precise, so you're iterating until you don't lose anything. Conditional backward branches you can treat as if they were a forward conditional over a backward unconditional. What it all adds up to is doing flow analysis, but on machine code instead of source code. Abstract interpretation (that is, running the program with values like pv_t that only approximate the real values the program would be operating on) is just a really nice way of looking at flow analysis. You know your analysis is true to your language because you're just writing an interpreter for the language; all the weirdness is isolated in the values and the branches. The trick to all this is that your approximate values (that is, pv_t) have to be limited. You can't have a fancy pv_t that holds arbitrary-sized sets of values: if you did, then if at every iteration something changes, you never reach the point where you're not learning anything, and your analysis never stops. Your "lattice" --- the hierarchy of pv_t values going from most specific to least specific (pvk_unknown) --- has to have a finite height, and the higher it is, the more times your analysis might have to go around a loop, and the longer your analysis will take to finish. (What I like about my pv_t is that it's really simple, but it still gets you exactly what you want for prologue analysis. Not that it really matters, because I've never handled anything but forward branches.) One wrench in the works is that alloca and C99 variable-sized arrays can give you functions where the sp just isn't known at compile time. As far as prologue analysis is concerned, I think those functions would always need to use an fp, if they ever return at all. But for flow analysis, if you have an ever-expanding set of values you're tracking, like stack slots, that's just as bad as having a pv_t carrying an unbounded amount of detail. I'm not sure this has any practical value, but it's fun to think about. I wonder if there are cool things we could do if we had such detailed information about the machine code we were looking at. Could we step through heavily reordered code better? (Reviewers: NONE OF THIS IS NECESSARY to make effective use of prologue.[ch]; of the three analyzers I've written this way, only one of them even handles forward unconditional branches.)