From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17440 invoked by alias); 16 Nov 2005 00:37:58 -0000 Received: (qmail 17402 invoked by uid 22791); 16 Nov 2005 00:37:54 -0000 Received: from mail-out4.apple.com (HELO mail-out4.apple.com) (17.254.13.23) by sourceware.org (qpsmtpd/0.30-dev) with ESMTP; Wed, 16 Nov 2005 00:37:54 +0000 Received: from relay8.apple.com (a17-128-113-38.apple.com [17.128.113.38]) by mail-out4.apple.com (8.12.11/8.12.11) with ESMTP id jAG0bq4E009317; Tue, 15 Nov 2005 16:37:52 -0800 (PST) Received: from [17.219.198.153] (unknown [17.219.198.153]) by relay8.apple.com (Apple SCV relay) with ESMTP id 8539E183; Tue, 15 Nov 2005 16:37:51 -0800 (PST) Message-ID: <437A7F5E.6090307@apple.com> Date: Wed, 16 Nov 2005 00:37:00 -0000 From: Stan Shebs User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.7.12) Gecko/20050915 MIME-Version: 1.0 To: Michael Snyder CC: gdb@sources.redhat.com Subject: Re: [RFC] a prototype checkpoint-restart using core files References: <43696953.9090601@cisco.com> In-Reply-To: <43696953.9090601@cisco.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Mailing-List: contact gdb-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sourceware.org X-SW-Source: 2005-11/txt/msg00305.txt.bz2 Michael Snyder wrote: > Folks, this isn't for commit, just for discussion. > > Attached is an experimental patch that adds a command > "restore-core-file" or "rcore", which is the inverse of > "generate-core-file" (gcore). Instead of copying the > memory and register state of a process into a file, > it takes an existing corefile, and copies its memory > and register state into the child process. My prototype is even lamer :-) I use target read/write operations to collect state - but it can step backwards. An improved version in the works uses vfork() to make core images more cheaply. > > The idea was to experiment with the concept of doing > checkpoint and restore, by using a corefile as the > checkpoint file. Obviously it has limitations -- > it doesn't save any kernel state, I/O state etc. > Just user state. > > But it turns out that if you avoid those limitations, > it works! As a conservative rule of thumb, you can > go back to an earlier state so long as you don't cross > a system call. And in practice there are lots of > system calls that can be regarded as "stateless", > or that change only user state -- so you can cross > those. One idea I've considered is getting the OS to set up some kind of notification at system calls, and then use it to warn the user who tries to resume the inferior after rolling back. In addition to obvious corruption issues, you can also get some funky Heisenbugs, for instance if the code of interest is inside "if (!file_exists()) {", but a forewarned user can then decide whether to press on or just rerun. On shared memory, there's an old Mark Linton paper (1988 debug workshop I think) where they deal with shared memory and replay by using the compiler to instrument all memory refs that might be to shmem, basically adding a test to see if the address is in shmem and if so, updating the shared memory bits from a saved version. A hairy solution to a hairy problem... Stan