From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-7126-listarch-gdb=sourceware.cygnus.com@sources.redhat.com>
Received: (qmail 18770 invoked by alias); 13 Dec 2001 13:56:01 -0000
Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-subscribe@sources.redhat.com>
List-Archive: <http://sources.redhat.com/ml/gdb/>
List-Post: <mailto:gdb@sources.redhat.com>
List-Help: <mailto:gdb-help@sources.redhat.com>, <http://sources.redhat.com/ml/#faqs>
Sender: gdb-owner@sources.redhat.com
Received: (qmail 18742 invoked from network); 13 Dec 2001 13:56:00 -0000
Received: from unknown (HELO pc-62-30-77-187-az.blueyonder.co.uk) (62.30.77.187)
  by sources.redhat.com with SMTP; 13 Dec 2001 13:56:00 -0000
Received: from jcownie by etnus.com with esmtp (MasqMail 0.1.16) id
 16EWL3-0eH-00; Thu, 13 Dec 2001 13:55:53 +0000
To: Michael Snyder <msnyder@cygnus.com>
cc: gdb@sources.redhat.com
Subject: Re: [RFC] New gdb command 'gcore'
Reply-To: James Cownie <jcownie@etnus.com>
Date: Thu, 13 Dec 2001 05:56:00 -0000
From: James Cownie <jcownie@etnus.com>
Message-ID: <16EWL3-0eH-00@etnus.com>
X-SW-Source: 2001-12/txt/msg00123.txt.bz2


> The holy grail, of course, would be to then give gdb the ability to
> restart the process from the core file state.  That would give us a
> checkpoint-and-restart capability that very few debuggers have ever
> had.  But that's down the line...

Unfortunately in general a normal core file does not contain enough
information to allow a process to be restarted, since it doesn't
contain a lot of the information in the kernel which forms part of the
process' state.

There are many "fun" issues which arise when trying to implement
checkpointing, such as

1) Open files. What fds ar open ? What's the seek position of each ?
   What about pipes ?

2) process id; does it change between the original process and its
   reincarnation ?)

3) parent process id (same question).

4) relationship with child processes (if any). Do you checkpoint the
   whole process group ?

5) network connections. Can you reconstruct them ? What about the
   state of the other end ?

6) time. When the process is reincarnated does it see time passing
   while it was only a checkpoint ?

7) signal handling state. What signal handlers are set up ? What
   signals are blocked ? 

8) State of any timers. Suppose a thread was in a sleep() when should
   the sleep complete ?

9) State of other potentially long system calls. A listen(), for
   instance, or a read from something which isn't ready.

10) All the other things which didn't come to mind in the three minutes
   it's taken to type this.

Of course it's possible to add restrictions to the state a process
must be in before it can be checkpointed, unfortunately if you want to
do the checkpoint from gdb it's going to be hard to know if the
restrictions are valid, since you can arbitrarily invoke gcore between
any two machine instructions.

It's a nice idea, but I think it's hard :-( (and to do it portably is
_very_ hard).

-- Jim 

James Cownie	<jcownie@etnus.com>
Etnus, LLC.     +44 117 9071438
http://www.etnus.com