Mirror of the gdb mailing list
 help / color / mirror / Atom feed
* Checkpoint-restart with different code
@ 2005-12-09 21:58 Greg Bronevetsky
  2005-12-09 22:08 ` Daniel Jacobowitz
  0 siblings, 1 reply; 6+ messages in thread
From: Greg Bronevetsky @ 2005-12-09 21:58 UTC (permalink / raw)
  To: gdb

I see that there's been some discussion on this list on checkpointing 
techniques that may be included in gdb. My research group at Cornell is 
working on a number of such checkpointers for both sequential and 
parallel programs and we recently decided to try a more challenging 
variant of checkpointing where the user can take a checkpoint of their 
program, modify their source code a bit (add remove stack variables, 
move function calls around a bit and a few other things) and then resume 
computation using the modified code. This seems to be very useful for 
debugging long-running applications since the user would be able to work 
around the bug without losing a week's or month's worth of results. (can 
happen in high-performance computing) Similarly, its useful for 
situations where your execution is in some particularly buggy corner 
case and you want to keep making modifications and trying them out 
without having to guide the program's execution back into that corner 
case after every code change.

My question is, has anybody heard of anything that can do this? 
Obviously, this kind of checkpointing would require compiler support, so 
gdb wouldn't have done this, but have you heard of any systems/research 
that has addressed this question? Thanks.

-- 
Greg Bronevetsky
490 Rhodes Hall
Cornell University


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Checkpoint-restart with different code
  2005-12-09 21:58 Checkpoint-restart with different code Greg Bronevetsky
@ 2005-12-09 22:08 ` Daniel Jacobowitz
  2005-12-10  1:36   ` Randolph Chung
  2005-12-10 20:34   ` Greg Bronevetsky
  0 siblings, 2 replies; 6+ messages in thread
From: Daniel Jacobowitz @ 2005-12-09 22:08 UTC (permalink / raw)
  To: Greg Bronevetsky; +Cc: gdb

On Fri, Dec 09, 2005 at 04:57:57PM -0500, Greg Bronevetsky wrote:
> I see that there's been some discussion on this list on checkpointing 
> techniques that may be included in gdb. My research group at Cornell is 
> working on a number of such checkpointers for both sequential and 
> parallel programs and we recently decided to try a more challenging 
> variant of checkpointing where the user can take a checkpoint of their 
> program, modify their source code a bit (add remove stack variables, 
> move function calls around a bit and a few other things) and then resume 
> computation using the modified code. This seems to be very useful for 
> debugging long-running applications since the user would be able to work 
> around the bug without losing a week's or month's worth of results. (can 
> happen in high-performance computing) Similarly, its useful for 
> situations where your execution is in some particularly buggy corner 
> case and you want to keep making modifications and trying them out 
> without having to guide the program's execution back into that corner 
> case after every code change.
> 
> My question is, has anybody heard of anything that can do this? 
> Obviously, this kind of checkpointing would require compiler support, so 
> gdb wouldn't have done this, but have you heard of any systems/research 
> that has addressed this question? Thanks.

Better: I know at least one production debug environment which supports
this - Apple's Xcode.  The option is called fix-and-continue.  I don't
think they combine it with checkpointing, though, only as an action on
a running process.  It's partly compiler-based and partly in their
debug environment.

Merging that with Michael's fork-based code would be fairly
straightforward, I expect.

-- 
Daniel Jacobowitz
CodeSourcery, LLC


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Checkpoint-restart with different code
  2005-12-09 22:08 ` Daniel Jacobowitz
@ 2005-12-10  1:36   ` Randolph Chung
  2005-12-10 20:34   ` Greg Bronevetsky
  1 sibling, 0 replies; 6+ messages in thread
From: Randolph Chung @ 2005-12-10  1:36 UTC (permalink / raw)
  To: gdb

>> My question is, has anybody heard of anything that can do this? 
>> Obviously, this kind of checkpointing would require compiler support, so 
>> gdb wouldn't have done this, but have you heard of any systems/research 
>> that has addressed this question? Thanks.
> 
> Better: I know at least one production debug environment which supports
> this - Apple's Xcode.  The option is called fix-and-continue.  I don't
> think they combine it with checkpointing, though, only as an action on
> a running process.  It's partly compiler-based and partly in their
> debug environment.
> 
> Merging that with Michael's fork-based code would be fairly
> straightforward, I expect.

HP's wdb (which is derived from gdb) claims to support this with HP
compilers. I have not tried it though.

http://www.hp.com/go/wdb

randolph


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Checkpoint-restart with different code
  2005-12-09 22:08 ` Daniel Jacobowitz
  2005-12-10  1:36   ` Randolph Chung
@ 2005-12-10 20:34   ` Greg Bronevetsky
  2005-12-11  3:53     ` Daniel Jacobowitz
  1 sibling, 1 reply; 6+ messages in thread
From: Greg Bronevetsky @ 2005-12-10 20:34 UTC (permalink / raw)
  Cc: gdb

Thanks for your help! I've looked around some more and it looks like a 
number of debuggers provide some form of this functionality. However, 
they all seem to be binary rewriters. Does anybody know of work at the 
compiler level? I can imagine that it would be possible to allow for 
more flexible editing of the code if it could be recompiled from the source.

-- 
                             Greg Bronevetsky

Daniel Jacobowitz wrote:

>On Fri, Dec 09, 2005 at 04:57:57PM -0500, Greg Bronevetsky wrote:
>  
>
>>I see that there's been some discussion on this list on checkpointing 
>>techniques that may be included in gdb. My research group at Cornell is 
>>working on a number of such checkpointers for both sequential and 
>>parallel programs and we recently decided to try a more challenging 
>>variant of checkpointing where the user can take a checkpoint of their 
>>program, modify their source code a bit (add remove stack variables, 
>>move function calls around a bit and a few other things) and then resume 
>>computation using the modified code. This seems to be very useful for 
>>debugging long-running applications since the user would be able to work 
>>around the bug without losing a week's or month's worth of results. (can 
>>happen in high-performance computing) Similarly, its useful for 
>>situations where your execution is in some particularly buggy corner 
>>case and you want to keep making modifications and trying them out 
>>without having to guide the program's execution back into that corner 
>>case after every code change.
>>
>>My question is, has anybody heard of anything that can do this? 
>>Obviously, this kind of checkpointing would require compiler support, so 
>>gdb wouldn't have done this, but have you heard of any systems/research 
>>that has addressed this question? Thanks.
>>    
>>
>
>Better: I know at least one production debug environment which supports
>this - Apple's Xcode.  The option is called fix-and-continue.  I don't
>think they combine it with checkpointing, though, only as an action on
>a running process.  It's partly compiler-based and partly in their
>debug environment.
>
>Merging that with Michael's fork-based code would be fairly
>straightforward, I expect.
>
>  
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Checkpoint-restart with different code
  2005-12-10 20:34   ` Greg Bronevetsky
@ 2005-12-11  3:53     ` Daniel Jacobowitz
  0 siblings, 0 replies; 6+ messages in thread
From: Daniel Jacobowitz @ 2005-12-11  3:53 UTC (permalink / raw)
  To: Greg Bronevetsky; +Cc: gdb

On Sat, Dec 10, 2005 at 03:34:47PM -0500, Greg Bronevetsky wrote:
> Thanks for your help! I've looked around some more and it looks like a 
> number of debuggers provide some form of this functionality. However, 
> they all seem to be binary rewriters. Does anybody know of work at the 
> compiler level? I can imagine that it would be possible to allow for 
> more flexible editing of the code if it could be recompiled from the source.

That's what fix-and-continue is.  The debugger inserts trampolines in
the old function which redirect to the new function.


-- 
Daniel Jacobowitz
CodeSourcery, LLC


^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Checkpoint-restart with different code
@ 2005-12-13 19:58 Prateek Saxena
  0 siblings, 0 replies; 6+ messages in thread
From: Prateek Saxena @ 2005-12-13 19:58 UTC (permalink / raw)
  To: greg; +Cc: gdb, drow

Hi,

I have a stable kernel patch for Linux 2.6.14.3 and Linux 2.4.31 that
allows a process to mmap any region of the tracee's process address
space, through the /proc/[pid]/mem file.

It allows the debugger/tracer to write/read change all parts of the
program memory of traced process. You could you this interface to
modify the code memory being read by the traced process, by simply
mmaping the code/stack/data page in the checkpointer's memory, and
writing to it.

I have tested some large tracer's with this, and the fsx testsuite.
The patches should be available for download in the next week or so. I
can send the link as soon as I upload them.

-- Prateek Saxena
   Department of Computer Science,
   Stony Brook University.

Greg Bronevetsky wrote:

> I see that there's been some discussion on this list on checkpointing techniques > that may be included in gdb. My research group at Cornell is working on a >number of such checkpointers for both sequential and parallel programs and we >recently decided to try a more challenging variant of checkpointing where the >user can take a checkpoint of their program, modify their source code a bit (add >remove stack variables, move function calls around a bit and a few other things) >and then resume computation using the modified code. This seems to be very >useful for debugging long-running applications since the user would be able to >work around the bug without losing a week's or month's worth of results. (can >happen in high-performance computing) Similarly, its useful for situations where >your execution is in some particularly buggy corner case and you want to keep >making modifications and trying them out without having to guide the program's >execution back into that corner case after every code change.

>My question is, has anybody heard of anything that can do this?
Obviously, this >kind of checkpointing would require compiler support,
so gdb wouldn't have done >this, but have you heard of any
systems/research that has addressed this >question? Thanks.

>--
>Greg Bronevetsky
>490 Rhodes Hall


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-12-13 19:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-12-09 21:58 Checkpoint-restart with different code Greg Bronevetsky
2005-12-09 22:08 ` Daniel Jacobowitz
2005-12-10  1:36   ` Randolph Chung
2005-12-10 20:34   ` Greg Bronevetsky
2005-12-11  3:53     ` Daniel Jacobowitz
2005-12-13 19:58 Prateek Saxena

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox