From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 10872 invoked by alias); 15 Jun 2009 17:52:51 -0000 Received: (qmail 10863 invoked by uid 22791); 15 Jun 2009 17:52:50 -0000 X-SWARE-Spam-Status: No, hits=-2.4 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from smtp-outbound-2.vmware.com (HELO smtp-outbound-2.vmware.com) (65.115.85.73) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 15 Jun 2009 17:52:42 +0000 Received: from mailhost2.vmware.com (mailhost2.vmware.com [10.16.67.167]) by smtp-outbound-2.vmware.com (Postfix) with ESMTP id 872681E015; Mon, 15 Jun 2009 10:52:38 -0700 (PDT) Received: from [10.20.94.141] (msnyder-server.eng.vmware.com [10.20.94.141]) by mailhost2.vmware.com (Postfix) with ESMTP id 6E1878EA20; Mon, 15 Jun 2009 10:52:38 -0700 (PDT) Message-ID: <4A368A89.8020706@vmware.com> Date: Mon, 15 Jun 2009 17:52:00 -0000 From: Michael Snyder User-Agent: Thunderbird 1.5.0.12 (X11/20080411) MIME-Version: 1.0 To: Hui Zhu CC: Eli Zaretskii , "gdb-patches@sourceware.org" Subject: Re: [Precord RFA/RFC] Check Linux sys_brk release memory in process record and replay. References: <83tz3ycchv.fsf@gnu.org> <83k54td2el.fsf@gnu.org> <4A342EBB.4020601@vmware.com> <4A3536B8.1090508@vmware.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2009-06/txt/msg00394.txt.bz2 Hui Zhu wrote: > On Mon, Jun 15, 2009 at 01:43, Michael Snyder wrote: >> Hui Zhu wrote: >>> On Sun, Jun 14, 2009 at 06:56, Michael Snyder wrote: >>>> Hui Zhu wrote: >>>>> Ping. >>>> OK, my bad for taking so long to get to this... please allow me >>>> to summarize the problem, to check my own understanding >>>> (tell me if I'm wrong). >>>> >>> For that "nice people" words. I just want to make a joke. :) >>> >>>> Currently linux-record.c does not know how to "undo" a sys_brk >>>> system call. You (teawater) are concerned because if the child >>>> process calls sys_brk to free some memory, we cannot un-free it >>>> and therefore we may get into trouble by writing to the freed >>>> memory during replay. Something like this: >>>> >>>> 1) child allocates memory X >>>> 2) child writes to memory X >>>> 3) child frees memory X >>>> 4) user asks for reverse-continue >>>> 5) gdb tries to revert the write that happened in step #2, >>>> gets SIGSEGV because location has been freed. >>>> >>>> So far so good? >>>> >>>> Now, your proposal is that during the record mode, we will >>>> detect any sys_brk call that frees memory, and query the >>>> user whether to continue or give up. >>>> >>>> I'm not too crazy about that solution. I think it's >>>> awkward, and drastic for a situation that may only be >>>> a problem later on (or not at all). Let me throw out >>>> some other ideas: >>>> >>>> A) Is it possible to actually "reverse" a sys_brk call? >>>> Suppose we record the arguments, and when we want to reverse >>>> it, we just change an increase into a decrease and vice versa? >>>> >>>> B) Suppose we wait until an actual memory error occurs >>>> during replay, and THEN inform the user? It will avoid >>>> warning him about something that may never happen. >>>> >>>> We could use catch_errors to trap the SIGSEGV, and then >>>> check to see if the error was caused by a write to memory >>>> above the BRK boundary. You will still need to keep track >>>> of the BRK boundary, but you won't have that awkward early >>>> query to deal with. >>> The sys_brk just can increase and decrease data segment size. The >>> decrease behavior is very hard to replay. >> I admit my ignorance in this area, but why is it difficult? >> In my simple-minded view, if we need to reverse over a sys_brk >> decrease call, we just make an increase call in the same amount. >> >> Please tell me what I am missing. >> > > I think about it again. Maybe let it increase and decrease can handle > this issue. But call function when infrun is running is a very hard > job. Because call_function_by_hand will clear a lot of thing of > inferior. > We can do it. But it must need a long time to check in. > So maybe we can give user a warning first. OK, I can see how it is difficult. Maybe save for later. For now, maybe we think about which short-term solution is better: 1) warn when sys_brk(<0), or 2) warn when cannot undo a memory write. Further comment below. >> During record phase is too early for that notification. >> The actual failure may never be encountered, especially if >> the user never tries to reverse past this point in the recording. >> >> What I think is that we should wait until we are replaying, and >> we actually experience a failure to modify freed memory. At that >> point we tell the user what has happened, and explain that we >> cannot go back any earlier in the recording. >> >> So something like this: >> >> 1) Remember the BRK boundary at start (as you do in this patch). >> 2) Remember the new BRK boundary whenever it changes (as you do). >> 3) During replay, compare every memory write against the BRK >> boundary. If the memory write will fail because it is above the >> BRK boundary, stop and inform the user that we cannot go back >> any further. > > It will make user lost the record entry that before this memory operation. > And when this thing happen (sys_brk), user doesn't get any alert. I think that warning when sys_brk(<0) is too early. Nothing is really wrong at that time -- it only means that something MIGHT be wrong later, during reverse. I think the right time to warn is when we actually encounter a memory write that we cannot undo. At that point, there is no way to reverse any further, but there is really nothing we can do about it. If we warn the user earlier (during record), there is still nothing that he can do about the problem.