question: python gc doesn't collect buffer allocated by read

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

* question: python gc doesn't collect buffer allocated by read_memory()
@ 2012-03-19  6:17 HATAYAMA Daisuke
  2012-03-20 20:59 ` Tom Tromey
  0 siblings, 1 reply; 7+ messages in thread
From: HATAYAMA Daisuke @ 2012-03-19  6:17 UTC (permalink / raw)
  To: gdb-patches

Hello,

I want to use read_memory() to read memory of core file of qemu-kvm
process that contains KVM guest memory. The problem is that garbage
collector never tries to collect the memory allocated during
read_memory() processing. So, for large qemu-kvm core file, doing this
leads to OOM.

I show an example, which is essentially the same as the issue I faced.

1. First, create a core file for test. The core file has buffer of
4096 bytes, which is intended to be read many times afterward.

# cat testpro.c
char buf[4096];

int main(int argc, char **argv)
{
        *(char *)0 = 1;

        return 0;
}

# gcc -g ./testpro.c -o testpro
# ulimit -c unlimited
# ./testpro
Segmentation fault (core dumped)
# ls core.*
core.4301

2. Second, open it with gdb.

# gdb ./testpro ./core.4301
<cut>

3. Use a python script that reads buf a lot of times, and so allocates
a lot of memory during the processing.

(gdb) shell cat testpro.py
import gdb
import gc

i = gdb.inferiors()[0]

buf = gdb.parse_and_eval('buf')

count = 100000
while count >= 0:
    i.read_memory(buf.address, buf.type.sizeof)
    count -= 1
(gdb) shell ps aux | head -n 1
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
(gdb) shell ps aux | grep gdb | grep -v grep
root      4728  0.1  0.7 132560 14752 pts/3    S+   Mar19   0:00 gdb ./testpro ./core.4301
(gdb) source ./testpro.py
(gdb) shell ps aux | grep gdb | grep -v grep
root      4728 24.8 22.1 542500 424900 pts/3   S+   Mar19   0:37 gdb ./testpro ./core.4301

So, we see that used memory increased from 132560 KB to 542500 KB.

4. Try to trigger garbage collector, but it appears to be meaningless.

(gdb) python gc.collect()
(gdb) shell ps aux | grep gdb | grep -v grep
root      4728 23.1 22.1 542500 424904 pts/3   S+   Mar19   0:37 gdb ./testpro ./core.4301
(gdb) shell ps aux | grep gdb | grep -v grep
root      4728 22.3 22.1 542500 424904 pts/3   S+   Mar19   0:37 gdb ./testpro ./core.4301
(gdb) shell ps aux | grep gdb | grep -v grep
root      4728 22.3 22.1 542500 424904 pts/3   S+   Mar19   0:37 gdb ./testpro ./core.4301
(gdb) shell ps aux | grep gdb | grep -v grep
root      4728 22.2 22.1 542500 424904 pts/3   S+   Mar19   0:37 gdb ./testpro ./core.4301
(gdb) shell ps aux | grep gdb | grep -v grep
root      4728 22.0 22.1 542500 424904 pts/3   S+   Mar19   0:37 gdb ./testpro ./core.4301

5. Looking at referrers of the buffer returned by read_memory(), they
are all empty [], so it looks OK to me if garbage collector collects
the memory...

(gdb) shell cat testpro2.py
import gdb
import gc

i = gdb.inferiors()[0]
buf = gdb.parse_and_eval('buf')
print gc.get_referrers(i.read_memory(buf.address, buf.type.sizeof))
(gdb) source testpro2.py
[]

So, could anyone point at me if I'm wrong anyware?

If there's no alternative, I want another variant of read_memory()
that receives buffer in the 3rd argument; just like:

  read_memory(address, length, buffer)

Thanks.
HATAYAMA, Daisuke

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: question: python gc doesn't collect buffer allocated by read_memory()
  2012-03-19  6:17 question: python gc doesn't collect buffer allocated by read_memory() HATAYAMA Daisuke
@ 2012-03-20 20:59 ` Tom Tromey
  2012-03-21  4:16   ` HATAYAMA Daisuke
  0 siblings, 1 reply; 7+ messages in thread
From: Tom Tromey @ 2012-03-20 20:59 UTC (permalink / raw)
  To: HATAYAMA Daisuke; +Cc: gdb-patches

>>>>> ">" == HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> writes:

>> count = 100000
>> while count >= 0:
>>     i.read_memory(buf.address, buf.type.sizeof)
>>     count -= 1

You don't say what version of gdb you are using.
What you are reporting sounds like PR 12533, which was fixed in CVS back
in January.

>> 5. Looking at referrers of the buffer returned by read_memory(), they
>> are all empty [], so it looks OK to me if garbage collector collects
>> the memory...

In 12533 the problem was that intermediate values weren't properly
deallocated.

You could test for this problem by hoisting 'buf.address' out of the
loop and seeing if that has an effect.

Tom


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: question: python gc doesn't collect buffer allocated by read_memory()
  2012-03-20 20:59 ` Tom Tromey
@ 2012-03-21  4:16   ` HATAYAMA Daisuke
  2012-03-26 18:56     ` Tom Tromey
  0 siblings, 1 reply; 7+ messages in thread
From: HATAYAMA Daisuke @ 2012-03-21  4:16 UTC (permalink / raw)
  To: tromey; +Cc: gdb-patches

Hello Tom,

From: Tom Tromey <tromey@redhat.com>
Subject: Re: question: python gc doesn't collect buffer allocated by read_memory()
Date: Tue, 20 Mar 2012 14:58:46 -0600

>>>>>> ">" == HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> writes:
> 
>>> count = 100000
>>> while count >= 0:
>>>     i.read_memory(buf.address, buf.type.sizeof)
>>>     count -= 1
> 
> You don't say what version of gdb you are using.

Sorry for missing. I first found this on gdb-7.2-48.el6.x86_64. I used
7.4 in the presentation of the first mail.

> What you are reporting sounds like PR 12533, which was fixed in CVS back
> in January.
> 
>>> 5. Looking at referrers of the buffer returned by read_memory(), they
>>> are all empty [], so it looks OK to me if garbage collector collects
>>> the memory...
> 
> In 12533 the problem was that intermediate values weren't properly
> deallocated.
> 
> You could test for this problem by hoisting 'buf.address' out of the
> loop and seeing if that has an effect.
> 
> Tom
> 

I tried on today's daily update version, but the situation didn't
change.

It seems to me that the issue in PR 12533 is different from the issue
here. On the issue here, memory size increased is mostly identical to
the size of data that is read by inferior.read_memory().

The next is a function to read memory in pages from core file. Here
size is 4096 kB.

(gdb) python
>def read_pages(n):
>  while n >= 0:
>    i.read_memory(addr, size)
>    n -= 1
>end

At initial state, gdb has 15600 kB memory.

# ps aux | grep testpro | grep -v grep
root     29655  0.0  0.8 234104 15600 pts/2    Ss+  21:57   0:00 /usr/bin/gdb --annotate=3 testpro

Executing python read_pages(100), it changes to 16004 kB. So, 400 kB increased.

# ps aux | grep testpro | grep -v grep
root     29655  0.0  0.8 234524 16004 pts/2    Ss+  21:57   0:00 /usr/bin/gdb --annotate=3 testpro

Then, executing python read_pages(1000), it changes to 20060 kB. So, 4000 kB increased.

# ps aux | grep testpro | grep -v grep
root     29655  0.0  1.0 238624 20060 pts/2    Ss+  21:57   0:00 /usr/bin/gdb --annotate=3 testpro

I'm beginning with considering that I don't understand memory
management principle on gdb python enough. The behaviour I'm expecting
is just like as follows.

Assume there's file_4KB.txt that contains 4KB data and a simple script
below.

# cat ./endlessread.py
with open('file_4KB.txt') as f:
    while True:
        f.read(4096)
        f.seek(0)

Even if executing this script, virtual memory used in the execution is
never increased unlimitedly.

# ps aux | grep endless | grep -v grep
root     29831 97.3  0.2 160796  5700 pts/1    R    22:13   0:16 python ./endlessread.py
# ps aux | grep endless | grep -v grep
root     29831 94.9  0.2 160796  5700 pts/1    R    22:13   0:18 python ./endlessread.py
# ps aux | grep endless | grep -v grep
root     29831 94.4  0.2 160796  5700 pts/1    R    22:13   0:18 python ./endlessread.py
# ps aux | grep endless | grep -v grep
root     29831 98.0  0.2 160796  5700 pts/1    R    22:13   0:19 python ./endlessread.py

This is because the string returned by f.read(4096) is not referred to
by any other objects, so periodically invoked gc.collect() collects
them entirely.

On the other hand, it appears to me that buffer objects returned by
inferior.read_memory() is never collected by gc.collect().

Thanks.
HATAYAMA, Daisuke

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: question: python gc doesn't collect buffer allocated by read_memory()
  2012-03-21  4:16   ` HATAYAMA Daisuke
@ 2012-03-26 18:56     ` Tom Tromey
  2012-03-27  1:52       ` HATAYAMA Daisuke
  0 siblings, 1 reply; 7+ messages in thread
From: Tom Tromey @ 2012-03-26 18:56 UTC (permalink / raw)
  To: HATAYAMA Daisuke; +Cc: gdb-patches

>>>>> ">" == HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> writes:

>> Sorry for missing. I first found this on gdb-7.2-48.el6.x86_64. I used
>> 7.4 in the presentation of the first mail.

Thanks.

>> On the other hand, it appears to me that buffer objects returned by
>> inferior.read_memory() is never collected by gc.collect().

I think I found the problem.

The issue is that PyBuffer_FromReadWriteObject acquires a reference to
the base object -- but the code in gdb assumed that it stole a reference.

Could you try the appended patch?
It works for me; if it works for you, I will put it in.
If you can't try it, I'll just assume it is ok and go ahead...

Tom

diff --git a/gdb/python/py-inferior.c b/gdb/python/py-inferior.c
index 339a221..0f5a6a3 100644
--- a/gdb/python/py-inferior.c
+++ b/gdb/python/py-inferior.c
@@ -405,7 +405,7 @@ infpy_read_memory (PyObject *self, PyObject *args, PyObject *kw)
   CORE_ADDR addr, length;
   void *buffer = NULL;
   membuf_object *membuf_obj;
-  PyObject *addr_obj, *length_obj;
+  PyObject *addr_obj, *length_obj, *result;
   struct cleanup *cleanups;
   volatile struct gdb_exception except;
   static char *keywords[] = { "address", "length", NULL };
@@ -457,8 +457,10 @@ infpy_read_memory (PyObject *self, PyObject *args, PyObject *kw)
   membuf_obj->addr = addr;
   membuf_obj->length = length;
 
-  return PyBuffer_FromReadWriteObject ((PyObject *) membuf_obj, 0,
-				       Py_END_OF_BUFFER);
+  result = PyBuffer_FromReadWriteObject ((PyObject *) membuf_obj, 0,
+					 Py_END_OF_BUFFER);
+  Py_DECREF (membuf_obj);
+  return result;
 }
 
 /* Implementation of gdb.write_memory (address, buffer [, length]).


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: question: python gc doesn't collect buffer allocated by read_memory()
  2012-03-26 18:56     ` Tom Tromey
@ 2012-03-27  1:52       ` HATAYAMA Daisuke
  2012-03-28 17:37         ` Tom Tromey
  0 siblings, 1 reply; 7+ messages in thread
From: HATAYAMA Daisuke @ 2012-03-27  1:52 UTC (permalink / raw)
  To: tromey; +Cc: gdb-patches

Hello Tom,

From: Tom Tromey <tromey@redhat.com>
Subject: Re: question: python gc doesn't collect buffer allocated by read_memory()
Date: Mon, 26 Mar 2012 10:58:22 -0600

>>>>>> ">" == HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> writes:
> 
>>> Sorry for missing. I first found this on gdb-7.2-48.el6.x86_64. I used
>>> 7.4 in the presentation of the first mail.
> 
> Thanks.
> 
>>> On the other hand, it appears to me that buffer objects returned by
>>> inferior.read_memory() is never collected by gc.collect().
> 
> I think I found the problem.
> 
> The issue is that PyBuffer_FromReadWriteObject acquires a reference to
> the base object -- but the code in gdb assumed that it stole a reference.
> 
> Could you try the appended patch?
> It works for me; if it works for you, I will put it in.
> If you can't try it, I'll just assume it is ok and go ahead...
> 
> Tom
> 

Thanks! I was also looking into py-inferior.c but I've first started
with studying python gc and it must have been taken more time.

I tried your patch using gdb of today's daily update and the same
script below.

import gdb
import gc

i = gdb.inferiors()[0]
buf = gdb.parse_and_eval('buf')

count = 100000
while count >= 0:
    i.read_memory(buf.address, buf.type.sizeof)
    count -= 1
gc.collect()

I typed ps aux command to see VSZ/RSS each time I executed the script
above that reads 4KB buffer 100000 times so about 390MB total.

[Before]

$ ps aux | head -1
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
$ ps aux | grep gdb | grep -v grep
hat      28438  1.0  0.3  81936 12340 pts/1    S+   10:15   0:00 /media/pub/repos/gdb/gdb/gdb ./testpro ./core.27403
$ ps aux | grep gdb | grep -v grep
hat      28438  9.3 10.7 491876 422296 pts/1   S+   10:15   0:01 /media/pub/repos/gdb/gdb/gdb ./testpro ./core.27403

400 MB increased.

[After]

$ ps aux | head -1
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
$ ps aux | grep gdb | grep -v grep
hat      28446  8.6  0.4  86164 16756 pts/1    S+   10:15   0:00 /media/pub/repos/gdb/gdb.fixed ./testpro ./core.27403
$ ps aux | grep gdb | grep -v grep
hat      28446  1.5  0.5  90928 21444 pts/1    S+   10:15   0:01 /media/pub/repos/gdb/gdb.fixed ./testpro ./core.27403

4 MB increased.

So, it seems to me that buffers allocated in infpy_read_memory() are
now collected sanely. I confirm the issue I reported has been fixed.


BTW, there is still about 4MB increment. I tried two more times, then
still saw constant increase.

$ ps aux | grep gdb | grep -v grep
hat      28446  2.1  0.6  95548 26132 pts/1    S+   10:15   0:02 /media/pub/repos/gdb/gdb.fixed ./testpro ./core.27403
$ ps aux | grep gdb | grep -v grep
hat      28446  0.7  0.7 100304 30820 pts/1    S+   10:15   0:02 /media/pub/repos/gdb/gdb.fixed ./testpro ./core.27403

And the increment can be reproduced by simplifying the script to
i.read_memory() only.

(gdb) python
>count=100000
>while count >= 0:
>  i.read_memory(addr, size)
>  count -= 1
>gc.collect()
>end

I suspect another objects allocated remain while not collected.

Thanks.
HATAYAMA, Daisuke


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: question: python gc doesn't collect buffer allocated by read_memory()
  2012-03-27  1:52       ` HATAYAMA Daisuke
@ 2012-03-28 17:37         ` Tom Tromey
  2012-03-29  0:49           ` HATAYAMA Daisuke
  0 siblings, 1 reply; 7+ messages in thread
From: Tom Tromey @ 2012-03-28 17:37 UTC (permalink / raw)
  To: HATAYAMA Daisuke; +Cc: gdb-patches

>>>>> ">" == HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> writes:

>> I suspect another objects allocated remain while not collected.

Thanks, you are correct.

I used valgrind --tool=massif to find the problem.

TRY_CATCH clears the cleanup chain, so making a cleanup in a TRY_CATCH
and then trying to run or discard it outside the TRY_CATCH will not
work; instead it leaks the cleanup.

I am checking in the appended patch.  It fixes both leaks.  Verified
with massif.

Tom

2012-03-28  Tom Tromey  <tromey@redhat.com>

	* python/py-inferior.c (infpy_read_memory): Remove cleanups and
	explicitly free 'buffer' on exit paths.  Decref 'membuf_object'
	before returning.

diff --git a/gdb/python/py-inferior.c b/gdb/python/py-inferior.c
index 339a221..06d3272 100644
--- a/gdb/python/py-inferior.c
+++ b/gdb/python/py-inferior.c
@@ -405,8 +405,7 @@ infpy_read_memory (PyObject *self, PyObject *args, PyObject *kw)
   CORE_ADDR addr, length;
   void *buffer = NULL;
   membuf_object *membuf_obj;
-  PyObject *addr_obj, *length_obj;
-  struct cleanup *cleanups;
+  PyObject *addr_obj, *length_obj, *result;
   volatile struct gdb_exception except;
   static char *keywords[] = { "address", "length", NULL };
 
@@ -414,8 +413,6 @@ infpy_read_memory (PyObject *self, PyObject *args, PyObject *kw)
 				     &addr_obj, &length_obj))
     return NULL;
 
-  cleanups = make_cleanup (null_cleanup, NULL);
-
   TRY_CATCH (except, RETURN_MASK_ALL)
     {
       if (!get_addr_from_python (addr_obj, &addr)
@@ -426,39 +423,38 @@ infpy_read_memory (PyObject *self, PyObject *args, PyObject *kw)
 	}
 
       buffer = xmalloc (length);
-      make_cleanup (xfree, buffer);
 
       read_memory (addr, buffer, length);
     }
   if (except.reason < 0)
     {
-      do_cleanups (cleanups);
+      xfree (buffer);
       GDB_PY_HANDLE_EXCEPTION (except);
     }
 
   if (error)
     {
-      do_cleanups (cleanups);
+      xfree (buffer);
       return NULL;
     }
 
   membuf_obj = PyObject_New (membuf_object, &membuf_object_type);
   if (membuf_obj == NULL)
     {
+      xfree (buffer);
       PyErr_SetString (PyExc_MemoryError,
 		       _("Could not allocate memory buffer object."));
-      do_cleanups (cleanups);
       return NULL;
     }
 
-  discard_cleanups (cleanups);
-
   membuf_obj->buffer = buffer;
   membuf_obj->addr = addr;
   membuf_obj->length = length;
 
-  return PyBuffer_FromReadWriteObject ((PyObject *) membuf_obj, 0,
-				       Py_END_OF_BUFFER);
+  result = PyBuffer_FromReadWriteObject ((PyObject *) membuf_obj, 0,
+					 Py_END_OF_BUFFER);
+  Py_DECREF (membuf_obj);
+  return result;
 }
 
 /* Implementation of gdb.write_memory (address, buffer [, length]).


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: question: python gc doesn't collect buffer allocated by read_memory()
  2012-03-28 17:37         ` Tom Tromey
@ 2012-03-29  0:49           ` HATAYAMA Daisuke
  0 siblings, 0 replies; 7+ messages in thread
From: HATAYAMA Daisuke @ 2012-03-29  0:49 UTC (permalink / raw)
  To: tromey; +Cc: gdb-patches

From: Tom Tromey <tromey@redhat.com>
Subject: Re: question: python gc doesn't collect buffer allocated by read_memory()
Date: Wed, 28 Mar 2012 11:37:40 -0600

>>>>>> ">" == HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> writes:
> 
>>> I suspect another objects allocated remain while not collected.
> 
> Thanks, you are correct.
> 
> I used valgrind --tool=massif to find the problem.
> 
> TRY_CATCH clears the cleanup chain, so making a cleanup in a TRY_CATCH
> and then trying to run or discard it outside the TRY_CATCH will not
> work; instead it leaks the cleanup.
> 
> I am checking in the appended patch.  It fixes both leaks.  Verified
> with massif.
> 
> Tom
> 
> 2012-03-28  Tom Tromey  <tromey@redhat.com>
> 
> 	* python/py-inferior.c (infpy_read_memory): Remove cleanups and
> 	explicitly free 'buffer' on exit paths.  Decref 'membuf_object'
> 	before returning.
> 

After applying this patch, I no longer see any memory leak.

Again, thanks for your help, Tom.

(gdb) shell cat ./testpro.py
import gdb
import gc

i = gdb.inferiors()[0]
buf = gdb.parse_and_eval('buf')

count = 100000
while count >= 0:
    i.read_memory(buf.address, buf.type.sizeof)
    count -= 1
gc.collect()
(gdb) shell ps aux | head -n 1
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
(gdb) shell ps aux | grep gdb | grep -v grep
hat      28071  0.6  0.3  81964 12364 pts/0    S+   09:41   0:00 /media/pub/repos/gdb/gdb/gdb ./testpro ./core.27403
(gdb) source ./testpro.py
(gdb) shell ps aux | grep gdb | grep -v grep
hat      28071  2.3  0.3  81968 12500 pts/0    S+   09:41   0:00 /media/pub/repos/gdb/gdb/gdb ./testpro ./core.27403
(gdb) source ./testpro.py
(gdb) shell ps aux | grep gdb | grep -v grep
hat      28071  2.7  0.3  81968 12500 pts/0    S+   09:41   0:01 /media/pub/repos/gdb/gdb/gdb ./testpro ./core.27403
(gdb) source ./testpro.py
sh(gdb) shell ps aux | grep gdb | grep -v grep
hat      28071  2.9  0.3  81968 12500 pts/0    S+   09:41   0:02 /media/pub/repos/gdb/gdb/gdb ./testpro ./core.27403

Thanks.
HATAYAMA, Daisuke


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-03-29  0:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-19  6:17 question: python gc doesn't collect buffer allocated by read_memory() HATAYAMA Daisuke
2012-03-20 20:59 ` Tom Tromey
2012-03-21  4:16   ` HATAYAMA Daisuke
2012-03-26 18:56     ` Tom Tromey
2012-03-27  1:52       ` HATAYAMA Daisuke
2012-03-28 17:37         ` Tom Tromey
2012-03-29  0:49           ` HATAYAMA Daisuke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox