* question: python gc doesn't collect buffer allocated by read_memory()
@ 2012-03-19 6:17 HATAYAMA Daisuke
2012-03-20 20:59 ` Tom Tromey
0 siblings, 1 reply; 7+ messages in thread
From: HATAYAMA Daisuke @ 2012-03-19 6:17 UTC (permalink / raw)
To: gdb-patches
Hello,
I want to use read_memory() to read memory of core file of qemu-kvm
process that contains KVM guest memory. The problem is that garbage
collector never tries to collect the memory allocated during
read_memory() processing. So, for large qemu-kvm core file, doing this
leads to OOM.
I show an example, which is essentially the same as the issue I faced.
1. First, create a core file for test. The core file has buffer of
4096 bytes, which is intended to be read many times afterward.
# cat testpro.c
char buf[4096];
int main(int argc, char **argv)
{
*(char *)0 = 1;
return 0;
}
# gcc -g ./testpro.c -o testpro
# ulimit -c unlimited
# ./testpro
Segmentation fault (core dumped)
# ls core.*
core.4301
2. Second, open it with gdb.
# gdb ./testpro ./core.4301
<cut>
3. Use a python script that reads buf a lot of times, and so allocates
a lot of memory during the processing.
(gdb) shell cat testpro.py
import gdb
import gc
i = gdb.inferiors()[0]
buf = gdb.parse_and_eval('buf')
count = 100000
while count >= 0:
i.read_memory(buf.address, buf.type.sizeof)
count -= 1
(gdb) shell ps aux | head -n 1
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
(gdb) shell ps aux | grep gdb | grep -v grep
root 4728 0.1 0.7 132560 14752 pts/3 S+ Mar19 0:00 gdb ./testpro ./core.4301
(gdb) source ./testpro.py
(gdb) shell ps aux | grep gdb | grep -v grep
root 4728 24.8 22.1 542500 424900 pts/3 S+ Mar19 0:37 gdb ./testpro ./core.4301
So, we see that used memory increased from 132560 KB to 542500 KB.
4. Try to trigger garbage collector, but it appears to be meaningless.
(gdb) python gc.collect()
(gdb) shell ps aux | grep gdb | grep -v grep
root 4728 23.1 22.1 542500 424904 pts/3 S+ Mar19 0:37 gdb ./testpro ./core.4301
(gdb) shell ps aux | grep gdb | grep -v grep
root 4728 22.3 22.1 542500 424904 pts/3 S+ Mar19 0:37 gdb ./testpro ./core.4301
(gdb) shell ps aux | grep gdb | grep -v grep
root 4728 22.3 22.1 542500 424904 pts/3 S+ Mar19 0:37 gdb ./testpro ./core.4301
(gdb) shell ps aux | grep gdb | grep -v grep
root 4728 22.2 22.1 542500 424904 pts/3 S+ Mar19 0:37 gdb ./testpro ./core.4301
(gdb) shell ps aux | grep gdb | grep -v grep
root 4728 22.0 22.1 542500 424904 pts/3 S+ Mar19 0:37 gdb ./testpro ./core.4301
5. Looking at referrers of the buffer returned by read_memory(), they
are all empty [], so it looks OK to me if garbage collector collects
the memory...
(gdb) shell cat testpro2.py
import gdb
import gc
i = gdb.inferiors()[0]
buf = gdb.parse_and_eval('buf')
print gc.get_referrers(i.read_memory(buf.address, buf.type.sizeof))
(gdb) source testpro2.py
[]
So, could anyone point at me if I'm wrong anyware?
If there's no alternative, I want another variant of read_memory()
that receives buffer in the 3rd argument; just like:
read_memory(address, length, buffer)
Thanks.
HATAYAMA, Daisuke
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: question: python gc doesn't collect buffer allocated by read_memory() 2012-03-19 6:17 question: python gc doesn't collect buffer allocated by read_memory() HATAYAMA Daisuke @ 2012-03-20 20:59 ` Tom Tromey 2012-03-21 4:16 ` HATAYAMA Daisuke 0 siblings, 1 reply; 7+ messages in thread From: Tom Tromey @ 2012-03-20 20:59 UTC (permalink / raw) To: HATAYAMA Daisuke; +Cc: gdb-patches >>>>> ">" == HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> writes: >> count = 100000 >> while count >= 0: >> i.read_memory(buf.address, buf.type.sizeof) >> count -= 1 You don't say what version of gdb you are using. What you are reporting sounds like PR 12533, which was fixed in CVS back in January. >> 5. Looking at referrers of the buffer returned by read_memory(), they >> are all empty [], so it looks OK to me if garbage collector collects >> the memory... In 12533 the problem was that intermediate values weren't properly deallocated. You could test for this problem by hoisting 'buf.address' out of the loop and seeing if that has an effect. Tom ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: question: python gc doesn't collect buffer allocated by read_memory() 2012-03-20 20:59 ` Tom Tromey @ 2012-03-21 4:16 ` HATAYAMA Daisuke 2012-03-26 18:56 ` Tom Tromey 0 siblings, 1 reply; 7+ messages in thread From: HATAYAMA Daisuke @ 2012-03-21 4:16 UTC (permalink / raw) To: tromey; +Cc: gdb-patches Hello Tom, From: Tom Tromey <tromey@redhat.com> Subject: Re: question: python gc doesn't collect buffer allocated by read_memory() Date: Tue, 20 Mar 2012 14:58:46 -0600 >>>>>> ">" == HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> writes: > >>> count = 100000 >>> while count >= 0: >>> i.read_memory(buf.address, buf.type.sizeof) >>> count -= 1 > > You don't say what version of gdb you are using. Sorry for missing. I first found this on gdb-7.2-48.el6.x86_64. I used 7.4 in the presentation of the first mail. > What you are reporting sounds like PR 12533, which was fixed in CVS back > in January. > >>> 5. Looking at referrers of the buffer returned by read_memory(), they >>> are all empty [], so it looks OK to me if garbage collector collects >>> the memory... > > In 12533 the problem was that intermediate values weren't properly > deallocated. > > You could test for this problem by hoisting 'buf.address' out of the > loop and seeing if that has an effect. > > Tom > I tried on today's daily update version, but the situation didn't change. It seems to me that the issue in PR 12533 is different from the issue here. On the issue here, memory size increased is mostly identical to the size of data that is read by inferior.read_memory(). The next is a function to read memory in pages from core file. Here size is 4096 kB. (gdb) python >def read_pages(n): > while n >= 0: > i.read_memory(addr, size) > n -= 1 >end At initial state, gdb has 15600 kB memory. # ps aux | grep testpro | grep -v grep root 29655 0.0 0.8 234104 15600 pts/2 Ss+ 21:57 0:00 /usr/bin/gdb --annotate=3 testpro Executing python read_pages(100), it changes to 16004 kB. So, 400 kB increased. # ps aux | grep testpro | grep -v grep root 29655 0.0 0.8 234524 16004 pts/2 Ss+ 21:57 0:00 /usr/bin/gdb --annotate=3 testpro Then, executing python read_pages(1000), it changes to 20060 kB. So, 4000 kB increased. # ps aux | grep testpro | grep -v grep root 29655 0.0 1.0 238624 20060 pts/2 Ss+ 21:57 0:00 /usr/bin/gdb --annotate=3 testpro I'm beginning with considering that I don't understand memory management principle on gdb python enough. The behaviour I'm expecting is just like as follows. Assume there's file_4KB.txt that contains 4KB data and a simple script below. # cat ./endlessread.py with open('file_4KB.txt') as f: while True: f.read(4096) f.seek(0) Even if executing this script, virtual memory used in the execution is never increased unlimitedly. # ps aux | grep endless | grep -v grep root 29831 97.3 0.2 160796 5700 pts/1 R 22:13 0:16 python ./endlessread.py # ps aux | grep endless | grep -v grep root 29831 94.9 0.2 160796 5700 pts/1 R 22:13 0:18 python ./endlessread.py # ps aux | grep endless | grep -v grep root 29831 94.4 0.2 160796 5700 pts/1 R 22:13 0:18 python ./endlessread.py # ps aux | grep endless | grep -v grep root 29831 98.0 0.2 160796 5700 pts/1 R 22:13 0:19 python ./endlessread.py This is because the string returned by f.read(4096) is not referred to by any other objects, so periodically invoked gc.collect() collects them entirely. On the other hand, it appears to me that buffer objects returned by inferior.read_memory() is never collected by gc.collect(). Thanks. HATAYAMA, Daisuke ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: question: python gc doesn't collect buffer allocated by read_memory() 2012-03-21 4:16 ` HATAYAMA Daisuke @ 2012-03-26 18:56 ` Tom Tromey 2012-03-27 1:52 ` HATAYAMA Daisuke 0 siblings, 1 reply; 7+ messages in thread From: Tom Tromey @ 2012-03-26 18:56 UTC (permalink / raw) To: HATAYAMA Daisuke; +Cc: gdb-patches >>>>> ">" == HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> writes: >> Sorry for missing. I first found this on gdb-7.2-48.el6.x86_64. I used >> 7.4 in the presentation of the first mail. Thanks. >> On the other hand, it appears to me that buffer objects returned by >> inferior.read_memory() is never collected by gc.collect(). I think I found the problem. The issue is that PyBuffer_FromReadWriteObject acquires a reference to the base object -- but the code in gdb assumed that it stole a reference. Could you try the appended patch? It works for me; if it works for you, I will put it in. If you can't try it, I'll just assume it is ok and go ahead... Tom diff --git a/gdb/python/py-inferior.c b/gdb/python/py-inferior.c index 339a221..0f5a6a3 100644 --- a/gdb/python/py-inferior.c +++ b/gdb/python/py-inferior.c @@ -405,7 +405,7 @@ infpy_read_memory (PyObject *self, PyObject *args, PyObject *kw) CORE_ADDR addr, length; void *buffer = NULL; membuf_object *membuf_obj; - PyObject *addr_obj, *length_obj; + PyObject *addr_obj, *length_obj, *result; struct cleanup *cleanups; volatile struct gdb_exception except; static char *keywords[] = { "address", "length", NULL }; @@ -457,8 +457,10 @@ infpy_read_memory (PyObject *self, PyObject *args, PyObject *kw) membuf_obj->addr = addr; membuf_obj->length = length; - return PyBuffer_FromReadWriteObject ((PyObject *) membuf_obj, 0, - Py_END_OF_BUFFER); + result = PyBuffer_FromReadWriteObject ((PyObject *) membuf_obj, 0, + Py_END_OF_BUFFER); + Py_DECREF (membuf_obj); + return result; } /* Implementation of gdb.write_memory (address, buffer [, length]). ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: question: python gc doesn't collect buffer allocated by read_memory() 2012-03-26 18:56 ` Tom Tromey @ 2012-03-27 1:52 ` HATAYAMA Daisuke 2012-03-28 17:37 ` Tom Tromey 0 siblings, 1 reply; 7+ messages in thread From: HATAYAMA Daisuke @ 2012-03-27 1:52 UTC (permalink / raw) To: tromey; +Cc: gdb-patches Hello Tom, From: Tom Tromey <tromey@redhat.com> Subject: Re: question: python gc doesn't collect buffer allocated by read_memory() Date: Mon, 26 Mar 2012 10:58:22 -0600 >>>>>> ">" == HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> writes: > >>> Sorry for missing. I first found this on gdb-7.2-48.el6.x86_64. I used >>> 7.4 in the presentation of the first mail. > > Thanks. > >>> On the other hand, it appears to me that buffer objects returned by >>> inferior.read_memory() is never collected by gc.collect(). > > I think I found the problem. > > The issue is that PyBuffer_FromReadWriteObject acquires a reference to > the base object -- but the code in gdb assumed that it stole a reference. > > Could you try the appended patch? > It works for me; if it works for you, I will put it in. > If you can't try it, I'll just assume it is ok and go ahead... > > Tom > Thanks! I was also looking into py-inferior.c but I've first started with studying python gc and it must have been taken more time. I tried your patch using gdb of today's daily update and the same script below. import gdb import gc i = gdb.inferiors()[0] buf = gdb.parse_and_eval('buf') count = 100000 while count >= 0: i.read_memory(buf.address, buf.type.sizeof) count -= 1 gc.collect() I typed ps aux command to see VSZ/RSS each time I executed the script above that reads 4KB buffer 100000 times so about 390MB total. [Before] $ ps aux | head -1 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND $ ps aux | grep gdb | grep -v grep hat 28438 1.0 0.3 81936 12340 pts/1 S+ 10:15 0:00 /media/pub/repos/gdb/gdb/gdb ./testpro ./core.27403 $ ps aux | grep gdb | grep -v grep hat 28438 9.3 10.7 491876 422296 pts/1 S+ 10:15 0:01 /media/pub/repos/gdb/gdb/gdb ./testpro ./core.27403 400 MB increased. [After] $ ps aux | head -1 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND $ ps aux | grep gdb | grep -v grep hat 28446 8.6 0.4 86164 16756 pts/1 S+ 10:15 0:00 /media/pub/repos/gdb/gdb.fixed ./testpro ./core.27403 $ ps aux | grep gdb | grep -v grep hat 28446 1.5 0.5 90928 21444 pts/1 S+ 10:15 0:01 /media/pub/repos/gdb/gdb.fixed ./testpro ./core.27403 4 MB increased. So, it seems to me that buffers allocated in infpy_read_memory() are now collected sanely. I confirm the issue I reported has been fixed. BTW, there is still about 4MB increment. I tried two more times, then still saw constant increase. $ ps aux | grep gdb | grep -v grep hat 28446 2.1 0.6 95548 26132 pts/1 S+ 10:15 0:02 /media/pub/repos/gdb/gdb.fixed ./testpro ./core.27403 $ ps aux | grep gdb | grep -v grep hat 28446 0.7 0.7 100304 30820 pts/1 S+ 10:15 0:02 /media/pub/repos/gdb/gdb.fixed ./testpro ./core.27403 And the increment can be reproduced by simplifying the script to i.read_memory() only. (gdb) python >count=100000 >while count >= 0: > i.read_memory(addr, size) > count -= 1 >gc.collect() >end I suspect another objects allocated remain while not collected. Thanks. HATAYAMA, Daisuke ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: question: python gc doesn't collect buffer allocated by read_memory() 2012-03-27 1:52 ` HATAYAMA Daisuke @ 2012-03-28 17:37 ` Tom Tromey 2012-03-29 0:49 ` HATAYAMA Daisuke 0 siblings, 1 reply; 7+ messages in thread From: Tom Tromey @ 2012-03-28 17:37 UTC (permalink / raw) To: HATAYAMA Daisuke; +Cc: gdb-patches >>>>> ">" == HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> writes: >> I suspect another objects allocated remain while not collected. Thanks, you are correct. I used valgrind --tool=massif to find the problem. TRY_CATCH clears the cleanup chain, so making a cleanup in a TRY_CATCH and then trying to run or discard it outside the TRY_CATCH will not work; instead it leaks the cleanup. I am checking in the appended patch. It fixes both leaks. Verified with massif. Tom 2012-03-28 Tom Tromey <tromey@redhat.com> * python/py-inferior.c (infpy_read_memory): Remove cleanups and explicitly free 'buffer' on exit paths. Decref 'membuf_object' before returning. diff --git a/gdb/python/py-inferior.c b/gdb/python/py-inferior.c index 339a221..06d3272 100644 --- a/gdb/python/py-inferior.c +++ b/gdb/python/py-inferior.c @@ -405,8 +405,7 @@ infpy_read_memory (PyObject *self, PyObject *args, PyObject *kw) CORE_ADDR addr, length; void *buffer = NULL; membuf_object *membuf_obj; - PyObject *addr_obj, *length_obj; - struct cleanup *cleanups; + PyObject *addr_obj, *length_obj, *result; volatile struct gdb_exception except; static char *keywords[] = { "address", "length", NULL }; @@ -414,8 +413,6 @@ infpy_read_memory (PyObject *self, PyObject *args, PyObject *kw) &addr_obj, &length_obj)) return NULL; - cleanups = make_cleanup (null_cleanup, NULL); - TRY_CATCH (except, RETURN_MASK_ALL) { if (!get_addr_from_python (addr_obj, &addr) @@ -426,39 +423,38 @@ infpy_read_memory (PyObject *self, PyObject *args, PyObject *kw) } buffer = xmalloc (length); - make_cleanup (xfree, buffer); read_memory (addr, buffer, length); } if (except.reason < 0) { - do_cleanups (cleanups); + xfree (buffer); GDB_PY_HANDLE_EXCEPTION (except); } if (error) { - do_cleanups (cleanups); + xfree (buffer); return NULL; } membuf_obj = PyObject_New (membuf_object, &membuf_object_type); if (membuf_obj == NULL) { + xfree (buffer); PyErr_SetString (PyExc_MemoryError, _("Could not allocate memory buffer object.")); - do_cleanups (cleanups); return NULL; } - discard_cleanups (cleanups); - membuf_obj->buffer = buffer; membuf_obj->addr = addr; membuf_obj->length = length; - return PyBuffer_FromReadWriteObject ((PyObject *) membuf_obj, 0, - Py_END_OF_BUFFER); + result = PyBuffer_FromReadWriteObject ((PyObject *) membuf_obj, 0, + Py_END_OF_BUFFER); + Py_DECREF (membuf_obj); + return result; } /* Implementation of gdb.write_memory (address, buffer [, length]). ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: question: python gc doesn't collect buffer allocated by read_memory() 2012-03-28 17:37 ` Tom Tromey @ 2012-03-29 0:49 ` HATAYAMA Daisuke 0 siblings, 0 replies; 7+ messages in thread From: HATAYAMA Daisuke @ 2012-03-29 0:49 UTC (permalink / raw) To: tromey; +Cc: gdb-patches From: Tom Tromey <tromey@redhat.com> Subject: Re: question: python gc doesn't collect buffer allocated by read_memory() Date: Wed, 28 Mar 2012 11:37:40 -0600 >>>>>> ">" == HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> writes: > >>> I suspect another objects allocated remain while not collected. > > Thanks, you are correct. > > I used valgrind --tool=massif to find the problem. > > TRY_CATCH clears the cleanup chain, so making a cleanup in a TRY_CATCH > and then trying to run or discard it outside the TRY_CATCH will not > work; instead it leaks the cleanup. > > I am checking in the appended patch. It fixes both leaks. Verified > with massif. > > Tom > > 2012-03-28 Tom Tromey <tromey@redhat.com> > > * python/py-inferior.c (infpy_read_memory): Remove cleanups and > explicitly free 'buffer' on exit paths. Decref 'membuf_object' > before returning. > After applying this patch, I no longer see any memory leak. Again, thanks for your help, Tom. (gdb) shell cat ./testpro.py import gdb import gc i = gdb.inferiors()[0] buf = gdb.parse_and_eval('buf') count = 100000 while count >= 0: i.read_memory(buf.address, buf.type.sizeof) count -= 1 gc.collect() (gdb) shell ps aux | head -n 1 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND (gdb) shell ps aux | grep gdb | grep -v grep hat 28071 0.6 0.3 81964 12364 pts/0 S+ 09:41 0:00 /media/pub/repos/gdb/gdb/gdb ./testpro ./core.27403 (gdb) source ./testpro.py (gdb) shell ps aux | grep gdb | grep -v grep hat 28071 2.3 0.3 81968 12500 pts/0 S+ 09:41 0:00 /media/pub/repos/gdb/gdb/gdb ./testpro ./core.27403 (gdb) source ./testpro.py (gdb) shell ps aux | grep gdb | grep -v grep hat 28071 2.7 0.3 81968 12500 pts/0 S+ 09:41 0:01 /media/pub/repos/gdb/gdb/gdb ./testpro ./core.27403 (gdb) source ./testpro.py sh(gdb) shell ps aux | grep gdb | grep -v grep hat 28071 2.9 0.3 81968 12500 pts/0 S+ 09:41 0:02 /media/pub/repos/gdb/gdb/gdb ./testpro ./core.27403 Thanks. HATAYAMA, Daisuke ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-03-29 0:49 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-03-19 6:17 question: python gc doesn't collect buffer allocated by read_memory() HATAYAMA Daisuke 2012-03-20 20:59 ` Tom Tromey 2012-03-21 4:16 ` HATAYAMA Daisuke 2012-03-26 18:56 ` Tom Tromey 2012-03-27 1:52 ` HATAYAMA Daisuke 2012-03-28 17:37 ` Tom Tromey 2012-03-29 0:49 ` HATAYAMA Daisuke
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox