* how could gdb handle truncated core files?
@ 2008-08-28 13:55 Jean-Marc Saffroy
2008-08-29 1:55 ` Paul Koning
2008-08-29 16:30 ` Paul Pluzhnikov
0 siblings, 2 replies; 4+ messages in thread
From: Jean-Marc Saffroy @ 2008-08-28 13:55 UTC (permalink / raw)
To: gdb
Hi,
For now, gdb does not seem to be able to do anything useful with a
truncated core file on Linux (ie. what you get when your process dies and
the core size limit is not 0 but less than the size of the process).
In a number of cases, I think it would be nice to be able to at least get
a stack trace, and examine local variables. This could require a limited
amount of data to be dumped by the kernel.
I'm curious what could be done to improve this situation, because I see
two potential use cases:
- embedded systems developpers: sometimes it's hard to find enough space
to write your core file (eg. the application uses 80% of your RAM, and
your only writable filesystem is a tiny temporary RAM disk)
- parallel application developpers on large clusters: sometimes you use a
huge amount of RAM in a bunch of processes (eg. an MPI parallel program),
and dumping all that on your home directory will fill your disk quota
and/or keep your file server busy for a very long time
In search of a solution, I patched my Linux kernel so that dumping a core
would start with the segments that hold a stack (assuming user stack
pointers are valid): thus these segments have a chance of being dumped
before the core limit is reached.
This approach gives interesting results with a (very simple) single
threaded process. However, my attempts with a multithreaded process
failed, like this:
$ gdb <binary> <core>
GNU gdb 6.8
<snip>
This GDB was configured as "x86_64-unknown-linux-gnu"...
Cannot access memory at address 0x2aaaaabc29c8
(gdb) bt
#0 0x00002aaaaabc9345 in ?? ()
#1 0x00000000400179f0 in ?? ()
#2 0x0000000000000000 in ?? ()
That is:
- gdb does not load symbols from binaries
- as a result, gdb does not detect threads (because IIRC libthread_db
would be loaded when some libpthread.so symbols are detected in the
process)
- the backtrace seems incorrect: if I have a "full" core dump, gdb shows
the following stack trace:
(gdb) bt
#0 0x00002aaaaabc9345 in pthread_create@@GLIBC_2.2.5 ()
from /lib/libpthread.so.0
#1 0x00000000004005c8 in main (argc=<value optimized out>,
argv=<value optimized out>) at thrcore.c:24
So, I have the following questions to the community:
- what can I do (eg. in my kernel patch) to have gdb load symbols from
binaries?
- do you have any comment on my approach? (eg. I *think* I've seen AIX
produce small dumps, but I have no idea how they do it, if it's a special
file format, etc.)
Thanks for your comments!
Cheers,
Jean-Marc
--
saffroy@gmail.com
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: how could gdb handle truncated core files?
2008-08-28 13:55 how could gdb handle truncated core files? Jean-Marc Saffroy
@ 2008-08-29 1:55 ` Paul Koning
2008-08-29 16:30 ` Paul Pluzhnikov
1 sibling, 0 replies; 4+ messages in thread
From: Paul Koning @ 2008-08-29 1:55 UTC (permalink / raw)
To: saffroy; +Cc: gdb
>>>>> "Jean-Marc" == Jean-Marc Saffroy <saffroy@gmail.com> writes:
Jean-Marc> Hi, For now, gdb does not seem to be able to do anything
Jean-Marc> useful with a truncated core file on Linux (ie. what you
Jean-Marc> get when your process dies and the core size limit is not
Jean-Marc> 0 but less than the size of the process).
Jean-Marc> In a number of cases, I think it would be nice to be able
Jean-Marc> to at least get a stack trace, and examine local
Jean-Marc> variables. This could require a limited amount of data to
Jean-Marc> be dumped by the kernel.
I've generally had good success (on a different OS) with partial
corefiles. As you said, the issue isn't in GDB, the issue is that the
partial corefile has to have the right subset of data in it.
Jean-Marc> ...In search of a solution, I patched my Linux kernel so that
Jean-Marc> dumping a core would start with the segments that hold a
Jean-Marc> stack (assuming user stack pointers are valid): thus these
Jean-Marc> segments have a chance of being dumped before the core
Jean-Marc> limit is reached.
Jean-Marc> This approach gives interesting results with a (very
Jean-Marc> simple) single threaded process. However, my attempts with
Jean-Marc> a multithreaded process failed, like this:
Jean-Marc> $ gdb <binary> <core> GNU gdb 6.8 <snip> This GDB was
Jean-Marc> configured as "x86_64-unknown-linux-gnu"... Cannot access
Jean-Marc> memory at address 0x2aaaaabc29c8 (gdb) bt #0
Jean-Marc> 0x00002aaaaabc9345 in ?? () #1 0x00000000400179f0 in ?? ()
Jean-Marc> #2 0x0000000000000000 in ?? ()
Jean-Marc> That is: - gdb does not load symbols from binaries - as a
Jean-Marc> result, gdb does not detect threads (because IIRC
Jean-Marc> libthread_db would be loaded when some libpthread.so
Jean-Marc> symbols are detected in the process) - the backtrace seems
Jean-Marc> incorrect: if I have a "full" core dump, gdb shows the
Jean-Marc> following stack trace:
I'm not particularly familiar with how shared library support works in
Linux. The address that's giving you trouble is a shared library
address, not an address in your main binary (or its data space). As a
guess, the problem is that there's an additional bit of critical data
that needs to be in your corefile: the tables that tell GDB what
shared libraries are loaded by the process, and to what addresses.
paul
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: how could gdb handle truncated core files?
2008-08-28 13:55 how could gdb handle truncated core files? Jean-Marc Saffroy
2008-08-29 1:55 ` Paul Koning
@ 2008-08-29 16:30 ` Paul Pluzhnikov
2008-08-29 17:46 ` Jean-Marc Saffroy
1 sibling, 1 reply; 4+ messages in thread
From: Paul Pluzhnikov @ 2008-08-29 16:30 UTC (permalink / raw)
To: Jean-Marc Saffroy; +Cc: gdb
On Wed, Aug 27, 2008 at 8:21 AM, Jean-Marc Saffroy <saffroy@gmail.com> wrote:
> For now, gdb does not seem to be able to do anything useful with a truncated
> core file on Linux (ie. what you get when your process dies and the core
> size limit is not 0 but less than the size of the process).
>
> In a number of cases, I think it would be nice to be able to at least get a
> stack trace, and examine local variables. This could require a limited
> amount of data to be dumped by the kernel.
...
> In search of a solution, I patched my Linux kernel so that dumping a core
> would start with the segments that hold a stack (assuming user stack
> pointers are valid): thus these segments have a chance of being dumped
> before the core limit is reached.
You may also want to look at Google user-space coredumper:
http://code.google.com/p/google-coredumper/
It is often easier to play with than to boot custom kernels,
and it already has support for prioritisation of what is dumped,
as well as compression of the core (core files are often *extremely*
compressible).
> This approach gives interesting results with a (very simple) single threaded
> process. However, my attempts with a multithreaded process failed, like
> this:
>
> $ gdb <binary> <core>
> GNU gdb 6.8
> <snip>
> This GDB was configured as "x86_64-unknown-linux-gnu"...
> Cannot access memory at address 0x2aaaaabc29c8
> (gdb) bt
> #0 0x00002aaaaabc9345 in ?? ()
> #1 0x00000000400179f0 in ?? ()
> #2 0x0000000000000000 in ?? ()
>
> That is:
> - gdb does not load symbols from binaries
The problem here most likely is that _r_debug.r_map was not found
in the (truncated) core. Without it, GDB can't know which libraries
were loaded, hence can't load unwind info for libpthread, hence
can't produce correct stack trace.
> So, I have the following questions to the community:
> - what can I do (eg. in my kernel patch) to have gdb load symbols from
> binaries?
You might get better mileage if you dump at least the beginning of
the initial data segment.
> - do you have any comment on my approach? (eg. I *think* I've seen AIX
> produce small dumps, but I have no idea how they do it, if it's a special
> file format, etc.)
I don't believe AIX has "small" dumps.
AFAIK, they have "regular" dumps (similar to Linux) and "full" dumps,
where full dump includes all the shared libraries, and thus allows
one to examine the core on a developer machine (which may not have
the same version of shared libs as the one used at runtime).
Cheers,
--
Paul Pluzhnikov
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: how could gdb handle truncated core files?
2008-08-29 16:30 ` Paul Pluzhnikov
@ 2008-08-29 17:46 ` Jean-Marc Saffroy
0 siblings, 0 replies; 4+ messages in thread
From: Jean-Marc Saffroy @ 2008-08-29 17:46 UTC (permalink / raw)
To: Paul Pluzhnikov; +Cc: gdb
On Wed, 27 Aug 2008, Paul Pluzhnikov wrote:
> You may also want to look at Google user-space coredumper:
> http://code.google.com/p/google-coredumper/
Cool, this project seems to do what I need, with a limited memory
footprint! :)
> It is often easier to play with than to boot custom kernels,
I'm not fond of custom kernels either. Should a clean kernel patch have
sufficed, I would have pushed for its inclusion in the mainline.
> and it already has support for prioritisation of what is dumped,
> as well as compression of the core (core files are often *extremely*
> compressible).
This prioritisation seems to be a simple and efficient way of reducing the
core size to something usable in the use cases I mentioned.
>> - gdb does not load symbols from binaries
>
> The problem here most likely is that _r_debug.r_map was not found
> in the (truncated) core. Without it, GDB can't know which libraries
> were loaded, hence can't load unwind info for libpthread, hence
> can't produce correct stack trace.
Indeed, that's certainly the problem! Thanks for pointing out. It seems
that coredumper's prioritisation works well enough that it does not need
to care about this level of detail directly.
Maybe the kernel could use the same approach, or a separate program could
trim full core dumps on the fly (see "Piping core dumps" in
http://lwn.net/Articles/280959/ ), so that linking all applications with
libcoredumper could be avoided.
But I'm going off-topic for this list. ;)
Cheers,
Jean-Marc
--
saffroy@gmail.com
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-08-27 22:46 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-08-28 13:55 how could gdb handle truncated core files? Jean-Marc Saffroy
2008-08-29 1:55 ` Paul Koning
2008-08-29 16:30 ` Paul Pluzhnikov
2008-08-29 17:46 ` Jean-Marc Saffroy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox