how could gdb handle truncated core files?

Mirror of the gdb mailing list
 help / color / mirror / Atom feed

* how could gdb handle truncated core files?
@ 2008-08-28 13:55 Jean-Marc Saffroy
  2008-08-29  1:55 ` Paul Koning
  2008-08-29 16:30 ` Paul Pluzhnikov
  0 siblings, 2 replies; 4+ messages in thread
From: Jean-Marc Saffroy @ 2008-08-28 13:55 UTC (permalink / raw)
  To: gdb

Hi,

For now, gdb does not seem to be able to do anything useful with a 
truncated core file on Linux (ie. what you get when your process dies and 
the core size limit is not 0 but less than the size of the process).

In a number of cases, I think it would be nice to be able to at least get 
a stack trace, and examine local variables. This could require a limited 
amount of data to be dumped by the kernel.

I'm curious what could be done to improve this situation, because I see 
two potential use cases:
  - embedded systems developpers: sometimes it's hard to find enough space 
to write your core file (eg. the application uses 80% of your RAM, and 
your only writable filesystem is a tiny temporary RAM disk)
  - parallel application developpers on large clusters: sometimes you use a 
huge amount of RAM in a bunch of processes (eg. an MPI parallel program), 
and dumping all that on your home directory will fill your disk quota 
and/or keep your file server busy for a very long time

In search of a solution, I patched my Linux kernel so that dumping a core 
would start with the segments that hold a stack (assuming user stack 
pointers are valid): thus these segments have a chance of being dumped 
before the core limit is reached.

This approach gives interesting results with a (very simple) single 
threaded process. However, my attempts with a multithreaded process 
failed, like this:

$ gdb <binary> <core>
GNU gdb 6.8
<snip>
This GDB was configured as "x86_64-unknown-linux-gnu"...
Cannot access memory at address 0x2aaaaabc29c8
(gdb) bt
#0  0x00002aaaaabc9345 in ?? ()
#1  0x00000000400179f0 in ?? ()
#2  0x0000000000000000 in ?? ()

That is:
  - gdb does not load symbols from binaries
  - as a result, gdb does not detect threads (because IIRC libthread_db 
would be loaded when some libpthread.so symbols are detected in the 
process)
  - the backtrace seems incorrect: if I have a "full" core dump, gdb shows 
the following stack trace:

(gdb) bt
#0  0x00002aaaaabc9345 in pthread_create@@GLIBC_2.2.5 ()
    from /lib/libpthread.so.0
#1  0x00000000004005c8 in main (argc=<value optimized out>,
     argv=<value optimized out>) at thrcore.c:24

So, I have the following questions to the community:
  - what can I do (eg. in my kernel patch) to have gdb load symbols from 
binaries?
  - do you have any comment on my approach? (eg. I *think* I've seen AIX 
produce small dumps, but I have no idea how they do it, if it's a special 
file format, etc.)

Thanks for your comments!


Cheers,
Jean-Marc

-- 
saffroy@gmail.com


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: how could gdb handle truncated core files?
  2008-08-28 13:55 how could gdb handle truncated core files? Jean-Marc Saffroy
@ 2008-08-29  1:55 ` Paul Koning
  2008-08-29 16:30 ` Paul Pluzhnikov
  1 sibling, 0 replies; 4+ messages in thread
From: Paul Koning @ 2008-08-29  1:55 UTC (permalink / raw)
  To: saffroy; +Cc: gdb

>>>>> "Jean-Marc" == Jean-Marc Saffroy <saffroy@gmail.com> writes:

 Jean-Marc> Hi, For now, gdb does not seem to be able to do anything
 Jean-Marc> useful with a truncated core file on Linux (ie. what you
 Jean-Marc> get when your process dies and the core size limit is not
 Jean-Marc> 0 but less than the size of the process).

 Jean-Marc> In a number of cases, I think it would be nice to be able
 Jean-Marc> to at least get a stack trace, and examine local
 Jean-Marc> variables. This could require a limited amount of data to
 Jean-Marc> be dumped by the kernel.

I've generally had good success (on a different OS) with partial
corefiles.  As you said, the issue isn't in GDB, the issue is that the
partial corefile has to have the right subset of data in it.

 Jean-Marc> ...In search of a solution, I patched my Linux kernel so that
 Jean-Marc> dumping a core would start with the segments that hold a
 Jean-Marc> stack (assuming user stack pointers are valid): thus these
 Jean-Marc> segments have a chance of being dumped before the core
 Jean-Marc> limit is reached.

 Jean-Marc> This approach gives interesting results with a (very
 Jean-Marc> simple) single threaded process. However, my attempts with
 Jean-Marc> a multithreaded process failed, like this:

 Jean-Marc> $ gdb <binary> <core> GNU gdb 6.8 <snip> This GDB was
 Jean-Marc> configured as "x86_64-unknown-linux-gnu"...  Cannot access
 Jean-Marc> memory at address 0x2aaaaabc29c8 (gdb) bt #0
 Jean-Marc> 0x00002aaaaabc9345 in ?? () #1 0x00000000400179f0 in ?? ()
 Jean-Marc> #2 0x0000000000000000 in ?? ()

 Jean-Marc> That is: - gdb does not load symbols from binaries - as a
 Jean-Marc> result, gdb does not detect threads (because IIRC
 Jean-Marc> libthread_db would be loaded when some libpthread.so
 Jean-Marc> symbols are detected in the process) - the backtrace seems
 Jean-Marc> incorrect: if I have a "full" core dump, gdb shows the
 Jean-Marc> following stack trace:

I'm not particularly familiar with how shared library support works in
Linux.  The address that's giving you trouble is a shared library
address, not an address in your main binary (or its data space).  As a
guess, the problem is that there's an additional bit of critical data
that needs to be in your corefile: the tables that tell GDB what
shared libraries are loaded by the process, and to what addresses.

       paul

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: how could gdb handle truncated core files?
  2008-08-28 13:55 how could gdb handle truncated core files? Jean-Marc Saffroy
  2008-08-29  1:55 ` Paul Koning
@ 2008-08-29 16:30 ` Paul Pluzhnikov
  2008-08-29 17:46   ` Jean-Marc Saffroy
  1 sibling, 1 reply; 4+ messages in thread
From: Paul Pluzhnikov @ 2008-08-29 16:30 UTC (permalink / raw)
  To: Jean-Marc Saffroy; +Cc: gdb

On Wed, Aug 27, 2008 at 8:21 AM, Jean-Marc Saffroy <saffroy@gmail.com> wrote:

> For now, gdb does not seem to be able to do anything useful with a truncated
> core file on Linux (ie. what you get when your process dies and the core
> size limit is not 0 but less than the size of the process).
>
> In a number of cases, I think it would be nice to be able to at least get a
> stack trace, and examine local variables. This could require a limited
> amount of data to be dumped by the kernel.
...
> In search of a solution, I patched my Linux kernel so that dumping a core
> would start with the segments that hold a stack (assuming user stack
> pointers are valid): thus these segments have a chance of being dumped
> before the core limit is reached.

You may also want to look at Google user-space coredumper:
  http://code.google.com/p/google-coredumper/

It is often easier to play with than to boot custom kernels,
and it already has support for prioritisation of what is dumped,
as well as compression of the core (core files are often *extremely*
compressible).

> This approach gives interesting results with a (very simple) single threaded
> process. However, my attempts with a multithreaded process failed, like
> this:
>
> $ gdb <binary> <core>
> GNU gdb 6.8
> <snip>
> This GDB was configured as "x86_64-unknown-linux-gnu"...
> Cannot access memory at address 0x2aaaaabc29c8
> (gdb) bt
> #0  0x00002aaaaabc9345 in ?? ()
> #1  0x00000000400179f0 in ?? ()
> #2  0x0000000000000000 in ?? ()
>
> That is:
>  - gdb does not load symbols from binaries

The problem here most likely is that _r_debug.r_map was not found
in the (truncated) core. Without it, GDB can't know which libraries
were loaded, hence can't load unwind info for libpthread, hence
can't produce correct stack trace.

> So, I have the following questions to the community:
>  - what can I do (eg. in my kernel patch) to have gdb load symbols from
> binaries?

You might get better mileage if you dump at least the beginning of
the initial data segment.

>  - do you have any comment on my approach? (eg. I *think* I've seen AIX
> produce small dumps, but I have no idea how they do it, if it's a special
> file format, etc.)

I don't believe AIX has "small" dumps.

AFAIK, they have "regular" dumps (similar to Linux) and "full" dumps,
where full dump includes all the shared libraries, and thus allows
one to examine the core on a developer machine (which may not have
the same version of shared libs as the one used at runtime).

Cheers,
-- 
Paul Pluzhnikov

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: how could gdb handle truncated core files?
  2008-08-29 16:30 ` Paul Pluzhnikov
@ 2008-08-29 17:46   ` Jean-Marc Saffroy
  0 siblings, 0 replies; 4+ messages in thread
From: Jean-Marc Saffroy @ 2008-08-29 17:46 UTC (permalink / raw)
  To: Paul Pluzhnikov; +Cc: gdb

On Wed, 27 Aug 2008, Paul Pluzhnikov wrote:

> You may also want to look at Google user-space coredumper:
>  http://code.google.com/p/google-coredumper/

Cool, this project seems to do what I need, with a limited memory 
footprint! :)

> It is often easier to play with than to boot custom kernels,

I'm not fond of custom kernels either. Should a clean kernel patch have 
sufficed, I would have pushed for its inclusion in the mainline.

> and it already has support for prioritisation of what is dumped,
> as well as compression of the core (core files are often *extremely*
> compressible).

This prioritisation seems to be a simple and efficient way of reducing the 
core size to something usable in the use cases I mentioned.

>>  - gdb does not load symbols from binaries
>
> The problem here most likely is that _r_debug.r_map was not found
> in the (truncated) core. Without it, GDB can't know which libraries
> were loaded, hence can't load unwind info for libpthread, hence
> can't produce correct stack trace.

Indeed, that's certainly the problem! Thanks for pointing out. It seems 
that coredumper's prioritisation works well enough that it does not need 
to care about this level of detail directly.

Maybe the kernel could use the same approach, or a separate program could 
trim full core dumps on the fly (see "Piping core dumps" in 
http://lwn.net/Articles/280959/ ), so that linking all applications with 
libcoredumper could be avoided.

But I'm going off-topic for this list. ;)

Cheers,
Jean-Marc

-- 
saffroy@gmail.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-08-27 22:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-08-28 13:55 how could gdb handle truncated core files? Jean-Marc Saffroy
2008-08-29  1:55 ` Paul Koning
2008-08-29 16:30 ` Paul Pluzhnikov
2008-08-29 17:46   ` Jean-Marc Saffroy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox