Mirror of the gdb mailing list
 help / color / mirror / Atom feed
* Partial cores using Linux "pipe" core_pattern
@ 2009-05-18  1:23 Paul Smith
  2009-05-18  6:05 ` Paul Pluzhnikov
  2009-05-18  7:25 ` Andi Kleen
  0 siblings, 2 replies; 9+ messages in thread
From: Paul Smith @ 2009-05-18  1:23 UTC (permalink / raw)
  To: gdb

I'm not sure this is the best list for this question; if anyone has any
other thoughts of where to ask please let me know.

I'm having problems debugging some cores being generated on a
distributed system.  The "client" (where the cores are being dumped) is
running on a cut-down GNU/Linux system, running out of a ramdisk (no
local disk).  To preserve cores I have set up NFS and automount, and I'm
dumping cores over the network to a host.  In order to make this as
efficient as possible I am using the Linux (I'm running 2.6.27) kernel's
pipe capability in the core_pattern and piping it to my own program to
write compressed output using gzopen()/etc.  I have some other locking,
etc. to do myself which is why I have my own program instead of just
piping to gzip.

Most of the time this works great; the core appears on the host and I
can decompress it and debug it and it's very nice.

But sometimes, the core is truncated and can't be debugged.  Basically
it has the first part of the core file without error (I've seen sizes
both 64K(!) and about 65M) but obviously you can't even get a backtrace,
with the whole last part of the core missing.  However, it's still a
valid compressed file (it decompresses just fine) so it's not a network
error.  After some experimentation I can determine that indeed the
generated core file contains all the data that was read from the
kernel... in this situation, it appears, the kernel simply doesn't give
me all the data to construct the core.

I've instrumented every single function with checking for errors and
writing issues to syslog (including informational messages so I know the
logging works) and no errors are printed.  The size of the core that I
get from read(2)'ing stdin is just short, but read(2) never fails or
shows any errors!

Does anyone have any thoughts about where I can look next to try to
figure out what's going on?  Ideas or knowledge about limitations of the
kernel's core_pattern pipe capability, such as timing issues etc., that
might be leaving me with short cores?

I'm pretty stumped here!

-- 
-------------------------------------------------------------------------------
 Paul D. Smith <psmith@gnu.org>          Find some GNU make tips at:
 http://www.gnu.org                      http://make.mad-scientist.us
 "Please remain calm...I may be mad, but I am a professional." --Mad Scientist


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-05-26 19:26 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-18  1:23 Partial cores using Linux "pipe" core_pattern Paul Smith
2009-05-18  6:05 ` Paul Pluzhnikov
2009-05-18 13:22   ` Paul Smith
2009-05-18  7:25 ` Andi Kleen
2009-05-18 13:29   ` Paul Smith
2009-05-18 13:49     ` Andreas Schwab
2009-05-18 14:32       ` Paul Smith
2009-05-21 16:32       ` Paul Smith
2009-05-26 19:26         ` Paul Smith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox