Mirror of the gdb mailing list
 help / color / mirror / Atom feed
From: Paul Smith <psmith@gnu.org>
To: Andreas Schwab <schwab@linux-m68k.org>
Cc: Andi Kleen <andi@firstfloor.org>, gdb@sourceware.org
Subject: Re: Partial cores using Linux "pipe" core_pattern
Date: Thu, 21 May 2009 16:32:00 -0000	[thread overview]
Message-ID: <1242923544.29250.134.camel@psmith-ubeta.netezza.com> (raw)
In-Reply-To: <m27i0ejzbi.fsf@igel.home>

On Mon, 2009-05-18 at 15:49 +0200, Andreas Schwab wrote:
> Apparently the ELF core dumper cannot handle short writes (see
> dump_write in fs/binfmt_elf.c).  You should probably use a read buffer
> of at least a page, which is the most the kernel tries to write at
> once.

Sorry for the delay; I lost my repro case and it took me a while to find
one.  And now when I dump cores over NFS, the bonding driver is causing
a kernel panic so there's that *sigh*.  I reconfigured my interfaces to
use a single non-bonded interface to avoid that issue and concentrate on
this one... I'll worry about that tomorrow.

I still need to do more investigation but I have more clarity around
when I see these short cores vs. "good" cores.  My system has a single
process and when a request for work comes in it forks (but not execs) a
number of helper copies of itself (typically 8).

In my test, all copies run the same code and so all will segv at the
around the same time (I just added code to do an invalid pointer access
at different areas of the program when certain test files exist).

Some areas of the code consider a segv or similar to be unrecoverable.
In those situations I have a signal handler that stops the other
processes in the process group, dumps a single core, then those other
process do NOT dump core and the whole thing exits.  The cores I get in
this situation are fine.

Other areas of the code consider a segv or similar to be recoverable.
In this case, each worker is left to dump core (or not) on its own, and
the system overall stays up.  When I force a segv in these areas, I get
the short cores.  Note that I am serializing my core dumping program
(the one cores are piped to) via an flock() file on the local disk, and
this serialization (based on messages to syslog) does seem to be
working.  What I see are 6-8 core dump messages from the kernel, then my
core saver runs on the first one and dumps about 50M of the 1G process
space (about 188 reads of 256K buffers plus some change).  Then that
exits and the second one starts and it dumps a 64K core (1 read), then
the next also dumps 64K etc.

It _feels_ to me like there's some kind of COW or similar mismanagement
of the VM for these forked processes such that they interfere and we
can't get a full and complete core dump when all of them are dumping at
the same time.

I'm going to do more investigation but maybe this rings some bells with
someone.


  parent reply	other threads:[~2009-05-21 16:32 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-18  1:23 Paul Smith
2009-05-18  6:05 ` Paul Pluzhnikov
2009-05-18 13:22   ` Paul Smith
2009-05-18  7:25 ` Andi Kleen
2009-05-18 13:29   ` Paul Smith
2009-05-18 13:49     ` Andreas Schwab
2009-05-18 14:32       ` Paul Smith
2009-05-21 16:32       ` Paul Smith [this message]
2009-05-26 19:26         ` Paul Smith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1242923544.29250.134.camel@psmith-ubeta.netezza.com \
    --to=psmith@gnu.org \
    --cc=andi@firstfloor.org \
    --cc=gdb@sourceware.org \
    --cc=schwab@linux-m68k.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox