Partial cores using Linux "pipe" core

Mirror of the gdb mailing list
 help / color / mirror / Atom feed

* Partial cores using Linux "pipe" core_pattern
@ 2009-05-18  1:23 Paul Smith
  2009-05-18  6:05 ` Paul Pluzhnikov
  2009-05-18  7:25 ` Andi Kleen
  0 siblings, 2 replies; 9+ messages in thread
From: Paul Smith @ 2009-05-18  1:23 UTC (permalink / raw)
  To: gdb

I'm not sure this is the best list for this question; if anyone has any
other thoughts of where to ask please let me know.

I'm having problems debugging some cores being generated on a
distributed system.  The "client" (where the cores are being dumped) is
running on a cut-down GNU/Linux system, running out of a ramdisk (no
local disk).  To preserve cores I have set up NFS and automount, and I'm
dumping cores over the network to a host.  In order to make this as
efficient as possible I am using the Linux (I'm running 2.6.27) kernel's
pipe capability in the core_pattern and piping it to my own program to
write compressed output using gzopen()/etc.  I have some other locking,
etc. to do myself which is why I have my own program instead of just
piping to gzip.

Most of the time this works great; the core appears on the host and I
can decompress it and debug it and it's very nice.

But sometimes, the core is truncated and can't be debugged.  Basically
it has the first part of the core file without error (I've seen sizes
both 64K(!) and about 65M) but obviously you can't even get a backtrace,
with the whole last part of the core missing.  However, it's still a
valid compressed file (it decompresses just fine) so it's not a network
error.  After some experimentation I can determine that indeed the
generated core file contains all the data that was read from the
kernel... in this situation, it appears, the kernel simply doesn't give
me all the data to construct the core.

I've instrumented every single function with checking for errors and
writing issues to syslog (including informational messages so I know the
logging works) and no errors are printed.  The size of the core that I
get from read(2)'ing stdin is just short, but read(2) never fails or
shows any errors!

Does anyone have any thoughts about where I can look next to try to
figure out what's going on?  Ideas or knowledge about limitations of the
kernel's core_pattern pipe capability, such as timing issues etc., that
might be leaving me with short cores?

I'm pretty stumped here!

-- 
-------------------------------------------------------------------------------
 Paul D. Smith <psmith@gnu.org>          Find some GNU make tips at:
 http://www.gnu.org                      http://make.mad-scientist.us
 "Please remain calm...I may be mad, but I am a professional." --Mad Scientist

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Partial cores using Linux "pipe" core_pattern
  2009-05-18  1:23 Partial cores using Linux "pipe" core_pattern Paul Smith
@ 2009-05-18  6:05 ` Paul Pluzhnikov
  2009-05-18 13:22   ` Paul Smith
  2009-05-18  7:25 ` Andi Kleen
  1 sibling, 1 reply; 9+ messages in thread
From: Paul Pluzhnikov @ 2009-05-18  6:05 UTC (permalink / raw)
  To: psmith; +Cc: gdb

On Sun, May 17, 2009 at 6:22 PM, Paul Smith <psmith@gnu.org> wrote:

> I've instrumented every single function with checking for errors and
> writing issues to syslog (including informational messages so I know the
> logging works) and no errors are printed.  The size of the core that I
> get from read(2)'ing stdin is just short, but read(2) never fails or
> shows any errors!

You can't be read(2)ing stdin, since read(2) doesn't take FILE*.
Are you doing read(0, ...) or are you doing fread(..., stdin) ?

If the latter, are you handling the feof() vs. ferror() difference
correctly? A small code sample might help.

Cheers,
-- 
Paul Pluzhnikov


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Partial cores using Linux "pipe" core_pattern
  2009-05-18  1:23 Partial cores using Linux "pipe" core_pattern Paul Smith
  2009-05-18  6:05 ` Paul Pluzhnikov
@ 2009-05-18  7:25 ` Andi Kleen
  2009-05-18 13:29   ` Paul Smith
  1 sibling, 1 reply; 9+ messages in thread
From: Andi Kleen @ 2009-05-18  7:25 UTC (permalink / raw)
  To: psmith; +Cc: gdb

Paul Smith <psmith@gnu.org> writes:
>
> Does anyone have any thoughts about where I can look next to try to
> figure out what's going on?  Ideas or knowledge about limitations of the
> kernel's core_pattern pipe capability, such as timing issues etc., that
> might be leaving me with short cores?

Sounds more like a kernel problem to me. It would be good if you
could find out in what way the cores are truncated, then do a proper
kernel bug report.

Or add printks to the kernel coredumper and try to figure out this
way where/why it stops.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Partial cores using Linux "pipe" core_pattern
  2009-05-18  6:05 ` Paul Pluzhnikov
@ 2009-05-18 13:22   ` Paul Smith
  0 siblings, 0 replies; 9+ messages in thread
From: Paul Smith @ 2009-05-18 13:22 UTC (permalink / raw)
  To: Paul Pluzhnikov; +Cc: gdb

On Sun, 2009-05-17 at 23:04 -0700, Paul Pluzhnikov wrote:
> On Sun, May 17, 2009 at 6:22 PM, Paul Smith <psmith@gnu.org> wrote:
> 
> > I've instrumented every single function with checking for errors and
> > writing issues to syslog (including informational messages so I know the
> > logging works) and no errors are printed.  The size of the core that I
> > get from read(2)'ing stdin is just short, but read(2) never fails or
> > shows any errors!
> 
> You can't be read(2)ing stdin, since read(2) doesn't take FILE*.
> Are you doing read(0, ...) or are you doing fread(..., stdin) ?
> 
> If the latter, are you handling the feof() vs. ferror() difference
> correctly? A small code sample might help.

Oh sorry; I should have been explicit.  I use fileno(stdin) to get an
FD.  I'm ONLY using FD-based calls; I don't use FILE*-based calls at all
(on stdin).

Here's my main loop.  I've hacked it a bit for ridiculous safety (for
example I used to have the errno log inside the if (l<0) but moved it
out of a surfeit of caution).  When I get a short core, the syslog entry
at the end shows the actual short core length, not the "full" core
length--so basically that's all the kernel is giving me.

------------------------------------------------------------
    coref = gzopen(core_path, "wb");
    if (!coref) {
        _vlog(LOG_ERR, "gzopen(%s): %s\n", core_path, _zerr());
        exit(1);
    }

    ssize_t tot = 0;
    while (true) {
        errno = 0;
        ssize_t l = read(infd, buf, bufsize);

        if (errno)
            _vlog(LOG_ERR, "read(stdin): %s\n", strerror(errno));

        // Did we read it all?
        if (l == 0)
            break;

        if (caughtsig) {
            _vlog(LOG_ERR, "savecore got signal %d--continuing\n",
                  caughtsig);
            caughtsig = 0;
        }

        // Did we get an error?
        if (l < 0) {
            if (errno == EINTR || errno == EAGAIN)
                continue;
            _clean_and_fail();
        }

        // Write what we got so far
        tot += l;
        char *p = buf;
        while (l > 0) {
            int w = gzwrite(coref, p, l);
            if (w <= 0) {
                _vlog(LOG_ERR, "gzwrite(%s): %s\n", core_path, _zerr());
                _clean_and_fail();
            }

            // Some amount of write succeeded; write the rest
            p += w;
            l -= w;
        }
    }

    if (gzclose(coref) < 0) {
        _vlog(LOG_ERR, "gzclose(%s): %s\n", core_path, _zerr());
        _clean_and_fail();
    }

    // Give some information on the core
    _vlog(LOG_ERR, "%s(%s): coredump complete: core.%s (%lu B)\n",
          argv[av_CMDNAME], argv[av_PID], core_suffix, tot);


-- 
-------------------------------------------------------------------------------
 Paul D. Smith <psmith@gnu.org>          Find some GNU make tips at:
 http://www.gnu.org                      http://make.mad-scientist.us
 "Please remain calm...I may be mad, but I am a professional." --Mad Scientist


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Partial cores using Linux "pipe" core_pattern
  2009-05-18  7:25 ` Andi Kleen
@ 2009-05-18 13:29   ` Paul Smith
  2009-05-18 13:49     ` Andreas Schwab
  0 siblings, 1 reply; 9+ messages in thread
From: Paul Smith @ 2009-05-18 13:29 UTC (permalink / raw)
  To: Andi Kleen; +Cc: gdb

On Mon, 2009-05-18 at 09:25 +0200, Andi Kleen wrote:
> Paul Smith <psmith@gnu.org> writes:
> >
> > Does anyone have any thoughts about where I can look next to try to
> > figure out what's going on?  Ideas or knowledge about limitations of the
> > kernel's core_pattern pipe capability, such as timing issues etc., that
> > might be leaving me with short cores?
> 
> Sounds more like a kernel problem to me. It would be good if you
> could find out in what way the cores are truncated, then do a proper
> kernel bug report.
> 
> Or add printks to the kernel coredumper and try to figure out this
> way where/why it stops.

Heh, I was afraid you'd say that :-).  OK, I'll dig into the kernel
coredumper and add some printks etc.

I do have problems getting a readily reproducible test case.  It seems
timing related; I tried replacing my gzopen()/etc. calls with
uncompressed open()/etc. and that seemed to help; no more short cores.

And yet at other times I can dump the core through gzopen() no problem.

-- 
-------------------------------------------------------------------------------
 Paul D. Smith <psmith@gnu.org>          Find some GNU make tips at:
 http://www.gnu.org                      http://make.mad-scientist.us
 "Please remain calm...I may be mad, but I am a professional." --Mad Scientist


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Partial cores using Linux "pipe" core_pattern
  2009-05-18 13:29   ` Paul Smith
@ 2009-05-18 13:49     ` Andreas Schwab
  2009-05-18 14:32       ` Paul Smith
  2009-05-21 16:32       ` Paul Smith
  0 siblings, 2 replies; 9+ messages in thread
From: Andreas Schwab @ 2009-05-18 13:49 UTC (permalink / raw)
  To: psmith; +Cc: Andi Kleen, gdb

Paul Smith <psmith@gnu.org> writes:

> On Mon, 2009-05-18 at 09:25 +0200, Andi Kleen wrote:
>> Paul Smith <psmith@gnu.org> writes:
>> >
>> > Does anyone have any thoughts about where I can look next to try to
>> > figure out what's going on?  Ideas or knowledge about limitations of the
>> > kernel's core_pattern pipe capability, such as timing issues etc., that
>> > might be leaving me with short cores?
>> 
>> Sounds more like a kernel problem to me. It would be good if you
>> could find out in what way the cores are truncated, then do a proper
>> kernel bug report.
>> 
>> Or add printks to the kernel coredumper and try to figure out this
>> way where/why it stops.
>
> Heh, I was afraid you'd say that :-).  OK, I'll dig into the kernel
> coredumper and add some printks etc.

Apparently the ELF core dumper cannot handle short writes (see
dump_write in fs/binfmt_elf.c).  You should probably use a read buffer
of at least a page, which is the most the kernel tries to write at once.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Partial cores using Linux "pipe" core_pattern
  2009-05-18 13:49     ` Andreas Schwab
@ 2009-05-18 14:32       ` Paul Smith
  2009-05-21 16:32       ` Paul Smith
  1 sibling, 0 replies; 9+ messages in thread
From: Paul Smith @ 2009-05-18 14:32 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Andi Kleen, gdb

On Mon, 2009-05-18 at 15:49 +0200, Andreas Schwab wrote:
> Apparently the ELF core dumper cannot handle short writes (see
> dump_write in fs/binfmt_elf.c).  You should probably use a read buffer
> of at least a page, which is the most the kernel tries to write at once.

This is good info; thanks.  But, my read buffer is 256K:

    const int bufsize = 1024 * 256;
    char *buf = (char *)malloc(bufsize);
    if (!buf) {
        _vlog(LOG_ERR, "Failed to allocate read buffer size %d\n", bufsize);
        exit(1);
    }

(this is compiled with g++ for hysterical raisins, in case that wasn't
obvious).  I didn't check to see what the "normal" read sizes are.

-- 
-------------------------------------------------------------------------------
 Paul D. Smith <psmith@gnu.org>          Find some GNU make tips at:
 http://www.gnu.org                      http://make.mad-scientist.us
 "Please remain calm...I may be mad, but I am a professional." --Mad Scientist


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Partial cores using Linux "pipe" core_pattern
  2009-05-18 13:49     ` Andreas Schwab
  2009-05-18 14:32       ` Paul Smith
@ 2009-05-21 16:32       ` Paul Smith
  2009-05-26 19:26         ` Paul Smith
  1 sibling, 1 reply; 9+ messages in thread
From: Paul Smith @ 2009-05-21 16:32 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Andi Kleen, gdb

On Mon, 2009-05-18 at 15:49 +0200, Andreas Schwab wrote:
> Apparently the ELF core dumper cannot handle short writes (see
> dump_write in fs/binfmt_elf.c).  You should probably use a read buffer
> of at least a page, which is the most the kernel tries to write at
> once.

Sorry for the delay; I lost my repro case and it took me a while to find
one.  And now when I dump cores over NFS, the bonding driver is causing
a kernel panic so there's that *sigh*.  I reconfigured my interfaces to
use a single non-bonded interface to avoid that issue and concentrate on
this one... I'll worry about that tomorrow.

I still need to do more investigation but I have more clarity around
when I see these short cores vs. "good" cores.  My system has a single
process and when a request for work comes in it forks (but not execs) a
number of helper copies of itself (typically 8).

In my test, all copies run the same code and so all will segv at the
around the same time (I just added code to do an invalid pointer access
at different areas of the program when certain test files exist).

Some areas of the code consider a segv or similar to be unrecoverable.
In those situations I have a signal handler that stops the other
processes in the process group, dumps a single core, then those other
process do NOT dump core and the whole thing exits.  The cores I get in
this situation are fine.

Other areas of the code consider a segv or similar to be recoverable.
In this case, each worker is left to dump core (or not) on its own, and
the system overall stays up.  When I force a segv in these areas, I get
the short cores.  Note that I am serializing my core dumping program
(the one cores are piped to) via an flock() file on the local disk, and
this serialization (based on messages to syslog) does seem to be
working.  What I see are 6-8 core dump messages from the kernel, then my
core saver runs on the first one and dumps about 50M of the 1G process
space (about 188 reads of 256K buffers plus some change).  Then that
exits and the second one starts and it dumps a 64K core (1 read), then
the next also dumps 64K etc.

It _feels_ to me like there's some kind of COW or similar mismanagement
of the VM for these forked processes such that they interfere and we
can't get a full and complete core dump when all of them are dumping at
the same time.

I'm going to do more investigation but maybe this rings some bells with
someone.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Partial cores using Linux "pipe" core_pattern
  2009-05-21 16:32       ` Paul Smith
@ 2009-05-26 19:26         ` Paul Smith
  0 siblings, 0 replies; 9+ messages in thread
From: Paul Smith @ 2009-05-26 19:26 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Andi Kleen, gdb

On Thu, 2009-05-21 at 12:32 -0400, Paul Smith wrote:
> It _feels_ to me like there's some kind of COW or similar mismanagement
> of the VM for these forked processes such that they interfere and we
> can't get a full and complete core dump when all of them are dumping at
> the same time.

Well, my feelings were way off.  I did more investigation and it turns
out that what's happening is that the TIF_SIGPENDING flag is being set
during the core dump.  This causes the write to the pipe to stop, and
the core dumping code makes no effort to manage errors or partial
writes.  Here's the function in binfmt_elf.c that does the write:

        static int dump_write(struct file *file, const void *addr, int nr)
        {
        	return file->f_op->write(file, addr, nr, &file->f_pos) == nr;
        }

If we get back anything other than exactly the number of bytes we tried
to write, we give up and return false (0).

This definitely returns false when I see the short cores, and never when
I see "normal" cores.  I modified it to see what it's getting back and
file->f_op->write() is returning ERESTARTSYS.  So I annotated
fs/pipe.c:pipe_write() and I'm definitely getting it from this code, at
line 550 or so:

        if (signal_pending(current)) {
        	if (!ret)
        		ret = -ERESTARTSYS;
        	break;
        }

I've been posting on the linux-kernel mailing list, so this is really
just an FYI to anyone interested in following the progress; you can find
the current end of the thread here:

http://marc.info/?l=linux-kernel&m=124336093401443&w=2

So far I've failed to gain any interest from anyone on that list but
hopefully someone will respond, who can help me figure out what to do
next.

Cheers!

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2009-05-26 19:26 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-18  1:23 Partial cores using Linux "pipe" core_pattern Paul Smith
2009-05-18  6:05 ` Paul Pluzhnikov
2009-05-18 13:22   ` Paul Smith
2009-05-18  7:25 ` Andi Kleen
2009-05-18 13:29   ` Paul Smith
2009-05-18 13:49     ` Andreas Schwab
2009-05-18 14:32       ` Paul Smith
2009-05-21 16:32       ` Paul Smith
2009-05-26 19:26         ` Paul Smith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox