* Partial cores using Linux "pipe" core_pattern @ 2009-05-18 1:23 Paul Smith 2009-05-18 6:05 ` Paul Pluzhnikov 2009-05-18 7:25 ` Andi Kleen 0 siblings, 2 replies; 9+ messages in thread From: Paul Smith @ 2009-05-18 1:23 UTC (permalink / raw) To: gdb I'm not sure this is the best list for this question; if anyone has any other thoughts of where to ask please let me know. I'm having problems debugging some cores being generated on a distributed system. The "client" (where the cores are being dumped) is running on a cut-down GNU/Linux system, running out of a ramdisk (no local disk). To preserve cores I have set up NFS and automount, and I'm dumping cores over the network to a host. In order to make this as efficient as possible I am using the Linux (I'm running 2.6.27) kernel's pipe capability in the core_pattern and piping it to my own program to write compressed output using gzopen()/etc. I have some other locking, etc. to do myself which is why I have my own program instead of just piping to gzip. Most of the time this works great; the core appears on the host and I can decompress it and debug it and it's very nice. But sometimes, the core is truncated and can't be debugged. Basically it has the first part of the core file without error (I've seen sizes both 64K(!) and about 65M) but obviously you can't even get a backtrace, with the whole last part of the core missing. However, it's still a valid compressed file (it decompresses just fine) so it's not a network error. After some experimentation I can determine that indeed the generated core file contains all the data that was read from the kernel... in this situation, it appears, the kernel simply doesn't give me all the data to construct the core. I've instrumented every single function with checking for errors and writing issues to syslog (including informational messages so I know the logging works) and no errors are printed. The size of the core that I get from read(2)'ing stdin is just short, but read(2) never fails or shows any errors! Does anyone have any thoughts about where I can look next to try to figure out what's going on? Ideas or knowledge about limitations of the kernel's core_pattern pipe capability, such as timing issues etc., that might be leaving me with short cores? I'm pretty stumped here! -- ------------------------------------------------------------------------------- Paul D. Smith <psmith@gnu.org> Find some GNU make tips at: http://www.gnu.org http://make.mad-scientist.us "Please remain calm...I may be mad, but I am a professional." --Mad Scientist ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Partial cores using Linux "pipe" core_pattern 2009-05-18 1:23 Partial cores using Linux "pipe" core_pattern Paul Smith @ 2009-05-18 6:05 ` Paul Pluzhnikov 2009-05-18 13:22 ` Paul Smith 2009-05-18 7:25 ` Andi Kleen 1 sibling, 1 reply; 9+ messages in thread From: Paul Pluzhnikov @ 2009-05-18 6:05 UTC (permalink / raw) To: psmith; +Cc: gdb On Sun, May 17, 2009 at 6:22 PM, Paul Smith <psmith@gnu.org> wrote: > I've instrumented every single function with checking for errors and > writing issues to syslog (including informational messages so I know the > logging works) and no errors are printed. The size of the core that I > get from read(2)'ing stdin is just short, but read(2) never fails or > shows any errors! You can't be read(2)ing stdin, since read(2) doesn't take FILE*. Are you doing read(0, ...) or are you doing fread(..., stdin) ? If the latter, are you handling the feof() vs. ferror() difference correctly? A small code sample might help. Cheers, -- Paul Pluzhnikov ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Partial cores using Linux "pipe" core_pattern 2009-05-18 6:05 ` Paul Pluzhnikov @ 2009-05-18 13:22 ` Paul Smith 0 siblings, 0 replies; 9+ messages in thread From: Paul Smith @ 2009-05-18 13:22 UTC (permalink / raw) To: Paul Pluzhnikov; +Cc: gdb On Sun, 2009-05-17 at 23:04 -0700, Paul Pluzhnikov wrote: > On Sun, May 17, 2009 at 6:22 PM, Paul Smith <psmith@gnu.org> wrote: > > > I've instrumented every single function with checking for errors and > > writing issues to syslog (including informational messages so I know the > > logging works) and no errors are printed. The size of the core that I > > get from read(2)'ing stdin is just short, but read(2) never fails or > > shows any errors! > > You can't be read(2)ing stdin, since read(2) doesn't take FILE*. > Are you doing read(0, ...) or are you doing fread(..., stdin) ? > > If the latter, are you handling the feof() vs. ferror() difference > correctly? A small code sample might help. Oh sorry; I should have been explicit. I use fileno(stdin) to get an FD. I'm ONLY using FD-based calls; I don't use FILE*-based calls at all (on stdin). Here's my main loop. I've hacked it a bit for ridiculous safety (for example I used to have the errno log inside the if (l<0) but moved it out of a surfeit of caution). When I get a short core, the syslog entry at the end shows the actual short core length, not the "full" core length--so basically that's all the kernel is giving me. ------------------------------------------------------------ coref = gzopen(core_path, "wb"); if (!coref) { _vlog(LOG_ERR, "gzopen(%s): %s\n", core_path, _zerr()); exit(1); } ssize_t tot = 0; while (true) { errno = 0; ssize_t l = read(infd, buf, bufsize); if (errno) _vlog(LOG_ERR, "read(stdin): %s\n", strerror(errno)); // Did we read it all? if (l == 0) break; if (caughtsig) { _vlog(LOG_ERR, "savecore got signal %d--continuing\n", caughtsig); caughtsig = 0; } // Did we get an error? if (l < 0) { if (errno == EINTR || errno == EAGAIN) continue; _clean_and_fail(); } // Write what we got so far tot += l; char *p = buf; while (l > 0) { int w = gzwrite(coref, p, l); if (w <= 0) { _vlog(LOG_ERR, "gzwrite(%s): %s\n", core_path, _zerr()); _clean_and_fail(); } // Some amount of write succeeded; write the rest p += w; l -= w; } } if (gzclose(coref) < 0) { _vlog(LOG_ERR, "gzclose(%s): %s\n", core_path, _zerr()); _clean_and_fail(); } // Give some information on the core _vlog(LOG_ERR, "%s(%s): coredump complete: core.%s (%lu B)\n", argv[av_CMDNAME], argv[av_PID], core_suffix, tot); -- ------------------------------------------------------------------------------- Paul D. Smith <psmith@gnu.org> Find some GNU make tips at: http://www.gnu.org http://make.mad-scientist.us "Please remain calm...I may be mad, but I am a professional." --Mad Scientist ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Partial cores using Linux "pipe" core_pattern 2009-05-18 1:23 Partial cores using Linux "pipe" core_pattern Paul Smith 2009-05-18 6:05 ` Paul Pluzhnikov @ 2009-05-18 7:25 ` Andi Kleen 2009-05-18 13:29 ` Paul Smith 1 sibling, 1 reply; 9+ messages in thread From: Andi Kleen @ 2009-05-18 7:25 UTC (permalink / raw) To: psmith; +Cc: gdb Paul Smith <psmith@gnu.org> writes: > > Does anyone have any thoughts about where I can look next to try to > figure out what's going on? Ideas or knowledge about limitations of the > kernel's core_pattern pipe capability, such as timing issues etc., that > might be leaving me with short cores? Sounds more like a kernel problem to me. It would be good if you could find out in what way the cores are truncated, then do a proper kernel bug report. Or add printks to the kernel coredumper and try to figure out this way where/why it stops. -Andi -- ak@linux.intel.com -- Speaking for myself only. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Partial cores using Linux "pipe" core_pattern 2009-05-18 7:25 ` Andi Kleen @ 2009-05-18 13:29 ` Paul Smith 2009-05-18 13:49 ` Andreas Schwab 0 siblings, 1 reply; 9+ messages in thread From: Paul Smith @ 2009-05-18 13:29 UTC (permalink / raw) To: Andi Kleen; +Cc: gdb On Mon, 2009-05-18 at 09:25 +0200, Andi Kleen wrote: > Paul Smith <psmith@gnu.org> writes: > > > > Does anyone have any thoughts about where I can look next to try to > > figure out what's going on? Ideas or knowledge about limitations of the > > kernel's core_pattern pipe capability, such as timing issues etc., that > > might be leaving me with short cores? > > Sounds more like a kernel problem to me. It would be good if you > could find out in what way the cores are truncated, then do a proper > kernel bug report. > > Or add printks to the kernel coredumper and try to figure out this > way where/why it stops. Heh, I was afraid you'd say that :-). OK, I'll dig into the kernel coredumper and add some printks etc. I do have problems getting a readily reproducible test case. It seems timing related; I tried replacing my gzopen()/etc. calls with uncompressed open()/etc. and that seemed to help; no more short cores. And yet at other times I can dump the core through gzopen() no problem. -- ------------------------------------------------------------------------------- Paul D. Smith <psmith@gnu.org> Find some GNU make tips at: http://www.gnu.org http://make.mad-scientist.us "Please remain calm...I may be mad, but I am a professional." --Mad Scientist ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Partial cores using Linux "pipe" core_pattern 2009-05-18 13:29 ` Paul Smith @ 2009-05-18 13:49 ` Andreas Schwab 2009-05-18 14:32 ` Paul Smith 2009-05-21 16:32 ` Paul Smith 0 siblings, 2 replies; 9+ messages in thread From: Andreas Schwab @ 2009-05-18 13:49 UTC (permalink / raw) To: psmith; +Cc: Andi Kleen, gdb Paul Smith <psmith@gnu.org> writes: > On Mon, 2009-05-18 at 09:25 +0200, Andi Kleen wrote: >> Paul Smith <psmith@gnu.org> writes: >> > >> > Does anyone have any thoughts about where I can look next to try to >> > figure out what's going on? Ideas or knowledge about limitations of the >> > kernel's core_pattern pipe capability, such as timing issues etc., that >> > might be leaving me with short cores? >> >> Sounds more like a kernel problem to me. It would be good if you >> could find out in what way the cores are truncated, then do a proper >> kernel bug report. >> >> Or add printks to the kernel coredumper and try to figure out this >> way where/why it stops. > > Heh, I was afraid you'd say that :-). OK, I'll dig into the kernel > coredumper and add some printks etc. Apparently the ELF core dumper cannot handle short writes (see dump_write in fs/binfmt_elf.c). You should probably use a read buffer of at least a page, which is the most the kernel tries to write at once. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Partial cores using Linux "pipe" core_pattern 2009-05-18 13:49 ` Andreas Schwab @ 2009-05-18 14:32 ` Paul Smith 2009-05-21 16:32 ` Paul Smith 1 sibling, 0 replies; 9+ messages in thread From: Paul Smith @ 2009-05-18 14:32 UTC (permalink / raw) To: Andreas Schwab; +Cc: Andi Kleen, gdb On Mon, 2009-05-18 at 15:49 +0200, Andreas Schwab wrote: > Apparently the ELF core dumper cannot handle short writes (see > dump_write in fs/binfmt_elf.c). You should probably use a read buffer > of at least a page, which is the most the kernel tries to write at once. This is good info; thanks. But, my read buffer is 256K: const int bufsize = 1024 * 256; char *buf = (char *)malloc(bufsize); if (!buf) { _vlog(LOG_ERR, "Failed to allocate read buffer size %d\n", bufsize); exit(1); } (this is compiled with g++ for hysterical raisins, in case that wasn't obvious). I didn't check to see what the "normal" read sizes are. -- ------------------------------------------------------------------------------- Paul D. Smith <psmith@gnu.org> Find some GNU make tips at: http://www.gnu.org http://make.mad-scientist.us "Please remain calm...I may be mad, but I am a professional." --Mad Scientist ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Partial cores using Linux "pipe" core_pattern 2009-05-18 13:49 ` Andreas Schwab 2009-05-18 14:32 ` Paul Smith @ 2009-05-21 16:32 ` Paul Smith 2009-05-26 19:26 ` Paul Smith 1 sibling, 1 reply; 9+ messages in thread From: Paul Smith @ 2009-05-21 16:32 UTC (permalink / raw) To: Andreas Schwab; +Cc: Andi Kleen, gdb On Mon, 2009-05-18 at 15:49 +0200, Andreas Schwab wrote: > Apparently the ELF core dumper cannot handle short writes (see > dump_write in fs/binfmt_elf.c). You should probably use a read buffer > of at least a page, which is the most the kernel tries to write at > once. Sorry for the delay; I lost my repro case and it took me a while to find one. And now when I dump cores over NFS, the bonding driver is causing a kernel panic so there's that *sigh*. I reconfigured my interfaces to use a single non-bonded interface to avoid that issue and concentrate on this one... I'll worry about that tomorrow. I still need to do more investigation but I have more clarity around when I see these short cores vs. "good" cores. My system has a single process and when a request for work comes in it forks (but not execs) a number of helper copies of itself (typically 8). In my test, all copies run the same code and so all will segv at the around the same time (I just added code to do an invalid pointer access at different areas of the program when certain test files exist). Some areas of the code consider a segv or similar to be unrecoverable. In those situations I have a signal handler that stops the other processes in the process group, dumps a single core, then those other process do NOT dump core and the whole thing exits. The cores I get in this situation are fine. Other areas of the code consider a segv or similar to be recoverable. In this case, each worker is left to dump core (or not) on its own, and the system overall stays up. When I force a segv in these areas, I get the short cores. Note that I am serializing my core dumping program (the one cores are piped to) via an flock() file on the local disk, and this serialization (based on messages to syslog) does seem to be working. What I see are 6-8 core dump messages from the kernel, then my core saver runs on the first one and dumps about 50M of the 1G process space (about 188 reads of 256K buffers plus some change). Then that exits and the second one starts and it dumps a 64K core (1 read), then the next also dumps 64K etc. It _feels_ to me like there's some kind of COW or similar mismanagement of the VM for these forked processes such that they interfere and we can't get a full and complete core dump when all of them are dumping at the same time. I'm going to do more investigation but maybe this rings some bells with someone. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Partial cores using Linux "pipe" core_pattern 2009-05-21 16:32 ` Paul Smith @ 2009-05-26 19:26 ` Paul Smith 0 siblings, 0 replies; 9+ messages in thread From: Paul Smith @ 2009-05-26 19:26 UTC (permalink / raw) To: Andreas Schwab; +Cc: Andi Kleen, gdb On Thu, 2009-05-21 at 12:32 -0400, Paul Smith wrote: > It _feels_ to me like there's some kind of COW or similar mismanagement > of the VM for these forked processes such that they interfere and we > can't get a full and complete core dump when all of them are dumping at > the same time. Well, my feelings were way off. I did more investigation and it turns out that what's happening is that the TIF_SIGPENDING flag is being set during the core dump. This causes the write to the pipe to stop, and the core dumping code makes no effort to manage errors or partial writes. Here's the function in binfmt_elf.c that does the write: static int dump_write(struct file *file, const void *addr, int nr) { return file->f_op->write(file, addr, nr, &file->f_pos) == nr; } If we get back anything other than exactly the number of bytes we tried to write, we give up and return false (0). This definitely returns false when I see the short cores, and never when I see "normal" cores. I modified it to see what it's getting back and file->f_op->write() is returning ERESTARTSYS. So I annotated fs/pipe.c:pipe_write() and I'm definitely getting it from this code, at line 550 or so: if (signal_pending(current)) { if (!ret) ret = -ERESTARTSYS; break; } I've been posting on the linux-kernel mailing list, so this is really just an FYI to anyone interested in following the progress; you can find the current end of the thread here: http://marc.info/?l=linux-kernel&m=124336093401443&w=2 So far I've failed to gain any interest from anyone on that list but hopefully someone will respond, who can help me figure out what to do next. Cheers! ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-05-26 19:26 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-05-18 1:23 Partial cores using Linux "pipe" core_pattern Paul Smith 2009-05-18 6:05 ` Paul Pluzhnikov 2009-05-18 13:22 ` Paul Smith 2009-05-18 7:25 ` Andi Kleen 2009-05-18 13:29 ` Paul Smith 2009-05-18 13:49 ` Andreas Schwab 2009-05-18 14:32 ` Paul Smith 2009-05-21 16:32 ` Paul Smith 2009-05-26 19:26 ` Paul Smith
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox