From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 25887 invoked by alias); 18 May 2009 01:23:49 -0000 Received: (qmail 25874 invoked by uid 22791); 18 May 2009 01:23:49 -0000 X-SWARE-Spam-Status: No, hits=0.4 required=5.0 tests=BAYES_40,SPF_SOFTFAIL X-Spam-Check-By: sourceware.org Received: from smtp02.lnh.mail.rcn.net (HELO smtp02.lnh.mail.rcn.net) (207.172.157.102) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 18 May 2009 01:23:41 +0000 Received: from mr02.lnh.mail.rcn.net ([207.172.157.22]) by smtp02.lnh.mail.rcn.net with ESMTP; 17 May 2009 21:23:39 -0400 Received: from smtp01.lnh.mail.rcn.net (smtp01.lnh.mail.rcn.net [207.172.4.11]) by mr02.lnh.mail.rcn.net (MOS 3.10.5-GA) with ESMTP id PWK73209; Sun, 17 May 2009 21:22:37 -0400 (EDT) Received: from 65-78-31-9.c3-0.lex-ubr3.sbo-lex.ma.cable.rcn.com (HELO homebase.localnet) ([65.78.31.9]) by smtp01.lnh.mail.rcn.net with ESMTP; 17 May 2009 21:22:37 -0400 Received: from psmith by homebase.localnet with local (Exim 4.69) (envelope-from ) id 1M5rYq-0003e5-KL for gdb@sourceware.org; Sun, 17 May 2009 21:22:36 -0400 Subject: Partial cores using Linux "pipe" core_pattern From: Paul Smith Reply-To: psmith@gnu.org To: gdb@sourceware.org Content-Type: text/plain Content-Transfer-Encoding: 7bit Date: Mon, 18 May 2009 01:23:00 -0000 Message-Id: <1242609756.2800.135.camel@homebase.localnet> Mime-Version: 1.0 X-IsSubscribed: yes Mailing-List: contact gdb-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sourceware.org X-SW-Source: 2009-05/txt/msg00109.txt.bz2 I'm not sure this is the best list for this question; if anyone has any other thoughts of where to ask please let me know. I'm having problems debugging some cores being generated on a distributed system. The "client" (where the cores are being dumped) is running on a cut-down GNU/Linux system, running out of a ramdisk (no local disk). To preserve cores I have set up NFS and automount, and I'm dumping cores over the network to a host. In order to make this as efficient as possible I am using the Linux (I'm running 2.6.27) kernel's pipe capability in the core_pattern and piping it to my own program to write compressed output using gzopen()/etc. I have some other locking, etc. to do myself which is why I have my own program instead of just piping to gzip. Most of the time this works great; the core appears on the host and I can decompress it and debug it and it's very nice. But sometimes, the core is truncated and can't be debugged. Basically it has the first part of the core file without error (I've seen sizes both 64K(!) and about 65M) but obviously you can't even get a backtrace, with the whole last part of the core missing. However, it's still a valid compressed file (it decompresses just fine) so it's not a network error. After some experimentation I can determine that indeed the generated core file contains all the data that was read from the kernel... in this situation, it appears, the kernel simply doesn't give me all the data to construct the core. I've instrumented every single function with checking for errors and writing issues to syslog (including informational messages so I know the logging works) and no errors are printed. The size of the core that I get from read(2)'ing stdin is just short, but read(2) never fails or shows any errors! Does anyone have any thoughts about where I can look next to try to figure out what's going on? Ideas or knowledge about limitations of the kernel's core_pattern pipe capability, such as timing issues etc., that might be leaving me with short cores? I'm pretty stumped here! -- ------------------------------------------------------------------------------- Paul D. Smith Find some GNU make tips at: http://www.gnu.org http://make.mad-scientist.us "Please remain calm...I may be mad, but I am a professional." --Mad Scientist