From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-32928-listarch-gdb=sources.redhat.com@sourceware.org>
Received: (qmail 9199 invoked by alias); 27 Aug 2008 15:22:00 -0000
Received: (qmail 9181 invoked by uid 22791); 27 Aug 2008 15:21:58 -0000
X-Spam-Check-By: sourceware.org
Received: from pas38-1-82-67-71-117.fbx.proxad.net (HELO siegfried.gbfo.org) (82.67.71.117)     by sourceware.org (qpsmtpd/0.31) with ESMTP; Wed, 27 Aug 2008 15:21:26 +0000
Received: from erda.mds (localhost [127.0.0.1]) 	by siegfried.gbfo.org (8.13.6/8.13.6) with ESMTP id m7RFLMnc005508 	for <gdb@sourceware.org>; Wed, 27 Aug 2008 17:21:22 +0200
Received: from localhost (saffroy@localhost) 	by erda.mds (8.13.6/8.13.6/Submit) with ESMTP id m7RFLMKF005505 	for <gdb@sourceware.org>; Wed, 27 Aug 2008 17:21:22 +0200
Date: Thu, 28 Aug 2008 13:55:00 -0000
From: Jean-Marc Saffroy <saffroy@gmail.com>
To: gdb@sourceware.org
Subject: how could gdb handle truncated core files?
Message-ID: <Pine.LNX.4.64.0808261803230.5290@erda.mds>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-IsSubscribed: yes
Mailing-List: contact gdb-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb.sourceware.org>
List-Subscribe: <mailto:gdb-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb/>
List-Post: <mailto:gdb@sourceware.org>
List-Help: <mailto:gdb-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-owner@sourceware.org
X-SW-Source: 2008-08/txt/msg00281.txt.bz2

Hi,

For now, gdb does not seem to be able to do anything useful with a 
truncated core file on Linux (ie. what you get when your process dies and 
the core size limit is not 0 but less than the size of the process).

In a number of cases, I think it would be nice to be able to at least get 
a stack trace, and examine local variables. This could require a limited 
amount of data to be dumped by the kernel.

I'm curious what could be done to improve this situation, because I see 
two potential use cases:
  - embedded systems developpers: sometimes it's hard to find enough space 
to write your core file (eg. the application uses 80% of your RAM, and 
your only writable filesystem is a tiny temporary RAM disk)
  - parallel application developpers on large clusters: sometimes you use a 
huge amount of RAM in a bunch of processes (eg. an MPI parallel program), 
and dumping all that on your home directory will fill your disk quota 
and/or keep your file server busy for a very long time

In search of a solution, I patched my Linux kernel so that dumping a core 
would start with the segments that hold a stack (assuming user stack 
pointers are valid): thus these segments have a chance of being dumped 
before the core limit is reached.

This approach gives interesting results with a (very simple) single 
threaded process. However, my attempts with a multithreaded process 
failed, like this:

$ gdb <binary> <core>
GNU gdb 6.8
<snip>
This GDB was configured as "x86_64-unknown-linux-gnu"...
Cannot access memory at address 0x2aaaaabc29c8
(gdb) bt
#0  0x00002aaaaabc9345 in ?? ()
#1  0x00000000400179f0 in ?? ()
#2  0x0000000000000000 in ?? ()

That is:
  - gdb does not load symbols from binaries
  - as a result, gdb does not detect threads (because IIRC libthread_db 
would be loaded when some libpthread.so symbols are detected in the 
process)
  - the backtrace seems incorrect: if I have a "full" core dump, gdb shows 
the following stack trace:

(gdb) bt
#0  0x00002aaaaabc9345 in pthread_create@@GLIBC_2.2.5 ()
    from /lib/libpthread.so.0
#1  0x00000000004005c8 in main (argc=<value optimized out>,
     argv=<value optimized out>) at thrcore.c:24

So, I have the following questions to the community:
  - what can I do (eg. in my kernel patch) to have gdb load symbols from 
binaries?
  - do you have any comment on my approach? (eg. I *think* I've seen AIX 
produce small dumps, but I have no idea how they do it, if it's a special 
file format, etc.)

Thanks for your comments!


Cheers,
Jean-Marc

-- 
saffroy@gmail.com