From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26520 invoked by alias); 5 Dec 2002 04:51:59 -0000 Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sources.redhat.com Received: (qmail 26494 invoked from network); 5 Dec 2002 04:51:58 -0000 Received: from unknown (HELO duracef.shout.net) (204.253.184.12) by sources.redhat.com with SMTP; 5 Dec 2002 04:51:58 -0000 Received: (from mec@localhost) by duracef.shout.net (8.11.6/8.11.6) id gB54ppB31800; Wed, 4 Dec 2002 22:51:51 -0600 Date: Wed, 04 Dec 2002 20:51:00 -0000 From: Michael Elizabeth Chastain Message-Id: <200212050451.gB54ppB31800@duracef.shout.net> To: fredrik@dolda2000.cjb.net, gdb@sources.redhat.com Subject: Re: Checking function calls X-SW-Source: 2002-12/txt/msg00088.txt.bz2 Hi Fredrik, I'm throwing out a bunch of ideas here, take whatever looks useful and discard the rest. > Therefore, the failure has to be that a called > function doesn't restore EBX correctly, on rare occasions, right? I have seen this happen in a mixed programming environment, with a Cygwin program that used a Windows DLL. The Windows DLL had subtly different calling conventions where it did not preserve %ebx, %esi, and %edi across function calls. Perhaps you have some kind of third party library in your program which has a similar compatibilty issue? > My question is thus: Is there any way of debugging this with GDB? Can I > make GDB check that EBX is the same before and after every function call > from that frame in this thread to isolate the failing function? The > frame never exits (until the program exits, that is), if that helps. You could set a bunch of conditional breakpoints with "break if %ebx != saved_ebx", where you add code to your program to initialize saved_ebx. Or you could say "break if %ebx < 0x1000" or some convenient constant. You could also try forcing your variable to be on the stack instead of a register. Remove the "register" attribute from the declaration of "next" if you have one. Then add a "do_nothing(&next)" call to your function, to force "next" to be on the stack instead of in a register. If the symptoms go away then it's more likely to really be a register clobber. If the symptoms remain then it's more likely to be a memory clobber (or you have a really sick low-level function that clobbers random words on the stack but this does not feel like it). > At first I was expecting that another thread somehow gets there and > modifies the storage memory of next. I still suspect this. It's more likely that memory gets clobbered rather than a register value. Perhaps you need a function that locks the whole list and walks it for a sanity check, without deleting anything? Here is another wild lead: if, somehow, a block gets freed and then you read it, many implementations of malloc keep housekeeping information in the first word or two of a freed block. That would explain why the value is always 0x10 to 0x30 (that could be block size, especially if it is rounded up to a multiple of 4 or 8) and why only 1-2 words are clobbered. If you manage your blocks with malloc/free, you could try turning on any malloc debugging facilities that you have. Hope this helps, Michael C