From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 28483 invoked by alias); 5 Dec 2002 02:02:05 -0000 Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sources.redhat.com Received: (qmail 28401 invoked from network); 5 Dec 2002 02:02:04 -0000 Received: from unknown (HELO pc2.dolda2000.com) (217.215.27.171) by sources.redhat.com with SMTP; 5 Dec 2002 02:02:04 -0000 Received: from [192.168.0.154] ([192.168.0.154]) by pc2.dolda2000.com (8.11.6/8.11.2) with ESMTP id gB5225N08128 for ; Thu, 5 Dec 2002 03:02:06 +0100 Subject: Checking function calls From: Fredrik Tolf To: gdb@sources.redhat.com Content-Type: text/plain Content-Transfer-Encoding: 7bit Date: Wed, 04 Dec 2002 18:02:00 -0000 Message-Id: <1039053722.5629.22.camel@pc7> Mime-Version: 1.0 X-SW-Source: 2002-12/txt/msg00082.txt.bz2 I'm having a strange problem in a program that I'm writing (in C). The background is essentially as follows: The program is multithreaded, and in one thread I'm looping through a linked list, and because elements may be freed inside the loop, I have an extra variable to hold a pointer to the next element. I only use this variable three times in total, like this: for(cur = list; cur != NULL; cur = next) { if((next = cur->next) != NULL) pthread_mutex_lock(&next->mutex); ... /* next is not mentioned anymore */ } There is a bug, which happens extremely seldomly (the program can go for days without anything happening), that appears to change the content the next variable, usually to something between 0x10 and 0x30. This, of course, causes the thread to segfault in the next iteration. At first I was expecting that another thread somehow gets there and modifies the storage memory of next. I realized that it was extremely unlikely that this would happen, especially since it was this variable and nothing else that was being changed, but I didn't have any other lead. Recently, I debugged it a little, and found that next is actually being stored in a register (EBX, more specifically, I'm using an IA32 arch). At first I therefore suspected a compile error by gcc, but after checking the assembly output, I ruled out that possibility; EBX was being used exactly as instructed. The possibility that the next element of the list structure is changed and then loaded into the next variable is also impossible. Therefore, the failure has to be that a called function doesn't restore EBX correctly, on rare occasions, right? If I'm not completely mistaken, there is no other possibility. My question is thus: Is there any way of debugging this with GDB? Can I make GDB check that EBX is the same before and after every function call from that frame in this thread to isolate the failing function? The frame never exits (until the program exits, that is), if that helps. I have been trying to solve this problem for months now, and I would be eternally grateful if someone helped me do it. Fredrik Tolf