From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 9240 invoked by alias); 1 Apr 2009 18:22:32 -0000 Received: (qmail 9230 invoked by uid 22791); 1 Apr 2009 18:22:31 -0000 X-SWARE-Spam-Status: No, hits=-2.1 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_42 X-Spam-Check-By: sourceware.org Received: from rock.gnat.com (HELO rock.gnat.com) (205.232.38.15) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 01 Apr 2009 18:22:25 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by filtered-rock.gnat.com (Postfix) with ESMTP id EBFAC2C095E for ; Wed, 1 Apr 2009 14:22:23 -0400 (EDT) Received: from rock.gnat.com ([127.0.0.1]) by localhost (rock.gnat.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id XHbybrzU70xP for ; Wed, 1 Apr 2009 14:22:23 -0400 (EDT) Received: from joel.gnat.com (localhost.localdomain [127.0.0.1]) by rock.gnat.com (Postfix) with ESMTP id B084C2C09E4 for ; Wed, 1 Apr 2009 14:22:23 -0400 (EDT) Received: by joel.gnat.com (Postfix, from userid 1000) id E86B2F5A6F; Wed, 1 Apr 2009 11:22:20 -0700 (PDT) Date: Wed, 01 Apr 2009 18:22:00 -0000 From: Joel Brobecker To: gdb-patches@sourceware.org Subject: [RFA] Fix crash on Linux 2.4 when threaded program exits Message-ID: <20090401182220.GG16605@adacore.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="uAKRQypu60I7Lcqm" Content-Disposition: inline User-Agent: Mutt/1.5.18 (2008-05-17) Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2009-04/txt/msg00004.txt.bz2 --uAKRQypu60I7Lcqm Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-length: 2307 The debugger crashes when debugging a threaded program when the program exits: (gdb) run Starting program: /[...]/q [Thread debugging using libthread_db enabled] [New Thread 0xb748ebb0 (LWP 9340)] [New Thread 0xb728abb0 (LWP 9341)] Test2 Test1 [Thread 0xb748ebb0 (LWP 9340) exited] [Thread 0xb728abb0 (LWP 9341) exited] [Thread 0xb75d9b80 (LWP 9337) exited] Recursive internal problem. zsh: 9330 abort gdb-head q It appears that this is only specific to Linux kernels 2.4, and the way the NPTL behaves on that version of the kernel: With 2.4, we only receive an "exited" notification for the main thread, whereas with 2.6, we receive the notification for each and every thread. What happens in the 2.4 case is that we delete the lp structure for the thread that exited and then still try to use it shortly after. At this point, the memory has been free'ed and the contents has been corrupted. As a result, we hit an internal error that hits another internal error that causes the abort. The code in linux-nat.c:linux_nat_filter_event looks like this: if ((WIFEXITED (status) || WIFSIGNALED (status)) && num_lwps > 1) { [delete threads that have vanished] exit_lwp (lp); /* If there is at least one more LWP, then the exit signal was not the end of the debugged application and should be ignored. */ if (num_lwps > 0) return NULL; } As you can see, in the linux-2.4 case, we end up deleting all threads, then call exit_lwp to delete the main thread. Next we check num_lwps which is zero, so we continue. Shortly after that, in the same routine, we already access lp (around line 2717, "lp->ignore_sigint"), but the symptoms actually appear slightly later when accessing the lp ptid in order to set the inferior_ptid which is used to get the associated inferior. The fix was to delete the lp and return NULL iff there are other lwps that still exist. 2009-04-01 Joel Brobecker * linux-nat.c (linux_nat_filter_events): Do not delete the lwp if this is the last one. Tested on x86-linux (with a 2.4.21 Linux kernel). It fixes ~25 failures. Tested on x86_64-linux (with a 2.6 kernel). No regression. Does this look correct? Thanks, -- Joel --uAKRQypu60I7Lcqm Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="threads-24.diff" Content-length: 782 diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c index be99ece..feca722 100644 --- a/gdb/linux-nat.c +++ b/gdb/linux-nat.c @@ -2644,13 +2644,14 @@ linux_nat_filter_event (int lwpid, int status, int options) "LLW: %s exited.\n", target_pid_to_str (lp->ptid)); - exit_lwp (lp); - - /* If there is at least one more LWP, then the exit signal was - not the end of the debugged application and should be - ignored. */ - if (num_lwps > 0) - return NULL; + if (num_lwps > 1) + { + /* If there is at least one more LWP, then the exit signal + was not the end of the debugged application and should be + ignored. */ + exit_lwp (lp); + return NULL; + } } /* Check if the current LWP has previously exited. In the nptl --uAKRQypu60I7Lcqm--