From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 6010 invoked by alias); 12 May 2005 19:08:05 -0000 Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sources.redhat.com Received: (qmail 5304 invoked from network); 12 May 2005 19:07:57 -0000 Received: from unknown (HELO mtagate4.de.ibm.com) (195.212.29.153) by sourceware.org with SMTP; 12 May 2005 19:07:57 -0000 Received: from d12nrmr1607.megacenter.de.ibm.com (d12nrmr1607.megacenter.de.ibm.com [9.149.167.49]) by mtagate4.de.ibm.com (8.12.10/8.12.10) with ESMTP id j4CJ7uqF051522 for ; Thu, 12 May 2005 19:07:56 GMT Received: from d12av02.megacenter.de.ibm.com (d12av02.megacenter.de.ibm.com [9.149.165.228]) by d12nrmr1607.megacenter.de.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j4CJ6uRM288236 for ; Thu, 12 May 2005 21:06:56 +0200 Received: from d12av02.megacenter.de.ibm.com (loopback [127.0.0.1]) by d12av02.megacenter.de.ibm.com (8.12.11/8.13.3) with ESMTP id j4CJ6ukH000751 for ; Thu, 12 May 2005 21:06:56 +0200 Received: from 53v30g15.boeblingen.de.ibm.com (53v30g15.boeblingen.de.ibm.com [9.152.26.155]) by d12av02.megacenter.de.ibm.com (8.12.11/8.12.11) with ESMTP id j4CJ6usO000748 for ; Thu, 12 May 2005 21:06:56 +0200 Received: from 53v30g15.boeblingen.de.ibm.com (localhost [127.0.0.1]) by 53v30g15.boeblingen.de.ibm.com (8.12.10/8.12.10) with ESMTP id j4CJ6dql012898 for ; Thu, 12 May 2005 21:06:39 +0200 Received: (from uweigand@localhost) by 53v30g15.boeblingen.de.ibm.com (8.12.10/8.12.10/Submit) id j4CJ6cSW012897 for gdb-patches@sources.redhat.com; Thu, 12 May 2005 21:06:38 +0200 From: Ulrich Weigand Message-Id: <200505121906.j4CJ6cSW012897@53v30g15.boeblingen.de.ibm.com> Subject: [RFA] Fix internal error in wait_lwp (interrupted system call) To: gdb-patches@sources.redhat.com Date: Thu, 12 May 2005 19:18:00 -0000 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-SW-Source: 2005-05/txt/msg00301.txt.bz2 Hello, we've had reports from our JVM/JIT development group that for them, gdb 6.3 frequently fails with internal errors like: linux-nat.c:1152: internal-error: wait_lwp: Assertion `pid == GET_LWP (lp->ptid)' failed. It turned out that this happens when a SIGCHLD arrives during execution of the waitpid call. This causes the signal handler to be executed, and subsequently the system call returns with errno equal to EINTR. Now, looking through the linux-nat.c file, it would appear that this type of problem has been addressed at various places in different ways. In linux_handle_extended_wait, the waitpid call is wrapped into an explicit do { } while (ret == -1 && errno == EINTR) loop. In linux_test_for_tracefork, this very loop is abstracted into a my_waitpid routine. In child_wait and linux_nat_wait, there are larger loops that will handle this situation as well. Finally, in lin_lwp_attach_lwp, SIGCHLD is actually blocked during the execution of the waitpid call. However, there remain some places where waitpid is called without any such precaution, and wait_lwp is one of these. When debugging a process making very heavy use of threads, as the JVM, this can lead to the error shown above. Now, as far as I can see, there is really *no* place where GDB actually *wants* a system call to be interrupted by the SIGCHLD signal handler. Thus, I'd propose to fix the problem at its root by simply installing the handler with the SA_RESTART flag, causing any interrupted system call to be automatically restarted. The patch below does this, and fixes all problems for the JVM team. It also passes regression testing on s390-ibm-linux and s390x-ibm-linux. OK to commit? Bye, Ulrich ChangeLog: * linux-nat.c (_initialize_linux_nat): Install SIGCHLD handler using the SA_RESTART flag. Index: gdb/linux-nat.c =================================================================== RCS file: /cvs/src/src/gdb/linux-nat.c,v retrieving revision 1.27 diff -c -p -r1.27 linux-nat.c *** gdb/linux-nat.c 6 Mar 2005 16:42:20 -0000 1.27 --- gdb/linux-nat.c 12 May 2005 18:50:42 -0000 *************** Specify any of the following keywords fo *** 3095,3101 **** action.sa_handler = sigchld_handler; sigemptyset (&action.sa_mask); ! action.sa_flags = 0; sigaction (SIGCHLD, &action, NULL); /* Make sure we don't block SIGCHLD during a sigsuspend. */ --- 3095,3101 ---- action.sa_handler = sigchld_handler; sigemptyset (&action.sa_mask); ! action.sa_flags = SA_RESTART; sigaction (SIGCHLD, &action, NULL); /* Make sure we don't block SIGCHLD during a sigsuspend. */ -- Dr. Ulrich Weigand Linux on zSeries Development Ulrich.Weigand@de.ibm.com