From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 28834 invoked by alias); 27 Jul 2003 16:17:43 -0000 Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sources.redhat.com Received: (qmail 28827 invoked from network); 27 Jul 2003 16:17:42 -0000 Received: from unknown (HELO nevyn.them.org) (66.93.172.17) by sources.redhat.com with SMTP; 27 Jul 2003 16:17:42 -0000 Received: from drow by nevyn.them.org with local (Exim 4.20 #1 (Debian)) id 19goDO-0006Df-5I; Sun, 27 Jul 2003 12:17:42 -0400 Date: Sun, 27 Jul 2003 16:17:00 -0000 From: Daniel Jacobowitz To: gdb@sources.redhat.com Cc: mingo@redhat.com, roland@redhat.com Subject: FYI: Increase in schedlock.exp failures in LinuxThreads on an RH kernel Message-ID: <20030727161737.GA23676@nevyn.them.org> Mail-Followup-To: gdb@sources.redhat.com, mingo@redhat.com, roland@redhat.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.1i X-SW-Source: 2003-07/txt/msg00317.txt.bz2 I noticed recently, after switching kernels (I'm temporarily running a Debian system with a Red Hat kernel - don't ask) that schedlock.exp took a rapid spike up in failures. I investigated; the problem appears to be SIGINT handling. When the test sends a control-C, it is delivered to some arbitrary portion of the (non-manager; this is LinuxThreads) threads. Then, when we next resume, the remaining threads receive it. I think this is an old problem, just being exposed by timing. The flush mask in stop_wait_callback is supposed to handle this. But since some of the extra SIGINT signals arrive after the SIGSTOP, we can't flush them. This won't happen in NPTL, of course, where just one thread will get the SIGINT. I tried something terribly clever involving looking at the SigPnd mask in /proc to see if a SIGINT was pending. However, unfortunately, this kernel (looks like... 2.4.20-18.9) introduced the shared pending queue, but did not add ShdPnd to /proc//status. Therefore there's absolutely no way to find out if a SIGINT is pending from userspace. The SIGSTOP will always be delivered before a pending SIGINT because we sent the SIGSTOP with tkill, which puts it onto the more specific queue. Any other signal we sent with tkill would have the same problem. In other words, on a system with this kernel and LinuxThreads, SIGINT may stop the inferior multiple times unpredictably. On 2.5 we'll be able to do something about it via ShdPnd; if Red Hat fixes their kernel then we'll be able to do something about it on RH systems, but that's not my problem. If the problem still manifests next time I'm in 2.5 (and I haven't switched to NPTL :) I'll put together the fix. Ingo/Roland - might want to export ShdPnd in the RH kernels... -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer