From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-14926-listarch-gdb=sources.redhat.com@sources.redhat.com>
Received: (qmail 28834 invoked by alias); 27 Jul 2003 16:17:43 -0000
Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-subscribe@sources.redhat.com>
List-Archive: <http://sources.redhat.com/ml/gdb/>
List-Post: <mailto:gdb@sources.redhat.com>
List-Help: <mailto:gdb-help@sources.redhat.com>, <http://sources.redhat.com/ml/#faqs>
Sender: gdb-owner@sources.redhat.com
Received: (qmail 28827 invoked from network); 27 Jul 2003 16:17:42 -0000
Received: from unknown (HELO nevyn.them.org) (66.93.172.17)
  by sources.redhat.com with SMTP; 27 Jul 2003 16:17:42 -0000
Received: from drow by nevyn.them.org with local (Exim 4.20 #1 (Debian))
	id 19goDO-0006Df-5I; Sun, 27 Jul 2003 12:17:42 -0400
Date: Sun, 27 Jul 2003 16:17:00 -0000
From: Daniel Jacobowitz <drow@mvista.com>
To: gdb@sources.redhat.com
Cc: mingo@redhat.com, roland@redhat.com
Subject: FYI: Increase in schedlock.exp failures in LinuxThreads on an RH kernel
Message-ID: <20030727161737.GA23676@nevyn.them.org>
Mail-Followup-To: gdb@sources.redhat.com, mingo@redhat.com,
	roland@redhat.com
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.1i
X-SW-Source: 2003-07/txt/msg00317.txt.bz2

I noticed recently, after switching kernels (I'm temporarily running a
Debian system with a Red Hat kernel - don't ask) that schedlock.exp took a
rapid spike up in failures.  I investigated; the problem appears to be
SIGINT handling.  When the test sends a control-C, it is delivered to some
arbitrary portion of the (non-manager; this is LinuxThreads) threads.  Then,
when we next resume, the remaining threads receive it.

I think this is an old problem, just being exposed by timing.  The flush
mask in stop_wait_callback is supposed to handle this.  But since some of
the extra SIGINT signals arrive after the SIGSTOP, we can't flush them. 
This won't happen in NPTL, of course, where just one thread will get the
SIGINT.

I tried something terribly clever involving looking at the SigPnd mask in
/proc to see if a SIGINT was pending.  However, unfortunately, this kernel
(looks like... 2.4.20-18.9) introduced the shared pending queue, but did not
add ShdPnd to /proc/<pid>/status.  Therefore there's absolutely no way to
find out if a SIGINT is pending from userspace.  The SIGSTOP will always be
delivered before a pending SIGINT because we sent the SIGSTOP with tkill,
which puts it onto the more specific queue.  Any other signal we sent with
tkill would have the same problem.

In other words, on a system with this kernel and LinuxThreads, SIGINT may
stop the inferior multiple times unpredictably.  On 2.5 we'll be able to do
something about it via ShdPnd; if Red Hat fixes their kernel then we'll be
able to do something about it on RH systems, but that's not my problem.  If
the problem still manifests next time I'm in 2.5 (and I haven't switched to
NPTL :) I'll put together the fix.

Ingo/Roland - might want to export ShdPnd in the RH kernels...

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer