From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-67320-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 22867 invoked by alias); 29 Sep 2009 14:32:21 -0000
Received: (qmail 22856 invoked by uid 22791); 29 Sep 2009 14:32:20 -0000
X-SWARE-Spam-Status: No, hits=-2.3 required=5.0 	tests=AWL,BAYES_00,SPF_PASS
X-Spam-Check-By: sourceware.org
Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (65.74.133.4)     by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 29 Sep 2009 14:32:14 +0000
Received: (qmail 9724 invoked from network); 29 Sep 2009 14:32:11 -0000
Received: from unknown (HELO orlando) (pedro@127.0.0.2)   by mail.codesourcery.com with ESMTPA; 29 Sep 2009 14:32:11 -0000
From: Pedro Alves <pedro@codesourcery.com>
To: Doug Evans <dje@google.com>
Subject: Re: [RFC] mask off is-syscall bit for TRAP_IS_SYSCALL
Date: Tue, 29 Sep 2009 14:32:00 -0000
User-Agent: KMail/1.9.10
Cc: sergiodj@linux.vnet.ibm.com,  gdb-patches@sourceware.org
References: <20090929124349.47188843A9@ruffy.mtv.corp.google.com>
In-Reply-To: <20090929124349.47188843A9@ruffy.mtv.corp.google.com>
MIME-Version: 1.0
Content-Type: text/plain;   charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200909291532.29075.pedro@codesourcery.com>
X-IsSubscribed: yes
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
X-SW-Source: 2009-09/txt/msg00915.txt.bz2

(for those not reading the whole thread, catch syscall is
unfortunatly broken with trivial multi-threading.)

On Tuesday 29 September 2009 13:43:48, Doug Evans wrote:

> I'm not sure the filtering is done too late though.
> [One *could* do it earlier, but it seems like it could be done
> later too.]

I don't see this 0x8x signal that much different from
the extended waitstatuses, and I don't see anywhere else
other than linux_handle_extended_wait (or somewhere around it)
that would need to distinguish a syscall SIGTRAP from a regular
SIGTRAP, and yet differently from other ptrace SIGTRAPs.  Take
cancel_breakpoint, for example, this seems to me that it should
already be ignoring any LWP that is stopped with a SIGTRAP due
to an ptrace event, like PTRACE_EVENT_VFORK|FORK|EXEC.

 cancel_breakpoint:

 -  if (lp->status != 0
 +  if (lp->waitstatus.kind != TARGET_WAITKIND_IGNORE
 +      && lp->status != 0    
       && WIFSTOPPED (lp->status) && WSTOPSIG (lp->status) == SIGTRAP

stop_wait_callback only needs to care that there's a SIGTRAP,
doesn't need to know if that SIGTRAP is a PTRACE_EVENT_FORK event
pending, just like it doesn't look like it needs to care for
TRAP_IS_SIGCALL specially.

I do see other broken calls to the core with an unfiltered 0x85,
like, e.g.,:

get_pending_status:
	  signo = target_signal_from_host (WSTOPSIG (lp->status));

or cases of passing a 0x85 to ptrace/kernel, like detach_callback:

      if (ptrace (PTRACE_DETACH, GET_LWP (lp->ptid), 0,
		  WSTOPSIG (status)) < 0)

This one's quite concerning:

 Catchpoint 1 (returned from syscall 202), 0x00007ffff7bd113e in __lll_lock_wait_private () from /lib/libpthread.so.0
 (gdb) detach
 Can't detach Thread 0x43806950 (LWP 20939): Input/output error
 (gdb)
 (gdb) c
 Continuing.
 ../../src/gdb/linux-nat.c:1782: internal-error: linux_nat_resume: Assertion `lp != NULL' failed.
 A problem internal to GDB has been detected,
 further debugging may prove unreliable.
 Quit this debugging session? (y or n) 

Irk!

It seems we'd want to handle syscall SIGTRAPs exactly like other
extended wait statuses --- they're all ptrace SIGTRAPs: don't ever
pass a ptrace SIGTRAP to the inferior.


Here's another somewhat related breakage:

Take the same simple multi-threaded app example as before, and do this:

 (gdb) r
 Starting program: /home/pedro/gdb/tests/trap_is_syscall
 [Thread debugging using libthread_db enabled]
 [New Thread 0x40800950 (LWP 18877)]
 [New Thread 0x41001950 (LWP 18878)]
 [New Thread 0x41802950 (LWP 18879)]
 [New Thread 0x42003950 (LWP 18880)]
 [New Thread 0x42804950 (LWP 18881)]
 [New Thread 0x43005950 (LWP 18882)]
 [New Thread 0x43806950 (LWP 18883)]
 [New Thread 0x44007950 (LWP 18884)]
 [New Thread 0x44808950 (LWP 18885)]
 [New Thread 0x45009950 (LWP 18886)]

<ctrl-c>

 Program received signal SIGINT, Interrupt.
 0x00007ffff78ffb81 in nanosleep () from /lib/libc.so.6
 (gdb) catch syscall
 warning: Could not open "syscalls/amd64-linux.xml"
 warning: Could not load the syscall XML file `syscalls/amd64-linux.xml'.
 GDB will not be able to display syscall names.
 Catchpoint 1 (any syscall)
 (gdb) c
 Continuing.
 [Switching to Thread 0x45009950 (LWP 18886)]

 Catchpoint 1 (call to syscall 35), 0x00007ffff78ffb81 in nanosleep () from /lib/libc.so.6

<a couple more continues>

Now, delete the catchpoint, while some LWPs have a pending syscall event to report:

 (gdb) del 1
 (gdb) c
 Continuing.
 Program received signal SIGTRAP, Trace/breakpoint trap.
 [Switching to Thread 0x44808950 (LWP 18885)]
 0x00007ffff78ffb81 in nanosleep () from /lib/libc.so.6
 (gdb) 

This case does need to be filtered later, it seems to me.

> btw, I'm seeing lots of "syscall 0" (presumably restarted system calls).
> I hacked another patch to save the the syscall number so that the user
> wouldn't see syscall 0, but maybe the thing to do is record the
> fact that the syscall got restarted and report that to the user?
> [I'm assuming the 0's I see are indeed restarted syscalls.]

Hmmm, haven't seen this one yet.

-- 
Pedro Alves