From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16017 invoked by alias); 13 May 2013 13:28:18 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 15991 invoked by uid 89); 13 May 2013 13:28:11 -0000 X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,RCVD_IN_HOSTKARMA_NO autolearn=ham version=3.3.1 Received: from rock.gnat.com (HELO rock.gnat.com) (205.232.38.15) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Mon, 13 May 2013 13:28:11 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by filtered-rock.gnat.com (Postfix) with ESMTP id 99B292EAE0; Mon, 13 May 2013 09:28:09 -0400 (EDT) Received: from rock.gnat.com ([127.0.0.1]) by localhost (rock.gnat.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id t8DrEfsRJajV; Mon, 13 May 2013 09:28:09 -0400 (EDT) Received: from joel.gnat.com (localhost.localdomain [127.0.0.1]) by rock.gnat.com (Postfix) with ESMTP id EBF572E34A; Mon, 13 May 2013 09:28:08 -0400 (EDT) Received: by joel.gnat.com (Postfix, from userid 1000) id 2519DC2584; Mon, 13 May 2013 17:28:03 +0400 (RET) Date: Mon, 13 May 2013 13:28:00 -0000 From: Joel Brobecker To: Pedro Alves Cc: gdb-patches@sourceware.org Subject: Re: [RFA] gdbserver/lynx178: spurious SIG61 signal when resuming inferior. Message-ID: <20130513132802.GA32222@adacore.com> References: <1368441986-14478-1-git-send-email-brobecker@adacore.com> <5190CCF9.3020004@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5190CCF9.3020004@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-SW-Source: 2013-05/txt/msg00433.txt.bz2 Thanks for the comments, Pedro. > > On ppc-lynx178, resuming the execution of a program after hitting > > a breakpoint sometimes triggers a spurious SIG61 event: > > I'd like to understand this a little better. > > Could that mean the thread that gdbserver used for ptrace hadn't > been ptrace stopped, or doesn't exist at all? "sometimes" makes > me wonder about the latter. My interpretation of the clues I have been able to gather is that the LynxOS thread library implementation does not like it when we mess with the program's scheduling. Lynx178 is derived from an old version of LynxOS, which can explain why newer versions are a little more robust in that respect. I tried to get more info directly from the people who I thought would know about this, but never managed to make progress in that direction, so I gave up when I found this solution. > So does that mean scheduler locking doesn't work? > > E.g., > > (gdb) thread 2 > (gdb) si > (gdb) thread 1 > (gdb) c Indeed, as expected, same sort of symptom: (gdb) thread 1 [Switching to thread 1 (Thread 30)] #0 0x1004ed94 in _trap_ () (gdb) si 0x1004ed98 in _trap_ () (gdb) thread 2 [Switching to thread 2 (Thread 36)] #0 task_switch.break_me () at task_switch.adb:42 42 null; (gdb) cont Continuing. Program received signal SIG62, Real-time event 62. task_switch.break_me () at task_switch.adb:42 42 null; > BTW, vCont;c means "resume all threads", why is the current code just > resuming one? It's actually using a ptrace request that applies to the process (either PTRACE_CONT or PTRACE_SINGLE_STEP). I never tried to implement single-thread control (scheduler-locking on), as this is not something we're interested on for this platform, at least for now... > lynx_wait_1 () > ... > if (ptid_equal (ptid, minus_one_ptid)) > pid = lynx_ptid_get_pid (thread_to_gdb_id (current_inferior)); > else > pid = BUILDPID (lynx_ptid_get_pid (ptid), lynx_ptid_get_tid (ptid)); > > retry: > > ret = lynx_waitpid (pid, &wstat); > > > is suspicious also. I understand... It's a bit of a hybrid between trying to deal with thread-level execution control, and process-level execution control. > Doesn't that mean we're doing a waitpid on > a possibly not-resumed current_inferior (that may not be the main task, > if that matters)? Could _that_ be reason for that magic signal 61? Given the above (we resume processes, rather than threads individually), I do not think that this is the source of the problem itself. I blame the thread library for now liking it when you potentially alter the program scheduling by resuming the non-active thread. This patch does not prevent this from happening, but at least makes an effort into avoiding it for the usual situations. -- Joel