From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-101427-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 16017 invoked by alias); 13 May 2013 13:28:18 -0000
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
Received: (qmail 15991 invoked by uid 89); 13 May 2013 13:28:11 -0000
X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,RCVD_IN_HOSTKARMA_NO autolearn=ham version=3.3.1
Received: from rock.gnat.com (HELO rock.gnat.com) (205.232.38.15)    by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Mon, 13 May 2013 13:28:11 +0000
Received: from localhost (localhost.localdomain [127.0.0.1])	by filtered-rock.gnat.com (Postfix) with ESMTP id 99B292EAE0;	Mon, 13 May 2013 09:28:09 -0400 (EDT)
Received: from rock.gnat.com ([127.0.0.1])	by localhost (rock.gnat.com [127.0.0.1]) (amavisd-new, port 10024)	with LMTP id t8DrEfsRJajV; Mon, 13 May 2013 09:28:09 -0400 (EDT)
Received: from joel.gnat.com (localhost.localdomain [127.0.0.1])	by rock.gnat.com (Postfix) with ESMTP id EBF572E34A;	Mon, 13 May 2013 09:28:08 -0400 (EDT)
Received: by joel.gnat.com (Postfix, from userid 1000)	id 2519DC2584; Mon, 13 May 2013 17:28:03 +0400 (RET)
Date: Mon, 13 May 2013 13:28:00 -0000
From: Joel Brobecker <brobecker@adacore.com>
To: Pedro Alves <palves@redhat.com>
Cc: gdb-patches@sourceware.org
Subject: Re: [RFA] gdbserver/lynx178: spurious SIG61 signal when resuming inferior.
Message-ID: <20130513132802.GA32222@adacore.com>
References: <1368441986-14478-1-git-send-email-brobecker@adacore.com> <5190CCF9.3020004@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <5190CCF9.3020004@redhat.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-SW-Source: 2013-05/txt/msg00433.txt.bz2

Thanks for the comments, Pedro.

> > On ppc-lynx178, resuming the execution of a program after hitting
> > a breakpoint sometimes triggers a spurious SIG61 event:
> 
> I'd like to understand this a little better.
>
> Could that mean the thread that gdbserver used for ptrace hadn't
> been ptrace stopped, or doesn't exist at all?  "sometimes" makes
> me wonder about the latter.

My interpretation of the clues I have been able to gather is that
the LynxOS thread library implementation does not like it when
we mess with the program's scheduling. Lynx178 is derived from
an old version of LynxOS, which can explain why newer versions
are a little more robust in that respect.

I tried to get more info directly from the people who I thought
would know about this, but never managed to make progress in that
direction, so I gave up when I found this solution.

> So does that mean scheduler locking doesn't work?
> 
> E.g.,
> 
> (gdb) thread 2
> (gdb) si
> (gdb) thread 1
> (gdb) c 
Indeed, as expected, same sort of symptom:

    (gdb) thread 1
    [Switching to thread 1 (Thread 30)]
    #0  0x1004ed94 in _trap_ ()
    (gdb) si
    0x1004ed98 in _trap_ ()
    (gdb) thread 2
    [Switching to thread 2 (Thread 36)]
    #0  task_switch.break_me () at task_switch.adb:42
    42            null;
    (gdb) cont
    Continuing.

    Program received signal SIG62, Real-time event 62.
    task_switch.break_me () at task_switch.adb:42
    42            null;

> BTW, vCont;c means "resume all threads", why is the current code just
> resuming one?

It's actually using a ptrace request that applies to the process
(either PTRACE_CONT or PTRACE_SINGLE_STEP).

I never tried to implement single-thread control (scheduler-locking
on), as this is not something we're interested on for this platform,
at least for now...

> lynx_wait_1 ()
> ...
>   if (ptid_equal (ptid, minus_one_ptid))
>     pid = lynx_ptid_get_pid (thread_to_gdb_id (current_inferior));
>   else
>     pid = BUILDPID (lynx_ptid_get_pid (ptid), lynx_ptid_get_tid (ptid));
> 
> retry:
> 
>   ret = lynx_waitpid (pid, &wstat);
> 
> 
> is suspicious also.

I understand... It's a bit of a hybrid between trying to deal with
thread-level execution control, and process-level execution control.

> Doesn't that mean we're doing a waitpid on
> a possibly not-resumed current_inferior (that may not be the main task,
> if that matters)?  Could _that_ be reason for that magic signal 61?

Given the above (we resume processes, rather than threads individually),
I do not think that this is the source of the problem itself. I blame
the thread library for now liking it when you potentially alter the
program scheduling by resuming the non-active thread. This patch does
not prevent this from happening, but at least makes an effort into
avoiding it for the usual situations.

-- 
Joel