From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-101430-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 27543 invoked by alias); 13 May 2013 14:28:01 -0000
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
Received: (qmail 27532 invoked by uid 89); 13 May 2013 14:28:00 -0000
X-Spam-SWARE-Status: No, score=-8.0 required=5.0 tests=AWL,BAYES_00,KHOP_THREADED,RCVD_IN_HOSTKARMA_W,RCVD_IN_HOSTKARMA_WL,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.1
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28)    by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Mon, 13 May 2013 14:27:59 +0000
Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25])	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r4DERuvN030373	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);	Mon, 13 May 2013 10:27:56 -0400
Received: from [127.0.0.1] (ovpn01.gateway.prod.ext.ams2.redhat.com [10.39.146.11])	by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r4DERsvB030068;	Mon, 13 May 2013 10:27:55 -0400
Message-ID: <5190F869.3090408@redhat.com>
Date: Mon, 13 May 2013 14:28:00 -0000
From: Pedro Alves <palves@redhat.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130311 Thunderbird/17.0.4
MIME-Version: 1.0
To: Joel Brobecker <brobecker@adacore.com>
CC: gdb-patches@sourceware.org
Subject: Re: [RFA] gdbserver/lynx178: spurious SIG61 signal when resuming inferior.
References: <1368441986-14478-1-git-send-email-brobecker@adacore.com> <5190CCF9.3020004@redhat.com> <20130513132802.GA32222@adacore.com>
In-Reply-To: <20130513132802.GA32222@adacore.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-SW-Source: 2013-05/txt/msg00436.txt.bz2

On 05/13/2013 02:28 PM, Joel Brobecker wrote:

> Lynx178 is derived from
> an old version of LynxOS, which can explain why newer versions
> are a little more robust in that respect.

Ah.  I really have no sense of whether 178 is old or recent.  ;-)

> 
> I tried to get more info directly from the people who I thought
> would know about this, but never managed to make progress in that
> direction, so I gave up when I found this solution.
> 
>> So does that mean scheduler locking doesn't work?
>>
>> E.g.,
>>
>> (gdb) thread 2
>> (gdb) si
>> (gdb) thread 1
>> (gdb) c 
> Indeed, as expected, same sort of symptom:
> 
>     (gdb) thread 1
>     [Switching to thread 1 (Thread 30)]
>     #0  0x1004ed94 in _trap_ ()
>     (gdb) si
>     0x1004ed98 in _trap_ ()
>     (gdb) thread 2
>     [Switching to thread 2 (Thread 36)]
>     #0  task_switch.break_me () at task_switch.adb:42
>     42            null;
>     (gdb) cont
>     Continuing.
> 
>     Program received signal SIG62, Real-time event 62.
>     task_switch.break_me () at task_switch.adb:42
>     42            null;
> 
>> BTW, vCont;c means "resume all threads", why is the current code just
>> resuming one?
> 
> It's actually using a ptrace request that applies to the process
> (either PTRACE_CONT or PTRACE_SINGLE_STEP).
> I never tried to implement single-thread control (scheduler-locking
> on), as this is not something we're interested on for this platform,
> at least for now...

Okay...  I see the file has a reference to PTRACE_CONT_ONE/PTRACE_SINGLE_STEP_ONE
though they're not really being used.  As PTRACE_SINGLE_STEP is resumes all
threads in the process, then when stepping over a breakpoint, other
threads may miss breakpoints...

Old lynx-nat.c did:

http://sourceware.org/cgi-bin/cvsweb.cgi/src/gdb/Attic/lynx-nat.c?rev=1.23&content-type=text/x-cvsweb-markup&cvsroot=src

  /* If pid == -1, then we want to step/continue all threads, else
     we only want to step/continue a single thread.  */
  if (pid == -1)
    {
      pid = PIDGET (inferior_ptid);
      func = step ? PTRACE_SINGLESTEP : PTRACE_CONT;
    }
  else
    func = step ? PTRACE_SINGLESTEP_ONE : PTRACE_CONT_ONE;


I'd like to believe that just doing that in gdbserver too
would fix the scheduler-locking example.  :-)

For the SIG61 issue, I wonder whether for PTRACE_CONT,
it's "continue main pid process" that we should always use
instead of "last reported thread id" (and that's what the old
lynx-nat.c did too).  Did you try that?

Sorry to be picky.  IMO, it's good to have all these
experimentation results archived, for when somebody proposes
removing/changing the "make sure to resume last reported" code
at some point...

> 
>> lynx_wait_1 ()
>> ...
>>   if (ptid_equal (ptid, minus_one_ptid))
>>     pid = lynx_ptid_get_pid (thread_to_gdb_id (current_inferior));
>>   else
>>     pid = BUILDPID (lynx_ptid_get_pid (ptid), lynx_ptid_get_tid (ptid));
>>
>> retry:
>>
>>   ret = lynx_waitpid (pid, &wstat);
>>
>>
>> is suspicious also.
> 
> I understand... It's a bit of a hybrid between trying to deal with
> thread-level execution control, and process-level execution control.

I actually misread this.  lynx_ptid_get_pid returns the main pid of the
process, while I read that as getting at the current_inferior's tid.

>> Doesn't that mean we're doing a waitpid on
>> a possibly not-resumed current_inferior (that may not be the main task,
>> if that matters)?  Could _that_ be reason for that magic signal 61?
> 
> Given the above (we resume processes, rather than threads individually),
> I do not think that this is the source of the problem itself. I blame
> the thread library for now liking it when you potentially alter the
> program scheduling by resuming the non-active thread. This patch does
> not prevent this from happening, but at least makes an effort into
> avoiding it for the usual situations.

-- 
Pedro Alves