Re: [RFA] gdbserver/lynx178: spurious SIG61 signal when resuming inferior.

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

From: Pedro Alves <palves@redhat.com>
To: Joel Brobecker <brobecker@adacore.com>
Cc: gdb-patches@sourceware.org
Subject: Re: [RFA] gdbserver/lynx178: spurious SIG61 signal when resuming inferior.
Date: Mon, 13 May 2013 14:28:00 -0000	[thread overview]
Message-ID: <5190F869.3090408@redhat.com> (raw)
In-Reply-To: <20130513132802.GA32222@adacore.com>

On 05/13/2013 02:28 PM, Joel Brobecker wrote:

> Lynx178 is derived from
> an old version of LynxOS, which can explain why newer versions
> are a little more robust in that respect.

Ah.  I really have no sense of whether 178 is old or recent.  ;-)

> 
> I tried to get more info directly from the people who I thought
> would know about this, but never managed to make progress in that
> direction, so I gave up when I found this solution.
> 
>> So does that mean scheduler locking doesn't work?
>>
>> E.g.,
>>
>> (gdb) thread 2
>> (gdb) si
>> (gdb) thread 1
>> (gdb) c 
> Indeed, as expected, same sort of symptom:
> 
>     (gdb) thread 1
>     [Switching to thread 1 (Thread 30)]
>     #0  0x1004ed94 in _trap_ ()
>     (gdb) si
>     0x1004ed98 in _trap_ ()
>     (gdb) thread 2
>     [Switching to thread 2 (Thread 36)]
>     #0  task_switch.break_me () at task_switch.adb:42
>     42            null;
>     (gdb) cont
>     Continuing.
> 
>     Program received signal SIG62, Real-time event 62.
>     task_switch.break_me () at task_switch.adb:42
>     42            null;
> 
>> BTW, vCont;c means "resume all threads", why is the current code just
>> resuming one?
> 
> It's actually using a ptrace request that applies to the process
> (either PTRACE_CONT or PTRACE_SINGLE_STEP).
> I never tried to implement single-thread control (scheduler-locking
> on), as this is not something we're interested on for this platform,
> at least for now...

Okay...  I see the file has a reference to PTRACE_CONT_ONE/PTRACE_SINGLE_STEP_ONE
though they're not really being used.  As PTRACE_SINGLE_STEP is resumes all
threads in the process, then when stepping over a breakpoint, other
threads may miss breakpoints...

Old lynx-nat.c did:

http://sourceware.org/cgi-bin/cvsweb.cgi/src/gdb/Attic/lynx-nat.c?rev=1.23&content-type=text/x-cvsweb-markup&cvsroot=src

  /* If pid == -1, then we want to step/continue all threads, else
     we only want to step/continue a single thread.  */
  if (pid == -1)
    {
      pid = PIDGET (inferior_ptid);
      func = step ? PTRACE_SINGLESTEP : PTRACE_CONT;
    }
  else
    func = step ? PTRACE_SINGLESTEP_ONE : PTRACE_CONT_ONE;

I'd like to believe that just doing that in gdbserver too
would fix the scheduler-locking example.  :-)

For the SIG61 issue, I wonder whether for PTRACE_CONT,
it's "continue main pid process" that we should always use
instead of "last reported thread id" (and that's what the old
lynx-nat.c did too).  Did you try that?

Sorry to be picky.  IMO, it's good to have all these
experimentation results archived, for when somebody proposes
removing/changing the "make sure to resume last reported" code
at some point...

> 
>> lynx_wait_1 ()
>> ...
>>   if (ptid_equal (ptid, minus_one_ptid))
>>     pid = lynx_ptid_get_pid (thread_to_gdb_id (current_inferior));
>>   else
>>     pid = BUILDPID (lynx_ptid_get_pid (ptid), lynx_ptid_get_tid (ptid));
>>
>> retry:
>>
>>   ret = lynx_waitpid (pid, &wstat);
>>
>>
>> is suspicious also.
> 
> I understand... It's a bit of a hybrid between trying to deal with
> thread-level execution control, and process-level execution control.

I actually misread this.  lynx_ptid_get_pid returns the main pid of the
process, while I read that as getting at the current_inferior's tid.

>> Doesn't that mean we're doing a waitpid on
>> a possibly not-resumed current_inferior (that may not be the main task,
>> if that matters)?  Could _that_ be reason for that magic signal 61?
> 
> Given the above (we resume processes, rather than threads individually),
> I do not think that this is the source of the problem itself. I blame
> the thread library for now liking it when you potentially alter the
> program scheduling by resuming the non-active thread. This patch does
> not prevent this from happening, but at least makes an effort into
> avoiding it for the usual situations.

-- 
Pedro Alves

next prev parent reply	other threads:[~2013-05-13 14:28 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-13 10:46 Joel Brobecker
2013-05-13 11:22 ` Pedro Alves
2013-05-13 11:25   ` Pedro Alves
2013-05-13 13:28   ` Joel Brobecker
2013-05-13 14:28     ` Pedro Alves [this message]
2013-05-16 12:24       ` Joel Brobecker
2013-05-16 13:14         ` Pedro Alves
2013-05-13 14:36 ` Pedro Alves
2013-05-17  6:57   ` Joel Brobecker
2013-05-17  6:48 ` Checked in: " Joel Brobecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5190F869.3090408@redhat.com \
    --to=palves@redhat.com \
    --cc=brobecker@adacore.com \
    --cc=gdb-patches@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox