From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 27543 invoked by alias); 13 May 2013 14:28:01 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 27532 invoked by uid 89); 13 May 2013 14:28:00 -0000 X-Spam-SWARE-Status: No, score=-8.0 required=5.0 tests=AWL,BAYES_00,KHOP_THREADED,RCVD_IN_HOSTKARMA_W,RCVD_IN_HOSTKARMA_WL,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.1 Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Mon, 13 May 2013 14:27:59 +0000 Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r4DERuvN030373 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 13 May 2013 10:27:56 -0400 Received: from [127.0.0.1] (ovpn01.gateway.prod.ext.ams2.redhat.com [10.39.146.11]) by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r4DERsvB030068; Mon, 13 May 2013 10:27:55 -0400 Message-ID: <5190F869.3090408@redhat.com> Date: Mon, 13 May 2013 14:28:00 -0000 From: Pedro Alves User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130311 Thunderbird/17.0.4 MIME-Version: 1.0 To: Joel Brobecker CC: gdb-patches@sourceware.org Subject: Re: [RFA] gdbserver/lynx178: spurious SIG61 signal when resuming inferior. References: <1368441986-14478-1-git-send-email-brobecker@adacore.com> <5190CCF9.3020004@redhat.com> <20130513132802.GA32222@adacore.com> In-Reply-To: <20130513132802.GA32222@adacore.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-SW-Source: 2013-05/txt/msg00436.txt.bz2 On 05/13/2013 02:28 PM, Joel Brobecker wrote: > Lynx178 is derived from > an old version of LynxOS, which can explain why newer versions > are a little more robust in that respect. Ah. I really have no sense of whether 178 is old or recent. ;-) > > I tried to get more info directly from the people who I thought > would know about this, but never managed to make progress in that > direction, so I gave up when I found this solution. > >> So does that mean scheduler locking doesn't work? >> >> E.g., >> >> (gdb) thread 2 >> (gdb) si >> (gdb) thread 1 >> (gdb) c > Indeed, as expected, same sort of symptom: > > (gdb) thread 1 > [Switching to thread 1 (Thread 30)] > #0 0x1004ed94 in _trap_ () > (gdb) si > 0x1004ed98 in _trap_ () > (gdb) thread 2 > [Switching to thread 2 (Thread 36)] > #0 task_switch.break_me () at task_switch.adb:42 > 42 null; > (gdb) cont > Continuing. > > Program received signal SIG62, Real-time event 62. > task_switch.break_me () at task_switch.adb:42 > 42 null; > >> BTW, vCont;c means "resume all threads", why is the current code just >> resuming one? > > It's actually using a ptrace request that applies to the process > (either PTRACE_CONT or PTRACE_SINGLE_STEP). > I never tried to implement single-thread control (scheduler-locking > on), as this is not something we're interested on for this platform, > at least for now... Okay... I see the file has a reference to PTRACE_CONT_ONE/PTRACE_SINGLE_STEP_ONE though they're not really being used. As PTRACE_SINGLE_STEP is resumes all threads in the process, then when stepping over a breakpoint, other threads may miss breakpoints... Old lynx-nat.c did: http://sourceware.org/cgi-bin/cvsweb.cgi/src/gdb/Attic/lynx-nat.c?rev=1.23&content-type=text/x-cvsweb-markup&cvsroot=src /* If pid == -1, then we want to step/continue all threads, else we only want to step/continue a single thread. */ if (pid == -1) { pid = PIDGET (inferior_ptid); func = step ? PTRACE_SINGLESTEP : PTRACE_CONT; } else func = step ? PTRACE_SINGLESTEP_ONE : PTRACE_CONT_ONE; I'd like to believe that just doing that in gdbserver too would fix the scheduler-locking example. :-) For the SIG61 issue, I wonder whether for PTRACE_CONT, it's "continue main pid process" that we should always use instead of "last reported thread id" (and that's what the old lynx-nat.c did too). Did you try that? Sorry to be picky. IMO, it's good to have all these experimentation results archived, for when somebody proposes removing/changing the "make sure to resume last reported" code at some point... > >> lynx_wait_1 () >> ... >> if (ptid_equal (ptid, minus_one_ptid)) >> pid = lynx_ptid_get_pid (thread_to_gdb_id (current_inferior)); >> else >> pid = BUILDPID (lynx_ptid_get_pid (ptid), lynx_ptid_get_tid (ptid)); >> >> retry: >> >> ret = lynx_waitpid (pid, &wstat); >> >> >> is suspicious also. > > I understand... It's a bit of a hybrid between trying to deal with > thread-level execution control, and process-level execution control. I actually misread this. lynx_ptid_get_pid returns the main pid of the process, while I read that as getting at the current_inferior's tid. >> Doesn't that mean we're doing a waitpid on >> a possibly not-resumed current_inferior (that may not be the main task, >> if that matters)? Could _that_ be reason for that magic signal 61? > > Given the above (we resume processes, rather than threads individually), > I do not think that this is the source of the problem itself. I blame > the thread library for now liking it when you potentially alter the > program scheduling by resuming the non-active thread. This patch does > not prevent this from happening, but at least makes an effort into > avoiding it for the usual situations. -- Pedro Alves