From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 105507 invoked by alias); 6 Jan 2017 19:35:27 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 105479 invoked by uid 89); 6 Jan 2017 19:35:26 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.1 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS,SPF_SOFTFAIL autolearn=no version=3.3.2 spammy=711,6, 7116, 6626, HX-Greylist:EST X-HELO: mail.baldwin.cx Received: from bigwig.baldwin.cx (HELO mail.baldwin.cx) (96.47.65.170) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 06 Jan 2017 19:35:23 +0000 Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by mail.baldwin.cx (Postfix) with ESMTPSA id 7B91610B56C; Fri, 6 Jan 2017 14:35:21 -0500 (EST) From: John Baldwin To: Luis Machado Cc: gdb-patches@sourceware.org, vd@freebsd.org Subject: Re: [PATCH] PR threads/20743: Don't attempt to suspend or resume exited threads. Date: Fri, 06 Jan 2017 19:35:00 -0000 Message-ID: <4337078.JhUSvW3lTe@ralph.baldwin.cx> User-Agent: KMail/4.14.10 (FreeBSD/11.0-STABLE; KDE/4.14.10; amd64; ; ) In-Reply-To: <2893581.89CAWbS1EM@ralph.baldwin.cx> References: <20161223212842.42715-1-jhb@FreeBSD.org> <20161227164329.GA43600@nitro> <2893581.89CAWbS1EM@ralph.baldwin.cx> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-IsSubscribed: yes X-SW-Source: 2017-01/txt/msg00114.txt.bz2 On Tuesday, December 27, 2016 01:03:27 PM John Baldwin wrote: > On Tuesday, December 27, 2016 05:43:29 PM Vasil Dimov wrote: > > On Fri, Dec 23, 2016 at 15:43:19 -0600, Luis Machado wrote: > > > On 12/23/2016 03:28 PM, John Baldwin wrote: > > > > When resuming a native FreeBSD process, ignore exited threads when > > > > suspending/resuming individual threads prior to continuing the process. > > > > > > > > gdb/ChangeLog: > > > > > > > > PR threads/20743 > > > > * fbsd-nat.c (resume_one_thread_cb): Ignore exited threads. > > > > (resume_all_threads_cb): Likewise. > > > > (fbsd_resume): Assert resuming thread has not exited. > > > > --- > > > > gdb/ChangeLog | 7 +++++++ > > > > gdb/fbsd-nat.c | 7 +++++++ > > > > 2 files changed, 14 insertions(+) > > > > > > > > diff --git a/gdb/ChangeLog b/gdb/ChangeLog > > > > index db6e913..4fb3732 100644 > > > > --- a/gdb/ChangeLog > > > > +++ b/gdb/ChangeLog > > > > @@ -1,3 +1,10 @@ > > > > +2016-12-23 John Baldwin > > > > + > > > > + PR threads/20743 > > > > + * fbsd-nat.c (resume_one_thread_cb): Ignore exited threads. > > > > + (resume_all_threads_cb): Likewise. > > > > + (fbsd_resume): Assert resuming thread has not exited. > > > > + > > > > 2016-12-22 Doug Evans > > > > > > > > * infrun.c (set_step_over_info): Add comment. > > > > diff --git a/gdb/fbsd-nat.c b/gdb/fbsd-nat.c > > > > index ade62f1..7cd08c6 100644 > > > > --- a/gdb/fbsd-nat.c > > > > +++ b/gdb/fbsd-nat.c > > > > @@ -662,6 +662,9 @@ resume_one_thread_cb (struct thread_info *tp, void *data) > > > > if (ptid_get_pid (tp->ptid) != ptid_get_pid (*ptid)) > > > > return 0; > > > > > > > > + if (is_exited (tp->ptid)) > > > > + return 0; > > > > + > > > > if (ptid_get_lwp (tp->ptid) == ptid_get_lwp (*ptid)) > > > > request = PT_RESUME; > > > > else > > > > @@ -680,6 +683,9 @@ resume_all_threads_cb (struct thread_info *tp, void *data) > > > > if (!ptid_match (tp->ptid, *filter)) > > > > return 0; > > > > > > > > + if (is_exited (tp->ptid)) > > > > + return 0; > > > > + > > > > if (ptrace (PT_RESUME, ptid_get_lwp (tp->ptid), NULL, 0) == -1) > > > > perror_with_name (("ptrace")); > > > > return 0; > > > > @@ -711,6 +717,7 @@ fbsd_resume (struct target_ops *ops, > > > > if (ptid_lwp_p (ptid)) > > > > { > > > > /* If ptid is a specific LWP, suspend all other LWPs in the process. */ > > > > + gdb_assert (!is_exited (ptid)); > > > > > > If we're asserting on this (since supposedly it shouldn't happen), do we > > > need to check for is_exited on the two functions above? > > > > > > Also, is there a reason why we're not detecting a thread that has > > > exited? Aren't all threads stopped at this point (for all-stop mode at > > > least)? > > [...] > > > > Hello, > > > > I just nailed this down after it has been annoying me for some time, > > fixed it with a similar patch as the one submitted by John, and came > > here to report it. > > > > The reason that we are "not detecting" an exited thread (at least in the > > scenario I got is), gdb/thread.c: > > > > --- cut --- > > static void > > delete_thread_1 (ptid_t ptid, int silent) > > { > > ... > > /* If this is the current thread, or there's code out there that > > relies on it existing (refcount > 0) we can't delete yet. Mark > > it as exited, and notify it. */ > > if (tp->refcount > 0 > > || ptid_equal (tp->ptid, inferior_ptid)) > > { > > ... > > /* Will be really deleted some other time. */ > > printf_unfiltered ("========== Will be really deleted some other time %u\n", ptid); > > return; > > } > > ... > > if (tpprev) > > tpprev->next = tp->next; > > else > > thread_list = tp->next; > > --- cut --- > > > > In my scenario tp->refcount is 0, but > > "ptid_equal (tp->ptid, inferior_ptid)" is true, so the thread's entry is > > not removed from the global "threads_list". > > > > The gdb output (with "set debug fbsd-lwp" enabled): > > > > --- cut --- > > FLWP: adding thread for LWP 102009 > > [New LWP 102009 of process 40304] > > FLWP: fbsd_resume for ptid (-1, 0, 0) > > FLWP: fbsd_resume for ptid (40304, 102009, 0) > > FLWP: fbsd_resume for ptid (-1, 0, 0) > > FLWP: fbsd_resume for ptid (40304, 102009, 0) > > FLWP: fbsd_resume for ptid (-1, 0, 0) > > FLWP: deleting thread for LWP 102009 > > [LWP 102009 of process 40304 exited] > > ... > > ptrace: No such process. > > --- cut --- > > > > Hope this helps. > > In particular, the sequence of events is this: > > - an LWP (T1) reports a "normal" event (in the test case it is hitting a > breakpoint). This is reported to the core and sets the current thread > (and thus inferior_ptid) to T1. > - the same LWP (T1) then exits and a thread exit event is reported via > ptrace() to the native target. The native target calls delete_thread, > but the thread is not removed, just marked EXITING since it == > inferior_ptid as Vasil noted. The native target just > continues the process explicitly via ptrace() without reporting any > event to the core aside from the call to delete_thread(). > - some other LWP (T2) reports an event (in the test case it is a > breakpoint). > - the user continues which invokes fbsd_resume() which wants to resume > all threads. Here iterate_over_threads() in fbsd_resume() will > encounters the exited thread for T1 since nothing has called > thread_update_list() (which would invoke delete_exited_threads() from > fbsd_update_thread_list()). Since the thread is exited, trying to > manipulate it via ptrace() results in an error. > > I have tried changing fbsd_wait() to return a TARGET_WAITKIND_SPURIOUS > instead of explicitly continuing the process, but that doesn't help, and it > means that the ptid being returned is still T1 in that case. > > I'm not sure if I should explicitly be calling delete_exited_threads() in > fbsd_resume() before calling iterate_threads()? Alternatively, fbsd_resume() > could use ALL_NONEXITED_THREADS() instead of iterate_threads() (it isn't > clear to me which of these is preferred since both are in use). > > I added the assertion for my own sanity. I suspect gdb should never try to > invoke target_resume() with a ptid of an exited thread, but if for some > reason it did the effect on FreeBSD would be a hang since we would suspend > all the other threads and when the process was continued via PT_CONTINUE it > would have nothing to do and would never return from wait(). I'd rather have > gdb fail an assertion in that case rather than hang. Luis (or anyone else), any thoughts on the above or if there is a better way to solve this (e.g. calling delete_exited_threads() explicitly in fbsd_resume() before iterate_threads() and/or using ALL_NONEXITED_THREADS instead of iterate_threads)? -- John Baldwin