From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 126042 invoked by alias); 12 Jan 2017 16:29:09 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 125951 invoked by uid 89); 12 Jan 2017 16:29:09 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=whereby, suspend, lwp, Hx-languages-length:2381 X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 12 Jan 2017 16:29:07 +0000 Received: from svr-orw-mbx-03.mgc.mentorg.com ([147.34.90.203]) by relay1.mentorg.com with esmtp id 1cRiFO-00018m-63 from Luis_Gustavo@mentor.com ; Thu, 12 Jan 2017 08:29:06 -0800 Received: from [172.30.9.168] (147.34.91.1) by svr-orw-mbx-03.mgc.mentorg.com (147.34.90.203) with Microsoft SMTP Server (TLS) id 15.0.1210.3; Thu, 12 Jan 2017 08:29:02 -0800 Reply-To: Luis Machado Subject: Re: [PATCH] PR threads/20743: Don't attempt to suspend or resume exited threads. References: <20161223212842.42715-1-jhb@FreeBSD.org> <2893581.89CAWbS1EM@ralph.baldwin.cx> <20161228080707.GA4007@nitro> <1700771.1OUYESxIQe@ralph.baldwin.cx> To: John Baldwin , CC: From: Luis Machado Message-ID: Date: Thu, 12 Jan 2017 16:29:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <1700771.1OUYESxIQe@ralph.baldwin.cx> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-ClientProxiedBy: svr-orw-mbx-01.mgc.mentorg.com (147.34.90.201) To svr-orw-mbx-03.mgc.mentorg.com (147.34.90.203) X-IsSubscribed: yes X-SW-Source: 2017-01/txt/msg00243.txt.bz2 On 12/28/2016 11:37 AM, John Baldwin wrote: > On Wednesday, December 28, 2016 09:07:07 AM Vasil Dimov wrote: >> On Tue, Dec 27, 2016 at 13:03:27 -0800, John Baldwin wrote: >> [...] >>> I have tried changing fbsd_wait() to return a TARGET_WAITKIND_SPURIOUS >>> instead of explicitly continuing the process, but that doesn't help, and it >>> means that the ptid being returned is still T1 in that case. >>> >>> I'm not sure if I should explicitly be calling delete_exited_threads() in >>> fbsd_resume() before calling iterate_threads()? Alternatively, fbsd_resume() >>> could use ALL_NONEXITED_THREADS() instead of iterate_threads() (it isn't >>> clear to me which of these is preferred since both are in use). >>> >>> I added the assertion for my own sanity. I suspect gdb should never try to >>> invoke target_resume() with a ptid of an exited thread, but if for some >>> reason it did the effect on FreeBSD would be a hang since we would suspend >>> all the other threads and when the process was continued via PT_CONTINUE it >>> would have nothing to do and would never return from wait(). I'd rather have >>> gdb fail an assertion in that case rather than hang. >> [...] >> >> Hi, >> >> I am not sure if this is related, but since I get a hang I would rather >> mention it: with the John's patch (including the assert) gdb does not >> emit the "ptrace: No such process" error, but when I attempt to quit, >> it hangs: > > No, this is a separate bug in the kernel whereby a process doesn't > treat PT_KILL as a detach-like event but incorrectly expects to keep > getting PT_CONTINUE events for a while until it finally exits. I'm > working on writing up regression/unit tests for PT_KILL and then > fixing the bug. > I think the patch is mainly papering over a bigger problem. My guess is that the native fbsd backend is not doing something it should. I'd check how linux-nat.c is doing things and then try to confirm the fbsd behavior is sane. For example, i noticed linux-nat.c has exit_lwp (...) that handles deletion of both thread information and the thread itself (lwp). Even if it is the currently-selected thread, we *will* get the lwp removed from the list of existing lwp's. It doesn't make sense to keep a thread that has already exitted in the list of threads we are manipulating.