From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.freebsd.org (mx2.freebsd.org [96.47.72.81]) by sourceware.org (Postfix) with ESMTPS id EF60C385BF81 for ; Tue, 7 Jul 2020 23:17:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org EF60C385BF81 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=FreeBSD.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=jhb@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (4096 bits)) (Client CN "mx1.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx2.freebsd.org (Postfix) with ESMTPS id AB58F90106; Tue, 7 Jul 2020 23:17:20 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4B1dcw2Sl4z4FJy; Tue, 7 Jul 2020 23:17:20 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Received: from John-Baldwins-MacBook-Pro-274.local (unknown [IPv6:2601:648:8203:2990:98ed:1435:28a2:b475]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) (Authenticated sender: jhb) by smtp.freebsd.org (Postfix) with ESMTPSA id D527610BD2; Tue, 7 Jul 2020 23:17:19 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Subject: Re: [PATCH 00/28] Decouple inferior_ptid/inferior_thread(); dup ptids in thread list (PR/25412) To: Pedro Alves , gdb-patches@sourceware.org References: <20200414175434.8047-1-palves@redhat.com> From: John Baldwin Autocrypt: addr=jhb@FreeBSD.org; keydata= mQGiBETQ+XcRBADMFybiq69u+fJRy/0wzqTNS8jFfWaBTs5/OfcV7wWezVmf9sgwn8TW0Dk0 c9MBl0pz+H01dA2ZSGZ5fXlmFIsee1WEzqeJzpiwd/pejPgSzXB9ijbLHZ2/E0jhGBcVy5Yo /Tw5+U/+laeYKu2xb0XPvM0zMNls1ah5OnP9a6Ql6wCgupaoMySb7DXm2LHD1Z9jTsHcAQMD /1jzh2BoHriy/Q2s4KzzjVp/mQO5DSm2z14BvbQRcXU48oAosHA1u3Wrov6LfPY+0U1tG47X 1BGfnQH+rNAaH0livoSBQ0IPI/8WfIW7ub4qV6HYwWKVqkDkqwcpmGNDbz3gfaDht6nsie5Z pcuCcul4M9CW7Md6zzyvktjnbz61BADGDCopfZC4of0Z3Ka0u8Wik6UJOuqShBt1WcFS8ya1 oB4rc4tXfSHyMF63aPUBMxHR5DXeH+EO2edoSwViDMqWk1jTnYza51rbGY+pebLQOVOxAY7k do5Ordl3wklBPMVEPWoZ61SdbcjhHVwaC5zfiskcxj5wwXd2E9qYlBqRg7QeSm9obiBCYWxk d2luIDxqaGJARnJlZUJTRC5vcmc+iGAEExECACAFAkTQ+awCGwMGCwkIBwMCBBUCCAMEFgID AQIeAQIXgAAKCRBy3lIGd+N/BI6RAJ9S97fvbME+3hxzE3JUyUZ6vTewDACdE1stFuSfqMvM jomvZdYxIYyTUpC5Ag0ERND5ghAIAPwsO0B7BL+bz8sLlLoQktGxXwXQfS5cInvL17Dsgnr3 1AKa94j9EnXQyPEj7u0d+LmEe6CGEGDh1OcGFTMVrof2ZzkSy4+FkZwMKJpTiqeaShMh+Goj XlwIMDxyADYvBIg3eN5YdFKaPQpfgSqhT+7El7w+wSZZD8pPQuLAnie5iz9C8iKy4/cMSOrH YUK/tO+Nhw8Jjlw94Ik0T80iEhI2t+XBVjwdfjbq3HrJ0ehqdBwukyeJRYKmbn298KOFQVHO EVbHA4rF/37jzaMadK43FgJ0SAhPPF5l4l89z5oPu0b/+5e2inA3b8J3iGZxywjM+Csq1tqz hltEc7Q+E08AAwUIAL+15XH8bPbjNJdVyg2CMl10JNW2wWg2Q6qdljeaRqeR6zFus7EZTwtX sNzs5bP8y51PSUDJbeiy2RNCNKWFMndM22TZnk3GNG45nQd4OwYK0RZVrikalmJY5Q6m7Z16 4yrZgIXFdKj2t8F+x613/SJW1lIr9/bDp4U9tw0V1g3l2dFtD3p3ZrQ3hpoDtoK70ioIAjjH aIXIAcm3FGZFXy503DOA0KaTWwvOVdYCFLm3zWuSOmrX/GsEc7ovasOWwjPn878qVjbUKWwx Q4QkF4OhUV9zPtf9tDSAZ3x7QSwoKbCoRCZ/xbyTUPyQ1VvNy/mYrBcYlzHodsaqUDjHuW+I SQQYEQIACQUCRND5ggIbDAAKCRBy3lIGd+N/BCO8AJ9j1dWVQWxw/YdTbEyrRKOY8YZNwwCf afMAg8QvmOWnHx3wl8WslCaXaE8= Message-ID: <935bd7c8-f107-f2d1-ade2-f6259dc1297c@FreeBSD.org> Date: Tue, 7 Jul 2020 16:16:19 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 MIME-Version: 1.0 In-Reply-To: <20200414175434.8047-1-palves@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.6 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jul 2020 23:17:22 -0000 On 4/14/20 10:54 AM, Pedro Alves via Gdb-patches wrote: > In PR/25412, Simon noticed that after the multi-target series, the > tid-reuse.exp testcase manages to create a duplicate thread in the > thread list. Or rather, two threads with the same PTID. > > This in turn exposes a design problem in GDB. The inferior_thread() > function looks up the current thread based on inferior_ptid: > > struct thread_info* > inferior_thread (void) > { > struct thread_info *tp = find_thread_ptid (current_inferior (), inferior_ptid); > gdb_assert (tp); > return tp; > } > > But if there can be more than one thread in the thread list with the > same ptid_t, inferior_thread() may well return the wrong thread. > > This series fixes this by making the current thread be a global > thread_info pointer that is written to directly by switch_to_thread, > etc., and making inferior_thread() return that pointer, instead of > having inferior_thread() lookup up the inferior_ptid thread, by > ptid_t. You can look at this as a continuation of the effort of using > more thread_info pointers instead of ptids when possible. > > This change required auditing the whole codebase for places where we > were writing to inferior_ptid directly to change the current thread, > and change them to use switch_to_thread instead or one of its > siblings, because otherwise inferior_thread() and inferior_ptid would > get out of sync and inferior_thread() would return a thread unrelated > to the new inferior_ptid we wanted to switch to. That was all > (hopefully) done in all the patches leading up to the last one. > > After this, inferior_ptid still exists, but it is mostly read-only and > mainly used by target backend code. It is also relied on by a number > of target methods as a global input argument. E.g., the target_resume > interface and the memory reading routines -- we still need it there > because we need to be able to access memory off of processes for which > we don't have a corresponding inferior/thread object, like when > handling forks. Maybe we could pass down a context explicitly to > target_read_memory, etc. > > Most of the host-/native-specific code here is untested. I did my > best, but I won't be surprised if more tweaking is necessary. > > Testing on native x86_64 GNU/Linux is regression free for me. Testing > against gdbserver has regressed significantly in the past months and > is becoming difficult to run with a high number of long timeout > sequences; really looks like people aren't paying much attention to > that. I think this series doesn't regress gdbserver, but it's getting > hard to tell. :-/ This appears to have broken native debugging on FreeBSD for me in that when I run a process to completion it triggers an assertion failure: (gdb) r Starting program: /bin/echo Child process unexpectedly missing: No child processes. /home/john/work/git/gdb/gdb/inferior.c:293: internal-error: struct inferior *find_inferior_pid(process_stratum_target *, int): Assertion `pid != 0' failed. I tracked this down to this code in inf_ptrace::wait(): /* Ignore terminated detached child processes. */ if (!WIFSTOPPED (status) && pid != inferior_ptid.pid ()) pid = -1; At this point, inferior_ptid() is all zeroes and the process has reported a non-stopped exit status (WIFEXITED) so this ignores the exit event and loops back around to call waitpid() again which then fails with ECHILD. Looks like we always clear the inferior thread (and thus inferior_ptid) in do_target_wait_1: /* We know that we are looking for an event in the target of inferior INF, but we don't know which thread the event might come from. As such we want to make sure that INFERIOR_PTID is reset so that none of the wait code relies on it - doing so is always a mistake. */ switch_to_inferior_no_thread (inf); Commenting out the check in inf_ptrace::wait() "fixes" the issue for me on FreeBSD, but I'm not sure that is the right fix. It seems to me that multiprocess probably needs to return events for not just the current inferior pid but for any valid pid for example, and though multiprocess is still broken for me on FreeBSD if I comment out the check against inferior_ptid, I can now see that I was getting an event for the "wrong" inferior that was previously discarded but with the check commented out now gets reported to the core. (The reason I get an event for the wrong process is that for some reason the core asks the native target to resume the wrong process: > ./gdb /bin/ls (gdb) set debug fbsd-nat (gdb) set debug fbsd-lwp (gdb) start Temporary breakpoint 1 at 0x20430a: file /usr/src/bin/ls/ls.c, line 236. Starting program: /bin/ls FNAT: stop for LWP 101518 event 1 flags 0x18 FLWP: using LWP 101518 for first thread FLWP: fbsd_resume for ptid (-1, 0, 0) FNAT: stop for LWP 101518 event 1 flags 0x20 FNAT: si_signo 20 si_code 1 FLWP: fbsd_resume for ptid (-1, 0, 0) FNAT: stop for LWP 101518 event 1 flags 0x20 FNAT: si_signo 20 si_code 1 FLWP: fbsd_resume for ptid (-1, 0, 0) FNAT: stop for LWP 101518 event 1 flags 0x20 FNAT: si_signo 20 si_code 1 FLWP: fbsd_resume for ptid (-1, 0, 0) FNAT: stop for LWP 101518 event 1 flags 0x20 FNAT: si_signo 20 si_code 1 FLWP: fbsd_resume for ptid (-1, 0, 0) FNAT: stop for LWP 101518 event 1 flags 0x20 FNAT: si_signo 20 si_code 1 FLWP: fbsd_resume for ptid (-1, 0, 0) FNAT: stop for LWP 101518 event 1 flags 0x18 FLWP: fbsd_resume for ptid (-1, 0, 0) FNAT: stop for LWP 101518 event 1 flags 0x20 FNAT: si_signo 5 si_code 1 FNAT: sw breakpoint trap for LWP 101518 FLWP: fbsd_resume for ptid (70453, 101518, 0) FNAT: stop for LWP 101518 event 1 flags 0x20 FNAT: si_signo 5 si_code 2 FNAT: trace trap for LWP 101518 FLWP: fbsd_resume for ptid (-1, 0, 0) FNAT: stop for LWP 101518 event 1 flags 0x20 FNAT: si_signo 5 si_code 1 FNAT: sw breakpoint trap for LWP 101518 Temporary breakpoint 1, main (argc=1, argv=0x7fffffffe710) at /usr/src/bin/ls/ls.c:236 236 const char *errstr = NULL; (gdb) add-inferior [New inferior 2] Added inferior 2 on connection 1 (native) (gdb) inferior 2 [Switching to inferior 2 [] ()] (gdb) file /bin/ls Reading symbols from /bin/ls... Reading symbols from /usr/lib/debug//bin/ls.debug... (gdb) start Temporary breakpoint 2 at 0x20430a: -qualified main. (2 locations) Starting program: /bin/ls FNAT: stop for LWP 101641 event 1 flags 0x18 FLWP: using LWP 101641 for first thread FLWP: fbsd_resume for ptid (-1, 0, 0) FNAT: stop for LWP 101641 event 1 flags 0x20 FNAT: si_signo 20 si_code 1 FLWP: fbsd_resume for ptid (-1, 0, 0) FNAT: stop for LWP 101641 event 1 flags 0x20 FNAT: si_signo 20 si_code 1 FLWP: fbsd_resume for ptid (-1, 0, 0) FNAT: stop for LWP 101641 event 1 flags 0x20 FNAT: si_signo 20 si_code 1 FLWP: fbsd_resume for ptid (-1, 0, 0) FNAT: stop for LWP 101641 event 1 flags 0x24 FNAT: si_signo 20 si_code 1 FLWP: fbsd_resume for ptid (-1, 0, 0) FNAT: stop for LWP 101641 event 1 flags 0x24 FNAT: si_signo 20 si_code 1 FLWP: fbsd_resume for ptid (-1, 0, 0) FNAT: stop for LWP 101641 event 1 flags 0x20 FNAT: si_signo 20 si_code 1 FLWP: fbsd_resume for ptid (-1, 0, 0) FNAT: stop for LWP 101641 event 1 flags 0x18 FLWP: fbsd_resume for ptid (70453, 101518, 0) Program terminated with signal SIGTRAP, Trace/breakpoint trap. The program no longer exists. Here you can see that the last call to fbsd_resume() used the ptid from inferior 1 instead of inferior 2, and inferior 1 didn't discard it's pending SIGTRAP but instead was killed by it.) -- John Baldwin