From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-f66.google.com (mail-wm1-f66.google.com [209.85.128.66]) by sourceware.org (Postfix) with ESMTPS id 6E6053858D35 for ; Tue, 7 Jul 2020 23:53:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 6E6053858D35 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=palves.net Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=alves.ped@gmail.com Received: by mail-wm1-f66.google.com with SMTP id o8so1072148wmh.4 for ; Tue, 07 Jul 2020 16:53:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=0OyUwGwWip8tPtC3gpQH7mPWCHxC1PopRfilaDOz3bE=; b=kpw2JZkvh0dGIGF7Eyyb6vXXv9xP543SYKPams4vdw7FkhvHLS5mF/OAMUgvS/NTTr vOPB+0FlmoauTDysMpfaa298k9wGZryFwBn1YzhjiHRDxiFRcs0VIBNJ2Jdu4SaYlMoR oNSNO1v00QxsbuerjRp25qOAKOB3aO/Q4yD52rAEMb0CbRXVzup98wWhfhBnIP+EH7sh RngiMHZIhVBDMKVAjW6D7cp8N1DcLH9wer6msPwdhAAw5P+Km1n087HtXkP/SML0zgb6 vORC5dMFtmgzfsto5i80MBn5368f/XGF6eqDMxv5/xJRKty7nuolU56YNmupg/Bp6ez2 cjxA== X-Gm-Message-State: AOAM533YYv4sQn0sj7ZoF9FAdbSTMMn2rUK4atvJ3kGbj6kuBvtB98lu T3y+rxbOFtRF6uXakr3ReVkf1LIaVyKtLA== X-Google-Smtp-Source: ABdhPJw/ECQ2CsboGeAuhKj2XKrgrAlysKPKTS2AVeZlQ+RPtUX96JncQUYksFhQbcYVHpT6vvYw3Q== X-Received: by 2002:a05:600c:2058:: with SMTP id p24mr6390931wmg.74.1594166023488; Tue, 07 Jul 2020 16:53:43 -0700 (PDT) Received: from ?IPv6:2001:8a0:f91a:c400:8728:8fef:5b85:5934? ([2001:8a0:f91a:c400:8728:8fef:5b85:5934]) by smtp.gmail.com with ESMTPSA id d132sm3092764wmd.35.2020.07.07.16.53.42 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 07 Jul 2020 16:53:42 -0700 (PDT) Subject: Re: [PATCH 00/28] Decouple inferior_ptid/inferior_thread(); dup ptids in thread list (PR/25412) To: John Baldwin , gdb-patches@sourceware.org References: <20200414175434.8047-1-palves@redhat.com> <935bd7c8-f107-f2d1-ade2-f6259dc1297c@FreeBSD.org> From: Pedro Alves Message-ID: <1000f354-aa00-9cf3-7417-57b2efe59216@palves.net> Date: Wed, 8 Jul 2020 00:53:41 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: <935bd7c8-f107-f2d1-ade2-f6259dc1297c@FreeBSD.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-9.8 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jul 2020 23:53:47 -0000 On 7/8/20 12:16 AM, John Baldwin wrote: > This appears to have broken native debugging on FreeBSD for me in that > when I run a process to completion it triggers an assertion failure: > > (gdb) r > Starting program: /bin/echo > > Child process unexpectedly missing: No child processes. > /home/john/work/git/gdb/gdb/inferior.c:293: internal-error: struct inferior *find_inferior_pid(process_stratum_target *, int): Assertion `pid != 0' failed. > > I tracked this down to this code in inf_ptrace::wait(): > > /* Ignore terminated detached child processes. */ > if (!WIFSTOPPED (status) && pid != inferior_ptid.pid ()) > pid = -1; > > At this point, inferior_ptid() is all zeroes and the process > has reported a non-stopped exit status (WIFEXITED) so this > ignores the exit event and loops back around to call waitpid() > again which then fails with ECHILD. > > Looks like we always clear the inferior thread (and thus > inferior_ptid) in do_target_wait_1: > > /* We know that we are looking for an event in the target of inferior > INF, but we don't know which thread the event might come from. As > such we want to make sure that INFERIOR_PTID is reset so that none of > the wait code relies on it - doing so is always a mistake. */ > switch_to_inferior_no_thread (inf); > > Commenting out the check in inf_ptrace::wait() "fixes" the issue for > me on FreeBSD, but I'm not sure that is the right fix. linux-nat.c has something logic, here in linux_nat_filter_event: /* Make sure we don't report an event for the exit of an LWP not in our list, i.e. not part of the current process. This can happen if we detach from a program we originally forked and then it exits. */ if (!WIFSTOPPED (status) && !lp) return NULL; For inf-ptrace.c, the right fix should be something around this: diff --git c/gdb/inf-ptrace.c w/gdb/inf-ptrace.c index d25d226abb..ae0b0f7ff0 100644 --- c/gdb/inf-ptrace.c +++ w/gdb/inf-ptrace.c @@ -347,7 +347,7 @@ inf_ptrace_target::wait (ptid_t ptid, struct target_waitstatus *ourstatus, } /* Ignore terminated detached child processes. */ - if (!WIFSTOPPED (status) && pid != inferior_ptid.pid ()) + if (!WIFSTOPPED (status) && find_inferior_pid (this, pid) == nullptr) pid = -1; } while (pid == -1); The inferior_ptid reference in the error path above that code, where it reads: "Claim it exited with unknown signal" is also wrong. I'm not sure what to do about that one. Not clear if that path can really happen. linux-nat.c calls perror_with_name in some places if waitpid returns -1, and in the main waitpid call, just assumes that error -1 never happens... > > It seems to me that multiprocess probably needs to return events for > not just the current inferior pid but for any valid pid for example, Yes. > and though multiprocess is still broken for me on FreeBSD if I comment > out the check against inferior_ptid, I can now see that I was getting > an event for the "wrong" inferior that was previously discarded but > with the check commented out now gets reported to the core. > > (The reason I get an event for the wrong process is that for some > reason the core asks the native target to resume the wrong process: > >> ./gdb /bin/ls > (gdb) set debug fbsd-nat > (gdb) set debug fbsd-lwp > (gdb) start > Temporary breakpoint 1 at 0x20430a: file /usr/src/bin/ls/ls.c, line 236. > Starting program: /bin/ls > FNAT: stop for LWP 101518 event 1 flags 0x18 > FLWP: using LWP 101518 for first thread > FLWP: fbsd_resume for ptid (-1, 0, 0) > FNAT: stop for LWP 101518 event 1 flags 0x20 > FNAT: si_signo 20 si_code 1 > FLWP: fbsd_resume for ptid (-1, 0, 0) > FNAT: stop for LWP 101518 event 1 flags 0x20 > FNAT: si_signo 20 si_code 1 > FLWP: fbsd_resume for ptid (-1, 0, 0) > FNAT: stop for LWP 101518 event 1 flags 0x20 > FNAT: si_signo 20 si_code 1 > FLWP: fbsd_resume for ptid (-1, 0, 0) > FNAT: stop for LWP 101518 event 1 flags 0x20 > FNAT: si_signo 20 si_code 1 > FLWP: fbsd_resume for ptid (-1, 0, 0) > FNAT: stop for LWP 101518 event 1 flags 0x20 > FNAT: si_signo 20 si_code 1 > FLWP: fbsd_resume for ptid (-1, 0, 0) > FNAT: stop for LWP 101518 event 1 flags 0x18 > FLWP: fbsd_resume for ptid (-1, 0, 0) > FNAT: stop for LWP 101518 event 1 flags 0x20 > FNAT: si_signo 5 si_code 1 > FNAT: sw breakpoint trap for LWP 101518 > FLWP: fbsd_resume for ptid (70453, 101518, 0) > FNAT: stop for LWP 101518 event 1 flags 0x20 > FNAT: si_signo 5 si_code 2 > FNAT: trace trap for LWP 101518 > FLWP: fbsd_resume for ptid (-1, 0, 0) > FNAT: stop for LWP 101518 event 1 flags 0x20 > FNAT: si_signo 5 si_code 1 > FNAT: sw breakpoint trap for LWP 101518 > > Temporary breakpoint 1, main (argc=1, argv=0x7fffffffe710) > at /usr/src/bin/ls/ls.c:236 > 236 const char *errstr = NULL; > (gdb) add-inferior > [New inferior 2] > Added inferior 2 on connection 1 (native) > (gdb) inferior 2 > [Switching to inferior 2 [] ()] > (gdb) file /bin/ls > Reading symbols from /bin/ls... > Reading symbols from /usr/lib/debug//bin/ls.debug... > (gdb) start > Temporary breakpoint 2 at 0x20430a: -qualified main. (2 locations) > Starting program: /bin/ls > FNAT: stop for LWP 101641 event 1 flags 0x18 > FLWP: using LWP 101641 for first thread > FLWP: fbsd_resume for ptid (-1, 0, 0) > FNAT: stop for LWP 101641 event 1 flags 0x20 > FNAT: si_signo 20 si_code 1 > FLWP: fbsd_resume for ptid (-1, 0, 0) > FNAT: stop for LWP 101641 event 1 flags 0x20 > FNAT: si_signo 20 si_code 1 > FLWP: fbsd_resume for ptid (-1, 0, 0) > FNAT: stop for LWP 101641 event 1 flags 0x20 > FNAT: si_signo 20 si_code 1 > FLWP: fbsd_resume for ptid (-1, 0, 0) > FNAT: stop for LWP 101641 event 1 flags 0x24 > FNAT: si_signo 20 si_code 1 > FLWP: fbsd_resume for ptid (-1, 0, 0) > FNAT: stop for LWP 101641 event 1 flags 0x24 > FNAT: si_signo 20 si_code 1 > FLWP: fbsd_resume for ptid (-1, 0, 0) > FNAT: stop for LWP 101641 event 1 flags 0x20 > FNAT: si_signo 20 si_code 1 > FLWP: fbsd_resume for ptid (-1, 0, 0) > FNAT: stop for LWP 101641 event 1 flags 0x18 > FLWP: fbsd_resume for ptid (70453, 101518, 0) > > Program terminated with signal SIGTRAP, Trace/breakpoint trap. > The program no longer exists. > > Here you can see that the last call to fbsd_resume() used the ptid from > inferior 1 instead of inferior 2, and inferior 1 didn't discard it's > pending SIGTRAP but instead was killed by it.) "set debug infrun 1" will probably reveal what is happening. Thanks, Pedro Alves