From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca by simark.ca with LMTP id xzzXLRdMI2axuzUAWB0awg (envelope-from ) for ; Sat, 20 Apr 2024 01:01:11 -0400 Authentication-Results: simark.ca; dkim=pass (2048-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=dIy90lTB; dkim-atps=neutral Received: by simark.ca (Postfix, from userid 112) id A6ADE1E0C0; Sat, 20 Apr 2024 01:01:11 -0400 (EDT) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPS id 753DC1E030 for ; Sat, 20 Apr 2024 01:01:09 -0400 (EDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B880E385840A for ; Sat, 20 Apr 2024 05:01:08 +0000 (GMT) Received: from mail-pl1-x62d.google.com (mail-pl1-x62d.google.com [IPv6:2607:f8b0:4864:20::62d]) by sourceware.org (Postfix) with ESMTPS id 941E53858D33 for ; Sat, 20 Apr 2024 05:00:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 941E53858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 941E53858D33 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::62d ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1713589248; cv=none; b=JKYUUEQX6+qMF0bmojAHARSeUrySONnCd4pxHswzqGpAp2WP9VYsb5TNb8ysvQgLm0uV+4FFdFIRPliaTJCChAUK88BjJVzd6GSnW3F3GfafYAzgn1NERylLD2tRkry/9+XWTRZgih0wYcurU0h49iWic8PIcKKp7ch+EF7pTTE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1713589248; c=relaxed/simple; bh=W5vQtKKnkghhMIU7PVFLTvJoyQr6uRgDe4zCXOFLF9g=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=L4YUmicB7D+MkPnwELu72vzrPa0vHytDuR9P0vbvh9RZJvkce7Tb7VnCQ/pSAGRVwDMVElaOOAlr/d5KFBtKNrMMHqvNBaC4p9K3ccLBvwD/Bi2QYtGv6/xXyuq0Vry8vmEJI4lwum86BTGbZztQarZhHO/s9a+z0AKHkjKxGM0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pl1-x62d.google.com with SMTP id d9443c01a7336-1e3f6f03594so22031915ad.0 for ; Fri, 19 Apr 2024 22:00:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1713589244; x=1714194044; darn=sourceware.org; h=content-transfer-encoding:mime-version:message-id:date:user-agent :references:in-reply-to:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ddAcXQbDu5+lUJt1t7QYvDPMT1NjqXzwlpuOMch2XJA=; b=dIy90lTBBCYfaNFkMjmBF4Gy0Bp6qKvCCFiCK30aQKrpI3RL3jC30avURI63hcokNh /rNSKBhCvRYHfF6CRT5KHSWGwzphmneS3rR89pem00obdBaANwcv0ZDBJpEoOkZrkTik VZ24qYUKC5mKhz1gqutm+e3rB9YQ9q0IzTXM5d5d3M2sUeoa8HvlW43L9hh5kxamRsSR YKyc7Muzikj+jVjX3H5J5MVCIVQ02oanK7MnF2LD+3hPAOnX0bC46B4ZUh+dW81LDcXj 6hqRmkKW91Bmw/8+VB4vTbAfGxwkIYflvPZrefJ6dEOp5LoEg2SsWenmb7XvR2l4bPPV Orog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713589244; x=1714194044; h=content-transfer-encoding:mime-version:message-id:date:user-agent :references:in-reply-to:subject:cc:to:from:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=ddAcXQbDu5+lUJt1t7QYvDPMT1NjqXzwlpuOMch2XJA=; b=cRi8dX8bGSVD3EPDx+w0EhBlQwR9LKlRYMHgUCcK4GX53SZ4rxK3KjEz1ToN5ftkkh MlLmMRSxEYVvKf2Crb6FRZXWKBGHS+CY49yhIku4dHUGug4J1tWIIKIGYQMBPJ0Ef2NF jI4btg1TMCSb9e9wlLlNX8auzGR+fLF8TeD3Z28fne2qcwN1I5GlFlXNW4RQ8te7nvlD V4DKuxuxnjMAX2MEtCIf0Me4wVVq6xJWngpvToaUwfpNCz8k/iwUEiia+4A4qmm3+ttZ xH9ExVquNZJnQsHJcqc1Zyhr+OCwt9324tuxr+e4OI+PuL6C/hTlBV+dU4kMpXwAEpGd DQIQ== X-Gm-Message-State: AOJu0YxvW879odeK5/Kt7ZCvAKCOIQk0nbSCmP3rdmzriKVUgfNRQ1c2 Ju0cu3mqrfrYT9qW8QtUSRyD9Gdy28FZv9wfg1CwXENqu7wVGYCAPNmnICD9qnEM0+7TQ2yY/cs a X-Google-Smtp-Source: AGHT+IFCvw4yV3we+oO9EZsPiObtYZptATkpigIQpSpDIbEffvXMUOMcsZkhrlifCzLj1dCJuRyGFw== X-Received: by 2002:a17:903:190:b0:1e3:cf2b:7151 with SMTP id z16-20020a170903019000b001e3cf2b7151mr5618429plg.59.1713589244293; Fri, 19 Apr 2024 22:00:44 -0700 (PDT) Received: from localhost ([2804:14d:7e39:8470:1348:72c4:7c65:61e7]) by smtp.gmail.com with ESMTPSA id k9-20020a170902c40900b001e431fb1336sm4211562plk.31.2024.04.19.22.00.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Apr 2024 22:00:42 -0700 (PDT) From: Thiago Jung Bauermann To: Pedro Alves Cc: gdb-patches@sourceware.org Subject: Re: [RFC PATCH 3/3] gdb/nat/linux: Fix attaching to process when it has zombie threads In-Reply-To: <9680e3cf-b8ad-4329-a51c-2aafb98d9476@palves.net> (Pedro Alves's message of "Wed, 17 Apr 2024 16:32:56 +0100") References: <20240321231149.519549-1-thiago.bauermann@linaro.org> <20240321231149.519549-4-thiago.bauermann@linaro.org> <87msptgbey.fsf@linaro.org> <9680e3cf-b8ad-4329-a51c-2aafb98d9476@palves.net> User-Agent: mu4e 1.12.4; emacs 29.3 Date: Sat, 20 Apr 2024 02:00:39 -0300 Message-ID: <87jzksk4qw.fsf@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gdb-patches-bounces+public-inbox=simark.ca@sourceware.org Hello Pedro, Pedro Alves writes: > On 2024-04-16 05:48, Thiago Jung Bauermann wrote: >> >> Pedro Alves writes: >> >>> Hmm. How about simply not restarting the loop if attach_lwp tries to a= ttach to >>> a zombie lwp (and silently fails)? >>> Similar thing for gdbserver, of course. >> >> Thank you for the suggestion. I tried doing that, and in fact I attached >> a patch with that change in comment #17 of PR 31312 when I was >> investigating a fix=C2=B9. I called it a workaround because I also had to >> increase the number of iterations in linux_proc_attach_tgid_threads from >> 2 to 100, otherwise GDB gives up on waiting for new inferior threads too >> early and the inferior dies with a SIGTRAP because some new unnoticed >> thread tripped into the breakpoint. >> >> Because of the need to increase the number of iterations, I didn't >> consider it a good solution and went with the approach in this patch >> series instead. Now I finally understand why I had to increase the >> number of iterations (though I still don't see a way around it other >> than what this patch series does): >> >> With the approach in this patch series, even if a new thread becomes >> zombie by the time GDB tries to attach to it, it still causes >> new_threads_found to be set the first time GDB notices it, and the loop >> in linux_proc_attach_tgid_threads starts over. >> >> With the approach above, a new thread that becomes zombie before GDB has >> a chance to attach to it never causes the loop to start over, and so it >> exits earlier. >> >> I think it's a matter of opinion whether one approach or the other can >> be considered the better one. >> >> Even with this patch series, it's not guaranteed that two iterations >> without finding new threads is enough to ensure that GDB has found all >> threads in the inferior. I left the test running in a loop overnight >> with the patch series applied and it failed after about 2500 runs. > > Thanks for all the investigations. I hadn't seen the bugzilla before, > I didn't notice there was one identified in the patch. > > Let me just start by saying "bleh"... Agreed. > This is all of course bad ptrace/kernel design... The only way to plug t= his race > completely is with kernel help, I believe. Indeed. Having a ptrace request to tell the kernel to stop all threads in the process, and having a "way to use ptrace without signals or wait" (as mentioned in the LinuxKernelWishList wiki page) would be a big improvement. In the past few years the kernel has been introducing more and more syscalls that accept a pidfd=C2=B9. Perhaps the time is ripe for a pidfd_ptrace syscall? > The only way that I know of today that we can make the kernel pause all t= hreads for > us, is with a group-stop. I.e., pass a SIGSTOP to the inferior, and the = kernel stops > _all_ threads with SIGSTOP. > > I prototyped it today, and while far from perfect (would need to handle a= corner > scenarios like attaching and getting back something !=3D SIGSTOP, and I s= till see > some spurious SIGSTOPs), it kind of works ... This is amazing. Thank you for this exploration and the patch. > ... except for what I think is a deal breaker that I don't know how to wo= rk around: > > If you attach to a process that is running on a shell, you will see that = the group-stop > makes it go to the background. GDB resumes the process, but it won't go = back to the > foreground without an explicit "fg" in the shell... Like: > > $ ./test_program > [1]+ Stopped ./test_program > $ Couldn't this be a shell behaviour rather than a kernel one? I don't see anything that I could associate with this effect mentioned in the ptrace man page (though I could have easily missed it). If it is, then the fix would be in the shell too. > Thankfully this is a contrived test, no real program should be spawning t= hreads > like this. One would hope. Yes, it's a particularly egregious program. :-) > I will take a new look at your series. Thank you! -- Thiago =C2=B9 https://lwn.net/Articles/794707/