From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca by simark.ca with LMTP id kPUiIG2ktmSKLSYAWB0awg (envelope-from ) for ; Tue, 18 Jul 2023 10:40:45 -0400 Authentication-Results: simark.ca; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=uUyD2be9; dkim-atps=neutral Received: by simark.ca (Postfix, from userid 112) id 7F4D21E0BD; Tue, 18 Jul 2023 10:40:45 -0400 (EDT) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPS id 54B441E00F for ; Tue, 18 Jul 2023 10:40:43 -0400 (EDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B074D3856DE8 for ; Tue, 18 Jul 2023 14:40:42 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B074D3856DE8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1689691242; bh=0o/6JfzJ3DG2LYwkHv1sB4N/muELtgF7CvVgmsFyLlg=; h=To:Subject:In-Reply-To:References:Date:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=uUyD2be9xC2YVl49bf/hqh0bhFwQrTmVeccxlElMEp+kFE/r0F4RMmLVMtruZIeE0 2298EyYIvFvzuG2KZhqVktlrNIZFxZTy2zkwtRoWprwI3a1SYMDpKTj7j8a4lOL7vu s/Hpc+TkRhP4U+tSZlyYNz6daunqCcq1brBzaUg0= Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 94161385773C for ; Tue, 18 Jul 2023 14:40:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 94161385773C Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-399-PcNgZjfMMl6Yuacy52V7Vg-1; Tue, 18 Jul 2023 10:40:18 -0400 X-MC-Unique: PcNgZjfMMl6Yuacy52V7Vg-1 Received: by mail-wr1-f69.google.com with SMTP id ffacd0b85a97d-314275b653eso3227193f8f.1 for ; Tue, 18 Jul 2023 07:40:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689691216; x=1692283216; h=mime-version:message-id:date:references:in-reply-to:subject:to:from :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=0o/6JfzJ3DG2LYwkHv1sB4N/muELtgF7CvVgmsFyLlg=; b=WETdngOaugaXsqe36hx5QYbSRch51JHqnQhu6OU7x9EZ5S99IjgIHYnQj0aQD1Z9wT 2SfR/3du5bqkRMmwXNYLlyWoOA0kw6US/yyHYtcBozq7ASY1svNqFUP7PnmdlmBNZ429 Cp4OOoRL2oAIAMqoxtpy36JnUIDczPvzuQPPjoUkLhPmSlTBZCbwKD1/4drwHVE2Os8v /3qCG4bkyh1+nyaF+sHP4Nr7xIxVdTPJeXAKijC+3ssBtng5NmE58oYL0Mq341GQTutS YvX6jmf/nDarZJHhuxa3Qx/T/mSyM0fR6h6fwoM2BD9eDeh+TnRei81SYxtXTGhmhe2L fJlQ== X-Gm-Message-State: ABy/qLZ3VSoOPAAt+u4Jrm58Wlm+chLfpnsLfVsHk5me5W+BCzFJh6bq MQzfQVUj+NIlaLkFGQh8sHw9tSu89RdXipWdTo4+hZXvi1CF/TTfngy5iojHYVwpnhY5RG/yiGP +NrfGoxpa6AXmwnSjgHo2BT9rJJRh9fT+wqEBcfm+ygsPT7VkJR0hSzddGX7jR3RenuObqC9g4+ eAFTiq7g== X-Received: by 2002:adf:cf10:0:b0:314:25d:c8f4 with SMTP id o16-20020adfcf10000000b00314025dc8f4mr12624435wrj.2.1689691216260; Tue, 18 Jul 2023 07:40:16 -0700 (PDT) X-Google-Smtp-Source: APBJJlGKb0pFdOP7uQ1UExiq3NbqGPQC7WaFC0i495z+OBmu0ZBQ2YkTxlmENAQeKXQ/at1OkQ4tHA== X-Received: by 2002:adf:cf10:0:b0:314:25d:c8f4 with SMTP id o16-20020adfcf10000000b00314025dc8f4mr12624405wrj.2.1689691215652; Tue, 18 Jul 2023 07:40:15 -0700 (PDT) Received: from localhost (93.72.115.87.dyn.plus.net. [87.115.72.93]) by smtp.gmail.com with ESMTPSA id v7-20020a5d5907000000b003141e9e2f81sm2667445wrd.4.2023.07.18.07.40.14 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Jul 2023 07:40:15 -0700 (PDT) To: gdb-patches@sourceware.org Subject: Re: [PATCH 2/2] gdb: handle main thread exiting during detach In-Reply-To: <19e8fbcdf634198ed3ab5dd3ee2e2b36c2353e27.1689690655.git.aburgess@redhat.com> References: <19e8fbcdf634198ed3ab5dd3ee2e2b36c2353e27.1689690655.git.aburgess@redhat.com> Date: Tue, 18 Jul 2023 15:40:14 +0100 Message-ID: <87fs5lgu5t.fsf@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain X-Spam-Status: No, score=-10.3 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Andrew Burgess via Gdb-patches Reply-To: Andrew Burgess Errors-To: gdb-patches-bounces+public-inbox=simark.ca@sourceware.org Sender: "Gdb-patches" Andrew Burgess writes: > Overview > ======== > > In non-stop mode, the main thread is running while a second thread is > stopped. The user has the second thread selected as the current > thread and asks GDB to detach. At the exact moment of detach the main > thread exits. This situation causes crashes, assertion failures, and > unexpected errors to be reported from GDB for both native and remote > targets. > > This commit addresses this situation for native and remote targets, > but the fixes are all different. > > Native Linux Target > =================== > > This is caught in linux_nat_target::detach and the stop_wait_callback > notices that the thread has exited and cleans up. > > GDB then detaches from everything except the main thread by calling > detach_callback. > > After this the first problem is this assert: > > /* Only the initial process should be left right now. */ > gdb_assert (num_lwps (pid) == 1); > > The num_lwps call will return 0 as the main thread has exited and all > of the other threads have been detached. I fix this by changing the > assert too: > > gdb_assert (num_lwps (pid) <= 1); > > and improving the comment. > > The next problem is that we do: > > main_lwp = find_lwp_pid (ptid_t (pid)); > > and then proceed assuming that main_lwp is not nullptr, however, as > the main thread has exited (num_lwps(pid) == 0), main_lwp is nullptr, > and so we crash trying to dereference the nullptr. I fix this by > adding a nullptr check for main_lwp. I think this is fine because the > actions we are skipping all relate to detaching from the main thread, > but we'll not be doing this as the thread has already exited! > > For Remote Targets > ================== > > On remote targets there are two problems. The first is that when the > exit occurs during the early phase of the detach, we see the stop > notification arrive while GDB is removing the breakpoints ahead of the > detach. The 'set debug remote on' trace looks like this: > > [remote] Sending packet: $z0,7f1648fe0241,1#35 > [remote] Notification received: Stop:W0;process:2a0ac8 > # At this point an unpatched gdbserver segfaults, and the connection > # is broken. A patched gdbserver continues as below... > [remote] Packet received: E01 > [remote] Sending packet: $z0,7f1648ff00a8,1#68 > [remote] Packet received: E01 > [remote] Sending packet: $z0,7f1648ff132f,1#6b > [remote] Packet received: E01 > [remote] Sending packet: $D;2a0ac8#3e > [remote] Packet received: E01 > > I was originally running into Segmentation Faults, from within > gdbserver/mem-break.cc, in the function find_gdb_breakpoint. This > function calls current_process() and then dereferences the result to > find the breakpoint list. > > However, in our case, the current process has already exited, and so > the current_process() call returns nullptr. > > It seems reasonable that in this case gdbserver should report a > failure to remove the breakpoint, and this is what I've done -- by > adding a nullptr check within find_gdb_breakpoint. > > The second problem I ran into was on the gdb side, as the process has > already exited, but GDB has not yet acknowledged the exit event, the > detach -- the 'D' packet in the above trace -- fails. This was being > reported to the user with a 'Can't detach process' error. As the test > actually calls detach from Python code, this error was then becoming a > Python exception. > > Though clearly the detach has returned an error, and so, maybe, having > GDB throw an error would be fine, I think in this case, there's a good > argument that the remote error should be ignored -- if GDB tries to > detach and gets back an error, and if there's a pending exit event for > the pid we tried to detach, then just ignore the error and pretend the > detach worked fine. > > Testing > ======= > > In order to test this issue I needed to ensure that the exit event > arrives at the same time as the detach call. The window of > opportunity for getting the exit to arrive is so small I've never > managed to trigger this in real use. > > However, if we trigger both the exit and the detach from a single > Python function then we never return to GDB's event loop, as such GDB > never processes the exit event, and so the first time GDB gets a > chance to see the exit is during the detach call. And so that is the > approach I've taken for testing this patch. > --- > gdb/linux-nat.c | 19 ++- > gdb/remote.c | 19 ++- > .../main-thread-exit-during-detach.c | 59 ++++++++ > .../main-thread-exit-during-detach.exp | 143 ++++++++++++++++++ > gdbserver/mem-break.cc | 7 + > 5 files changed, 239 insertions(+), 8 deletions(-) > create mode 100644 gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c > create mode 100644 gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp > > diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c > index 73008a313c0..c3d62e0f85e 100644 > --- a/gdb/linux-nat.c > +++ b/gdb/linux-nat.c > @@ -1426,10 +1426,11 @@ linux_nat_target::detach (inferior *inf, int from_tty) > > iterate_over_lwps (ptid_t (pid), detach_callback); > > - /* Only the initial process should be left right now. */ > - gdb_assert (num_lwps (pid) == 1); > - > - main_lwp = find_lwp_pid (ptid_t (pid)); > + /* We have detached from everything except the main thread now, so > + should only have one thread left. However, in non-stop mode the > + main thread might have exited, in which case we'll have no threads > + left. */ > + gdb_assert (num_lwps (pid) <= 1); > > if (forks_exist_p ()) > { > @@ -1443,10 +1444,14 @@ linux_nat_target::detach (inferior *inf, int from_tty) > { > target_announce_detach (from_tty); > > - /* Pass on any pending signal for the last LWP. */ > - int signo = get_detach_signal (main_lwp); > + main_lwp = find_lwp_pid (ptid_t (pid)); > + if (main_lwp != nullptr) > + { > + /* Pass on any pending signal for the last LWP. */ > + int signo = get_detach_signal (main_lwp); > > - detach_one_lwp (main_lwp, &signo); > + detach_one_lwp (main_lwp, &signo); > + } > > detach_success (inf); > } > diff --git a/gdb/remote.c b/gdb/remote.c > index ff3d7e5cd32..f61035a26b0 100644 > --- a/gdb/remote.c > +++ b/gdb/remote.c > @@ -6126,7 +6126,24 @@ remote_target::remote_detach_pid (int pid) > else if (rs->buf[0] == '\0') > error (_("Remote doesn't know how to detach")); > else > - error (_("Can't detach process.")); > + { > + /* It is possible that we have an unprocessed exit event for this > + pid. If this is the case then we can ignore the failure to detach > + and just pretend that the detach worked, as far as the user is > + concerned, the process exited immediately after the detach. */ > + bool process_has_already_exited = false; > + struct remote_notif_state *rns = rs->notif_state; > + auto reply > + = (struct stop_reply *) rns->pending_event[notif_client_stop.id]; > + if (reply != nullptr && reply->ptid.pid () == pid) > + { > + if (reply->ws.kind () == TARGET_WAITKIND_EXITED) > + process_has_already_exited = true; > + } > + > + if (!process_has_already_exited) > + error (_("Can't detach process: %s"), (char *) rs->buf.data ()); > + } > } > > /* This detaches a program to which we previously attached, using > diff --git a/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c > new file mode 100644 > index 00000000000..ecd0138c354 > --- /dev/null > +++ b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.c > @@ -0,0 +1,59 @@ > +/* This testcase is part of GDB, the GNU debugger. > + > + Copyright 2023 Free Software Foundation, Inc. > + > + This program is free software; you can redistribute it and/or modify > + it under the terms of the GNU General Public License as published by > + the Free Software Foundation; either version 3 of the License, or > + (at your option) any later version. > + > + This program is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + GNU General Public License for more details. > + > + You should have received a copy of the GNU General Public License > + along with this program. If not, see . */ > + > +#include > +#include > + > +/* This is set from GDB to allow the main thread to exit. */ > + > +volatile int dont_exit_just_yet = 1; > + > +/* Somewhere to place a breakpoint. */ > + > +void > +breakpt () > +{ > + /* Just spin. When the test is started under GDB we never enter the spin > + loop, but when we attach, the worker thread will be spinning here. */ > + while (dont_exit_just_yet) > + sleep (1); > +} > + > +/* Thread function, doesn't do anything but hit a breakpoint. */ > + > +void * > +thread_worker (void *arg) > +{ > + breakpt (); > + return NULL; > +} > + > +int > +main () > +{ > + pthread_t thr; > + This test program is missing an 'alarm (300);' call, which I've now added just here. Apologies for missing this before posting. Thanks, Andrew > + /* Create a thread. */ > + pthread_create (&thr, NULL, thread_worker, NULL); > + pthread_detach (thr); > + > + /* Spin until GDB releases us. */ > + while (dont_exit_just_yet) > + sleep (1); > + > + _exit (0); > +} > diff --git a/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp > new file mode 100644 > index 00000000000..97df3bd2783 > --- /dev/null > +++ b/gdb/testsuite/gdb.threads/main-thread-exit-during-detach.exp > @@ -0,0 +1,143 @@ > +# Copyright 2023 Free Software Foundation, Inc. > +# This program is free software; you can redistribute it and/or modify > +# it under the terms of the GNU General Public License as published by > +# the Free Software Foundation; either version 3 of the License, or > +# (at your option) any later version. > +# > +# This program is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > +# GNU General Public License for more details. > +# > +# You should have received a copy of the GNU General Public License > +# along with this program. If not, see . > + > +# Check for a race condition where in non-stop mode, the user might > +# have a thread other than the main (original) thread selected and use > +# the 'detach' command. > +# > +# As GDB tries to detach it is possible that the main thread might > +# exit, the main thread is still running due to non-stop mode. > +# > +# GDB used to assume that the main thread would always exist when > +# processing the detach, clearly this isn't the case, and this > +# assumption would lead to assertion failures and segfaults. > +# > +# Triggering the precise timing is pretty hard, we need the main > +# thread to exit after the user has entered the 'detach' command, but > +# before GDB enters the detach implementation and stops all threads, > +# the window of opportunity for this bug is actually tiny. > +# > +# However, we can trigger this bug 100% from Python, as GDB's > +# event-loop only kicks in once we return from a Python function. > +# Thus, if we have a single Python function that causes the main > +# thread to exit, and then calls detach GDB will not have a chance to > +# handle the main thread exiting before entering the detach code. > + > +standard_testfile > + > +require allow_python_tests > + > +if [build_executable "failed to prepare" $testfile $srcfile {debug pthreads}] { > + return -1 > +} > + > +# Run the test. When SPAWN_INFERIOR is true the inferior is started > +# as a separate process which GDB then attaches too. When > +# SPAWN_INFERIOR is false the inferior is started directly within GDB. > + > +proc run_test { spawn_inferior } { > + global srcfile > + global gdb_prompt > + global GDBFLAGS > + > + save_vars { GDBFLAGS } { > + append GDBFLAGS " -ex \"set non-stop on\"" > + clean_restart $::binfile > + } > + > + # Setup the inferior. When complete the main thread (#1) will > + # still be running (due to non-stop mode), while the worker thread > + # (#2) will be stopped. > + # > + # There are two setup modes, when SPAWN_INFERIOR is true we span a > + # separate process and attach to it, after the attach both threads > + # are stopped, so it is necessary to resume thread #1. > + # > + # When SPAWN_INFERIOR is false we just start the inferior within > + # GDB, in this case we place a breakpoint that will be hit by > + # thread #2. When the breakpoint is hit thread #1 will remain > + # running. > + if {$spawn_inferior} { > + set test_spawn_id [spawn_wait_for_attach $::binfile] > + set testpid [spawn_id_get_pid $test_spawn_id] > + > + set escapedbinfile [string_to_regexp $::binfile] > + gdb_test -no-prompt-anchor "attach $testpid" \ > + "Attaching to program.*`?$escapedbinfile'?, process $testpid.*" \ > + "attach to the inferior" > + > + # Attaching to a multi-threaded application in non-stop mode > + # can result in thread stops being reported after the prompt > + # is displayed. > + # > + # Send a simple command now just to resync the command prompt. > + gdb_test "p 1 + 2" " = 3" > + > + # Set thread 1 (the current thread) running again. > + gdb_test "continue&" > + } else { > + if {![runto_main]} { > + return -1 > + } > + > + gdb_breakpoint "breakpt" > + gdb_continue_to_breakpoint "run to breakpoint" > + } > + > + # Switch to thread 2. > + gdb_test "thread 2" \ > + [multi_line \ > + "Switching to thread 2\[^\r\n\]*" \ > + "#0\\s+.*"] > + > + # Create a Python function that sets a variable in the inferior and > + # then detaches. Setting the variable in the inferior will allow the > + # main thread to exit, we even sleep for a short while in order to > + # give the inferior a chance to exit. > + # > + # However, we don't want GDB to notice the exit before we call detach, > + # which is why we perform both these actions from a Python function. > + gdb_test_multiline "Create worker function" \ > + "python" "" \ > + "import time" "" \ > + "def set_and_detach():" "" \ > + " gdb.execute(\"set variable dont_exit_just_yet=0\")" "" \ > + " time.sleep(1)" "" \ > + " gdb.execute(\"detach\")" "" \ > + "end" "" > + > + # The Python function performs two actions, the first causes the > + # main thread to exit, while the second detaches from the inferior. > + # > + # In both cases the stop arrives while GDB is processing the > + # detach, however, for remote targets GDB doesn't report the stop, > + # while for local targets GDB does report the stop. > + if {![gdb_is_target_remote]} { > + set stop_re "\\\[Thread.*exited\\\]\r\n" > + } else { > + set stop_re "" > + } > + gdb_test "python set_and_detach()" \ > + "${stop_re}\\\[Inferior.*detached\\\]" > +} > + > +foreach_with_prefix spawn_inferior { true false } { > + if {$spawn_inferior && ![can_spawn_for_attach]} { > + # If spawning (and attaching too) a separate inferior is not > + # supported for the current board, then lets not try too. > + continue > + } > + > + run_test $spawn_inferior > +} > diff --git a/gdbserver/mem-break.cc b/gdbserver/mem-break.cc > index 897b9a2273c..12143caf104 100644 > --- a/gdbserver/mem-break.cc > +++ b/gdbserver/mem-break.cc > @@ -976,6 +976,13 @@ static struct gdb_breakpoint * > find_gdb_breakpoint (char z_type, CORE_ADDR addr, int kind) > { > struct process_info *proc = current_process (); > + > + /* GDB might not yet be aware that the current process has exited and so > + still requests that we modify (delete) a breakpoint. Just claim we > + can't find the breakpoint. */ > + if (proc == nullptr) > + return nullptr; > + > struct breakpoint *bp; > enum bkpt_type type = Z_packet_to_bkpt_type (z_type); > > -- > 2.25.4