From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca by simark.ca with LMTP id yFY5O1jEGWQ0DxYAWB0awg (envelope-from ) for ; Tue, 21 Mar 2023 10:51:04 -0400 Received: by simark.ca (Postfix, from userid 112) id E54901E223; Tue, 21 Mar 2023 10:51:04 -0400 (EDT) Authentication-Results: simark.ca; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=wGJ8bH4i; dkim-atps=neutral X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on simark.ca X-Spam-Level: X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, RDNS_DYNAMIC,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPS id D07B01E0D3 for ; Tue, 21 Mar 2023 10:51:03 -0400 (EDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5C0363858418 for ; Tue, 21 Mar 2023 14:51:03 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5C0363858418 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1679410263; bh=+0hmAAl3kI4rXw9rquSHaVBgrDpsGl4wfii3dc4xiJY=; h=To:Subject:In-Reply-To:References:Date:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=wGJ8bH4irxVokGNRCMnYhhG89/Wo6bx6pVyYs4rA+vStKutXeTgOnE7txWXNgcIYw W6HOMI39LyoO9VPFIgSesdM1bYKz9GnGvkC2Hcb5sGUYdIuxN/Al1kYwNwCHzFnABh jj6IaoVLyjJQxkKdl4KMrjgzL960Acm4Q51LVWU4= Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 0F4BF3858C2D for ; Tue, 21 Mar 2023 14:50:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0F4BF3858C2D Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-244-J20B7bKHOBqi0RgCRdaIHw-1; Tue, 21 Mar 2023 10:50:40 -0400 X-MC-Unique: J20B7bKHOBqi0RgCRdaIHw-1 Received: by mail-wm1-f70.google.com with SMTP id bi5-20020a05600c3d8500b003edda1368d7so2847122wmb.8 for ; Tue, 21 Mar 2023 07:50:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679410239; h=mime-version:message-id:date:references:in-reply-to:subject:to:from :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=+0hmAAl3kI4rXw9rquSHaVBgrDpsGl4wfii3dc4xiJY=; b=mctsYvVVcvQ+sD78Wfb8ehTa9vJbIzpg6lHjrIPy2Y5WBJUiKH/L4sP+5vSaKrd+k+ kScHaEMVQOMVkayfqs/hVXsC00+nQR9uw2EaHXyJ1Kw0nGA1iB2IYwecq2AYBzcGtPrV 34nMLuSt/xL1GpWwODgstpOK5kAiRB3CN/VMP2+3cPYv/V+HrKQLTus7F/oZPulpOjA2 deWQbHG3S2mFffWoiccGhI/AVHcpJSXZl9S6HsCS3DLccnVr65WHH8cBQ6tAuRtla7yU UwokTDfmLYR8/5Cbx0k6nt7uNKKKAScZyq7e98UhxXjLxNDwV9drOshbPfvDuQivKU4M 3Wzg== X-Gm-Message-State: AO0yUKVAz+YAktYWIVXEjm3RgBHniHZ8N7ZA7OrniSz2cfwQSD6RNguq LxMgWqnV22x2gdtnRjhRON7nGvTufIXr7vYil/hFwEEB/JLA+NwFhdq0vGjTY85+G/z1vhL43Sm oqKt1l9Mu0AropvyGYrvuIBV1/hd5dg== X-Received: by 2002:a05:600c:219:b0:3ed:809b:79ac with SMTP id 25-20020a05600c021900b003ed809b79acmr2555351wmi.19.1679410239003; Tue, 21 Mar 2023 07:50:39 -0700 (PDT) X-Google-Smtp-Source: AK7set/bAFyEs2lzwVVM3VQzC7hAUVw5dEQIcLt2qxaenaKMo73foIxviHoAIPW3mQ64qCMQQYZ5Hw== X-Received: by 2002:a05:600c:219:b0:3ed:809b:79ac with SMTP id 25-20020a05600c021900b003ed809b79acmr2555333wmi.19.1679410238479; Tue, 21 Mar 2023 07:50:38 -0700 (PDT) Received: from localhost (95.72.115.87.dyn.plus.net. [87.115.72.95]) by smtp.gmail.com with ESMTPSA id p5-20020adfce05000000b002d64fcb362dsm6235815wrn.111.2023.03.21.07.50.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Mar 2023 07:50:38 -0700 (PDT) To: Pedro Alves , gdb-patches@sourceware.org Subject: Re: [PATCH 03/31] gdb/linux: Delete all other LWPs immediately on ptrace exec event In-Reply-To: <20221212203101.1034916-4-pedro@palves.net> References: <20221212203101.1034916-1-pedro@palves.net> <20221212203101.1034916-4-pedro@palves.net> Date: Tue, 21 Mar 2023 14:50:36 +0000 Message-ID: <87ileucg5f.fsf@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Andrew Burgess via Gdb-patches Reply-To: Andrew Burgess Errors-To: gdb-patches-bounces+public-inbox=simark.ca@sourceware.org Sender: "Gdb-patches" Pedro Alves writes: > I noticed that after a following patch ("Step over clone syscall w/ > breakpoint, TARGET_WAITKIND_THREAD_CLONED"), the > gdb.threads/step-over-exec.exp was passing cleanly, but still, we'd > end up with four new unexpected GDB core dumps: > > === gdb Summary === > > # of unexpected core files 4 > # of expected passes 48 > > That said patch is making the pre-existing > gdb.threads/step-over-exec.exp testcase (almost silently) expose a > latent problem in gdb/linux-nat.c, resulting in a GDB crash when: > > #1 - a non-leader thread execs > #2 - the post-exec program stops somewhere > #3 - you kill the inferior > > Instead of #3 directly, the testcase just returns, which ends up in > gdb_exit, tearing down GDB, which kills the inferior, and is thus > equivalent to #3 above. > > Vis: > > $ gdb --args ./gdb /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true > ... > (top-gdb) r > ... > (gdb) b main > ... > (gdb) r > ... > Breakpoint 1, main (argc=1, argv=0x7fffffffdb88) at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec.c:69 > 69 argv0 = argv[0]; > (gdb) c > Continuing. > [New Thread 0x7ffff7d89700 (LWP 2506975)] > Other going in exec. > Exec-ing /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd > process 2506769 is executing new program: /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd > > Thread 1 "step-over-exec-" hit Breakpoint 1, main () at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec-execd.c:28 > 28 foo (); > (gdb) k > ... > Thread 1 "gdb" received signal SIGSEGV, Segmentation fault. > 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393 > 393 return m_suspend.waitstatus_pending_p; > (top-gdb) bt > #0 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393 > #1 0x0000555555a884d1 in get_pending_child_status (lp=0x5555579b8230, ws=0x7fffffffd130) at ../../src/gdb/linux-nat.c:1345 > #2 0x0000555555a8e5e6 in kill_unfollowed_child_callback (lp=0x5555579b8230) at ../../src/gdb/linux-nat.c:3564 > #3 0x0000555555a92a26 in gdb::function_view::bind(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const (this=0x0, ecall=..., args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:284 > #4 0x0000555555a92a51 in gdb::function_view::bind(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) () at ../../src/gdb/../gdbsupport/function-view.h:278 > #5 0x0000555555a91f84 in gdb::function_view::operator()(lwp_info*) const (this=0x7fffffffd210, args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:247 > #6 0x0000555555a87072 in iterate_over_lwps(ptid_t, gdb::function_view) (filter=..., callback=...) at ../../src/gdb/linux-nat.c:864 > #7 0x0000555555a8e732 in linux_nat_target::kill (this=0x55555653af40 ) at ../../src/gdb/linux-nat.c:3590 > #8 0x0000555555cfdc11 in target_kill () at ../../src/gdb/target.c:911 > ... It wasn't 100% clear to me if the above session was supposed to show a failure with GDB prior to *this* commit, or was a demonstration of what would happen if this commit is skipped, and the later commits applied. I thought it was the second case, but I was so unsure that I tried the reproducer anyway. Just in case I'm wrong, the above example doesn't seem to fail prior to this commit. > > The root of the problem is that when a non-leader LWP execs, it just > changes its tid to the tgid, replacing the pre-exec leader thread, > becoming the new leader. There's no thread exit event for the execing > thread. It's as if the old pre-exec LWP vanishes without trace. The > ptrace man page says: > > "PTRACE_O_TRACEEXEC (since Linux 2.5.46) > Stop the tracee at the next execve(2). A waitpid(2) by the > tracer will return a status value such that > > status>>8 == (SIGTRAP | (PTRACE_EVENT_EXEC<<8)) > > If the execing thread is not a thread group leader, the thread > ID is reset to thread group leader's ID before this stop. > Since Linux 3.0, the former thread ID can be retrieved with > PTRACE_GETEVENTMSG." > > When the core of GDB processes an exec events, it deletes all the > threads of the inferior. But, that is too late -- deleting the thread > does not delete the corresponding LWP, so we end leaving the pre-exec > non-leader LWP stale in the LWP list. That's what leads to the crash > above -- linux_nat_target::kill iterates over all LWPs, and after the > patch in question, that code will look for the corresponding > thread_info for each LWP. For the pre-exec non-leader LWP still > listed, won't find one. > > This patch fixes it, by deleting the pre-exec non-leader LWP (and > thread) from the LWP/thread lists as soon as we get an exec event out > of ptrace. Given that we don't have a test *right now* for this issue, and instead rely on a future patch not failing. I wondered if there was any way that we could trigger a failure. So I was poking around looking for places where we iterate over the all_lwps() list wondering which we could trigger that might cause a failure... ... and then I thought: why not just have GDB tell us that the all_lwps() list is broken. So I hacked up a new 'maint info linux-lwps' command. It's not very interesting right now, here's the output in a multi-threaded inferior prior to the exec: (gdb) maintenance info linux-lwps LWP Ptid Thread ID 1707218.1707239.0 2 1707218.1707218.0 1 And in your failure case (after the exec): (gdb) maintenance info linux-lwps LWP Ptid Thread ID 1708883.1708895.0 None 1708883.1708883.0 1 And then we can check this from the testscript, and now we have a test that fails before this commit, and passes afterwards. And in the future we might find other information we want to add in the new maintenance command. What are your thoughts on including this, or something like this with this commit? My patch, which applies on top of this commit, is included at the end of this email. Please feel free to take any changes that you feel add value. > > GDBserver does not need an equivalent fix, because it is already doing > this, as side effect of mourning the pre-exec process, in > gdbserver/linux-low.cc: > > else if (event == PTRACE_EVENT_EXEC && cs.report_exec_events) > { > ... > /* Delete the execing process and all its threads. */ > mourn (proc); > switch_to_thread (nullptr); > > Change-Id: I21ec18072c7750f3a972160ae6b9e46590376643 > --- > gdb/linux-nat.c | 15 +++++++++++++++ > gdb/testsuite/gdb.threads/step-over-exec.exp | 6 ++++++ > 2 files changed, 21 insertions(+) > > diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c > index 9b78fd1f8e8..5ee3227f1b9 100644 > --- a/gdb/linux-nat.c > +++ b/gdb/linux-nat.c > @@ -1986,6 +1986,21 @@ linux_handle_extended_wait (struct lwp_info *lp, int status) > thread execs, it changes its tid to the tgid, and the old > tgid thread might have not been resumed. */ > lp->resumed = 1; > + > + /* All other LWPs are gone now. We'll have received a thread > + exit notification for all threads other the execing one. > + That one, if it wasn't the leader, just silently changes its > + tid to the tgid, and the previous leader vanishes. Since > + Linux 3.0, the former thread ID can be retrieved with > + PTRACE_GETEVENTMSG, but since we support older kernels, don't > + bother with it, and just walk the LWP list. Even with > + PTRACE_GETEVENTMSG, we'd still need to lookup the > + corresponding LWP object, and it would be an extra ptrace > + syscall, so this way may even be more efficient. */ > + for (lwp_info *other_lp : all_lwps_safe ()) > + if (other_lp != lp && other_lp->ptid.pid () == lp->ptid.pid ()) > + exit_lwp (other_lp); > + > return 0; > } > > diff --git a/gdb/testsuite/gdb.threads/step-over-exec.exp b/gdb/testsuite/gdb.threads/step-over-exec.exp > index 783f865585c..a8b01f8aeda 100644 > --- a/gdb/testsuite/gdb.threads/step-over-exec.exp > +++ b/gdb/testsuite/gdb.threads/step-over-exec.exp > @@ -102,6 +102,12 @@ proc do_test { execr_thread different_text_segments displaced_stepping } { > gdb_breakpoint foo > gdb_test "continue" "Breakpoint $decimal, foo .*" \ > "continue to foo" > + > + # Test that GDB is able to kill the inferior. This may fail if > + # e.g., GDB does not dispose of the pre-exec threads properly. > + gdb_test "with confirm off -- kill" \ > + "\\\[Inferior 1 (.*) killed\\\]" \ > + "kill inferior" > } > These changes all look good. Reviewed-By: Andrew Burgess Thanks, Andrew > foreach_with_prefix displaced_stepping {auto off} { > -- > 2.36.0 --- diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c index 5f67bcbcb4f..9b1e071b5f6 100644 --- a/gdb/linux-nat.c +++ b/gdb/linux-nat.c @@ -4482,6 +4485,49 @@ current_lwp_ptid (void) return inferior_ptid; } +/* Implement 'maintenance info linux-lwps'. Displays some basic + information about all the current lwp_info objects. */ + +static void +maintenance_info_lwps (const char *arg, int from_tty) +{ + if (all_lwps ().size () == 0) + { + gdb_printf ("No Linux LWPs\n"); + return; + } + + /* Start the width at 8 to match the column heading below, then figure + out the widest ptid string. We'll use this to build our output table + below. */ + size_t ptid_width = 8; + for (lwp_info *lp : all_lwps ()) + ptid_width = std::max (ptid_width, lp->ptid.to_string ().size ()); + + /* Setup the table headers. */ + struct ui_out *uiout = current_uiout; + ui_out_emit_table table_emitter (uiout, 2, -1, "linux-lwps"); + uiout->table_header (ptid_width, ui_left, "lwp-ptid", _("LWP Ptid")); + uiout->table_header (9, ui_left, "thread-info", _("Thread ID")); + uiout->table_body (); + + /* Display one table row for each lwp_info. */ + for (lwp_info *lp : all_lwps ()) + { + ui_out_emit_tuple tuple_emitter (uiout, "lwp-entry"); + + struct thread_info *th = find_thread_ptid (linux_target, lp->ptid); + + uiout->field_string ("lwp-ptid", lp->ptid.to_string ().c_str ()); + if (th == nullptr) + uiout->field_string ("thread-info", "None"); + else + uiout->field_string ("thread-info", print_thread_id (th)); + + uiout->message ("\n"); + } +} + void _initialize_linux_nat (); void _initialize_linux_nat () @@ -4519,6 +4565,9 @@ Enables printf debugging output."), sigemptyset (&blocked_mask); lwp_lwpid_htab_create (); + + add_cmd ("linux-lwps", class_maintenance, maintenance_info_lwps, + _("List the Linux LWPS."), &maintenanceinfolist); } diff --git a/gdb/testsuite/gdb.threads/step-over-exec.exp b/gdb/testsuite/gdb.threads/step-over-exec.exp index c9a067b23aa..8ab027f6f08 100644 --- a/gdb/testsuite/gdb.threads/step-over-exec.exp +++ b/gdb/testsuite/gdb.threads/step-over-exec.exp @@ -103,6 +103,49 @@ proc do_test { execr_thread different_text_segments displaced_stepping } { gdb_test "continue" "Breakpoint $decimal, foo .*" \ "continue to foo" + # If we have a linux target then there used to be a bug that in + # some situations we'd leave an orphaned lwp object around. Check + # the 'maint info linux-lwp' output to spot any orphans. + # + # If linux native support is not built in then we'll get an + # undefined maintenance command error, which is fine. The bug + # we're checking for was in linux native code, so we know we're + # fine. + # + # Alternatively, linux native support might be built in, but we + # might be using an alternative target (e.g. a remote target), in + # this case we'll get a message about 'No Linux LWPs'. Again + # there's nothing that needs testing in this case. + gdb_test_multiple "maint info linux-lwp" "" { + -re "^maint info linux-lwp\r\n" { + exp_continue + } + + -re "^Undefined maintenance info command: \"linux-lwp\"\\. Try \"help maintenance info\"\\.\r\n$::gdb_prompt $" { + unsupported $gdb_test_name + } + + -re "^LWP Ptid\\s+Thread Info\\s*\r\n" { + exp_continue + } + + -re "^\\d+\\.\\d+\\.\\d+\\s+\\d+(?:\\.\\d+)?\\s*\r\n" { + exp_continue + } + + -re "^\\d+\\.\\d+\\.\\d+\\s+None\\s*\r\n" { + fail $gdb_test_name + } + + -re "^No Linux LWPs\r\n$::gdb_prompt" { + unsupported $gdb_test_name + } + + -re "^$::gdb_prompt $" { + pass $gdb_test_name + } + } + # Test that GDB is able to kill the inferior. This may fail if # e.g., GDB does not dispose of the pre-exec threads properly. gdb_test "with confirm off -- kill" \