From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca by simark.ca with LMTP id dXFDKizBg2TH/SIAWB0awg (envelope-from ) for ; Fri, 09 Jun 2023 20:17:48 -0400 Received: by simark.ca (Postfix, from userid 112) id 999F11E124; Fri, 9 Jun 2023 20:17:48 -0400 (EDT) Authentication-Results: simark.ca; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=bcJVHftC; dkim-atps=neutral X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on simark.ca X-Spam-Level: X-Spam-Status: No, score=-8.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPS id D9DA11E0D4 for ; Fri, 9 Jun 2023 20:17:47 -0400 (EDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8FF903857019 for ; Sat, 10 Jun 2023 00:17:44 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8FF903857019 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1686356264; bh=ddp9ODJnddaB2msa3/OBJkGTVkm840KzgA8NUqO28PE=; h=Date:To:Cc:Subject:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=bcJVHftCrDAlfbWj37CBqlluBoWsWjXY8PlcKa6CFtaAkV1azkTaOxuAkTRCsqlKd JoQCRlsmA9y0I/7bnsMcBbrmKvvFlz4ZtJ7CvViUqCpTvMlGMhkebqQnyrUYzeiklG UYsr20PyvhsZ7WPrjFRvNVsLkLp8lNupR1Tkaaec= Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 4EF953858D35 for ; Sat, 10 Jun 2023 00:16:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4EF953858D35 Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-318-yCL4KONwPUWOVblO0h3e1g-1; Fri, 09 Jun 2023 20:16:39 -0400 X-MC-Unique: yCL4KONwPUWOVblO0h3e1g-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 386A11C04B54 for ; Sat, 10 Jun 2023 00:16:39 +0000 (UTC) Received: from f38-zws-nv (unknown [10.22.32.62]) by smtp.corp.redhat.com (Postfix) with ESMTPS id D7B8220268C6; Sat, 10 Jun 2023 00:16:38 +0000 (UTC) Date: Fri, 9 Jun 2023 17:16:37 -0700 To: Andrew Burgess via Gdb-patches Cc: Andrew Burgess Subject: Re: [PATCH 2/3] gdb/testsuite: add test for core file with a 0 pid Message-ID: <20230609171637.0cfcd9c6@f38-zws-nv> In-Reply-To: <6cb39b064b3e1c9ed57964b29fd980f3a6d30a25.1685956034.git.aburgess@redhat.com> References: <6cb39b064b3e1c9ed57964b29fd980f3a6d30a25.1685956034.git.aburgess@redhat.com> Organization: Red Hat MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Kevin Buettner via Gdb-patches Reply-To: Kevin Buettner Errors-To: gdb-patches-bounces+public-inbox=simark.ca@sourceware.org Sender: "Gdb-patches" On Mon, 5 Jun 2023 10:11:08 +0100 Andrew Burgess via Gdb-patches wrote: > This patch contains a test for this commit: > > commit c820c52a914cc9d7c63cb41ad396f4ddffff2196 > Date: Fri Aug 6 19:45:58 2010 +0000 > > * thread.c (add_thread_silent): Use null_ptid instead of > minus_one_ptid while getting rid of stale inferior_ptid. > > This is another test that has been carried in the Fedora GDB tree for > some time, and I thought that it would be worth merging to master. I > don't believe there is any test like this currently in the testsuite. > > The original issue was reported in this thread: > > https://inbox.sourceware.org/gdb-patches/AANLkTi=zuEDw6qiZ1jRatkdwHO99xF2Qu+WZ7i0EQjef@mail.gmail.com/ > > The problem was that when GDB was used to open a vmcore (core file) > image generated by the Linux kernel GDB would (sometimes) crash with > an assertion failure: > > thread.c:884: internal-error: switch_to_thread: Assertion `inf != NULL' failed. > > To understand what's going on we need some background; a vmcore file > represents each processor core in the same way that a standard > application core file represents threads. Thus, we might say, a > vmcore file represents cores as threads. > > When writing a vmcore file, the kernel will store the pid of the > process currently running on that core as the thread's lwpid. > > However, if a core is idle, with no process currently running on it, > then the lwpid for that thread is stored as 0 in the vmcore file. If > multiple cores are idle then multiple threads will have a lwpid of 0. > > Back in 2010, the original issue reported tried to change the kernel's > behaviour in this thread: > > https://lkml.org/lkml/2010/8/3/75 > > This change was rejected by the kernel team, the current > behaviour (lwpid of 0) was considered correct. I've checked the > source of a recent kernel. The code mentioned in the lkml.org posting > has moved, it's now in the function crash_save_cpu in the file > kernel/kexec_core.c, but the general behaviour is unchanged, an idle > core will have an lwpid of 0, so I think GDB still needs to be able to > handle this case. > > When GDB loads a vmcore file (which is handled just like any other > core file) the sections are processed in core_open to generate the > threads for the core file. The processing is done by calling > add_to_thread_list, a function which looks for sections named .reg/NN > where NN is the lwpid of the thread, GDB then builds a ptid_t for the > new thread and calls add_thread. > > Remember, in our case the lwpid is 0. Now for the first thread this > is fine, if a little weird, 0 isn't usually a valid lwpid, but that's > OK, GDB creates a thread with lwpid of 0 and carries on. > > When we find the next thread (core) with lwpid of 0, we attempt to > create another thread with an lwpid of 0. This of course clashes with > the previously created thread, they have the same ptid_t, so GDB tries > to delete the first thread. > > And it was within this thread delete code that we triggered a bug > which would then cause GDB to assert -- when deleting we tried to > switch to a thread with minus_one_ptid, this resulted in a call to > find_inferior_pid (passing in minus_one_ptid's pid, which is -1), the > find_inferior_pid call fails and returns NULL, which then triggered an > assert in switch_to_thread. > > The actual details of the why the assert triggered are really not > important. What's important (I think) is that a vmcore file might > have this interesting lwpid of 0 characteristic, which isn't something > we see in "normal" application core files, and it is this that I think > we should be testing. > > Now, you might be thinking: isn't deleting the first thread the wrong > thing to do? If the vmcore file has two threads that represent two > cores, and both have an lwpid of 0 (indicating both cores are idle), > then surely GDB should still represent this as two threads? You're > not wrong. This was mentioned by Pedro in the original GDB mailing > list thread here: > > https://inbox.sourceware.org/gdb-patches/201008061057.03037.pedro@codesourcery.com/ > > This is indeed a problem, and this problem is still present in GDB > today. I plan to try and address this in a later commit, however, > this first commit is about getting a test in place to confirm that GDB > at a minimum doesn't crash when loading such a vmcore file. > > And so, finally, what's in this commit? > > This commit contains a new test. The test doesn't actually contain a > vmcore file. Instead I've created a standard application core file > that contains two threads, and then manually edited the core file to > set the lwpid of each thread to 0. > > To further reduce the size of the core file (as it will be stored in > git), I've zeroed all of the LOAD-able segments in the core file. > This test really doesn't care about that part of the core file, we > only really care about loading the register's, this is enough to > confirm that the GDB doesn't crash. > > Obviously as the core file is pre-generated, this test is architecture > specific. There are already a few tests in gdb.arch/ that include > pre-generate core files. Just as those existing tests do, I've > compressed the core file with bzip2, which reduces it to just 750 > bytes. I have structured the test so that if/when this patch is > merged I can add some additional core files for other architectures, > however, these are not included in this commit. > > The test simply expands the core file, and then loads it into GDB. > One interesting thing to note is that GDB reports the core file > loading like this: > > (gdb) core-file ./gdb/testsuite/outputs/gdb.arch/core-file-pid0/core-file-pid0.x86-64.core > [New process 1] > [New process 1] > Failed to read a valid object file image from memory. > Core was generated by `./segv-mt'. > Program terminated with signal SIGSEGV, Segmentation fault. > The current thread has terminated > (gdb) > > There's two interesting things here: first, the repeated "New process > 1" message. This is caused because linux_core_pid_to_str reports > anything with an lwpid of 0 as a process, rather than an LWP. And > second, the "The current thread has terminated" message. This is > because the first thread in the core file is the current thread, but > when GDB loads the second thread (which also has lwpid 0) this causes > the first thread to be deleted, as a result GDB thinks that the > current (first) thread has terminated. > > As I said previously, both of these problems are a result of the lwpid > 0 aliasing, which is not being fixed in this commit -- this commit is > just confirming that GDB doesn't crash when loading this core file. Great explanation! :) Approved-by: Kevin Buettner