From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca by simark.ca with LMTP id Ney0As7Fg2T//yIAWB0awg (envelope-from ) for ; Fri, 09 Jun 2023 20:37:34 -0400 Received: by simark.ca (Postfix, from userid 112) id E928F1E124; Fri, 9 Jun 2023 20:37:33 -0400 (EDT) Authentication-Results: simark.ca; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=EZOxIsNH; dkim-atps=neutral X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on simark.ca X-Spam-Level: X-Spam-Status: No, score=-8.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPS id C431A1E0D4 for ; Fri, 9 Jun 2023 20:37:32 -0400 (EDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 95DAA388459B for ; Sat, 10 Jun 2023 00:37:29 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 95DAA388459B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1686357449; bh=zZFWKN/uj1ov8hEWp6I2WE5vC7nu1Ss6VAZiSNUN1t0=; h=Date:To:Cc:Subject:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=EZOxIsNHoSht5ipieMtpoJty1G3SehLl416MXK9PtCjwTsycv/q6zDDgiR1ce+JPq T5EEeh+vJYFO1FD3teS2iurxIs7LXQDRmGHqX0Ah8erf/j+K4z8NH3lN++Zgr3Sijd T7H/aQVzo/tqGx3LkwJHCykIiA24CRxjTMJCCxpM= Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id AE3853831E11 for ; Sat, 10 Jun 2023 00:36:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AE3853831E11 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-158-IxxBfdmEOHmsm-YSr1kuMg-1; Fri, 09 Jun 2023 20:36:03 -0400 X-MC-Unique: IxxBfdmEOHmsm-YSr1kuMg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0B246811E7C for ; Sat, 10 Jun 2023 00:36:03 +0000 (UTC) Received: from f38-zws-nv (unknown [10.22.32.62]) by smtp.corp.redhat.com (Postfix) with ESMTPS id A884D140E956; Sat, 10 Jun 2023 00:36:02 +0000 (UTC) Date: Fri, 9 Jun 2023 17:36:01 -0700 To: Andrew Burgess via Gdb-patches Cc: Andrew Burgess Subject: Re: [PATCH 3/3] gdb: handle core files with .reg/0 section names Message-ID: <20230609173601.7aa44af5@f38-zws-nv> In-Reply-To: <889bdea18153917a87db6425e78e1f13a218ee63.1685956034.git.aburgess@redhat.com> References: <889bdea18153917a87db6425e78e1f13a218ee63.1685956034.git.aburgess@redhat.com> Organization: Red Hat MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Kevin Buettner via Gdb-patches Reply-To: Kevin Buettner Errors-To: gdb-patches-bounces+public-inbox=simark.ca@sourceware.org Sender: "Gdb-patches" On Mon, 5 Jun 2023 10:11:09 +0100 Andrew Burgess via Gdb-patches wrote: > The previous commit added the test gdb.arch/core-file-pid0.exp which > tests GDB's ability to load a core file containing threads with an > lwpid of 0, which is something we GDB can encounter when loading a > vmcore file -- a core file generated by the Linux kernel. The threads > with an lwpid of 0 represents idle cores. > > While the previous commit added the test, which confirms GDB doesn't > crash when confronted with such a core file, there are still some > problems with GDB's handling of these core files. These problems all > originate from the fact that the core file (once opened by bfd) > contains multiple sections called .reg/0, these sections all > represents different threads (cpu cores in the original vmcore dump), > but GDB gets confused and thinks all of these .reg/0 sections are all > referencing the same thread. > > Here is a GDB session on an x86-64 machine which loads the core file > from the gdb.arch/core-file-pid0.exp, this core file contains two > threads, both of which have a pid of 0: > > $ ./gdb/gdb --data-directory ./gdb/data-directory/ -q > (gdb) core-file /tmp/x86_64-pid0-core.core > [New process 1] > [New process 1] > Failed to read a valid object file image from memory. > Core was generated by `./segv-mt'. > Program terminated with signal SIGSEGV, Segmentation fault. > The current thread has terminated > (gdb) info threads > Id Target Id Frame > 2 process 1 0x00000000004017c2 in ?? () > > The current thread has terminated. See `help thread'. > (gdb) maintenance info sections > Core file: `/tmp/x86_64-pid0-core.core', file type elf64-x86-64. > [0] 0x00000000->0x000012d4 at 0x00000318: note0 READONLY HAS_CONTENTS > [1] 0x00000000->0x000000d8 at 0x0000039c: .reg/0 HAS_CONTENTS > [2] 0x00000000->0x000000d8 at 0x0000039c: .reg HAS_CONTENTS > [3] 0x00000000->0x00000080 at 0x0000052c: .note.linuxcore.siginfo/0 HAS_CONTENTS > [4] 0x00000000->0x00000080 at 0x0000052c: .note.linuxcore.siginfo HAS_CONTENTS > [5] 0x00000000->0x00000140 at 0x000005c0: .auxv HAS_CONTENTS > [6] 0x00000000->0x000000a4 at 0x00000714: .note.linuxcore.file/0 HAS_CONTENTS > [7] 0x00000000->0x000000a4 at 0x00000714: .note.linuxcore.file HAS_CONTENTS > [8] 0x00000000->0x00000200 at 0x000007cc: .reg2/0 HAS_CONTENTS > [9] 0x00000000->0x00000200 at 0x000007cc: .reg2 HAS_CONTENTS > [10] 0x00000000->0x00000440 at 0x000009e0: .reg-xstate/0 HAS_CONTENTS > [11] 0x00000000->0x00000440 at 0x000009e0: .reg-xstate HAS_CONTENTS > [12] 0x00000000->0x000000d8 at 0x00000ea4: .reg/0 HAS_CONTENTS > [13] 0x00000000->0x00000200 at 0x00000f98: .reg2/0 HAS_CONTENTS > [14] 0x00000000->0x00000440 at 0x000011ac: .reg-xstate/0 HAS_CONTENTS > [15] 0x00400000->0x00401000 at 0x00002000: load1 ALLOC LOAD READONLY HAS_CONTENTS > [16] 0x00401000->0x004b9000 at 0x00003000: load2 ALLOC READONLY CODE > [17] 0x004b9000->0x004e5000 at 0x00003000: load3 ALLOC READONLY > [18] 0x004e6000->0x004ec000 at 0x00003000: load4 ALLOC LOAD HAS_CONTENTS > [19] 0x004ec000->0x004f2000 at 0x00009000: load5 ALLOC LOAD HAS_CONTENTS > [20] 0x012a8000->0x012cb000 at 0x0000f000: load6 ALLOC LOAD HAS_CONTENTS > [21] 0x7fda77736000->0x7fda77737000 at 0x00032000: load7 ALLOC READONLY > [22] 0x7fda77737000->0x7fda77f37000 at 0x00032000: load8 ALLOC LOAD HAS_CONTENTS > [23] 0x7ffd55f65000->0x7ffd55f86000 at 0x00832000: load9 ALLOC LOAD HAS_CONTENTS > [24] 0x7ffd55fc3000->0x7ffd55fc7000 at 0x00853000: load10 ALLOC LOAD READONLY HAS_CONTENTS > [25] 0x7ffd55fc7000->0x7ffd55fc9000 at 0x00857000: load11 ALLOC LOAD READONLY CODE HAS_CONTENTS > [26] 0xffffffffff600000->0xffffffffff601000 at 0x00859000: load12 ALLOC LOAD READONLY CODE HAS_CONTENTS > (gdb) > > Notice when the core file is first loaded we see two lines like: > > [New process 1] > > And GDB reports: > > The current thread has terminated > > Which isn't what we'd expect from a core file -- the core file should > only contain threads that are live at the point of the crash, one of > which should be the current thread. The above message is reported > because GDB has deleted what we think is the current thread! > > And in the 'info threads' output we are only seeing a single thread, > again, this is because GDB has deleted one of the threads. > > Finally, the 'maintenance info sections' output shows the cause of all > our problems, two sections named .reg/0. When GDB sees the first of > these it creates a new thread. But, when we see the second .reg/0 GDB > tries to create another new thread, but this thread has the same > ptid_t as the first thread, so GDB deletes the first thread and > creates the second thread in its place. > > Because both these threads are created with an lwpid of 0 GDB reports > these are 'New process NN' rather than 'New LWP NN' which is what we > would normally expect. > > The previous commit includes a little more of the history of GDB > support in this area, but these problems were discussed on the mailing > list a while ago in this thread: > > https://inbox.sourceware.org/gdb-patches/AANLkTi=zuEDw6qiZ1jRatkdwHO99xF2Qu+WZ7i0EQjef@mail.gmail.com/ > > In this commit I propose a solution to these problems. > > What I propose is that GDB should spot when we have .reg/0 sections > and, when these are found, should rename these sections using some > unique non-zero lwpid. > > Note in the above output we also have sections like .reg2/0 and > .reg-xstate/0, these are additional register sets, this commit also > renumbers these sections inline with their .reg section. > > The user is warned that some section renumbering has been performed. > > GDB takes care to ensure that the new numbers assigned are unique and > don't clash with any of the pid's that might already be in use -- > remember, in a real vmcore file, 0 is used to indicate an idle core, > non-idle cores will have the pid of whichever process was running on > that core, so we don't want GDB to assign an lwpid that clashes with > an actual pid that is in use in the core file. > > After this commit here's the updated GDB session output: > > $ ./gdb/gdb --data-directory ./gdb/data-directory/ -q > (gdb) core-file /tmp/x86_64-pid0-core.core > warning: found threads with pid 0, assigned replacement Target Ids: LWP 1, LWP 2 > [New LWP 1] > [New LWP 2] > Failed to read a valid object file image from memory. > Core was generated by `./segv-mt'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 0x00000000004017c2 in ?? () > [Current thread is 1 (LWP 1)] > (gdb) info threads > Id Target Id Frame > * 1 LWP 1 0x00000000004017c2 in ?? () > 2 LWP 2 0x000000000040dda5 in ?? () > (gdb) maintenance info sections > Core file: `/tmp/x86_64-pid0-core.core', file type elf64-x86-64. > [0] 0x00000000->0x000012d4 at 0x00000318: note0 READONLY HAS_CONTENTS > [1] 0x00000000->0x000000d8 at 0x0000039c: .reg/1 HAS_CONTENTS > [2] 0x00000000->0x000000d8 at 0x0000039c: .reg HAS_CONTENTS > [3] 0x00000000->0x00000080 at 0x0000052c: .note.linuxcore.siginfo/1 HAS_CONTENTS > [4] 0x00000000->0x00000080 at 0x0000052c: .note.linuxcore.siginfo HAS_CONTENTS > [5] 0x00000000->0x00000140 at 0x000005c0: .auxv HAS_CONTENTS > [6] 0x00000000->0x000000a4 at 0x00000714: .note.linuxcore.file/1 HAS_CONTENTS > [7] 0x00000000->0x000000a4 at 0x00000714: .note.linuxcore.file HAS_CONTENTS > [8] 0x00000000->0x00000200 at 0x000007cc: .reg2/1 HAS_CONTENTS > [9] 0x00000000->0x00000200 at 0x000007cc: .reg2 HAS_CONTENTS > [10] 0x00000000->0x00000440 at 0x000009e0: .reg-xstate/1 HAS_CONTENTS > [11] 0x00000000->0x00000440 at 0x000009e0: .reg-xstate HAS_CONTENTS > [12] 0x00000000->0x000000d8 at 0x00000ea4: .reg/2 HAS_CONTENTS > [13] 0x00000000->0x00000200 at 0x00000f98: .reg2/2 HAS_CONTENTS > [14] 0x00000000->0x00000440 at 0x000011ac: .reg-xstate/2 HAS_CONTENTS > [15] 0x00400000->0x00401000 at 0x00002000: load1 ALLOC LOAD READONLY HAS_CONTENTS > [16] 0x00401000->0x004b9000 at 0x00003000: load2 ALLOC READONLY CODE > [17] 0x004b9000->0x004e5000 at 0x00003000: load3 ALLOC READONLY > [18] 0x004e6000->0x004ec000 at 0x00003000: load4 ALLOC LOAD HAS_CONTENTS > [19] 0x004ec000->0x004f2000 at 0x00009000: load5 ALLOC LOAD HAS_CONTENTS > [20] 0x012a8000->0x012cb000 at 0x0000f000: load6 ALLOC LOAD HAS_CONTENTS > [21] 0x7fda77736000->0x7fda77737000 at 0x00032000: load7 ALLOC READONLY > [22] 0x7fda77737000->0x7fda77f37000 at 0x00032000: load8 ALLOC LOAD HAS_CONTENTS > [23] 0x7ffd55f65000->0x7ffd55f86000 at 0x00832000: load9 ALLOC LOAD HAS_CONTENTS > [24] 0x7ffd55fc3000->0x7ffd55fc7000 at 0x00853000: load10 ALLOC LOAD READONLY HAS_CONTENTS > [25] 0x7ffd55fc7000->0x7ffd55fc9000 at 0x00857000: load11 ALLOC LOAD READONLY CODE HAS_CONTENTS > [26] 0xffffffffff600000->0xffffffffff601000 at 0x00859000: load12 ALLOC LOAD READONLY CODE HAS_CONTENTS > (gdb) > > Notice the new warning which is issued when the core file is being > loaded. The threads are announced as '[New LWP NN]', and we see two > threads in the 'info threads' output. The 'maintenance info sections' > output shows the result of the section renaming. > > The gdb.arch/core-file-pid0.exp test has been update to check for the > improved GDB output. > --- > gdb/corelow.c | 150 ++++++++++++++++++++++ > gdb/testsuite/gdb.arch/core-file-pid0.exp | 12 +- > 2 files changed, 161 insertions(+), 1 deletion(-) Another great explanation! LGTM. Reviewed-by: Kevin Buettner