From: Kevin Buettner via Gdb-patches <gdb-patches@sourceware.org>
To: Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org>
Cc: Andrew Burgess <aburgess@redhat.com>
Subject: Re: [PATCH 2/3] gdb/testsuite: add test for core file with a 0 pid
Date: Fri, 9 Jun 2023 17:16:37 -0700 [thread overview]
Message-ID: <20230609171637.0cfcd9c6@f38-zws-nv> (raw)
In-Reply-To: <6cb39b064b3e1c9ed57964b29fd980f3a6d30a25.1685956034.git.aburgess@redhat.com>
On Mon, 5 Jun 2023 10:11:08 +0100
Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org> wrote:
> This patch contains a test for this commit:
>
> commit c820c52a914cc9d7c63cb41ad396f4ddffff2196
> Date: Fri Aug 6 19:45:58 2010 +0000
>
> * thread.c (add_thread_silent): Use null_ptid instead of
> minus_one_ptid while getting rid of stale inferior_ptid.
>
> This is another test that has been carried in the Fedora GDB tree for
> some time, and I thought that it would be worth merging to master. I
> don't believe there is any test like this currently in the testsuite.
>
> The original issue was reported in this thread:
>
> https://inbox.sourceware.org/gdb-patches/AANLkTi=zuEDw6qiZ1jRatkdwHO99xF2Qu+WZ7i0EQjef@mail.gmail.com/
>
> The problem was that when GDB was used to open a vmcore (core file)
> image generated by the Linux kernel GDB would (sometimes) crash with
> an assertion failure:
>
> thread.c:884: internal-error: switch_to_thread: Assertion `inf != NULL' failed.
>
> To understand what's going on we need some background; a vmcore file
> represents each processor core in the same way that a standard
> application core file represents threads. Thus, we might say, a
> vmcore file represents cores as threads.
>
> When writing a vmcore file, the kernel will store the pid of the
> process currently running on that core as the thread's lwpid.
>
> However, if a core is idle, with no process currently running on it,
> then the lwpid for that thread is stored as 0 in the vmcore file. If
> multiple cores are idle then multiple threads will have a lwpid of 0.
>
> Back in 2010, the original issue reported tried to change the kernel's
> behaviour in this thread:
>
> https://lkml.org/lkml/2010/8/3/75
>
> This change was rejected by the kernel team, the current
> behaviour (lwpid of 0) was considered correct. I've checked the
> source of a recent kernel. The code mentioned in the lkml.org posting
> has moved, it's now in the function crash_save_cpu in the file
> kernel/kexec_core.c, but the general behaviour is unchanged, an idle
> core will have an lwpid of 0, so I think GDB still needs to be able to
> handle this case.
>
> When GDB loads a vmcore file (which is handled just like any other
> core file) the sections are processed in core_open to generate the
> threads for the core file. The processing is done by calling
> add_to_thread_list, a function which looks for sections named .reg/NN
> where NN is the lwpid of the thread, GDB then builds a ptid_t for the
> new thread and calls add_thread.
>
> Remember, in our case the lwpid is 0. Now for the first thread this
> is fine, if a little weird, 0 isn't usually a valid lwpid, but that's
> OK, GDB creates a thread with lwpid of 0 and carries on.
>
> When we find the next thread (core) with lwpid of 0, we attempt to
> create another thread with an lwpid of 0. This of course clashes with
> the previously created thread, they have the same ptid_t, so GDB tries
> to delete the first thread.
>
> And it was within this thread delete code that we triggered a bug
> which would then cause GDB to assert -- when deleting we tried to
> switch to a thread with minus_one_ptid, this resulted in a call to
> find_inferior_pid (passing in minus_one_ptid's pid, which is -1), the
> find_inferior_pid call fails and returns NULL, which then triggered an
> assert in switch_to_thread.
>
> The actual details of the why the assert triggered are really not
> important. What's important (I think) is that a vmcore file might
> have this interesting lwpid of 0 characteristic, which isn't something
> we see in "normal" application core files, and it is this that I think
> we should be testing.
>
> Now, you might be thinking: isn't deleting the first thread the wrong
> thing to do? If the vmcore file has two threads that represent two
> cores, and both have an lwpid of 0 (indicating both cores are idle),
> then surely GDB should still represent this as two threads? You're
> not wrong. This was mentioned by Pedro in the original GDB mailing
> list thread here:
>
> https://inbox.sourceware.org/gdb-patches/201008061057.03037.pedro@codesourcery.com/
>
> This is indeed a problem, and this problem is still present in GDB
> today. I plan to try and address this in a later commit, however,
> this first commit is about getting a test in place to confirm that GDB
> at a minimum doesn't crash when loading such a vmcore file.
>
> And so, finally, what's in this commit?
>
> This commit contains a new test. The test doesn't actually contain a
> vmcore file. Instead I've created a standard application core file
> that contains two threads, and then manually edited the core file to
> set the lwpid of each thread to 0.
>
> To further reduce the size of the core file (as it will be stored in
> git), I've zeroed all of the LOAD-able segments in the core file.
> This test really doesn't care about that part of the core file, we
> only really care about loading the register's, this is enough to
> confirm that the GDB doesn't crash.
>
> Obviously as the core file is pre-generated, this test is architecture
> specific. There are already a few tests in gdb.arch/ that include
> pre-generate core files. Just as those existing tests do, I've
> compressed the core file with bzip2, which reduces it to just 750
> bytes. I have structured the test so that if/when this patch is
> merged I can add some additional core files for other architectures,
> however, these are not included in this commit.
>
> The test simply expands the core file, and then loads it into GDB.
> One interesting thing to note is that GDB reports the core file
> loading like this:
>
> (gdb) core-file ./gdb/testsuite/outputs/gdb.arch/core-file-pid0/core-file-pid0.x86-64.core
> [New process 1]
> [New process 1]
> Failed to read a valid object file image from memory.
> Core was generated by `./segv-mt'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> The current thread has terminated
> (gdb)
>
> There's two interesting things here: first, the repeated "New process
> 1" message. This is caused because linux_core_pid_to_str reports
> anything with an lwpid of 0 as a process, rather than an LWP. And
> second, the "The current thread has terminated" message. This is
> because the first thread in the core file is the current thread, but
> when GDB loads the second thread (which also has lwpid 0) this causes
> the first thread to be deleted, as a result GDB thinks that the
> current (first) thread has terminated.
>
> As I said previously, both of these problems are a result of the lwpid
> 0 aliasing, which is not being fixed in this commit -- this commit is
> just confirming that GDB doesn't crash when loading this core file.
Great explanation! :)
Approved-by: Kevin Buettner <kevinb@redhat.com>
next prev parent reply other threads:[~2023-06-10 0:17 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-05 9:11 [PATCH 0/3] Improve vmcore loading Andrew Burgess via Gdb-patches
2023-06-05 9:11 ` [PATCH 1/3] gdb: split inferior and thread setup when opening a core file Andrew Burgess via Gdb-patches
2023-06-10 0:04 ` Kevin Buettner via Gdb-patches
2023-06-05 9:11 ` [PATCH 2/3] gdb/testsuite: add test for core file with a 0 pid Andrew Burgess via Gdb-patches
2023-06-10 0:16 ` Kevin Buettner via Gdb-patches [this message]
2023-07-06 14:49 ` Pedro Alves
2023-07-07 9:56 ` Andrew Burgess via Gdb-patches
2023-07-10 11:10 ` Andrew Burgess via Gdb-patches
2023-06-05 9:11 ` [PATCH 3/3] gdb: handle core files with .reg/0 section names Andrew Burgess via Gdb-patches
2023-06-10 0:36 ` Kevin Buettner via Gdb-patches
2023-07-03 17:03 ` [PATCH 0/3] Improve vmcore loading Andrew Burgess via Gdb-patches
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230609171637.0cfcd9c6@f38-zws-nv \
--to=gdb-patches@sourceware.org \
--cc=aburgess@redhat.com \
--cc=kevinb@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox