Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed
From: Kevin Buettner via Gdb-patches <gdb-patches@sourceware.org>
To: Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org>
Cc: Andrew Burgess <aburgess@redhat.com>
Subject: Re: [PATCH 2/3] gdb/testsuite: add test for core file with a 0 pid
Date: Fri, 9 Jun 2023 17:16:37 -0700	[thread overview]
Message-ID: <20230609171637.0cfcd9c6@f38-zws-nv> (raw)
In-Reply-To: <6cb39b064b3e1c9ed57964b29fd980f3a6d30a25.1685956034.git.aburgess@redhat.com>

On Mon,  5 Jun 2023 10:11:08 +0100
Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org> wrote:

> This patch contains a test for this commit:
> 
>   commit c820c52a914cc9d7c63cb41ad396f4ddffff2196
>   Date:   Fri Aug 6 19:45:58 2010 +0000
> 
>               * thread.c (add_thread_silent): Use null_ptid instead of
>               minus_one_ptid while getting rid of stale inferior_ptid.
> 
> This is another test that has been carried in the Fedora GDB tree for
> some time, and I thought that it would be worth merging to master.  I
> don't believe there is any test like this currently in the testsuite.
> 
> The original issue was reported in this thread:
> 
>   https://inbox.sourceware.org/gdb-patches/AANLkTi=zuEDw6qiZ1jRatkdwHO99xF2Qu+WZ7i0EQjef@mail.gmail.com/
> 
> The problem was that when GDB was used to open a vmcore (core file)
> image generated by the Linux kernel GDB would (sometimes) crash with
> an assertion failure:
> 
>   thread.c:884: internal-error: switch_to_thread: Assertion `inf != NULL' failed.
> 
> To understand what's going on we need some background; a vmcore file
> represents each processor core in the same way that a standard
> application core file represents threads.  Thus, we might say, a
> vmcore file represents cores as threads.
> 
> When writing a vmcore file, the kernel will store the pid of the
> process currently running on that core as the thread's lwpid.
> 
> However, if a core is idle, with no process currently running on it,
> then the lwpid for that thread is stored as 0 in the vmcore file.  If
> multiple cores are idle then multiple threads will have a lwpid of 0.
> 
> Back in 2010, the original issue reported tried to change the kernel's
> behaviour in this thread:
> 
>   https://lkml.org/lkml/2010/8/3/75
> 
> This change was rejected by the kernel team, the current
> behaviour (lwpid of 0) was considered correct.  I've checked the
> source of a recent kernel.  The code mentioned in the lkml.org posting
> has moved, it's now in the function crash_save_cpu in the file
> kernel/kexec_core.c, but the general behaviour is unchanged, an idle
> core will have an lwpid of 0, so I think GDB still needs to be able to
> handle this case.
> 
> When GDB loads a vmcore file (which is handled just like any other
> core file) the sections are processed in core_open to generate the
> threads for the core file.  The processing is done by calling
> add_to_thread_list, a function which looks for sections named .reg/NN
> where NN is the lwpid of the thread, GDB then builds a ptid_t for the
> new thread and calls add_thread.
> 
> Remember, in our case the lwpid is 0.  Now for the first thread this
> is fine, if a little weird, 0 isn't usually a valid lwpid, but that's
> OK, GDB creates a thread with lwpid of 0 and carries on.
> 
> When we find the next thread (core) with lwpid of 0, we attempt to
> create another thread with an lwpid of 0.  This of course clashes with
> the previously created thread, they have the same ptid_t, so GDB tries
> to delete the first thread.
> 
> And it was within this thread delete code that we triggered a bug
> which would then cause GDB to assert -- when deleting we tried to
> switch to a thread with minus_one_ptid, this resulted in a call to
> find_inferior_pid (passing in minus_one_ptid's pid, which is -1), the
> find_inferior_pid call fails and returns NULL, which then triggered an
> assert in switch_to_thread.
> 
> The actual details of the why the assert triggered are really not
> important.  What's important (I think) is that a vmcore file might
> have this interesting lwpid of 0 characteristic, which isn't something
> we see in "normal" application core files, and it is this that I think
> we should be testing.
> 
> Now, you might be thinking: isn't deleting the first thread the wrong
> thing to do?  If the vmcore file has two threads that represent two
> cores, and both have an lwpid of 0 (indicating both cores are idle),
> then surely GDB should still represent this as two threads?  You're
> not wrong.  This was mentioned by Pedro in the original GDB mailing
> list thread here:
> 
>   https://inbox.sourceware.org/gdb-patches/201008061057.03037.pedro@codesourcery.com/
> 
> This is indeed a problem, and this problem is still present in GDB
> today.  I plan to try and address this in a later commit, however,
> this first commit is about getting a test in place to confirm that GDB
> at a minimum doesn't crash when loading such a vmcore file.
> 
> And so, finally, what's in this commit?
> 
> This commit contains a new test.  The test doesn't actually contain a
> vmcore file.  Instead I've created a standard application core file
> that contains two threads, and then manually edited the core file to
> set the lwpid of each thread to 0.
> 
> To further reduce the size of the core file (as it will be stored in
> git), I've zeroed all of the LOAD-able segments in the core file.
> This test really doesn't care about that part of the core file, we
> only really care about loading the register's, this is enough to
> confirm that the GDB doesn't crash.
> 
> Obviously as the core file is pre-generated, this test is architecture
> specific.  There are already a few tests in gdb.arch/ that include
> pre-generate core files.  Just as those existing tests do, I've
> compressed the core file with bzip2, which reduces it to just 750
> bytes.  I have structured the test so that if/when this patch is
> merged I can add some additional core files for other architectures,
> however, these are not included in this commit.
> 
> The test simply expands the core file, and then loads it into GDB.
> One interesting thing to note is that GDB reports the core file
> loading like this:
> 
>   (gdb) core-file ./gdb/testsuite/outputs/gdb.arch/core-file-pid0/core-file-pid0.x86-64.core
>   [New process 1]
>   [New process 1]
>   Failed to read a valid object file image from memory.
>   Core was generated by `./segv-mt'.
>   Program terminated with signal SIGSEGV, Segmentation fault.
>   The current thread has terminated
>   (gdb)
> 
> There's two interesting things here: first, the repeated "New process
> 1" message.  This is caused because linux_core_pid_to_str reports
> anything with an lwpid of 0 as a process, rather than an LWP.  And
> second, the "The current thread has terminated" message.  This is
> because the first thread in the core file is the current thread, but
> when GDB loads the second thread (which also has lwpid 0) this causes
> the first thread to be deleted, as a result GDB thinks that the
> current (first) thread has terminated.
> 
> As I said previously, both of these problems are a result of the lwpid
> 0 aliasing, which is not being fixed in this commit -- this commit is
> just confirming that GDB doesn't crash when loading this core file.

Great explanation! :)

Approved-by: Kevin Buettner <kevinb@redhat.com>


  reply	other threads:[~2023-06-10  0:17 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-05  9:11 [PATCH 0/3] Improve vmcore loading Andrew Burgess via Gdb-patches
2023-06-05  9:11 ` [PATCH 1/3] gdb: split inferior and thread setup when opening a core file Andrew Burgess via Gdb-patches
2023-06-10  0:04   ` Kevin Buettner via Gdb-patches
2023-06-05  9:11 ` [PATCH 2/3] gdb/testsuite: add test for core file with a 0 pid Andrew Burgess via Gdb-patches
2023-06-10  0:16   ` Kevin Buettner via Gdb-patches [this message]
2023-07-06 14:49   ` Pedro Alves
2023-07-07  9:56     ` Andrew Burgess via Gdb-patches
2023-07-10 11:10   ` Andrew Burgess via Gdb-patches
2023-06-05  9:11 ` [PATCH 3/3] gdb: handle core files with .reg/0 section names Andrew Burgess via Gdb-patches
2023-06-10  0:36   ` Kevin Buettner via Gdb-patches
2023-07-03 17:03 ` [PATCH 0/3] Improve vmcore loading Andrew Burgess via Gdb-patches

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230609171637.0cfcd9c6@f38-zws-nv \
    --to=gdb-patches@sourceware.org \
    --cc=aburgess@redhat.com \
    --cc=kevinb@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox