Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed
From: Simon Marchi <simark@simark.ca>
To: Pedro Alves <palves@redhat.com>,
	"Aktemur, Tankut Baris" <tankut.baris.aktemur@intel.com>,
	"gdb-patches@sourceware.org" <gdb-patches@sourceware.org>
Subject: Re: [PATCH] Fix re-runs of a second inferior (PR gdb/25410)
Date: Fri, 24 Jan 2020 17:15:00 -0000	[thread overview]
Message-ID: <5db80e9a-2c58-6edc-8a9e-db44ea44a8dc@simark.ca> (raw)
In-Reply-To: <bd7d83f4-49e9-787c-cc36-6fa97e73f6e7@redhat.com>

On 2020-01-24 10:11 a.m., Pedro Alves wrote:
> On 1/24/20 2:59 PM, Simon Marchi wrote:
>> On 2020-01-24 6:22 a.m., Aktemur, Tankut Baris wrote:
>>> On Friday, January 24, 2020 4:02 AM, Pedro Alves wrote:
>>>> This fixes a latent bug exposed by the multi-target patch (5b6d1e4fa
>>>> "Multi-target support).
>>>
>>> The patch led to the assertion violation below when running
>>> gdb.threads/vfork-follow-child-exit.exp.
>>>
>>> gdb/progspace.c:243: internal-error: void set_current_program_space(program_space*): Assertion `pspace != NULL' failed.
>>
>>
>> Oh, I see that too.
> 
> Me too.  No idea how I missed this.  Sorry about that...
> There's actually yet another internal error in addition
> to that one.
> 
> Here's the updated patch, which fixes all.
> 
> From 2880f692d69cf54ef9fb41f84ac00a1d04f28447 Mon Sep 17 00:00:00 2001
> From: Pedro Alves <palves@redhat.com>
> Date: Fri, 24 Jan 2020 14:55:43 +0000
> Subject: [PATCH] Fix re-runs of a second inferior (PR gdb/25410)
> 
> This fixes a latent bug exposed by the multi-target patch (5b6d1e4fa
> "Multi-target support), and then fixes two other latent bugs exposed
> by fixing that first latent bug.
> 
> The symptom described in the bug report is that starting a first
> inferior, then trying to run a second (multi-threaded) inferior twice,
> causes libthread_db to fail to load, along with other erratic
> behavior:
> 
>  (gdb) run
>  Starting program: /tmp/foo
>  warning: td_ta_new failed: generic error
> 
> Going a bit deeply, I found that if the two inferiors have different
> symbols, we can see that just after inferior 2 exits, we are left with
> inferior 2 selected, which is correct, but the symbols in scope belong
> to inferior 1, which is obviously incorrect...
> 
> This problem is that there's a path in
> scoped_restore_current_thread::restore() that switches to no thread
> selected, and switches the current inferior, but leaves the current
> program space as is, resulting in leaving the program space pointing
> to the wrong program space (the one of the other inferior).  This was
> happening after handling TARGET_WAITKIND_NO_RESUMED, which is an event
> that triggers after TARGET_WAITKIND_EXITED for the previous inferior
> exit.  Subsequent symbol lookups find the symbols of the wrong
> inferior.
> 
> The fix is to use switch_to_inferior_no_thread in that problem spot.
> This function was recently added along with the multi-target work
> exactly for these situations.
> 
> As for testing, this patch adds a new testcase that tests symbol
> printing just after inferior exit, which exercises the root cause of
> the problem more directly.  And then, to cover the use case described
> in the bug too, it also exercises the lithread_db.so mis-loading, by
> using TLS printing as a proxy for being sure that threaded debugging
> was activated sucessfully.  The testcase fails without the fix like
> this, for the "print symbol just after exit" bits:
> 
>  ...
>  [Inferior 1 (process 8719) exited normally]
>  (gdb) PASS: gdb.multi/multi-re-run.exp: re_run_inf=1: iter=1: continue until exit
>  print re_run_var_1
>  No symbol "re_run_var_1" in current context.
>  (gdb) FAIL: gdb.multi/multi-re-run.exp: re_run_inf=1: iter=1: print re_run_var_1
>  ...
> 
> And like this for the "libthread_db.so loading" bits:
> 
>  (gdb) run
>  Starting program: /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.multi/multi-re-run/multi-re-run
>  warning: td_ta_new failed: generic error
>  [New LWP 27001]
> 
>  Thread 1.1 "multi-re-run" hit Breakpoint 3, all_started () at /home/pedro/gdb/binutils-gdb/build/../src/gdb/testsuite/gdb.multi/multi-re-run.c:44
>  44      }
>  (gdb) PASS: gdb.multi/multi-re-run.exp: re_run_inf=1: iter=2: running to all_started in runto
>  print tls_var
>  Cannot find thread-local storage for LWP 27000, executable file /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.multi/multi-re-run/multi-re-run:
>  Cannot find thread-local variables on this target
>  (gdb) FAIL: gdb.multi/multi-re-run.exp: re_run_inf=1: iter=2: print tls_var
> 
> 
> As mentioned, that fix above goes on to expose a couple other latent
> bugs.  This commit fixes those as well.
> 
> The first latent bug exposed is in
> infrun.c:handle_vfork_child_exec_or_exit.  The current code is leaving
> inf->pspace == NULL while calling clone_program_space.  The idea was
> to make it so that the breakpoints module doesn't use this inferior's
> pspace to set breakpoints.  With that, any
> scoped_restore_current_thread use from within clone_program_space
> tries to restore a NULL program space, which hits an assertion:
> 
>  Attaching after Thread 0x7ffff74b8700 (LWP 27276) vfork to child process 27277]
>  [New inferior 2 (process 27277)]
>  [Thread debugging using libthread_db enabled]
>  Using host libthread_db library "/lib64/libthread_db.so.1".
>  /home/pedro/gdb/binutils-gdb/build/../src/gdb/progspace.c:243: internal-error: void set_current_program_space(program_space*): Assertion `pspace != NULL' faile
>  d.
>  A problem internal to GDB has been detected,
>  further debugging may prove unreliable.
>  Quit this debugging session? (y or n) FAIL: gdb.threads/vfork-follow-child-exit.exp: detach-on-fork=off: continue (GDB internal error)
> 
> That NULL pspace idea was legitimate, but it's no longer necessary,
> since commit b2e586e850db ("Defer breakpoint reset when cloning
> progspace for fork child").  So the fix is to just set the inferior's
> program space earlier.
> 
> 
> The other latent bug exposed is in exec.c.  When exec_close is called
> from the program_space destructor, it is purposedly called with a
> current program space that is not current inferior's program space.
> The problem is that the multi-target work added some code to
> remove_target_sections that loops over all inferiors, and uses
> scoped_restore_current_thread to save/restore the previous
> thread/inferior/frame state.  This makes it so that exec_close returns
> with the current program space set to the current inferior's program
> space, which is exactly what we did not want.  Then the program_space
> destructor continues into free_all_objfiles, but it is not running
> that method on the wrong program space, resulting in:
> 
>  Reading symbols from /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.threads/fork-plus-threads/fork-plus-threads...
>  Reading symbols from /usr/lib/debug/usr/lib64/libpthread-2.26.so.debug...
>  Reading symbols from /usr/lib/debug/usr/lib64/libm-2.26.so.debug...
>  Reading symbols from /usr/lib/debug/usr/lib64/libc-2.26.so.debug...
>  Reading symbols from /usr/lib/debug/usr/lib64/ld-2.26.so.debug...
>  [Inferior 3 (process 9583) exited normally]
>  /home/pedro/gdb/binutils-gdb/build/../src/gdb/progspace.c:170: internal-error: void program_space::free_all_objfiles(): Assertion `so->objfile == NULL' failed.
>  A problem internal to GDB has been detected,
>  further debugging may prove unreliable.
>  Quit this debugging session? (y or n) FAIL: gdb.threads/fork-plus-threads.exp: detach-on-fork=off: inferior 1 exited (GDB internal error)
> 
> The fix is to use scoped_restore_current_pspace_and_thread instead of
> scoped_restore_current_thread.

I did not look at the code in depth, but your explanations make sense to me,
so if the testsuite shows no regression, this LGTM.

Simon


      reply	other threads:[~2020-01-24 17:05 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-24  3:32 Pedro Alves
2020-01-24  7:00 ` Simon Marchi
     [not found] ` <DM6PR11MB303390E9124B6ECD67E1740AC40E0@DM6PR11MB3033.namprd11.prod.outlook.com>
2020-01-24 15:02   ` Simon Marchi
2020-01-24 15:23     ` Pedro Alves
2020-01-24 17:15       ` Simon Marchi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5db80e9a-2c58-6edc-8a9e-db44ea44a8dc@simark.ca \
    --to=simark@simark.ca \
    --cc=gdb-patches@sourceware.org \
    --cc=palves@redhat.com \
    --cc=tankut.baris.aktemur@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox