Re: Multi-threaded dwarf parsing

Mirror of the gdb mailing list
 help / color / mirror / Atom feed

From: Simon Marchi <simon.marchi@polymtl.ca>
To: Tom Tromey <tom@tromey.com>
Cc: Pedro Alves <palves@redhat.com>, gdb@sourceware.org
Subject: Re: Multi-threaded dwarf parsing
Date: Wed, 24 Feb 2016 16:43:00 -0000	[thread overview]
Message-ID: <c4dc7b1f07fe11da024684ec2de47a7e@simark.ca> (raw)
In-Reply-To: <87lh6a6s8s.fsf@tromey.com>

On 2016-02-24 10:30, Tom Tromey wrote:
> It's been a while since I thought about that branch.
> 
> I think it helps some scenarios, but maybe not as many as you'd like.
> In fact, I think it doesn't help the two of the three most typical ways
> I debug Firefox.  (I realize this may not apply directly to your idea 
> of
> reading each CU independently; this is just the state of that branch.)
> 
> 1. Run Firefox, then attach.
> 
>    Here it is pretty normal for the attach to interrupt Firefox
>    somewhere in libxul.so -- the largest library (so much larger that 
> it
>    is the only one that causes a noticeable pause at gdb startup).
> 
>    But, it seems to me that stopping somewhere in libxul.so should
>    probably cause its debuginfo to be read.
> 
> 2. Start gdb, set a breakpoint, then run Firefox.
> 
>    Here debuginfo for every library must be read in order to set the
>    breakpoint correctly.
> 
> 
> The third scenario, which would be helped, is:
> 
> 3. Start gdb, run Firefox, and try to reproduce a crash.  In this
>    situation gdb could read the debuginfo in the background and
>    everything would work nicely.
> 
> 
> That said, I think my branch might have helped a tiny bit with scenario
> #1, because it prioritized the largest files when reading debuginfo.
> So, libxul.so would generally be read a bit earlier than it is now.
> 
> Reading each CU independently seems like a good idea to me.  I think it
> will stumble into various problems inside gdb, but I'd guess they are
> all surmountable with enough work.

Indeed, we probably had different, but not incompatible ideas of 
"threaded".
Just to make sure I understand correctly: instead of blocking on the 
psymtabs
creation at startup (in elf_symfile_read), you offload that to worker 
threads
and carry on.  If you happen to need the information and it's not ready 
yet,
then the main code will have to block until the corresponding task is 
complete
(dwarf2_require_psymtabs).  However, in each worker thread, each objfile 
is
still processed sequentially.  So if you are waiting for libxul.so's 
debug info
to be ready (such as in #1), it won't be ready any faster.  Is that 
right?

My view of the parallelism was that when reading an objfile's debug 
info, the
main thread would offload chunks of work (a chunk == a CU) to the worker
threads, but wait for all of them to be done before continuing.  So it 
would
still be blocking on the psymtab creation, but it would block for a 
shorter
time (divided by the number of threads/cores, in an ideal world).  It's 
just
replacing a serial algorithm by a parallel one, but it would be mostly
transparent to the rest of gdb.

I hadn't thought of reading the info in the background, but I like the 
fact
that it can get the user to a prompt faster.  And I think these two 
forms of
parallelism are not mutually exclusive, we could very well read CUs in 
parallel,
in the background.

> I think this could help with scenario #1.  The ideal situation here
> would be to read just the CU (or CUs?) covering the stop address; then
> lazily read more as needed for types and such.
> 
> I suppose it could also help #2 if enough parallelism is there to be
> had, though I'm a bit skeptical.

I think that reading CUs in parallel would help pretty much any use case 
where
you are waiting for psymtabs to be created, it could reduce that wait 
time.

>>> So, in a word, are there any gotchas or good reasons not do take this
>>> path?
> 
> Pedro> The obvious gotchas are of course all the globals, and coming up 
> with
> Pedro> fine enough locking granularity that threads actually do run in 
> parallel.
> 
> I think the gotcha situation got worse since I wrote my patch.
> 
> Now the DWARF reader can call into the type-printing system, which it
> didn't before.  It wasn't clear to me that this was safe.  ISTR there
> was some other change along these lines -- the DWARF reader calling out
> to some gdb module that it previously did not -- but I can't remember
> what it was any more.
> 
> The DWARF reader also has many more modes (debug_types, dwz, dwo/dwp)
> than it did back then.  So, this will require some careful auditing.

Yes, I'm sure the reality is way more complicated than the image I have
in my head at the moment :).

> FWIW my threading patches were written during my time at Red Hat and so
> you can use any part of that series without needing any paperwork from
> me.

Great, thanks!

Simon

next prev parent reply	other threads:[~2016-02-24 16:43 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-24  2:45 Simon Marchi
2016-02-24 11:06 ` Pedro Alves
2016-02-24 15:30   ` Tom Tromey
2016-02-24 16:43     ` Simon Marchi [this message]
2016-02-24 19:50       ` Tom Tromey
2016-02-24 20:25       ` Jan Kratochvil
2016-02-24 20:37         ` Simon Marchi
2016-02-24 21:28           ` Jan Kratochvil
2016-02-24 21:10         ` Pedro Alves
2016-02-24 21:22           ` Jan Kratochvil
2016-02-25  3:31         ` Tom Tromey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c4dc7b1f07fe11da024684ec2de47a7e@simark.ca \
    --to=simon.marchi@polymtl.ca \
    --cc=gdb@sourceware.org \
    --cc=palves@redhat.com \
    --cc=tom@tromey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox