Re: [PATCH 3/6] gdbsupport: use dynamic partitioning in gdb::parallel_for_each

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

From: Simon Marchi <simon.marchi@polymtl.ca>
To: Tom Tromey <tom@tromey.com>, Simon Marchi <simon.marchi@efficios.com>
Cc: gdb-patches@sourceware.org
Subject: Re: [PATCH 3/6] gdbsupport: use dynamic partitioning in gdb::parallel_for_each
Date: Fri, 13 Jun 2025 15:22:13 -0400	[thread overview]
Message-ID: <46f028fd-14f8-49a5-afc1-d6661d6f84ac@polymtl.ca> (raw)
In-Reply-To: <87tt4jjy13.fsf@tromey.com>

On 6/13/25 2:29 PM, Tom Tromey wrote:
>>>>>> "Simon" == Simon Marchi <simon.marchi@efficios.com> writes:
> 
> Simon>  - I want to still be able to use scoped_time_it to measure the time
> Simon>    that each worker thread spent working on the task.  I find it really
> Simon>    handy when measuring the performance impact of changes.
> 
> Simon>    Unfortunately, the current interface of gdb::parallel_for_each, which
> Simon>    receives a simple callback, is not well-suited for that, once I
> Simon>    introduce the dynamic partitioning.  The callback would get called
> Simon>    once for each work item batch (multiple time for each worker thread),
> Simon>    so it's not possible to maintain a per-worker thread object for the
> Simon>    duration of the parallel for.
> 
> Simon>    To allow this, I changed gdb::parallel_for_each to receive a worker
> Simon>    type as a template parameter.  Each worker thread creates one local
> Simon>    instance of that type, and calls operator() on it for each work item
> Simon>    batch.  By having a scoped_time_it object as a field of that worker,
> Simon>    we can get the timings per worker thread.
> 
> Simon>    The drawbacks of this approach is that we must now define the
> Simon>    parallel task in a separate class and manually capture any context we
> Simon>    need as fields of that class.
> 
> This seems like a real step backward to me.  It's basically replicating
> lambdas by hand.
> 
> Why can't parallel_for_each just do this itself?  It could take an
> argument naming the task to pass to the scoped_time_it constructor.  The
> 'task' lambda could instantiate the timer.

This would require scoped_time_it to be in gdbsupport.  For the moment,
scoped_time_it is defined in gdb, because it was easier to do so at the
time, but it could be moved to gdbsupport.

In the patch that makes the DWARF reader use
gdb::parallel_for_each_async, we keep more per-worker state, the
`cooked_index_worker_result` where each worker threads puts what it
computes.  We would need a way to make that work with the lambda.

One way to do this would be to pass to the lambda the index (0 to N-1)
of the worker thread, and the lambda could use it to index into the
vector of cooked_index_worker_result, which would have been prepared by
the caller of gdb::parallel_for_each_async.

The last detail would be: how do to some one-time work at the end of a
parallel for-each worker's life.  For example, in the DWARF indexer,
each worker does:

 - call bfd_thread_cleanup
 - move the complaints to the thread storage
 - move the thread storage to the results vector

How could this work, given that the lambda is invoked many times for
each worker?

We could make gdb::parallel_for_each accept a second lambda, "finalize",
which would be called at the end of a worker's life.

Another idea I had (but didn't try) was to make lambdas receive a
"magic" range object, like athis:

  [&] (gdb::dynamic_range<dwarf2_per_cu *> range)
    {
      for (dwarf2_per_cu *cu : range)
        process_unit (cu);
    }

"range" would get work items from the work queue in batches, but yield
one at a time.  It would reach its end when the work queue is empty.
It will be a bit of work to implement, but it would have the advantage
that any per-worker state could be right there in the lambda, like we
have today.

Any thoughts?

> Simon> +  /* The next item to hand out.  */
> Simon> +  std::atomic<RandomIt> next = first;
> 
> This is maybe a bit worrying since IIUC std::atomic only works for
> trivially copyable types.  But this doesn't have to be a general purpose
> facility, it only has to work for gdb's needs.

Yes, I hit this when trying to do:

  gdb::parallel_for_each (vector.begin (), vector.end (), ...);

By default, a vector iterator is small and trivial.  But when building
with -D_GLIBCXX_DEBUG, then it's no longer trivially copyable.  That's
why I used the less ideal:

  gdb::parallel_for_each (vector.data (),
                          vector.data () + vector.size (), ...);

If you have a better idea to make this work, let me know.

Simon

next prev parent reply	other threads:[~2025-06-13 19:23 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-05 20:15 [PATCH 1/6] gdb: re-work parallel-for-selftests.c Simon Marchi
2025-05-05 20:15 ` [PATCH 2/6] gdbsupport: make gdb::parallel_for_each's n parameter a template parameter Simon Marchi
2025-06-13 17:55   ` Tom Tromey
2025-06-13 18:43     ` Simon Marchi
2025-05-05 20:15 ` [PATCH 3/6] gdbsupport: use dynamic partitioning in gdb::parallel_for_each Simon Marchi
2025-06-13 18:29   ` Tom Tromey
2025-06-13 19:22     ` Simon Marchi [this message]
2025-06-13 19:56       ` Simon Marchi
2025-06-13 20:12         ` Simon Marchi
2025-06-26 19:27           ` Simon Marchi
2025-07-03 19:23             ` Tom Tromey
2025-07-03 19:36               ` Simon Marchi
2025-05-05 20:15 ` [PATCH 4/6] gdbsupport: factor out work queue from parallel-for.h Simon Marchi
2025-06-13 18:33   ` Tom Tromey
2025-06-13 19:24     ` Simon Marchi
2025-05-05 20:15 ` [PATCH 5/6] gdbsupport: add async parallel_for_each version Simon Marchi
2025-06-13 18:39   ` Tom Tromey
2025-06-13 19:29     ` Simon Marchi
2025-05-05 20:15 ` [PATCH 6/6] gdb/dwarf: use dynamic partitioning for DWARF CU indexing Simon Marchi
2025-05-27 14:44 ` [PATCH 1/6] gdb: re-work parallel-for-selftests.c Simon Marchi
2025-06-13 17:48 ` Tom Tromey
2025-06-13 18:38   ` Simon Marchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46f028fd-14f8-49a5-afc1-d6661d6f84ac@polymtl.ca \
    --to=simon.marchi@polymtl.ca \
    --cc=gdb-patches@sourceware.org \
    --cc=simon.marchi@efficios.com \
    --cc=tom@tromey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox