From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca by simark.ca with LMTP id 2txBB1VekmjcqAQAWB0awg (envelope-from ) for ; Tue, 05 Aug 2025 15:41:09 -0400 Authentication-Results: simark.ca; dkim=pass (1024-bit key; unprotected) header.d=polymtl.ca header.i=@polymtl.ca header.a=rsa-sha256 header.s=default header.b=szrmjTri; dkim-atps=neutral Received: by simark.ca (Postfix, from userid 112) id 075421E100; Tue, 5 Aug 2025 15:41:09 -0400 (EDT) X-Spam-Checker-Version: SpamAssassin 4.0.1 (2024-03-25) on simark.ca X-Spam-Level: X-Spam-Status: No, score=-9.1 required=5.0 tests=ARC_SIGNED,ARC_VALID,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_VALIDITY_CERTIFIED,RCVD_IN_VALIDITY_RPBL, RCVD_IN_VALIDITY_SAFE autolearn=unavailable autolearn_force=no version=4.0.1 Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPS id C984B1E091 for ; Tue, 5 Aug 2025 15:41:04 -0400 (EDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 40D03385802C for ; Tue, 5 Aug 2025 19:41:04 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 40D03385802C Authentication-Results: sourceware.org; dkim=pass (1024-bit key, unprotected) header.d=polymtl.ca header.i=@polymtl.ca header.a=rsa-sha256 header.s=default header.b=szrmjTri Received: from smtp.polymtl.ca (smtp.polymtl.ca [132.207.4.11]) by sourceware.org (Postfix) with ESMTPS id 55C5338515E9 for ; Tue, 5 Aug 2025 17:01:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 55C5338515E9 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=polymtl.ca Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=polymtl.ca ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 55C5338515E9 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=132.207.4.11 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1754413313; cv=none; b=uSfN7VxP/T/v4am23YHKHLy4o4niHEGXtxUP+ZP24niCceEGGpPoVAoRtW9bnRRMOcC7HdgxUD+iZ7kZFUH58Z7mdic0IqHg8nyeTmRqdESmk7p5vEPvr1Ye9trqI1aJIbbNwFDqm2nPikjRPWCHlf4gFixjgzGg6r/yUwB9LZU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1754413313; c=relaxed/simple; bh=FVLHj4Qn23TnK8VzNTJvspneGjqTb4SHYVniBvMZKYc=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=FFTJ0gTltlJx1VrmEmdoPm5mW0c9IjsukggF6hUDdiOP5k19axaJt7t8uwGw6ltFhc0tZY9pAXQfID3J97/iNxKXUZ+1tT2B93zvOh8f8MjLXklSf5yQMpF9rl+bkJTi7hxZEd8MFitkOg3RUnPshtZuE0mFsRwtDSKLGK3FjqM= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from simark.ca (simark.ca [158.69.221.121]) (authenticated bits=0) by smtp.polymtl.ca (8.14.7/8.14.7) with ESMTP id 575H1IlI244265 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 5 Aug 2025 13:01:22 -0400 DKIM-Filter: OpenDKIM Filter v2.11.0 smtp.polymtl.ca 575H1IlI244265 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=polymtl.ca; s=default; t=1754413282; bh=OqopipFQY6d+fks3qkdh/iPUhI8ZXQw1y53ZYBL5VzM=; h=Date:Subject:To:From:In-Reply-To:From; b=szrmjTriCaytshnTtcVaxYGXT6c6QTj+eb1dKlFbvBchh6EscCmvS/FMxi937Qyf0 W/ZbekqNHXds0I+fodeT3N959Umjc4GTUXuq6tTlOwDELFdWZl97h0YxIfIlDzeIhq jDMo/uSRFXq1DzbfAU8pMgishn3zcc9TZrUtuD6Q= Received: by simark.ca (Postfix, from userid 112) id 179081E100; Tue, 5 Aug 2025 13:01:18 -0400 (EDT) Received: from [172.24.0.6] (192-222-132-26.qc.cable.ebox.net [192.222.132.26]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPSA id B0BAE1E091 for ; Tue, 5 Aug 2025 13:01:16 -0400 (EDT) Message-ID: <879f5c1e-14e6-41d4-b3d9-76d6c91bcf15@polymtl.ca> Date: Tue, 5 Aug 2025 13:01:15 -0400 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 1/4] gdbsupport: use dynamic partitioning in gdb::parallel_for_each To: gdb-patches@sourceware.org References: <20250703200130.4095761-1-simon.marchi@polymtl.ca> Content-Language: fr From: Simon Marchi In-Reply-To: <20250703200130.4095761-1-simon.marchi@polymtl.ca> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Poly-FromMTA: (simark.ca [158.69.221.121]) at Tue, 5 Aug 2025 17:01:18 +0000 X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gdb-patches-bounces~public-inbox=simark.ca@sourceware.org Ping. On 7/3/25 4:01 PM, simon.marchi@polymtl.ca wrote: > From: Simon Marchi > > gdb::parallel_for_each uses static partitioning of the workload, meaning > that each worker thread receives a similar number of work items. Change > it to use dynamic partitioning, where worker threads pull work items > from a shared work queue when they need to. > > Note that gdb::parallel_for_each is currently only used for processing > minimal symbols in GDB. I am looking at improving the startup > performance of GDB, where the minimal symbol process is one step. > > With static partitioning, there is a risk of workload imbalance if some > threads receive "easier" work than others. Some threads sit still while > others finish working on their share of the work. This is not > desirable, because the gdb::parallel_for_each takes as long as the > slowest thread takes. > > When loading a file with a lot of minimal symbols (~600k) in GDB, with > "maint set per-command time on", I observe some imbalance: > > Time for "minsyms install worker": wall 0.732, user 0.550, sys 0.041, user+sys 0.591, 80.7 % CPU > Time for "minsyms install worker": wall 0.881, user 0.722, sys 0.071, user+sys 0.793, 90.0 % CPU > Time for "minsyms install worker": wall 2.107, user 1.804, sys 0.147, user+sys 1.951, 92.6 % CPU > Time for "minsyms install worker": wall 2.351, user 2.003, sys 0.151, user+sys 2.154, 91.6 % CPU > Time for "minsyms install worker": wall 2.611, user 2.322, sys 0.235, user+sys 2.557, 97.9 % CPU > Time for "minsyms install worker": wall 3.074, user 2.729, sys 0.203, user+sys 2.932, 95.4 % CPU > Time for "minsyms install worker": wall 3.486, user 3.074, sys 0.260, user+sys 3.334, 95.6 % CPU > Time for "minsyms install worker": wall 3.927, user 3.475, sys 0.336, user+sys 3.811, 97.0 % CPU > ^ > ----´ > > The fastest thread took 0.732 seconds to complete its work (and then sat > still), while the slowest took 3.927 seconds. This means the > parallel_for_each took a bit less than 4 seconds. > > Even if the number of minimal symbols assigned to each worker is the > same, I suppose that some symbols (e.g. those that need demangling) take > longer to process, which could explain the imbalance. > > With this patch, things are much more balanced: > > Time for "minsym install worker": wall 2.807, user 2.222, sys 0.144, user+sys 2.366, 84.3 % CPU > Time for "minsym install worker": wall 2.808, user 2.073, sys 0.131, user+sys 2.204, 78.5 % CPU > Time for "minsym install worker": wall 2.804, user 1.994, sys 0.151, user+sys 2.145, 76.5 % CPU > Time for "minsym install worker": wall 2.808, user 1.977, sys 0.135, user+sys 2.112, 75.2 % CPU > Time for "minsym install worker": wall 2.808, user 2.061, sys 0.142, user+sys 2.203, 78.5 % CPU > Time for "minsym install worker": wall 2.809, user 2.012, sys 0.146, user+sys 2.158, 76.8 % CPU > Time for "minsym install worker": wall 2.809, user 2.178, sys 0.137, user+sys 2.315, 82.4 % CPU > Time for "minsym install worker": wall 2.820, user 2.141, sys 0.170, user+sys 2.311, 82.0 % CPU > ^ > ----´ > > In this version, the parallel_for_each took about 2.8 seconds, > representing a reduction of ~1.2 seconds for this step. Not > life-changing, but it's still good I think. > > Note that this patch helps when loading big programs. My go-to test > program for this is telegram-desktop that I built from source. For > small programs (including loading gdb itself), it makes no perceptible > difference. > > Now the technical bits: > > - One impact that this change has on the minimal symbol processing > specifically is that not all calls to compute_and_set_names (a > critical region guarded by a mutex) are done at the end of each > worker thread's task anymore. > > Before this patch, each thread would compute the names and hash values for > all the minimal symbols it has been assigned, and then would call > compute_and_set_names for all of them, while holding the mutex (thus > preventing other threads from doing this same step). > > With the shared work queue approach, each thread grabs a batch of of > minimal symbols, computes the names and hash values for them, and > then calls compute_and_set_names (with the mutex held) for this batch > only. It then repeats that until the work queue is empty. > > There are therefore more small and spread out compute_and_set_names > critical sections, instead of just one per worker thread at the end. > Given that before this patch the work was not well balanced among worker > threads, I guess that threads would enter that critical region at > roughly different times, causing little contention. > > In the "with this patch" results, the CPU utilization numbers are not > as good, suggesting that there is some contention. But I don't know > if it's contention due to the compute_and_set_names critical section > or the shared work queue critical section. That can be investigated > later. In any case, what ultimately counts is the wall time, which > improves. > > - One choice I had to make was to decide how many work items (in this > case minimal symbols) each worker should pop when getting work from > the shared queue. The general wisdom is that: > > - popping too few items, and the synchronization overhead becomes > significant, and the total processing time increases > - popping too many items, and we get some imbalance back, and the > total processing time increases again > > I experimented using a dynamic batch size proportional to the number > of remaining work items. It worked well in some cases but not > always. So I decided to keep it simple, with a fixed batch size. > That can always be tweaked later. > > - I want to still be able to use scoped_time_it to measure the time > that each worker thread spent working on the task. I find it really > handy when measuring the performance impact of changes. > > Unfortunately, the current interface of gdb::parallel_for_each, which > receives a simple callback, is not well-suited for that, once I > introduce the dynamic partitioning. The callback would get called > once for each work item batch (multiple time for each worker thread), > so it's not possible to maintain a per-worker thread object for the > duration of the parallel for. > > To allow this, I changed gdb::parallel_for_each to receive a worker > type as a template parameter. Each worker thread creates one local > instance of that type, and calls operator() on it for each work item > batch. By having a scoped_time_it object as a field of that worker, > we can get the timings per worker thread. > > The drawbacks of this approach is that we must now define the > parallel task in a separate class and manually capture any context we > need as fields of that class. > > Change-Id: Ibf1fea65c91f76a95b9ed8f706fd6fa5ef52d9cf > --- > gdb/minsyms.c | 133 ++++++++++++-------- > gdb/unittests/parallel-for-selftests.c | 22 +++- > gdbsupport/parallel-for.h | 164 ++++++++++++------------- > 3 files changed, 178 insertions(+), 141 deletions(-) > > diff --git a/gdb/minsyms.c b/gdb/minsyms.c > index 4a6459a6f2d2..e353a8a9ba00 100644 > --- a/gdb/minsyms.c > +++ b/gdb/minsyms.c > @@ -1392,6 +1392,81 @@ build_minimal_symbol_hash_tables > } > } > > +/* gdb::parallel_for_each worker to compute minimal symbol names and hashes. */ > + > +class minimal_symbol_install_worker > +{ > +public: > + minimal_symbol_install_worker > + (minimal_symbol *msymbols, > + gdb::array_view hash_values, > + objfile_per_bfd_storage *per_bfd, > + std::mutex &demangled_mutex) > + : m_time_it ("minsym install worker"), > + m_msymbols (msymbols), > + m_hash_values (hash_values), > + m_per_bfd (per_bfd), > + m_demangled_mutex (demangled_mutex) > + {} > + > + void operator() (minimal_symbol *start, minimal_symbol *end) noexcept > + { > + for (minimal_symbol *msym = start; msym < end; ++msym) > + { > + size_t idx = msym - m_msymbols; > + m_hash_values[idx].name_length = strlen (msym->linkage_name ()); > + > + if (!msym->name_set) > + { > + /* This will be freed later, by compute_and_set_names. */ > + gdb::unique_xmalloc_ptr demangled_name > + = symbol_find_demangled_name (msym, msym->linkage_name ()); > + msym->set_demangled_name (demangled_name.release (), > + &m_per_bfd->storage_obstack); > + msym->name_set = 1; > + } > + > + /* This mangled_name_hash computation has to be outside of > + the name_set check, or compute_and_set_names below will > + be called with an invalid hash value. */ > + m_hash_values[idx].mangled_name_hash > + = fast_hash (msym->linkage_name (), m_hash_values[idx].name_length); > + m_hash_values[idx].minsym_hash = msymbol_hash (msym->linkage_name ()); > + > + /* We only use this hash code if the search name differs > + from the linkage name. See the code in > + build_minimal_symbol_hash_tables. */ > + if (msym->search_name () != msym->linkage_name ()) > + m_hash_values[idx].minsym_demangled_hash > + = search_name_hash (msym->language (), msym->search_name ()); > + } > + > + { > + /* To limit how long we hold the lock, we only acquire it here > + and not while we demangle the names above. */ > +#if CXX_STD_THREAD > + std::lock_guard guard (m_demangled_mutex); > +#endif > + for (minimal_symbol *msym = start; msym < end; ++msym) > + { > + size_t idx = msym - m_msymbols; > + std::string_view name (msym->linkage_name (), > + m_hash_values[idx].name_length); > + hashval_t hashval = m_hash_values[idx].mangled_name_hash; > + > + msym->compute_and_set_names (name, false, m_per_bfd, hashval); > + } > + } > + } > + > +private: > + scoped_time_it m_time_it; > + minimal_symbol *m_msymbols; > + gdb::array_view m_hash_values; > + objfile_per_bfd_storage *m_per_bfd; > + std::mutex &m_demangled_mutex; > +}; > + > /* Add the minimal symbols in the existing bunches to the objfile's official > minimal symbol table. In most cases there is no minimal symbol table yet > for this objfile, and the existing bunches are used to create one. Once > @@ -1478,59 +1553,11 @@ minimal_symbol_reader::install () > std::vector hash_values (mcount); > > msymbols = m_objfile->per_bfd->msymbols.get (); > - /* Arbitrarily require at least 10 elements in a thread. */ > - gdb::parallel_for_each<10> (&msymbols[0], &msymbols[mcount], > - [&] (minimal_symbol *start, minimal_symbol *end) > - { > - scoped_time_it time_it ("minsyms install worker"); > - > - for (minimal_symbol *msym = start; msym < end; ++msym) > - { > - size_t idx = msym - msymbols; > - hash_values[idx].name_length = strlen (msym->linkage_name ()); > - if (!msym->name_set) > - { > - /* This will be freed later, by compute_and_set_names. */ > - gdb::unique_xmalloc_ptr demangled_name > - = symbol_find_demangled_name (msym, msym->linkage_name ()); > - msym->set_demangled_name > - (demangled_name.release (), > - &m_objfile->per_bfd->storage_obstack); > - msym->name_set = 1; > - } > - /* This mangled_name_hash computation has to be outside of > - the name_set check, or compute_and_set_names below will > - be called with an invalid hash value. */ > - hash_values[idx].mangled_name_hash > - = fast_hash (msym->linkage_name (), > - hash_values[idx].name_length); > - hash_values[idx].minsym_hash > - = msymbol_hash (msym->linkage_name ()); > - /* We only use this hash code if the search name differs > - from the linkage name. See the code in > - build_minimal_symbol_hash_tables. */ > - if (msym->search_name () != msym->linkage_name ()) > - hash_values[idx].minsym_demangled_hash > - = search_name_hash (msym->language (), msym->search_name ()); > - } > - { > - /* To limit how long we hold the lock, we only acquire it here > - and not while we demangle the names above. */ > -#if CXX_STD_THREAD > - std::lock_guard guard (demangled_mutex); > -#endif > - for (minimal_symbol *msym = start; msym < end; ++msym) > - { > - size_t idx = msym - msymbols; > - msym->compute_and_set_names > - (std::string_view (msym->linkage_name (), > - hash_values[idx].name_length), > - false, > - m_objfile->per_bfd, > - hash_values[idx].mangled_name_hash); > - } > - } > - }); > + > + gdb::parallel_for_each<1000, minimal_symbol *, minimal_symbol_install_worker> > + (&msymbols[0], &msymbols[mcount], msymbols, > + gdb::array_view (hash_values), > + m_objfile->per_bfd, demangled_mutex); > > build_minimal_symbol_hash_tables (m_objfile, hash_values); > } > diff --git a/gdb/unittests/parallel-for-selftests.c b/gdb/unittests/parallel-for-selftests.c > index f5456141ff6e..86bf06c073a6 100644 > --- a/gdb/unittests/parallel-for-selftests.c > +++ b/gdb/unittests/parallel-for-selftests.c > @@ -91,12 +91,30 @@ test_one (int n_threads, do_foreach_t do_foreach) > static void > test_parallel_for_each () > { > + struct test_worker > + { > + /* DUMMY is there to test passing multiple arguments to the worker > + constructor. */ > + test_worker (foreach_callback_t callback, int dummy) > + : m_callback (callback) > + { > + } > + > + void operator() (int first, int last) noexcept > + { > + return m_callback (first, last); > + } > + > + private: > + foreach_callback_t m_callback; > + }; > + > const std::vector for_each_functions > { > [] (int start, int end, foreach_callback_t callback) > - { gdb::parallel_for_each<1> (start, end, callback); }, > + { gdb::parallel_for_each<1, int, test_worker> (start, end, callback, 0); }, > [] (int start, int end, foreach_callback_t callback) > - { gdb::sequential_for_each (start, end, callback);} > + { gdb::sequential_for_each (start, end, callback, 0); }, > }; > > for (int n_threads : { 0, 1, 3 }) > diff --git a/gdbsupport/parallel-for.h b/gdbsupport/parallel-for.h > index b74c8068cf2c..bb41eb8700f0 100644 > --- a/gdbsupport/parallel-for.h > +++ b/gdbsupport/parallel-for.h > @@ -21,115 +21,107 @@ > #define GDBSUPPORT_PARALLEL_FOR_H > > #include > -#include > +#include > #include "gdbsupport/thread-pool.h" > -#include "gdbsupport/function-view.h" > > namespace gdb > { > > -/* A very simple "parallel for". This splits the range of iterators > - into subranges, and then passes each subrange to the callback. The > - work may or may not be done in separate threads. > +/* A "parallel-for" implementation using a shared work queue. Work items get > + popped in batches of size up to BATCH_SIZE from the queue and handed out to > + worker threads. > > - This approach was chosen over having the callback work on single > - items because it makes it simple for the caller to do > - once-per-subrange initialization and destruction. > + Each worker thread instantiates an object of type Worker, forwarding ARGS to > + its constructor. The Worker object can be used to keep some per-worker > + thread state. > > - The parameter N says how batching ought to be done -- there will be > - at least N elements processed per thread. Setting N to 0 is not > - allowed. */ > + Worker threads call Worker::operator() repeatedly until the queue is > + empty. */ > > -template > +template + class... WorkerArgs> > void > -parallel_for_each (RandomIt first, RandomIt last, RangeFunction callback) > +parallel_for_each (const RandomIt first, const RandomIt last, > + WorkerArgs &&...worker_args) > { > /* If enabled, print debug info about how the work is distributed across > the threads. */ > const bool parallel_for_each_debug = false; > > - size_t n_worker_threads = thread_pool::g_thread_pool->thread_count (); > - size_t n_threads = n_worker_threads; > - size_t n_elements = last - first; > - size_t elts_per_thread = 0; > - size_t elts_left_over = 0; > + gdb_assert (first <= last); > > - if (n_threads > 1) > + if (parallel_for_each_debug) > { > - /* Require that there should be at least N elements in a > - thread. */ > - gdb_assert (n > 0); > - if (n_elements / n_threads < n) > - n_threads = std::max (n_elements / n, (size_t) 1); > - elts_per_thread = n_elements / n_threads; > - elts_left_over = n_elements % n_threads; > - /* n_elements == n_threads * elts_per_thread + elts_left_over. */ > + debug_printf ("Parallel for: n elements: %zu\n", > + static_cast (last - first)); > + debug_printf ("Parallel for: batch size: %zu\n", batch_size); > } > > - size_t count = n_threads == 0 ? 0 : n_threads - 1; > + const size_t n_worker_threads > + = std::max (thread_pool::g_thread_pool->thread_count (), 1); > std::vector> results; > > - if (parallel_for_each_debug) > - { > - debug_printf (_("Parallel for: n_elements: %zu\n"), n_elements); > - debug_printf (_("Parallel for: minimum elements per thread: %zu\n"), n); > - debug_printf (_("Parallel for: elts_per_thread: %zu\n"), elts_per_thread); > - } > + /* The next item to hand out. */ > + std::atomic next = first; > > - for (int i = 0; i < count; ++i) > - { > - RandomIt end; > - end = first + elts_per_thread; > - if (i < elts_left_over) > - /* Distribute the leftovers over the worker threads, to avoid having > - to handle all of them in a single thread. */ > - end++; > - > - /* This case means we don't have enough elements to really > - distribute them. Rather than ever submit a task that does > - nothing, we short-circuit here. */ > - if (first == end) > - end = last; > - > - if (end == last) > - { > - /* We're about to dispatch the last batch of elements, which > - we normally process in the main thread. So just truncate > - the result list here. This avoids submitting empty tasks > - to the thread pool. */ > - count = i; > - break; > - } > + /* The worker thread task. > + > + We need to capture args as a tuple, because it's not possible to capture > + the parameter pack directly in C++17. Once we migrate to C++20, the > + capture can be simplified to: > > - if (parallel_for_each_debug) > + ... args = std::forward(args) > + > + and `args` can be used as-is in the lambda. */ > + auto args_tuple > + = std::forward_as_tuple (std::forward (worker_args)...); > + auto task = > + [&next, first, last, n_worker_threads, &args_tuple] () { > + /* Instantiate the user-defined worker. */ > + auto worker = std::make_from_tuple (args_tuple); > + > + for (;;) > { > - debug_printf (_("Parallel for: elements on worker thread %i\t: %zu"), > - i, (size_t)(end - first)); > - debug_printf (_("\n")); > + /* Grab a snapshot of NEXT. */ > + auto local_next = next.load (); > + gdb_assert (local_next <= last); > + > + /* Number of remaining items. */ > + auto n_remaining = last - local_next; > + gdb_assert (n_remaining >= 0); > + > + /* Are we done? */ > + if (n_remaining == 0) > + break; > + > + const auto this_batch_size > + = std::min (batch_size, n_remaining); > + > + /* The range to process in this iteration. */ > + const auto this_batch_first = local_next; > + const auto this_batch_last = local_next + this_batch_size; > + > + /* Update NEXT. If the current value of NEXT doesn't match > + LOCAL_NEXT, it means another thread updated it concurrently, > + restart. */ > + if (!next.compare_exchange_weak (local_next, this_batch_last)) > + continue; > + > + if (parallel_for_each_debug) > + debug_printf ("Processing %zu items, range [%zu, %zu[\n", > + this_batch_size, > + static_cast (this_batch_first - first), > + static_cast (this_batch_last - first)); > + > + worker (this_batch_first, this_batch_last); > } > - results.push_back (gdb::thread_pool::g_thread_pool->post_task ([=] () > - { > - return callback (first, end); > - })); > - first = end; > - } > - > - for (int i = count; i < n_worker_threads; ++i) > - if (parallel_for_each_debug) > - { > - debug_printf (_("Parallel for: elements on worker thread %i\t: 0"), i); > - debug_printf (_("\n")); > - } > + }; > > - /* Process all the remaining elements in the main thread. */ > - if (parallel_for_each_debug) > - { > - debug_printf (_("Parallel for: elements on main thread\t\t: %zu"), > - (size_t)(last - first)); > - debug_printf (_("\n")); > - } > - callback (first, last); > + /* Start N_WORKER_THREADS tasks. */ > + for (int i = 0; i < n_worker_threads; ++i) > + results.push_back (gdb::thread_pool::g_thread_pool->post_task (task)); > > + /* Wait for all of them to be finished. */ > for (auto &fut : results) > fut.get (); > } > @@ -138,11 +130,11 @@ parallel_for_each (RandomIt first, RandomIt last, RangeFunction callback) > when debugging multi-threading behavior, and you want to limit > multi-threading in a fine-grained way. */ > > -template > +template > void > -sequential_for_each (RandomIt first, RandomIt last, RangeFunction callback) > +sequential_for_each (RandomIt first, RandomIt last, WorkerArgs &&...worker_args) > { > - callback (first, last); > + Worker (std::forward (worker_args)...) (first, last); > } > > } > > base-commit: b7ff16c68a2c0bacc0416c4b36a44e65888ce72b > -- > 2.50.0 >