From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-bounces+public-inbox=simark.ca@sourceware.org>
Received: from simark.ca
	by simark.ca with LMTP
	id uk5kLjY20WLpWhMAWB0awg
	(envelope-from <gdb-patches-bounces+public-inbox=simark.ca@sourceware.org>)
	for <public-inbox@simark.ca>; Fri, 15 Jul 2022 05:41:10 -0400
Received: by simark.ca (Postfix, from userid 112)
	id A81211E5EA; Fri, 15 Jul 2022 05:41:10 -0400 (EDT)
Authentication-Results: simark.ca;
	dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=kQDMOdkN;
	dkim-atps=neutral
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on simark.ca
X-Spam-Level: 
X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RDNS_DYNAMIC,URIBL_BLOCKED
	autolearn=ham autolearn_force=no version=3.4.6
Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
	(No client certificate requested)
	by simark.ca (Postfix) with ESMTPS id C47CC1E13B
	for <public-inbox@simark.ca>; Fri, 15 Jul 2022 05:41:03 -0400 (EDT)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 29115385BACE
	for <public-inbox@simark.ca>; Fri, 15 Jul 2022 09:41:03 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 29115385BACE
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org;
	s=default; t=1657878063;
	bh=8TY6s6s/rep0xxjwnURzRx3CZueSuVfP9uhHxfrelW4=;
	h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post:
	 List-Help:List-Subscribe:From:Reply-To:From;
	b=kQDMOdkNoLAYjTbk/PBL8ap5Z3Em1XZklDUZpdz1gC7j4oDbS69BHzqLMRgIMYMcz
	 crjLPrP0so0edYj6GdhooxA6P0v1bmNPdwoCjZ95g/DriKFtkJbJ6Nvob845s/+oJJ
	 R+KCRaghDjcAwXWdcT+L8h+bOlpf8cKxl8nS79Cw=
Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28])
 by sourceware.org (Postfix) with ESMTPS id CE6683857BB6
 for <gdb-patches@sourceware.org>; Fri, 15 Jul 2022 09:40:42 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CE6683857BB6
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512)
 (No client certificate requested)
 by smtp-out1.suse.de (Postfix) with ESMTPS id 0492134411
 for <gdb-patches@sourceware.org>; Fri, 15 Jul 2022 09:40:42 +0000 (UTC)
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512)
 (No client certificate requested)
 by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id E586913754
 for <gdb-patches@sourceware.org>; Fri, 15 Jul 2022 09:40:41 +0000 (UTC)
Received: from dovecot-director2.suse.de ([192.168.254.65])
 by imap2.suse-dmz.suse.de with ESMTPSA id JLPVNhk20WIxRgAAMHmgww
 (envelope-from <tdevries@suse.de>)
 for <gdb-patches@sourceware.org>; Fri, 15 Jul 2022 09:40:41 +0000
Date: Fri, 15 Jul 2022 11:40:35 +0200
To: gdb-patches@sourceware.org
Subject: [RFC][gdbsupport] Improve thread scheduling in parallel_for_each
Message-ID: <20220715094034.GA10751@delia.home>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.10.1 (2018-07-13)
X-BeenThere: gdb-patches@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gdb-patches mailing list <gdb-patches.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=subscribe>
From: Tom de Vries via Gdb-patches <gdb-patches@sourceware.org>
Reply-To: Tom de Vries <tdevries@suse.de>
Errors-To: gdb-patches-bounces+public-inbox=simark.ca@sourceware.org
Sender: "Gdb-patches"
 <gdb-patches-bounces+public-inbox=simark.ca@sourceware.org>

Hi,

In https://sourceware.org/pipermail/gdb-patches/2022-July/190757.html I added
some debug statements to the thread scheduling in parallel_for_each and
noticed that only 3 out of 4 worker threads were used:
...
Parallel for: n_elements: 7271
Parallel for: minimum elements per thread: 10
Parallel for: elts_per_thread: 1817
Parallel for: worker threads: 4
Parallel for: worker threads used: 3
Parallel for: elements 0-1816 on thread 0: 1817
Parallel for: elements 1817-3633 on thread 1: 1817
Parallel for: elements 3634-5450 on thread 2: 1817
Parallel for: elements 5451-7270 on main thread: 1820
...

This simple fix:
...
-  size_t count = n_threads == 0 ? 0 : n_threads - 1;
+  size_t count = n_threads == 0 ? 0 : n_threads;
...
would give thread 3 a share of the work, such that we have instead:
...
Parallel for: worker threads used: 4
Parallel for: elements 0-1816 on thread 0: 1817
Parallel for: elements 1817-3633 on thread 1: 1817
Parallel for: elements 3634-5450 on thread 2: 1817
Parallel for: elements 5451-7267 on thread 3: 1817
Parallel for: elements 7268-7270 on main thread: 3
...
but that would leave the main thread with very few iterations (always less
than the number of worker threads).

Fix this instead by also counting the main thread for distributing work to,
such that we have instead:
...
Parallel for: elts_per_thread: 1454
Parallel for: worker threads: 4
Parallel for: worker threads used: 4
Parallel for: elements 0-1453 on thread 0: 1454
Parallel for: elements 1454-2907 on thread 1: 1454
Parallel for: elements 2908-4361 on thread 2: 1454
Parallel for: elements 4362-5815 on thread 3: 1454
Parallel for: elements 5816-7270 on main thread: 1455
...

This introduces a performance regression on a particular test-case I happened
to use:
...
$ for n in $(seq 1 10); do \
    time gdb -q -batch libxul.so.debug 2>&1 | grep real:; \
  done
...
so revert to the original schedule by reducing the worker threads:
...
     n_threads = std::thread::hardware_concurrency ();
+    if (n_threads > 0)
+      /* Account for main thread.  */
+      n_threads--;
...

So, it looks like the only thing we've achieved is not have an idle worker
thread anymore.

Redoing the performance experiment still yields a slight performance loss.

Try to improve things a tiny bit by spreading the left-over elements over
all threads rather than just the main thread:
...
Parallel for: elements 0-1817 on thread 0: 1818
Parallel for: elements 1818-3635 on thread 1: 1818
Parallel for: elements 3636-5453 on thread 2: 1818
Parallel for: elements 5454-7270 on main thread: 1817
...

Still, the performance experiment yields a slight performance loss.

Tested on x86_64-linux.

Any comments?

Thanks,
- Tom

[gdbsupport] Improve thread scheduling in parallel_for_each

---
 gdb/maint.c               |  7 ++++++-
 gdbsupport/parallel-for.h | 27 ++++++++++++++++-----------
 2 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/gdb/maint.c b/gdb/maint.c
index 289560957f2..cc3be7b7d82 100644
--- a/gdb/maint.c
+++ b/gdb/maint.c
@@ -850,7 +850,12 @@ update_thread_pool_size ()
   int n_threads = n_worker_threads;
 
   if (n_threads < 0)
-    n_threads = std::thread::hardware_concurrency ();
+    {
+      n_threads = std::thread::hardware_concurrency ();
+      if (n_threads > 0)
+	/* Account for main thread.  */
+	n_threads--;
+    }
 
   gdb::thread_pool::g_thread_pool->set_thread_count (n_threads);
 #endif
diff --git a/gdbsupport/parallel-for.h b/gdbsupport/parallel-for.h
index a614fc35766..2bf8de25bc9 100644
--- a/gdbsupport/parallel-for.h
+++ b/gdbsupport/parallel-for.h
@@ -139,25 +139,30 @@ parallel_for_each (unsigned n, RandomIt first, RandomIt last,
   using result_type
     = typename std::result_of<RangeFunction (RandomIt, RandomIt)>::type;
 
-  size_t n_threads = thread_pool::g_thread_pool->thread_count ();
+  size_t n_threads
+    = (thread_pool::g_thread_pool->thread_count ()
+       /* Also count the main thread as a thread to distribute work over.  */
+       + 1);
   size_t n_elements = last - first;
   size_t elts_per_thread = 0;
-  if (n_threads > 1)
-    {
-      /* Require that there should be at least N elements in a
-	 thread.  */
-      gdb_assert (n > 0);
-      if (n_elements / n_threads < n)
-	n_threads = std::max (n_elements / n, (size_t) 1);
-      elts_per_thread = n_elements / n_threads;
-    }
 
-  size_t count = n_threads == 0 ? 0 : n_threads - 1;
+  /* Require that there should be at least N elements in a
+     thread.  */
+  gdb_assert (n > 0);
+  if (n_elements / n_threads < n)
+    n_threads = std::max (n_elements / n, (size_t) 1);
+  elts_per_thread = n_elements / n_threads;
+  gdb_assert (elts_per_thread >= n || n_threads == 1);
+  size_t left_over = n_elements % n_threads;
+
+  size_t count = n_threads - 1;
   gdb::detail::par_for_accumulator<result_type> results (count);
 
   for (int i = 0; i < count; ++i)
     {
       RandomIt end = first + elts_per_thread;
+      if (i < left_over)
+	end++;
       results.post (i, [=] ()
         {
 	  return callback (first, end);