From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca by simark.ca with LMTP id uk5kLjY20WLpWhMAWB0awg (envelope-from ) for ; Fri, 15 Jul 2022 05:41:10 -0400 Received: by simark.ca (Postfix, from userid 112) id A81211E5EA; Fri, 15 Jul 2022 05:41:10 -0400 (EDT) Authentication-Results: simark.ca; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=kQDMOdkN; dkim-atps=neutral X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on simark.ca X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,RDNS_DYNAMIC,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 Received: from sourceware.org (ip-8-43-85-97.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPS id C47CC1E13B for ; Fri, 15 Jul 2022 05:41:03 -0400 (EDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 29115385BACE for ; Fri, 15 Jul 2022 09:41:03 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 29115385BACE DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1657878063; bh=8TY6s6s/rep0xxjwnURzRx3CZueSuVfP9uhHxfrelW4=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=kQDMOdkNoLAYjTbk/PBL8ap5Z3Em1XZklDUZpdz1gC7j4oDbS69BHzqLMRgIMYMcz crjLPrP0so0edYj6GdhooxA6P0v1bmNPdwoCjZ95g/DriKFtkJbJ6Nvob845s/+oJJ R+KCRaghDjcAwXWdcT+L8h+bOlpf8cKxl8nS79Cw= Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id CE6683857BB6 for ; Fri, 15 Jul 2022 09:40:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CE6683857BB6 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 0492134411 for ; Fri, 15 Jul 2022 09:40:42 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id E586913754 for ; Fri, 15 Jul 2022 09:40:41 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id JLPVNhk20WIxRgAAMHmgww (envelope-from ) for ; Fri, 15 Jul 2022 09:40:41 +0000 Date: Fri, 15 Jul 2022 11:40:35 +0200 To: gdb-patches@sourceware.org Subject: [RFC][gdbsupport] Improve thread scheduling in parallel_for_each Message-ID: <20220715094034.GA10751@delia.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.10.1 (2018-07-13) X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Tom de Vries via Gdb-patches Reply-To: Tom de Vries Errors-To: gdb-patches-bounces+public-inbox=simark.ca@sourceware.org Sender: "Gdb-patches" Hi, In https://sourceware.org/pipermail/gdb-patches/2022-July/190757.html I added some debug statements to the thread scheduling in parallel_for_each and noticed that only 3 out of 4 worker threads were used: ... Parallel for: n_elements: 7271 Parallel for: minimum elements per thread: 10 Parallel for: elts_per_thread: 1817 Parallel for: worker threads: 4 Parallel for: worker threads used: 3 Parallel for: elements 0-1816 on thread 0: 1817 Parallel for: elements 1817-3633 on thread 1: 1817 Parallel for: elements 3634-5450 on thread 2: 1817 Parallel for: elements 5451-7270 on main thread: 1820 ... This simple fix: ... - size_t count = n_threads == 0 ? 0 : n_threads - 1; + size_t count = n_threads == 0 ? 0 : n_threads; ... would give thread 3 a share of the work, such that we have instead: ... Parallel for: worker threads used: 4 Parallel for: elements 0-1816 on thread 0: 1817 Parallel for: elements 1817-3633 on thread 1: 1817 Parallel for: elements 3634-5450 on thread 2: 1817 Parallel for: elements 5451-7267 on thread 3: 1817 Parallel for: elements 7268-7270 on main thread: 3 ... but that would leave the main thread with very few iterations (always less than the number of worker threads). Fix this instead by also counting the main thread for distributing work to, such that we have instead: ... Parallel for: elts_per_thread: 1454 Parallel for: worker threads: 4 Parallel for: worker threads used: 4 Parallel for: elements 0-1453 on thread 0: 1454 Parallel for: elements 1454-2907 on thread 1: 1454 Parallel for: elements 2908-4361 on thread 2: 1454 Parallel for: elements 4362-5815 on thread 3: 1454 Parallel for: elements 5816-7270 on main thread: 1455 ... This introduces a performance regression on a particular test-case I happened to use: ... $ for n in $(seq 1 10); do \ time gdb -q -batch libxul.so.debug 2>&1 | grep real:; \ done ... so revert to the original schedule by reducing the worker threads: ... n_threads = std::thread::hardware_concurrency (); + if (n_threads > 0) + /* Account for main thread. */ + n_threads--; ... So, it looks like the only thing we've achieved is not have an idle worker thread anymore. Redoing the performance experiment still yields a slight performance loss. Try to improve things a tiny bit by spreading the left-over elements over all threads rather than just the main thread: ... Parallel for: elements 0-1817 on thread 0: 1818 Parallel for: elements 1818-3635 on thread 1: 1818 Parallel for: elements 3636-5453 on thread 2: 1818 Parallel for: elements 5454-7270 on main thread: 1817 ... Still, the performance experiment yields a slight performance loss. Tested on x86_64-linux. Any comments? Thanks, - Tom [gdbsupport] Improve thread scheduling in parallel_for_each --- gdb/maint.c | 7 ++++++- gdbsupport/parallel-for.h | 27 ++++++++++++++++----------- 2 files changed, 22 insertions(+), 12 deletions(-) diff --git a/gdb/maint.c b/gdb/maint.c index 289560957f2..cc3be7b7d82 100644 --- a/gdb/maint.c +++ b/gdb/maint.c @@ -850,7 +850,12 @@ update_thread_pool_size () int n_threads = n_worker_threads; if (n_threads < 0) - n_threads = std::thread::hardware_concurrency (); + { + n_threads = std::thread::hardware_concurrency (); + if (n_threads > 0) + /* Account for main thread. */ + n_threads--; + } gdb::thread_pool::g_thread_pool->set_thread_count (n_threads); #endif diff --git a/gdbsupport/parallel-for.h b/gdbsupport/parallel-for.h index a614fc35766..2bf8de25bc9 100644 --- a/gdbsupport/parallel-for.h +++ b/gdbsupport/parallel-for.h @@ -139,25 +139,30 @@ parallel_for_each (unsigned n, RandomIt first, RandomIt last, using result_type = typename std::result_of::type; - size_t n_threads = thread_pool::g_thread_pool->thread_count (); + size_t n_threads + = (thread_pool::g_thread_pool->thread_count () + /* Also count the main thread as a thread to distribute work over. */ + + 1); size_t n_elements = last - first; size_t elts_per_thread = 0; - if (n_threads > 1) - { - /* Require that there should be at least N elements in a - thread. */ - gdb_assert (n > 0); - if (n_elements / n_threads < n) - n_threads = std::max (n_elements / n, (size_t) 1); - elts_per_thread = n_elements / n_threads; - } - size_t count = n_threads == 0 ? 0 : n_threads - 1; + /* Require that there should be at least N elements in a + thread. */ + gdb_assert (n > 0); + if (n_elements / n_threads < n) + n_threads = std::max (n_elements / n, (size_t) 1); + elts_per_thread = n_elements / n_threads; + gdb_assert (elts_per_thread >= n || n_threads == 1); + size_t left_over = n_elements % n_threads; + + size_t count = n_threads - 1; gdb::detail::par_for_accumulator results (count); for (int i = 0; i < count; ++i) { RandomIt end = first + elts_per_thread; + if (i < left_over) + end++; results.post (i, [=] () { return callback (first, end);