From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca by simark.ca with LMTP id as7KJl+O2WIjKBgAWB0awg (envelope-from ) for ; Thu, 21 Jul 2022 13:35:27 -0400 Received: by simark.ca (Postfix, from userid 112) id 8D2731E5EA; Thu, 21 Jul 2022 13:35:27 -0400 (EDT) X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on simark.ca X-Spam-Level: X-Spam-Status: No, score=-3.9 required=5.0 tests=BAYES_00,MAILING_LIST_MULTI, NICE_REPLY_A autolearn=ham autolearn_force=no version=3.4.6 Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPS id D98CF1E13B for ; Thu, 21 Jul 2022 13:35:26 -0400 (EDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 55EE1385840B for ; Thu, 21 Jul 2022 17:35:26 +0000 (GMT) Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com [209.85.128.53]) by sourceware.org (Postfix) with ESMTPS id 4D4403858C56 for ; Thu, 21 Jul 2022 17:35:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4D4403858C56 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=palves.net Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-wm1-f53.google.com with SMTP id f24-20020a1cc918000000b003a30178c022so3761193wmb.3 for ; Thu, 21 Jul 2022 10:35:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=eE4bt1y5rbB7wKpBZa0FPuoL78VWui3JPJUIN8IfvR4=; b=AFy1cr0dhYPWhFoAE1SYM9glvIBUEjZuywdPKNZi0G6+Fp+DdtvUO6nxXv7wUjP7MF gIFCniMhlpFKno7oftv76oLhcoJz9Wd1ydAr/pOtCpnER9XD2Pag5O2gf0npmUmEe9hk dBYo2+Jkz7CBEBq5lSflSHdHmKtc5EUl/gzrWtx1/R1l+sTROYg/YGsBDLAEh5OK7aSU WDQDMOO5Kjd2qybpaQJ2maNaMYny8jW4AgQ0KXDxSfT6VxicjRqGknjE/x/cC5xv3IbV JM26bRHW3T5qdQMeFM6VJQ72iu00Zv25p1Ih31m4iYAkxJLBC0mwveqMhs27AVwPnqnr MP7Q== X-Gm-Message-State: AJIora/01x7Qq7cXC9yO3DfI/tOL355GyagSl+zwFFP+o6b1LolbCh24 qwvQsjLW0ay+nm6llZJHOwUvkRKF88I= X-Google-Smtp-Source: AGRyM1t1afm3W/eo18XwRoV71b51YKWcCUzw3GxDXG51fTtFqcJgbkbMtNeoNL/eyscVHugLT8NZ+w== X-Received: by 2002:a05:600c:1991:b0:3a3:274f:21b0 with SMTP id t17-20020a05600c199100b003a3274f21b0mr9474657wmq.72.1658424912084; Thu, 21 Jul 2022 10:35:12 -0700 (PDT) Received: from ?IPv6:2001:8a0:f924:2600:209d:85e2:409e:8726? ([2001:8a0:f924:2600:209d:85e2:409e:8726]) by smtp.gmail.com with ESMTPSA id f8-20020a05600c4e8800b003a31673515bsm7527855wmq.7.2022.07.21.10.35.10 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 21 Jul 2022 10:35:10 -0700 (PDT) Subject: Re: [PATCH][gdbsupport] Use task size in parallel_for_each To: Tom de Vries , gdb-patches@sourceware.org References: <20220718194219.GA16823@delia.home> From: Pedro Alves Message-ID: <4fc23fcd-c15d-7622-8b51-cc48cd3cba16@palves.net> Date: Thu, 21 Jul 2022 18:35:09 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.1 MIME-Version: 1.0 In-Reply-To: <20220718194219.GA16823@delia.home> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Tom Tromey Errors-To: gdb-patches-bounces+public-inbox=simark.ca@sourceware.org Sender: "Gdb-patches" On 2022-07-18 8:42 p.m., Tom de Vries via Gdb-patches wrote: > Hi, > > Ensuring a fair distribution over the worker threads and main thread in terms > of number of CUs might not be the most efficient way, given that CUs can vary > in size. > > Fix this by: > - adding a task_size_ptr parameter to parallel_for_each, > defaulting to nullptr, > - using per_cu->get_length () as the task size in the parallel_for_each > in dwarf2_build_psymtabs_hard, and > - using the task size in parallel_for_each to distribute similarly-sized tasks > to the threads. > > I've used this experiment to verify the performance impact: > ... > $ for n in $(seq 1 10); do \ > time gdb -q -batch ~/firefox/libxul.so-93.0-1.1.x86_64.debug \ > 2>&1 \ > | grep "real:"; \ > done > ... > and without the patch got: > ... > real: 4.71 > real: 4.88 > real: 4.29 > real: 4.30 > real: 4.65 > real: 4.27 > real: 4.27 > real: 4.27 > real: 4.75 > real: 4.41 > ... > and with the patch: > ... > real: 3.68 > real: 3.81 > real: 3.80 > real: 3.68 > real: 3.75 > real: 3.69 > real: 3.69 > real: 3.74 > real: 3.67 > real: 3.74 > ... > so that seems a reasonable improvement. Nice. > > With parallel_for_each_debug set to true, we get some more details about the > difference in behaviour. Without the patch we have: > ... > Parallel for: n_elements: 2818 > Parallel for: minimum elements per thread: 1 > Parallel for: elts_per_thread: 704 > Parallel for: elements on worker thread 0 : 705 > Parallel for: elements on worker thread 1 : 705 > Parallel for: elements on worker thread 2 : 704 > Parallel for: elements on worker thread 3 : 0 > Parallel for: elements on main thread : 704 > ... > and with the patch: > ... > Parallel for: n_elements: 2818 > Parallel for: total_size: 1483674865 > Parallel for: size_per_thread: 370918716 > Parallel for: elements on worker thread 0 : 752 (size: 371811790) > Parallel for: elements on worker thread 1 : 360 (size: 371509370) > Parallel for: elements on worker thread 2 : 1130 (size: 372681710) > Parallel for: elements on worker thread 3 : 0 (size: 0) > Parallel for: elements on main thread : 576 (size: 367671995) > ... > > Tested on x86_64-linux. > > Any comments? I'll defer to Tromey. Just a passerby comment, and a comment on the code itself if we follow this approach, below. My passerby comment is that I wonder whether we should consider switching to a work stealing implementation, so that threads that are done with their chunk would grab another chunk from the work pool. I think this would spare us from having to worry about these distribution heuristics. > @@ -7059,6 +7059,13 @@ dwarf2_build_psymtabs_hard (dwarf2_per_objfile *per_objfile) > > using iter_type = decltype (per_bfd->all_comp_units.begin ()); > > + std::function task_size > + = [=] (iter_type iter) > + { > + dwarf2_per_cu_data *per_cu = iter->get (); > + return per_cu->length (); > + }; > + > /* Each thread returns a pair holding a cooked index, and a vector > of errors that should be printed. The latter is done because > GDB's I/O system is not thread-safe. run_on_main_thread could be > @@ -7087,7 +7094,7 @@ dwarf2_build_psymtabs_hard (dwarf2_per_objfile *per_objfile) > } > } > return result_type (thread_storage.release (), std::move (errors)); > - }); > + }, &task_size); > > /* Only show a given exception a single time. */ > std::unordered_set seen_exceptions; > diff --git a/gdbsupport/parallel-for.h b/gdbsupport/parallel-for.h > index bf40f125f0f..3c9269574df 100644 > --- a/gdbsupport/parallel-for.h > +++ b/gdbsupport/parallel-for.h > @@ -134,7 +134,9 @@ typename gdb::detail::par_for_accumulator< > typename std::result_of::type > >::result_type > parallel_for_each (unsigned n, RandomIt first, RandomIt last, > - RangeFunction callback) > + RangeFunction callback, > + std::function *task_size_ptr > + = (std::function *)nullptr) That use of a std::function pointer looks odd. AFAICT, TASK_SIZE_PTR is only ever called as a callback by parallel_for_each, for setup, in the calling thread, and isn't stored anywhere, right? If so, gdb::function_view instead should work, is lightweight, and is nullable, meaning you don't need a pointer. And then, at the caller, just using a lambda instead of a std::function should work too: auto task_size = [=] (iter_type iter) { dwarf2_per_cu_data *per_cu = iter->get (); return per_cu->length (); };