From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-bounces+public-inbox=simark.ca@sourceware.org>
Received: from simark.ca
	by simark.ca with LMTP
	id as7KJl+O2WIjKBgAWB0awg
	(envelope-from <gdb-patches-bounces+public-inbox=simark.ca@sourceware.org>)
	for <public-inbox@simark.ca>; Thu, 21 Jul 2022 13:35:27 -0400
Received: by simark.ca (Postfix, from userid 112)
	id 8D2731E5EA; Thu, 21 Jul 2022 13:35:27 -0400 (EDT)
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on simark.ca
X-Spam-Level: 
X-Spam-Status: No, score=-3.9 required=5.0 tests=BAYES_00,MAILING_LIST_MULTI,
	NICE_REPLY_A autolearn=ham autolearn_force=no version=3.4.6
Received: from sourceware.org (server2.sourceware.org [8.43.85.97])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
	(No client certificate requested)
	by simark.ca (Postfix) with ESMTPS id D98CF1E13B
	for <public-inbox@simark.ca>; Thu, 21 Jul 2022 13:35:26 -0400 (EDT)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 55EE1385840B
	for <public-inbox@simark.ca>; Thu, 21 Jul 2022 17:35:26 +0000 (GMT)
Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com
 [209.85.128.53])
 by sourceware.org (Postfix) with ESMTPS id 4D4403858C56
 for <gdb-patches@sourceware.org>; Thu, 21 Jul 2022 17:35:13 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4D4403858C56
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=palves.net
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-wm1-f53.google.com with SMTP id
 f24-20020a1cc918000000b003a30178c022so3761193wmb.3
 for <gdb-patches@sourceware.org>; Thu, 21 Jul 2022 10:35:13 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:subject:to:cc:references:from:message-id:date
 :user-agent:mime-version:in-reply-to:content-language
 :content-transfer-encoding;
 bh=eE4bt1y5rbB7wKpBZa0FPuoL78VWui3JPJUIN8IfvR4=;
 b=AFy1cr0dhYPWhFoAE1SYM9glvIBUEjZuywdPKNZi0G6+Fp+DdtvUO6nxXv7wUjP7MF
 gIFCniMhlpFKno7oftv76oLhcoJz9Wd1ydAr/pOtCpnER9XD2Pag5O2gf0npmUmEe9hk
 dBYo2+Jkz7CBEBq5lSflSHdHmKtc5EUl/gzrWtx1/R1l+sTROYg/YGsBDLAEh5OK7aSU
 WDQDMOO5Kjd2qybpaQJ2maNaMYny8jW4AgQ0KXDxSfT6VxicjRqGknjE/x/cC5xv3IbV
 JM26bRHW3T5qdQMeFM6VJQ72iu00Zv25p1Ih31m4iYAkxJLBC0mwveqMhs27AVwPnqnr
 MP7Q==
X-Gm-Message-State: AJIora/01x7Qq7cXC9yO3DfI/tOL355GyagSl+zwFFP+o6b1LolbCh24
 qwvQsjLW0ay+nm6llZJHOwUvkRKF88I=
X-Google-Smtp-Source: AGRyM1t1afm3W/eo18XwRoV71b51YKWcCUzw3GxDXG51fTtFqcJgbkbMtNeoNL/eyscVHugLT8NZ+w==
X-Received: by 2002:a05:600c:1991:b0:3a3:274f:21b0 with SMTP id
 t17-20020a05600c199100b003a3274f21b0mr9474657wmq.72.1658424912084; 
 Thu, 21 Jul 2022 10:35:12 -0700 (PDT)
Received: from ?IPv6:2001:8a0:f924:2600:209d:85e2:409e:8726?
 ([2001:8a0:f924:2600:209d:85e2:409e:8726])
 by smtp.gmail.com with ESMTPSA id
 f8-20020a05600c4e8800b003a31673515bsm7527855wmq.7.2022.07.21.10.35.10
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Thu, 21 Jul 2022 10:35:10 -0700 (PDT)
Subject: Re: [PATCH][gdbsupport] Use task size in parallel_for_each
To: Tom de Vries <tdevries@suse.de>, gdb-patches@sourceware.org
References: <20220718194219.GA16823@delia.home>
From: Pedro Alves <pedro@palves.net>
Message-ID: <4fc23fcd-c15d-7622-8b51-cc48cd3cba16@palves.net>
Date: Thu, 21 Jul 2022 18:35:09 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101
 Thunderbird/78.10.1
MIME-Version: 1.0
In-Reply-To: <20220718194219.GA16823@delia.home>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-BeenThere: gdb-patches@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gdb-patches mailing list <gdb-patches.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=subscribe>
Cc: Tom Tromey <tom@tromey.com>
Errors-To: gdb-patches-bounces+public-inbox=simark.ca@sourceware.org
Sender: "Gdb-patches"
 <gdb-patches-bounces+public-inbox=simark.ca@sourceware.org>

On 2022-07-18 8:42 p.m., Tom de Vries via Gdb-patches wrote:
> Hi,
> 
> Ensuring a fair distribution over the worker threads and main thread in terms
> of number of CUs might not be the most efficient way, given that CUs can vary
> in size.
> 
> Fix this by:
> - adding a task_size_ptr parameter to parallel_for_each,
>   defaulting to nullptr,
> - using per_cu->get_length () as the task size in the parallel_for_each
>   in dwarf2_build_psymtabs_hard, and
> - using the task size in parallel_for_each to distribute similarly-sized tasks
>   to the threads.
> 
> I've used this experiment to verify the performance impact:
> ...
> $ for n in $(seq 1 10); do \
>     time gdb -q -batch ~/firefox/libxul.so-93.0-1.1.x86_64.debug \
>     2>&1 \
>     | grep "real:"; \
>   done
> ...
> and without the patch got:
> ...
> real: 4.71
> real: 4.88
> real: 4.29
> real: 4.30
> real: 4.65
> real: 4.27
> real: 4.27
> real: 4.27
> real: 4.75
> real: 4.41
> ...
> and with the patch:
> ...
> real: 3.68
> real: 3.81
> real: 3.80
> real: 3.68
> real: 3.75
> real: 3.69
> real: 3.69
> real: 3.74
> real: 3.67
> real: 3.74
> ...
> so that seems a reasonable improvement.

Nice.

> 
> With parallel_for_each_debug set to true, we get some more details about the
> difference in behaviour.  Without the patch we have:
> ...
> Parallel for: n_elements: 2818
> Parallel for: minimum elements per thread: 1
> Parallel for: elts_per_thread: 704
> Parallel for: elements on worker thread 0       : 705
> Parallel for: elements on worker thread 1       : 705
> Parallel for: elements on worker thread 2       : 704
> Parallel for: elements on worker thread 3       : 0
> Parallel for: elements on main thread           : 704
> ...
> and with the patch:
> ...
> Parallel for: n_elements: 2818
> Parallel for: total_size: 1483674865
> Parallel for: size_per_thread: 370918716
> Parallel for: elements on worker thread 0       : 752   (size: 371811790)
> Parallel for: elements on worker thread 1       : 360   (size: 371509370)
> Parallel for: elements on worker thread 2       : 1130  (size: 372681710)
> Parallel for: elements on worker thread 3       : 0     (size: 0)
> Parallel for: elements on main thread           : 576   (size: 367671995)
> ...
> 
> Tested on x86_64-linux.
> 
> Any comments?

I'll defer to Tromey.  Just a passerby comment, and a comment on the code itself
if we follow this approach, below.

My passerby comment is that I wonder whether we should consider switching to a work
stealing implementation, so that threads that are done with their chunk would grab another
chunk from the work pool.  I think this would spare us from having to worry about these
distribution heuristics.

> @@ -7059,6 +7059,13 @@ dwarf2_build_psymtabs_hard (dwarf2_per_objfile *per_objfile)
>  
>      using iter_type = decltype (per_bfd->all_comp_units.begin ());
>  
> +    std::function<unsigned int (iter_type)> task_size
> +      = [=] (iter_type iter)
> +      {
> +	dwarf2_per_cu_data *per_cu = iter->get ();
> +	return per_cu->length ();
> +      };
> +
>      /* Each thread returns a pair holding a cooked index, and a vector
>         of errors that should be printed.  The latter is done because
>         GDB's I/O system is not thread-safe.  run_on_main_thread could be
> @@ -7087,7 +7094,7 @@ dwarf2_build_psymtabs_hard (dwarf2_per_objfile *per_objfile)
>  	      }
>  	  }
>  	return result_type (thread_storage.release (), std::move (errors));
> -      });
> +      }, &task_size);
>  
>      /* Only show a given exception a single time.  */
>      std::unordered_set<gdb_exception> seen_exceptions;
> diff --git a/gdbsupport/parallel-for.h b/gdbsupport/parallel-for.h
> index bf40f125f0f..3c9269574df 100644
> --- a/gdbsupport/parallel-for.h
> +++ b/gdbsupport/parallel-for.h
> @@ -134,7 +134,9 @@ typename gdb::detail::par_for_accumulator<
>      typename std::result_of<RangeFunction (RandomIt, RandomIt)>::type
>    >::result_type
>  parallel_for_each (unsigned n, RandomIt first, RandomIt last,
> -		   RangeFunction callback)
> +		   RangeFunction callback,
> +		   std::function<unsigned int(RandomIt)> *task_size_ptr
> +		     = (std::function<unsigned int(RandomIt)> *)nullptr)

That use of a std::function pointer looks odd.  AFAICT, TASK_SIZE_PTR is only ever called
as a callback by parallel_for_each, for setup, in the calling thread, and isn't stored
anywhere, right?  If so, gdb::function_view instead should work, is lightweight, and is
nullable, meaning you don't need a pointer.

And then, at the caller, just using a lambda instead of a std::function should work too:

    auto task_size = [=] (iter_type iter)
      {
	dwarf2_per_cu_data *per_cu = iter->get ();
	return per_cu->length ();
      };