From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <simark@simark.ca>
Received: from simark.ca (simark.ca [158.69.221.121])
 by sourceware.org (Postfix) with ESMTPS id 358423850434
 for <gdb-patches@sourceware.org>; Thu, 30 Jul 2020 16:58:21 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 358423850434
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=simark.ca
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=simark@simark.ca
Received: from [10.0.0.11] (173-246-6-90.qc.cable.ebox.net [173.246.6.90])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256)
 (No client certificate requested)
 by simark.ca (Postfix) with ESMTPSA id AE2971E792;
 Thu, 30 Jul 2020 12:58:20 -0400 (EDT)
Subject: Re: [PATCH 4/4] gdb: change regcache list to be a map
To: Pedro Alves <pedro@palves.net>, Simon Marchi <simon.marchi@efficios.com>, 
 gdb-patches@sourceware.org
Cc: Laurent <Laurent.Morichetti@amd.com>
References: <20200720204101.2849535-1-simon.marchi@efficios.com>
 <20200720204101.2849535-5-simon.marchi@efficios.com>
 <5b521483-2f09-3424-943d-40f83b3080af@palves.net>
From: Simon Marchi <simark@simark.ca>
Message-ID: <82d8b01a-2b37-0b79-f657-1081375270ee@simark.ca>
Date: Thu, 30 Jul 2020 12:58:13 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
 Thunderbird/68.10.0
MIME-Version: 1.0
In-Reply-To: <5b521483-2f09-3424-943d-40f83b3080af@palves.net>
Content-Type: text/plain; charset=utf-8
Content-Language: fr
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
 NICE_REPLY_A, SPF_HELO_PASS, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: gdb-patches@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gdb-patches mailing list <gdb-patches.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Jul 2020 16:58:23 -0000

On 2020-07-23 9:53 p.m., Pedro Alves wrote:
> On 7/20/20 9:41 PM, Simon Marchi via Gdb-patches wrote:
>> One regcache object is created for each stopped thread and is stored in
>> the regcache::regcaches linked list.  Looking up a regcache for a given
>> thread is therefore in O(number of threads).  Stopping all threads then
>> becomes O((number of threads) ^ 2).  It becomes noticeable when
>> debugging thousands of threads, which is typical with GPU targets.  This
>> patch replaces the linked list with an std::unordered_multimap, indexed
>> by (target, ptid).
>>
>> I originally designed it using an std::unordered_map with (target, ptid,
>> arch) as the key, because that's how lookups are done (in
>> get_thread_arch_aspace_regcache).  However, the registers_changed_ptid
>> function, also somewhat on the hot path (it is used when resuming
>> threads), needs to delete all regcaches associated to a given (target,
>> ptid) tuple.  Using (target, ptid) as a key allows to do this more
>> efficiently (see exception below).  If the key of the map was (target,
>> ptid, arch), we'd have to walk all items of the map.
>>
>> The lookup (in get_thread_arch_aspace_regcache), walks over all existing
>> regcaches belonging to this (target, ptid), looking to find the one with
>> the right arch.  This is ok, as there will be very few regcaches for a
>> given key (typically one).  Lookups become faster when the number of
>> threads grows, compared to the linked list.  With a small number of
>> threads, it will probably be a bit slower to do a map lookup than to
>> walk a few linked list nodes, but I don't think it will be noticeable in
>> practice.
>>
>> The function registers_changed_ptid deletes all regcaches related to a
>> given (target, ptid).  We must now handle the different cases
>> separately:
>>
>> - NULL target and minus_one_ptid: we delete all the entries
>> - NULL target and non-minus_one_ptid: invalid (checked by assert)
>> - non-NULL target and non-minus_one_ptid: we delete all the entries
>>   associated to that tuple, this is done efficiently
>> - a non-NULL target and minus_one_ptid: we delete all the entries
>>   associated to that target, whatever the ptid.  This is the slightly
>>   annoying case, as we can't easily look up all items having this target
>>   in their key.  I implemented it by walking the list, which is not
>>   ideal.
> 
> Given the last case, did you consider using a two-level map instead?
> 
>   using ptid_regcache_map 
>     = std::unordered_multimap<ptid_t, regcache *>;
> 
>   using target_ptid_regcache_map 
>     = std::unordered_map<process_stratum_target *, ptid_regcache_map>;
> 
>   static target_ptid_regcache_map regcaches;
> 
> The result for registers_changed_ptid would be:
> 
>  - NULL target and minus_one_ptid: we delete all the REGCACHES entries,
> 
>  - NULL target and non-minus_one_ptid: invalid (checked by assert)
> 
>  - non-NULL target and non-minus_one_ptid: we lookup the target
>    in REGCACHES, and then delete all the entries in that map
>    associated to that ptid.
> 
>  - a non-NULL target and minus_one_ptid: we delete the map entry
>    in REGCACHES associated to that target.

I thought about it early on, but dismissed it thinking it would be too heavy
for nothing, having a map of maps.  But I didn't think it would help that last
case, that might be a good reason to go with that.

> Looking up a regcache does two map lookups instead of one, but
> that is 2x amortized O(1), so should be a constant hit.
> 
> I gave this a try, to see how feasible it would be.  See attached
> patches on top of yours (the first two address comments I'll make below).
> I've also put these in the users/palves/regcache-map branch on
> sourceware.org.

Thanks, I'll run with that.

> I did not run performance tests, though.  It may be that in
> registers_changed_ptid, it would be more efficient to
> not clear the first level map entry for a given target (i.e.,
> destroy the second level map object completely), but instead
> clear the second level map entry.  I.e., avoid destroying the
> unordered_multimap for a given target, only to recreate it soon
> enough.  But I kept it simple in case that isn't noticeable.

Will try.

> A similar two-level data structure would be to put an instance of:
> 
>   using ptid_regcache_map 
>     = std::unordered_multimap<ptid_t, regcache *>;
> 
> in process_stratum_target directly.
> 
> IOW, target-connection.c:process_targets has a list of open targets,
> which could be used to walk over all targets in the
> "NULL target and minus_one_ptid" scenario.

Ok, that would be an extra step, I suggest we keep it in mind for a subsequent
refactor.

>> The function regcache_thread_ptid_changed is called when a thread
>> changes ptid.  It is implemented efficiently using the map, although
>> that's not very important: it is not called often, mostly when creating
>> an inferior, on some specific platforms.
>>
>> Note: In hash_target_ptid, I am combining hash values from std::hash by
>> summing them.  I don't think it's ideal, since std::hash is just the
>> identity function for base types.  But I don't know what would be better
>> to reduce the change of collisions.  If anybody has a better idea, I'd
>> be interested.
> 
> I'd maybe look in some kernel sources, e.g., Linux or BSD, what they
> use as hash function for pids.

Ok.

>> +/* Hash function for target_ptid.  */
>> +
>> +struct hash_target_ptid
>> +{
>> +  size_t operator() (const target_ptid &val) const
>> +  {
>> +    std::hash<long> h_long;
>> +    hash_ptid h_ptid;
>> +
>> +    return h_long ((long) val.target) + h_ptid (val.ptid);
> 
> Note that on 64-bit Windows, long is 32-bit, so pointers don't
> fit.  It just means you're only hashing half the bits.
> I wonder whether using libiberty's htab_hash_pointer would be
> a good idea here.

Huh, I did that because there's no std::hash overload for uintptr_t.  But
I just saw there is:

  template< class T > struct hash<T*>;

which seems to be for hashing pointers.  It is implemented as:

  return reinterpret_cast<size_t>(__p);

So I'll just use that.  Otherwise, htab_hash_pointer would be good too.

>> @@ -417,10 +458,25 @@ static void
>>  regcache_thread_ptid_changed (process_stratum_target *target,
>>  			      ptid_t old_ptid, ptid_t new_ptid)
>>  {
>> -  for (auto &regcache : regcaches)
>> +  /* Find all the regcaches to updates.  */
> 
> typo: to update.

Fixed.
>> +  std::vector<regcache *> regcaches_to_update;
>> +  target_ptid old_key (target, old_ptid);
>> +  auto range = regcaches.equal_range (old_key);
>> +  for (auto it = range.first; it != range.second; it++)
>> +    regcaches_to_update.push_back (it->second);
>> +
>> +  /* Remove all entries associated to OLD_KEY.  We could have erased the items
>> +     in the previous `for`, but it is only safe to erase items while iterating
>> +     starting with C++14.  */
>> +  regcaches.erase (old_key);
> 
> I don't think that's really true in practice.  The idiom used pretty
> much by everyone is:
> 
>   for (auto it = range.first; it != range.second;)
>     {
>       regcaches_to_update.push_back (it->second);
>       it = regcaches.erase (it);
>     }
> 
> Note that even the resolution of the C++ issue at:
> 
>  https://wg21.cmeerw.net/lwg/issue2356 
> 
> says:
> 
>  "not that any actual implementation does that, anyway"
> 
> There's no need to be super pedantic wrt to an issue
> that no compiler is going to miscompile.

Ok, didn't know about that.

> See here as well:
>  https://stackoverflow.com/questions/25047241/c11-is-it-safe-to-remove-individual-elements-from-stdunordered-map-while-it
> 
> And then, given we know that there are no entries for
> new_ptid in the map, I think we could instead do it all in
> one iteration, like:
> 
>   target_ptid old_key (target, old_ptid);
>   target_ptid new_key (target, new_ptid);
> 
>   auto range = regcaches.equal_range (old_key);
>   for (auto it = range.first; it != range.second;)
>     {
>       regcache *rc = it->second;
>       rc->set_ptid (new_ptid);
> 
>       /* Remove old before inserting new, to avoid rehashing, which
>          would invalidate iterators.  */
>       it = regcaches.erase (it);
>       regcaches.insert (std::make_pair (new_key, rc));
>     }

Thanks, I used this.

>> +
>> +  /* Update the regcaches' ptid, insert them back in the map with an updated
>> +     key.  */
>> +  target_ptid new_key (target, new_ptid);
>> +  for (regcache *rc : regcaches_to_update)
>>      {
>> -      if (regcache->ptid () == old_ptid && regcache->target () == target)
>> -	regcache->set_ptid (new_ptid);
>> +      rc->set_ptid (new_ptid);
>> +      regcaches.insert (std::make_pair (new_key, rc));
>>      }
>>  }
>>  
>> @@ -438,19 +494,53 @@ regcache_thread_ptid_changed (process_stratum_target *target,
>>  void
>>  registers_changed_ptid (process_stratum_target *target, ptid_t ptid)
>>  {
>> -  for (auto oit = regcaches.before_begin (), it = std::next (oit);
>> -       it != regcaches.end (); )
>> +  if (target == nullptr)
>>      {
>> -      struct regcache *regcache = *it;
>> -      if ((target == nullptr || regcache->target () == target)
>> -	  && regcache->ptid ().matches (ptid))
>> -	{
>> -	  delete regcache;
>> -	  it = regcaches.erase_after (oit);
>> -	}
>> -      else
>> -	oit = it++;
>> +      /* Since there can be ptid clashes between targets, it's not valid to
>> +         pass a ptid without saying to which target it belongs.  */
>> +      gdb_assert (ptid == minus_one_ptid);
>> +
>> +      /* Delete all the regcaches.  */
>> +      for (auto pair : regcaches)
>> +	delete pair.second;
> 
> We could store a std::unique_ptr<regcache> in the map instead
> to avoid all these manual deletes.

Indeed, I'll do that.

>> +
>> +      regcaches.clear ();
>>      }
>> +  else if (ptid != minus_one_ptid)
>> +    {
>> +      /* Non-NULL target and non-minus_one_ptid, delete all regcaches belonging
>> +         to this (TARGET, PTID).  */
>> +      target_ptid key (target, ptid);
>> +      auto range = regcaches.equal_range (key);
>> +      for (auto it = range.first; it != range.second; it++)
>> +	delete it->second;
> 
> Like this one.

Done.

>> +
>> +      regcaches.erase (key);
> 
> Here as well could do:
> 
>       for (auto it = range.first; it != range.second;)
>         {
> 	  delete it->second;
> 	  it = regcaches.erase (it);
>         }

With the unique_ptr, it's now just:

  regcaches.erase (target_ptid (target, ptid));

>> +    }
>> +  else
>> +    {
>> +       /* Non-NULL target and minus_one_ptid, delete all regcaches associated
>> +          to this target.
>> +
>> +          We unfortunately don't have an efficient way to do this.  Fall back
>> +          to iterating all items to find all those belonging to TARGET.
>> +
>> +          Note that in C++11, it's not safe to erase map entries while
>> +          iterating, so we keep track of them and delete them at the end.  */
> 
> Here as well.

It's down to:

       for (auto it = regcaches.begin (); it != regcaches.end ();)
	 {
	   if (it->second->target () == target)
	     it = regcaches.erase (it);
	   else
	     it++;
	 }

Thanks for the comments.  I'll try the multi-level map thing and come back with
another version.

Simon