From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca by simark.ca with LMTP id bjuPLatGtmdIrzYAWB0awg (envelope-from ) for ; Wed, 19 Feb 2025 16:01:31 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=simark.ca; s=mail; t=1739998891; bh=GoGJM+7YP2jRxhSu8k68VbOekuSDbvXksLYdCYboiow=; h=Date:Subject:To:Cc:References:From:In-Reply-To:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From; b=aCXDmvuI5DBrtwMWjCU6KkOji+XIoZW9vI8yHTyBEIwNCtGlN2OVc58ahKZoeVBcQ JTM/nOsQzumnKlDJvHHdzNCneRnKcpKyVBEm8LLccWqFkFnWaG7BtEy+XALhdJpc1h xFerbzDDi0kJ4n0KJHkOv1ZCQXGfdN4Fve533pSA= Received: by simark.ca (Postfix, from userid 112) id A642D1E10A; Wed, 19 Feb 2025 16:01:31 -0500 (EST) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on simark.ca X-Spam-Level: X-Spam-Status: No, score=-5.4 required=5.0 tests=ARC_SIGNED,ARC_VALID,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=unavailable autolearn_force=no version=4.0.0 Authentication-Results: simark.ca; dkim=pass (1024-bit key; unprotected) header.d=simark.ca header.i=@simark.ca header.a=rsa-sha256 header.s=mail header.b=QsmQtCPe; dkim=pass (1024-bit key) header.d=simark.ca header.i=@simark.ca header.a=rsa-sha256 header.s=mail header.b=daQRPlxI; dkim-atps=neutral Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPS id E047E1E05C for ; Wed, 19 Feb 2025 16:01:30 -0500 (EST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 02B0F3858405 for ; Wed, 19 Feb 2025 21:01:29 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 02B0F3858405 Authentication-Results: sourceware.org; dkim=pass (1024-bit key, unprotected) header.d=simark.ca header.i=@simark.ca header.a=rsa-sha256 header.s=mail header.b=QsmQtCPe; dkim=pass (1024-bit key) header.d=simark.ca header.i=@simark.ca header.a=rsa-sha256 header.s=mail header.b=daQRPlxI Received: from simark.ca (simark.ca [158.69.221.121]) by sourceware.org (Postfix) with ESMTPS id 75EAC3858C50 for ; Wed, 19 Feb 2025 21:00:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 75EAC3858C50 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=simark.ca Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=simark.ca ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 75EAC3858C50 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=158.69.221.121 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1739998846; cv=none; b=fHxw6qF0zcjOTL+w5N0PoYx91xT3drJ9LSZmVEXTKe2A1JYzOQJ/7cgtc8owx01bsJsmeHqwVnqTtddex7u/vz9t+W7tEz6AgO2hnT7TWa8CnxC/Urb0v1vng1nd8k5SiQouQ8871jv15ky4SPCXSo+7s5PmBJd/C4/qVQl09BE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1739998846; c=relaxed/simple; bh=GoGJM+7YP2jRxhSu8k68VbOekuSDbvXksLYdCYboiow=; h=DKIM-Signature:DKIM-Signature:Message-ID:Date:MIME-Version: Subject:To:From; b=iWGcNnVfgdOEtdVt77f0wZsghHRn75RcurdS7oa7gJozTuDm2tVdZvNDWeH+hw2oI0NZRTSx1TG+6iaKWNYq+DyO6NEo9Tbit18bLR2dknkAjA/ZqTr3xgXlPhGwAd+DYb1yfz+Y611KSfPHMwzpdjHlXfmbcen1WvohxjAXqqg= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 75EAC3858C50 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=simark.ca; s=mail; t=1739998843; bh=GoGJM+7YP2jRxhSu8k68VbOekuSDbvXksLYdCYboiow=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=QsmQtCPeLxw6srHqXVFJxtedP2uejAdy42d6uhFQ6eLV7xHbSZAkDZ/bXCQSgQTL1 k5U41Xo8GpMaCCv3sraSG4TBNYEqACiGXLP9ovXlcYBwuVuH6Jt8F82Lu/tLcaMpsY lDIN6w2LwwoNqKDV6bgCS5GI4p9r9fZFh0WJICxY= Received: by simark.ca (Postfix, from userid 112) id 62B821E110; Wed, 19 Feb 2025 16:00:43 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=simark.ca; s=mail; t=1739998840; bh=GoGJM+7YP2jRxhSu8k68VbOekuSDbvXksLYdCYboiow=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=daQRPlxItHr86P+BDDfyE7b/dw/4NhzczEHNchTJQcHVeIBSJNSXBQKhxutGZFhKe K5gjR6rA5Vv9LMNRQkgl9Pxp5AAyvA7YFc7kSLhIget4HPe96Q4wHJeQhLoJAei948 6yMoc2s9XEouMFtidfTmtfncLndo9iBHtYVzG7Ao= Received: from [10.0.0.11] (modemcable238.237-201-24.mc.videotron.ca [24.201.237.238]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPSA id CA8261E05C; Wed, 19 Feb 2025 16:00:39 -0500 (EST) Message-ID: <3714b9a2-a8cb-423e-878b-a80f3b52f8e9@simark.ca> Date: Wed, 19 Feb 2025 16:00:39 -0500 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC 0/9] Attempt to unify Python object's lifecycle To: Jan Vrany , gdb-patches@sourceware.org Cc: Andrew Burgess References: <20250127104435.823519-1-jan.vrany@labware.com> Content-Language: en-US From: Simon Marchi In-Reply-To: <20250127104435.823519-1-jan.vrany@labware.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gdb-patches-bounces~public-inbox=simark.ca@sourceware.org On 2025-01-27 05:44, Jan Vrany wrote: > Hi, > > this RFC is an attempt to systematically address Andrew's > comment on a patch I submitted earlier [1]. In particular, > this part: > >> I don't really like the approach taken by this function. Each time the >> function is called we get a new gdb.Compunit object created and chained >> onto the objfile. > > This is in fact true for other objects - gdb.Symtab, gdb.Symbol, > gdb.Type. Moreover each of these objects have their own copy > of (largely same) housekeeping code. The gdb.Block used to be > the same but not long ago it was changed so gdb.Blocks are now > interned (memoized). From the user point of view, I found it bit > counter-intuitive. > > My idea was to refactor this housekeeping code into a common class. > This RFC is result of several iterations. Tested on x86_64-linux. > > First four commits expand existing code to intern (memoize) > gdb.Symtab, gdb.Symbol and gdb.Type. The rest then introduces > template classes that implement necessary housekeeping and > converts gdb.Symtab, gdb.Symbol and gdb.Type to use it. > It could go further (one can convert gdb.Block too and > gdb.Value and gdb.Value.type and dynamic_type could be > further simplified too) but I decided to stop here. > > The main reason is that it turned not to be as simple as > I thought - there seem to be few differences here and there > (see [2]). This complicated the housekeeping classes > (gdbpy_registry and associated "storage"). My C++ is > is pretty basic so perhaps there's better and simpler way > of doing it. Another reason is that I was hoping for some code > reduction in terms of size but looking at the result, > there's hardly any. On the other hand, the lifecycle management > is more unified across different Python objects. > > All in all, I'm not sure this is the best approach and worth > it. By this RFC, I'd like to solicit feedback from experienced > GDB developers on how to move on. > > Basically I see following options: > > 1) Do not change anything in this area (I'm perfectly > happy with that). > 2) Intern (memoize) Python objects (where it makes sense) > but keep the current approach. Basically first four > commits of this RFC. > 3) Continue working on this. > > Thanks, > > Jan > > [1]: https://inbox.sourceware.org/gdb-patches/87o70ar34z.fsf@redhat.com/ > [2]: https://inbox.sourceware.org/gdb/6c73f834d3f7b545188e1999125e7ae63ae83eab.camel@vrany.io/T/#u I'm sorry, I don't have any answers for your questions in [2]. I scanned your patch series, and I'm not sure about one thing: once the user is done with a Python object and releases all their reference, do we delete the Python object, or do we keep it around for the next time it might be needed? I guess both would be valid approaches. Let me summarize our choices, see if I get it right: 1. No memoization at all: all calls to, e.g., `block.compunit`, will return a new `gdb.CompUnit`. Once the user releases their references on that `gdb.CompUnit`, it is freed. For some types, we need to track the Python objects that are alive (e.g. through the registry you made) in order to invalidate them if the backing object disappears. The registry holds no reference to the object. When the refcount of the Python object gets to 0 (the user is done with it), the object gets removed from the registry (through the tp_dealloc callback). 2. Temporary memoization: a first call to `block.compunit` will return a new `gdb.CompUnit`, which gets saved in the per-objfile registry. If the user requests the same comp unit, we can return the same Python object. The registry still doesn't hold a reference to the Python object. Like for the previous case, when the user is done with the object, we remove it from our registry (through the tp_dealloc callback). If the user requests a Python object for the same comp unit later, a new Python object will be created. 3. Permanent memoization: like temporary memoization, except that the registry hold a reference to the Python object. When the user drops all their references, the registry keeps the Python alive, such that if they request a Python object for the same comp unit later, they will get the same Python object as before. The Python objects only get deleted when either the owning objfile gets deleted or the owning arch gets deleted (for the former, it means the objects will exist until the end of the process, as arches don't get deleted). With approach 1, for the objects we need to track (to invalidate eventually) we certainly want to forget about Python objects that the user is done with, and not hold onto them forever (otherwise, that would be like a memory leak). If the user decides to keep alive 50000 Python objects wrapping the same comp unit, then we need to track them all. If one of those 50000 objects gets deleted (its refcount drops to 0), we need to find it in the linked list of at least 50000 items to remove it. That sounds bad. So for the objects that we need to track in order to invalidate if needed, I guess it is easy enough and makes sense to do some memoization. And there is value in being consistent and predictable, so if we do some memoization for some objects, we might as well do it for all objects (especially if it can easily be abstracted away). Between 2 and 3, again, it is a memory vs time tradeoff. If you are looping over all symbols, get the CU from the symbol to check something and then throw the CU away, do you prefer to create/delete a Python object every time? If not, that means keeping some Python objects around "forever" (until the objfile gets delete, or really forever in the case of arch-owned objects), that may never get used again. Intuitively I'd say I'm slightly in favor of just doing temporary memoization, because I'm hoping that allocating/initializing/deleting a Python object is fast enough, and I don't really like the idea of keeping unused things around, but benchmarks / profiling would help. Overall, I would say that I think memoization is a good idea and I like where you're going with this series. Simon