Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed
From: Simon Marchi <simark@simark.ca>
To: Jan Vrany <jan.vrany@labware.com>, gdb-patches@sourceware.org
Cc: Andrew Burgess <aburgess@redhat.com>
Subject: Re: [RFC 0/9] Attempt to unify Python object's lifecycle
Date: Wed, 19 Feb 2025 16:00:39 -0500	[thread overview]
Message-ID: <3714b9a2-a8cb-423e-878b-a80f3b52f8e9@simark.ca> (raw)
In-Reply-To: <20250127104435.823519-1-jan.vrany@labware.com>



On 2025-01-27 05:44, Jan Vrany wrote:
> Hi,
> 
> this RFC is an attempt to systematically address Andrew's
> comment on a patch I submitted earlier [1]. In particular,
> this part:
> 
>> I don't really like the approach taken by this function.  Each time the
>> function is called we get a new gdb.Compunit object created and chained
>> onto the objfile.
> 
> This is in fact true for other objects - gdb.Symtab, gdb.Symbol,
> gdb.Type. Moreover each of these objects have their own copy
> of (largely same) housekeeping code. The gdb.Block used to be
> the same but not long ago it was changed so gdb.Blocks are now
> interned (memoized). From the user point of view, I found it bit
> counter-intuitive.
> 
> My idea was to refactor this housekeeping code into a common class.
> This RFC is result of several iterations. Tested on x86_64-linux.
> 
> First four commits expand existing code to intern (memoize)
> gdb.Symtab, gdb.Symbol and gdb.Type. The rest then introduces
> template classes that implement necessary housekeeping and
> converts gdb.Symtab, gdb.Symbol and gdb.Type to use it.
> It could go further (one can convert gdb.Block too and
> gdb.Value and gdb.Value.type and dynamic_type could be
> further simplified too) but I decided to stop here.
> 
> The main reason is that it turned not to be as simple as
> I thought - there seem to be few differences here and there
> (see [2]). This complicated the housekeeping classes
> (gdbpy_registry and associated "storage"). My C++ is
> is pretty basic so perhaps there's better and simpler way
> of doing it. Another reason is that I was hoping for some code
> reduction in terms of size but looking at the result,
> there's hardly any. On the other hand, the lifecycle management
> is more unified across different Python objects.
> 
> All in all, I'm not sure this is the best approach and worth
> it. By this RFC, I'd like to solicit feedback from experienced
> GDB developers on how to move on.
> 
> Basically I see following options:
> 
> 1) Do not change anything in this area (I'm perfectly
>    happy with that).
> 2) Intern (memoize) Python objects (where it makes sense)
>    but keep the current approach. Basically first four
>    commits of this RFC.
> 3) Continue working on this.
> 
> Thanks,
> 
> Jan
> 
> [1]: https://inbox.sourceware.org/gdb-patches/87o70ar34z.fsf@redhat.com/
> [2]: https://inbox.sourceware.org/gdb/6c73f834d3f7b545188e1999125e7ae63ae83eab.camel@vrany.io/T/#u

I'm sorry, I don't have any answers for your questions in [2].

I scanned your patch series, and I'm not sure about one thing: once the
user is done with a Python object and releases all their reference, do
we delete the Python object, or do we keep it around for the next time
it might be needed?  I guess both would be valid approaches.

Let me summarize our choices, see if I get it right:

 1. No memoization at all: all calls to, e.g., `block.compunit`, will
 return a new `gdb.CompUnit`.  Once the user releases their references
 on that `gdb.CompUnit`, it is freed.  For some types, we need to track
 the Python objects that are alive (e.g. through the registry you made)
 in order to invalidate them if the backing object disappears.  The
 registry holds no reference to the object.  When the refcount of the
 Python object gets to 0 (the user is done with it), the object gets
 removed from the registry (through the tp_dealloc callback).

 2. Temporary memoization: a first call to `block.compunit` will return
 a new `gdb.CompUnit`, which gets saved in the per-objfile registry.  If
 the user requests the same comp unit, we can return the same Python
 object.  The registry still doesn't hold a reference to the Python
 object.  Like for the previous case, when the user is done with the
 object, we remove it from our registry (through the tp_dealloc
 callback).  If the user requests a Python object for the same comp unit
 later, a new Python object will be created.

 3. Permanent memoization: like temporary memoization, except that the
 registry hold a reference to the Python object.  When the user drops
 all their references, the registry keeps the Python alive, such that if
 they request a Python object for the same comp unit later, they will
 get the same Python object as before.  The Python objects only get
 deleted when either the owning objfile gets deleted or the owning arch
 gets deleted (for the former, it means the objects will exist until the
 end of the process, as arches don't get deleted).

With approach 1, for the objects we need to track (to invalidate
eventually) we certainly want to forget about Python objects that the
user is done with, and not hold onto them forever (otherwise, that would
be like a memory leak).  If the user decides to keep alive 50000 Python
objects wrapping the same comp unit, then we need to track them all.  If
one of those 50000 objects gets deleted (its refcount drops to 0), we
need to find it in the linked list of at least 50000 items to remove it.
That sounds bad.  So for the objects that we need to track in order to
invalidate if needed, I guess it is easy enough and makes sense to do
some memoization.

And there is value in being consistent and predictable, so if we do some
memoization for some objects, we might as well do it for all objects
(especially if it can easily be abstracted away).

Between 2 and 3, again, it is a memory vs time tradeoff.  If you are
looping over all symbols, get the CU from the symbol to check something
and then throw the CU away, do you prefer to create/delete a Python
object every time?  If not, that means keeping some Python objects
around "forever" (until the objfile gets delete, or really forever in
the case of arch-owned objects), that may never get used again.
Intuitively I'd say I'm slightly in favor of just doing temporary
memoization, because I'm hoping that allocating/initializing/deleting a
Python object is fast enough, and I don't really like the idea of
keeping unused things around, but benchmarks / profiling would help.

Overall, I would say that I think memoization is a good idea and I like
where you're going with this series.

Simon

  parent reply	other threads:[~2025-02-19 21:01 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-27 10:44 Jan Vrany
2025-01-27 10:44 ` [RFC 1/9] gdb/python: preserve identity for gdb.Symtab objects Jan Vrany
2025-01-27 10:44 ` [RFC 2/9] gdb/python: preserve identity for gdb.Symbol objects Jan Vrany
2025-02-20 19:22   ` Tom Tromey
2025-01-27 10:44 ` [RFC 3/9] gdb/python: do not hold on gdb.Symtab object from gdb.Symtab_and_line Jan Vrany
2025-01-27 10:44 ` [RFC 4/9] gdb/python: preserve identity for gdb.Type objects Jan Vrany
2025-01-27 10:44 ` [RFC 5/9] gdb/python: introduce gdbpy_registry Jan Vrany
2025-02-20 19:28   ` Tom Tromey
2025-01-27 10:44 ` [RFC 6/9] gdb/python: convert gdb.Symbol to use gdbpy_registry Jan Vrany
2025-01-27 10:44 ` [RFC 7/9] gdb/python: convert gdb.Type " Jan Vrany
2025-01-27 10:44 ` [RFC 8/9] gdb/python: convert gdb.Symtab " Jan Vrany
2025-01-27 10:44 ` [RFC 9/9] gdb/python: convert gdb.Symtab_and_line " Jan Vrany
2025-02-18 11:15 ` [PING] Re: [RFC 0/9] Attempt to unify Python object's lifecycle Jan Vraný
2025-02-19 21:00 ` Simon Marchi [this message]
2025-02-20 17:50   ` Jan Vraný
2025-02-20 19:18 ` Tom Tromey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3714b9a2-a8cb-423e-878b-a80f3b52f8e9@simark.ca \
    --to=simark@simark.ca \
    --cc=aburgess@redhat.com \
    --cc=gdb-patches@sourceware.org \
    --cc=jan.vrany@labware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox