From: "Jan Vraný" <Jan.Vrany@labware.com>
To: "simark@simark.ca" <simark@simark.ca>,
"gdb-patches@sourceware.org" <gdb-patches@sourceware.org>
Cc: "aburgess@redhat.com" <aburgess@redhat.com>
Subject: Re: [RFC 0/9] Attempt to unify Python object's lifecycle
Date: Thu, 20 Feb 2025 17:50:54 +0000 [thread overview]
Message-ID: <6779ca1e67acc2081194631fb01b423215401d2a.camel@labware.com> (raw)
In-Reply-To: <3714b9a2-a8cb-423e-878b-a80f3b52f8e9@simark.ca>
On Wed, 2025-02-19 at 16:00 -0500, Simon Marchi wrote:
>
>
> On 2025-01-27 05:44, Jan Vrany wrote:
> > Hi,
> >
> > this RFC is an attempt to systematically address Andrew's
> > comment on a patch I submitted earlier [1]. In particular,
> > this part:
> >
> > > I don't really like the approach taken by this function. Each
> > > time the
> > > function is called we get a new gdb.Compunit object created and
> > > chained
> > > onto the objfile.
> >
> > This is in fact true for other objects - gdb.Symtab, gdb.Symbol,
> > gdb.Type. Moreover each of these objects have their own copy
> > of (largely same) housekeeping code. The gdb.Block used to be
> > the same but not long ago it was changed so gdb.Blocks are now
> > interned (memoized). From the user point of view, I found it bit
> > counter-intuitive.
> >
> > My idea was to refactor this housekeeping code into a common class.
> > This RFC is result of several iterations. Tested on x86_64-linux.
> >
> > First four commits expand existing code to intern (memoize)
> > gdb.Symtab, gdb.Symbol and gdb.Type. The rest then introduces
> > template classes that implement necessary housekeeping and
> > converts gdb.Symtab, gdb.Symbol and gdb.Type to use it.
> > It could go further (one can convert gdb.Block too and
> > gdb.Value and gdb.Value.type and dynamic_type could be
> > further simplified too) but I decided to stop here.
> >
> > The main reason is that it turned not to be as simple as
> > I thought - there seem to be few differences here and there
> > (see [2]). This complicated the housekeeping classes
> > (gdbpy_registry and associated "storage"). My C++ is
> > is pretty basic so perhaps there's better and simpler way
> > of doing it. Another reason is that I was hoping for some code
> > reduction in terms of size but looking at the result,
> > there's hardly any. On the other hand, the lifecycle management
> > is more unified across different Python objects.
> >
> > All in all, I'm not sure this is the best approach and worth
> > it. By this RFC, I'd like to solicit feedback from experienced
> > GDB developers on how to move on.
> >
> > Basically I see following options:
> >
> > 1) Do not change anything in this area (I'm perfectly
> > happy with that).
> > 2) Intern (memoize) Python objects (where it makes sense)
> > but keep the current approach. Basically first four
> > commits of this RFC.
> > 3) Continue working on this.
> >
> > Thanks,
> >
> > Jan
> >
> > [1]:
> > https://inbox.sourceware.org/gdb-patches/87o70ar34z.fsf@redhat.com
> > [2]:
> > https://inbox.sourceware.org/gdb/6c73f834d3f7b545188e1999125e7ae63ae83eab.camel@vrany.io/T/#u
>
> I'm sorry, I don't have any answers for your questions in [2].
>
> I scanned your patch series, and I'm not sure about one thing: once
> the
> user is done with a Python object and releases all their reference,
> do
> we delete the Python object, or do we keep it around for the next
> time
> it might be needed? I guess both would be valid approaches.
In the series I have posted Python objects are kept around and
destroyed only when owner is gone. So it implements choice 3
from choices below.
>
> Let me summarize our choices, see if I get it right:
>
> 1. No memoization at all: all calls to, e.g., `block.compunit`, will
> return a new `gdb.CompUnit`. Once the user releases their
> references
> on that `gdb.CompUnit`, it is freed. For some types, we need to
> track
> the Python objects that are alive (e.g. through the registry you
> made)
> in order to invalidate them if the backing object disappears. The
> registry holds no reference to the object. When the refcount of the
> Python object gets to 0 (the user is done with it), the object gets
> removed from the registry (through the tp_dealloc callback).
>
> 2. Temporary memoization: a first call to `block.compunit` will
> return
> a new `gdb.CompUnit`, which gets saved in the per-objfile registry.
> If
> the user requests the same comp unit, we can return the same Python
> object. The registry still doesn't hold a reference to the Python
> object. Like for the previous case, when the user is done with the
> object, we remove it from our registry (through the tp_dealloc
> callback). If the user requests a Python object for the same comp
> unit
> later, a new Python object will be created.
>
> 3. Permanent memoization: like temporary memoization, except that
> the
> registry hold a reference to the Python object. When the user drops
> all their references, the registry keeps the Python alive, such that
> if
> they request a Python object for the same comp unit later, they will
> get the same Python object as before. The Python objects only get
> deleted when either the owning objfile gets deleted or the owning
> arch
> gets deleted (for the former, it means the objects will exist until
> the
> end of the process, as arches don't get deleted).
>
> With approach 1, for the objects we need to track (to invalidate
> eventually) we certainly want to forget about Python objects that the
> user is done with, and not hold onto them forever (otherwise, that
> would
> be like a memory leak). If the user decides to keep alive 50000
> Python
> objects wrapping the same comp unit, then we need to track them all.
> If
> one of those 50000 objects gets deleted (its refcount drops to 0), we
> need to find it in the linked list of at least 50000 items to remove
> it.
> That sounds bad.
The way it is currently implemented, it is okay. Python objects
themselves form the doubly-linked list so one can remove it from
the list easily - there's no need to "find" it.
> So for the objects that we need to track in order to
> invalidate if needed, I guess it is easy enough and makes sense to do
> some memoization.
>
> And there is value in being consistent and predictable, so if we do
> some
> memoization for some objects, we might as well do it for all objects
> (especially if it can easily be abstracted away).
>
> Between 2 and 3, again, it is a memory vs time tradeoff. If you are
> looping over all symbols, get the CU from the symbol to check
> something
> and then throw the CU away, do you prefer to create/delete a Python
> object every time? If not, that means keeping some Python objects
> around "forever" (until the objfile gets delete, or really forever in
> the case of arch-owned objects), that may never get used again.
> Intuitively I'd say I'm slightly in favor of just doing temporary
> memoization, because I'm hoping that allocating/initializing/deleting
> a
> Python object is fast enough, and I don't really like the idea of
> keeping unused things around, but benchmarks / profiling would help.
I have no strong preference here. gdb.Block currently implements
temporary memoization (2). My code implements permament memoization (3)
mainly because this is what Andrew suggested (if I understood
correctly).
I'll try to adapt the code to do temporary memoization to see how it
feels.
>
> Overall, I would say that I think memoization is a good idea and I
> like
> where you're going with this series.
>
Thanks,
Jan
next prev parent reply other threads:[~2025-02-20 17:51 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-27 10:44 Jan Vrany
2025-01-27 10:44 ` [RFC 1/9] gdb/python: preserve identity for gdb.Symtab objects Jan Vrany
2025-01-27 10:44 ` [RFC 2/9] gdb/python: preserve identity for gdb.Symbol objects Jan Vrany
2025-02-20 19:22 ` Tom Tromey
2025-01-27 10:44 ` [RFC 3/9] gdb/python: do not hold on gdb.Symtab object from gdb.Symtab_and_line Jan Vrany
2025-01-27 10:44 ` [RFC 4/9] gdb/python: preserve identity for gdb.Type objects Jan Vrany
2025-01-27 10:44 ` [RFC 5/9] gdb/python: introduce gdbpy_registry Jan Vrany
2025-02-20 19:28 ` Tom Tromey
2025-01-27 10:44 ` [RFC 6/9] gdb/python: convert gdb.Symbol to use gdbpy_registry Jan Vrany
2025-01-27 10:44 ` [RFC 7/9] gdb/python: convert gdb.Type " Jan Vrany
2025-01-27 10:44 ` [RFC 8/9] gdb/python: convert gdb.Symtab " Jan Vrany
2025-01-27 10:44 ` [RFC 9/9] gdb/python: convert gdb.Symtab_and_line " Jan Vrany
2025-02-18 11:15 ` [PING] Re: [RFC 0/9] Attempt to unify Python object's lifecycle Jan Vraný
2025-02-19 21:00 ` Simon Marchi
2025-02-20 17:50 ` Jan Vraný [this message]
2025-02-20 19:18 ` Tom Tromey
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6779ca1e67acc2081194631fb01b423215401d2a.camel@labware.com \
--to=jan.vrany@labware.com \
--cc=aburgess@redhat.com \
--cc=gdb-patches@sourceware.org \
--cc=simark@simark.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox