Re: [RFC 0/9] Attempt to unify Python object's lifecycle

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

From: "Jan Vraný" <Jan.Vrany@labware.com>
To: "simark@simark.ca" <simark@simark.ca>,
	"gdb-patches@sourceware.org" <gdb-patches@sourceware.org>
Cc: "aburgess@redhat.com" <aburgess@redhat.com>
Subject: Re: [RFC 0/9] Attempt to unify Python object's lifecycle
Date: Thu, 20 Feb 2025 17:50:54 +0000	[thread overview]
Message-ID: <6779ca1e67acc2081194631fb01b423215401d2a.camel@labware.com> (raw)
In-Reply-To: <3714b9a2-a8cb-423e-878b-a80f3b52f8e9@simark.ca>

On Wed, 2025-02-19 at 16:00 -0500, Simon Marchi wrote:
> 
> 
> On 2025-01-27 05:44, Jan Vrany wrote:
> > Hi,
> > 
> > this RFC is an attempt to systematically address Andrew's
> > comment on a patch I submitted earlier [1]. In particular,
> > this part:
> > 
> > > I don't really like the approach taken by this function.  Each
> > > time the
> > > function is called we get a new gdb.Compunit object created and
> > > chained
> > > onto the objfile.
> > 
> > This is in fact true for other objects - gdb.Symtab, gdb.Symbol,
> > gdb.Type. Moreover each of these objects have their own copy
> > of (largely same) housekeeping code. The gdb.Block used to be
> > the same but not long ago it was changed so gdb.Blocks are now
> > interned (memoized). From the user point of view, I found it bit
> > counter-intuitive.
> > 
> > My idea was to refactor this housekeeping code into a common class.
> > This RFC is result of several iterations. Tested on x86_64-linux.
> > 
> > First four commits expand existing code to intern (memoize)
> > gdb.Symtab, gdb.Symbol and gdb.Type. The rest then introduces
> > template classes that implement necessary housekeeping and
> > converts gdb.Symtab, gdb.Symbol and gdb.Type to use it.
> > It could go further (one can convert gdb.Block too and
> > gdb.Value and gdb.Value.type and dynamic_type could be
> > further simplified too) but I decided to stop here.
> > 
> > The main reason is that it turned not to be as simple as
> > I thought - there seem to be few differences here and there
> > (see [2]). This complicated the housekeeping classes
> > (gdbpy_registry and associated "storage"). My C++ is
> > is pretty basic so perhaps there's better and simpler way
> > of doing it. Another reason is that I was hoping for some code
> > reduction in terms of size but looking at the result,
> > there's hardly any. On the other hand, the lifecycle management
> > is more unified across different Python objects.
> > 
> > All in all, I'm not sure this is the best approach and worth
> > it. By this RFC, I'd like to solicit feedback from experienced
> > GDB developers on how to move on.
> > 
> > Basically I see following options:
> > 
> > 1) Do not change anything in this area (I'm perfectly
> >    happy with that).
> > 2) Intern (memoize) Python objects (where it makes sense)
> >    but keep the current approach. Basically first four
> >    commits of this RFC.
> > 3) Continue working on this.
> > 
> > Thanks,
> > 
> > Jan
> > 
> > [1]:
> > https://inbox.sourceware.org/gdb-patches/87o70ar34z.fsf@redhat.com
> > [2]:
> > https://inbox.sourceware.org/gdb/6c73f834d3f7b545188e1999125e7ae63ae83eab.camel@vrany.io/T/#u
> 
> I'm sorry, I don't have any answers for your questions in [2].
> 
> I scanned your patch series, and I'm not sure about one thing: once
> the
> user is done with a Python object and releases all their reference,
> do
> we delete the Python object, or do we keep it around for the next
> time
> it might be needed?  I guess both would be valid approaches.

In the series I have posted Python objects are kept around and
destroyed only when owner is gone. So it implements choice 3
from choices below. 

> 
> Let me summarize our choices, see if I get it right:
> 
>  1. No memoization at all: all calls to, e.g., `block.compunit`, will
>  return a new `gdb.CompUnit`.  Once the user releases their
> references
>  on that `gdb.CompUnit`, it is freed.  For some types, we need to
> track
>  the Python objects that are alive (e.g. through the registry you
> made)
>  in order to invalidate them if the backing object disappears.  The
>  registry holds no reference to the object.  When the refcount of the
>  Python object gets to 0 (the user is done with it), the object gets
>  removed from the registry (through the tp_dealloc callback).
> 
>  2. Temporary memoization: a first call to `block.compunit` will
> return
>  a new `gdb.CompUnit`, which gets saved in the per-objfile registry. 
> If
>  the user requests the same comp unit, we can return the same Python
>  object.  The registry still doesn't hold a reference to the Python
>  object.  Like for the previous case, when the user is done with the
>  object, we remove it from our registry (through the tp_dealloc
>  callback).  If the user requests a Python object for the same comp
> unit
>  later, a new Python object will be created.
> 
>  3. Permanent memoization: like temporary memoization, except that
> the
>  registry hold a reference to the Python object.  When the user drops
>  all their references, the registry keeps the Python alive, such that
> if
>  they request a Python object for the same comp unit later, they will
>  get the same Python object as before.  The Python objects only get
>  deleted when either the owning objfile gets deleted or the owning
> arch
>  gets deleted (for the former, it means the objects will exist until
> the
>  end of the process, as arches don't get deleted).
> 
> With approach 1, for the objects we need to track (to invalidate
> eventually) we certainly want to forget about Python objects that the
> user is done with, and not hold onto them forever (otherwise, that
> would
> be like a memory leak).  If the user decides to keep alive 50000
> Python
> objects wrapping the same comp unit, then we need to track them all. 
> If
> one of those 50000 objects gets deleted (its refcount drops to 0), we
> need to find it in the linked list of at least 50000 items to remove
> it.
> That sounds bad.  

The way it is currently implemented, it is okay. Python objects
themselves form the doubly-linked list so one can remove it from
the list easily - there's no need to "find" it. 

> So for the objects that we need to track in order to
> invalidate if needed, I guess it is easy enough and makes sense to do
> some memoization.
> 
> And there is value in being consistent and predictable, so if we do
> some
> memoization for some objects, we might as well do it for all objects
> (especially if it can easily be abstracted away).
> 
> Between 2 and 3, again, it is a memory vs time tradeoff.  If you are
> looping over all symbols, get the CU from the symbol to check
> something
> and then throw the CU away, do you prefer to create/delete a Python
> object every time?  If not, that means keeping some Python objects
> around "forever" (until the objfile gets delete, or really forever in
> the case of arch-owned objects), that may never get used again.
> Intuitively I'd say I'm slightly in favor of just doing temporary
> memoization, because I'm hoping that allocating/initializing/deleting
> a
> Python object is fast enough, and I don't really like the idea of
> keeping unused things around, but benchmarks / profiling would help.

I have no strong preference here. gdb.Block currently implements
temporary memoization (2). My code implements permament memoization (3)
mainly because this is what Andrew suggested (if I understood  
correctly). 

I'll try to adapt the code to do temporary memoization to see how it
feels. 

> 
> Overall, I would say that I think memoization is a good idea and I
> like
> where you're going with this series.
> 

Thanks,
Jan

next prev parent reply	other threads:[~2025-02-20 17:51 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-27 10:44 Jan Vrany
2025-01-27 10:44 ` [RFC 1/9] gdb/python: preserve identity for gdb.Symtab objects Jan Vrany
2025-01-27 10:44 ` [RFC 2/9] gdb/python: preserve identity for gdb.Symbol objects Jan Vrany
2025-02-20 19:22   ` Tom Tromey
2025-01-27 10:44 ` [RFC 3/9] gdb/python: do not hold on gdb.Symtab object from gdb.Symtab_and_line Jan Vrany
2025-01-27 10:44 ` [RFC 4/9] gdb/python: preserve identity for gdb.Type objects Jan Vrany
2025-01-27 10:44 ` [RFC 5/9] gdb/python: introduce gdbpy_registry Jan Vrany
2025-02-20 19:28   ` Tom Tromey
2025-01-27 10:44 ` [RFC 6/9] gdb/python: convert gdb.Symbol to use gdbpy_registry Jan Vrany
2025-01-27 10:44 ` [RFC 7/9] gdb/python: convert gdb.Type " Jan Vrany
2025-01-27 10:44 ` [RFC 8/9] gdb/python: convert gdb.Symtab " Jan Vrany
2025-01-27 10:44 ` [RFC 9/9] gdb/python: convert gdb.Symtab_and_line " Jan Vrany
2025-02-18 11:15 ` [PING] Re: [RFC 0/9] Attempt to unify Python object's lifecycle Jan Vraný
2025-02-19 21:00 ` Simon Marchi
2025-02-20 17:50   ` Jan Vraný [this message]
2025-02-20 19:18 ` Tom Tromey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6779ca1e67acc2081194631fb01b423215401d2a.camel@labware.com \
    --to=jan.vrany@labware.com \
    --cc=aburgess@redhat.com \
    --cc=gdb-patches@sourceware.org \
    --cc=simark@simark.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox