From: Matthieu Longo via Gdb <gdb@sourceware.org>
To: gdb@sourceware.org
Cc: Tamar Christina <tamar.christina@arm.com>,
Andre Simoes Dias Vieira <andre.aimoesdiasvieira@arm.com>,
Tom Tromey <tom@tromey.com>,
Simon Marchi <simon.marchi@polymtl.ca>,
Luis Machado <luis.machado@arm.com>,
Andrew Burgess <aburgess@redhat.com>
Subject: [RFC] Allowing GDB to use a more recent version of Python at runtime than it was compiled with
Date: Thu, 29 May 2025 17:14:07 +0100 [thread overview]
Message-ID: <314abf0a-007c-457d-bcc3-c28384b9f098@arm.com> (raw)
## End goal
Allowing GDB to use a more recent version of Python at runtime than it
was compiled with.
## Arm's context
We have a CI building release artifacts of Arm GNU toolchains (GDB is
shipped along the toolchain) on a Linux distribution with the oldest
glibc version among all the distributions that we want to support.
The rationale for using the distribution with the oldest glibc version
is to leverage the ABI backward compatibility of glibc. If we build a C
binary on the old distribution, we expect it to work on more recent
distributions.
Same thing for C++ artifacts. See
https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html for more
information.
However, CPython does not provide backward compatibility guarantees as
glibc or libstdc++ does. This means that a GDB compiled against an old
CPython won't run with a more recent CPython due to ABI divergences
between the Python versions [1].
This situation obliges us to provide GDB with two flavors, one with
Python support and one without (in case there is a conflicting version
of Python already installed on the system).
## General packaging difficulties
Usually, Linux distributions embed a default Python version and don't
change it much along the life time of a release. Providing a GDB for a
distribution means that we would have to compile and bundle a GDB for
each of the supported distributions. Given the number of distributions
out there, this approach does not look very sustainable for us (even
using Docker [2]).
Others platforms, like Windows, don't ship Python by default. This means
that we cannot know the version of Python that the user has installed or
will install on his machine.
## How does CPython propose to tackle this issue ?
PEP 384 – Defining a Stable ABI [3] and PEP 652 – Maintaining the Stable
ABI [4] defined a Limited API and Stable ABI, which allow extenders and
embedders of CPython to compile extension modules that are
binary-compatible with any subsequent version of 3.x
More details about the current state of this limited API is available at
https://docs.python.org/3/c-api/stable.html
### Experiment: build the master branch of GDB with the Python limited API
I defined `#define Py_LIMITED_API 0x03020000` (Python 3.2: introduction
of the limited API) before `#include <Python.h>` in
gdb/python/python-internal.h, and tried to recompile GDB to have an
rough idea of how many incompatibilities there is.
I got errors about Py_buffer and PyBuffer_Release (part of the Stable
ABI (including all members) since version 3.11, so changed
Py_LIMITED_API to 0x030b0000):
https://docs.python.org/3/c-api/buffer.html#c.Py_buffer
All the remaining errors seems to be related to PyTypeObject:
https://docs.python.org/3/c-api/type.html#c.PyTypeObject
The Python doc says:
> Part of the Limited API (as an opaque struct).
Except that GDB uses it as a non-opaque structure.
### Experiment summary
At a first glance, :
- The main bottle neck to make GDB compliant with the limited Python API
is PyTypeObject, which is used in a lot of places (41 c files out of 51
in gdb/python).
- The secondary issue to support older version than 3.11 is Py_buffer
and related functions. Pragmatically, this does not look like an issue
we want to tackle unless we absolutely want to support older versions
than 3.11. For information, Python 3.11 should reach its EOL in 2027 [5].
There might be more issues as the build process stopped and didn't go
over all the files in gdb/python.
Regarding the first issue, from what I could understand of the usages of
PyTypeObject in the GDB code base, they seem to be used to declare
"static types" to Python, i.e. exposing C data structure to Python. More
information at https://docs.python.org/3/c-api/typeobj.html#static-types
Making the usage of PyTypeObject opaque would consist in transforming
the declaration of those PyTypeObjects to a static PyType_Slot and
PyType_Spec equivalent, and then creating a PyObject* by calling
PyType_FromSpec() instead of PyModule_AddObject(), which is, by the way,
"soft deprecated" since Python 3.13.
The Python extensions seem to have a reasonable test coverage (tests in
gdb/testsuite/gdb.python), but I don't have any information on its
completeness. The testsuite should be a good enough safety net to at
least validate the approach, but might require new tests to make sure no
regression are introduced.
Given the data above, and after reading an example of such a transition
to the limited Python ABI in Qt [6], making GDB use the limited C API
does not seem out of reach.
## Shared library naming issue in CPython
For several years, it seems that the CPython project has an issue when
generating libpython3.so, the expected name of the Python shared library
exposing the Python limited C API. The file is empty and doesn't contain
any symbol.
Ticket: https://github.com/python/cpython/issues/104612 (status: OPEN)
From what I could understand, libpython3.so is supposed to only expose
the symbols of the Python limited C API, and have a runtime dependency
against libpython3.x.so. However, it has never been implemented and in
the meantime, someone decided to make it an empty file. Since nobody
followed up on this, Python build still generates an empty file.
So packagers came with a workaround (for those who cared) for people
trying to use the Python limited C API, i.e. with a DT_NEEDED reference
in the binary set to libpython3.so. They either created a symbolic link
libpython3.so to libpython3.x.so, or copied and renamed libpython3.x.so
With this solution, the limited C API is not isolated from the
whole(=unstable) Python C API, but a program using only the limited C
API should work in the same way.
### How does it impact GDB ?
Let's assume that GDB can compile only using the Python limited C API.
Following the logic of PEP-384, linking against the Python limited C API
means linking against libpython3.so, not libpython3.x.so
With in the current state of Python packaging, this creates issues at
different stages:
1. Most of packagers don't expose a symbolic link libpython3.so to
libpython3.x.so, or a copy of libpython3.x.so named libpython3.so
2. If it is not exposed on the build environment, we can still link
against libpython3.x.so, and patch manually the reference in DT_NEEDED
before bundling the artifact in the distributed archive.
```
patchelf --replace-needed libpython3.12.so.1.0 libpython3.so gdb
```
3. If the runtime environment does not have libpython3.so, the user
needs to create a symbolic link himself.
### A light of hope from some packagers
Brew (on Linux, I didn't test on MacOS) exposes a symbolic link
libpython3.so (instead of the actual shared library for the limited C
API) which points to libpython3.x.so
There are probably others good examples out there that I am not aware of.
## Conclusion
Clearly, today the user experience of GDB with Python enabled can vary a
lot depending on how Python was installed on the end system, and how GDB
was built by the packagers.
Allowing GDB to use a more recent version of Python at runtime than it
was compiled with, would allow to improve both the user experience, and
also the packagers experience.
From my understanding, this goal can be achieved thanks to the two
following conditions:
- one internal to GDB: adapting GDB to use the Python limited C API.
- one external to GDB: lobbying the packagers to provide a default
Python installation with libpython3.so so that a binary using the Python
limited C API is supported without change to the build and runtime
environment.
## RFC
Please let me know your thoughts on the above.
On the recommendation of Luis Machado, I added in CC Andrew Burgess, Tom
Tromer and Simon Marchi. Feel free to add anyone you think relevant into
the loop.
I am particularly interested in:
- any experience using the Python limited C API.
- stories of transition to the Python limited C API, and any pain point
you met on the road.
- testing. I have had a look at the test coverage of the Python GDB API,
and it seems ok at a first glance. Please let me know if you can see any
difficulty with the current coverage.
- any performance impact caused by this transition that I should be
aware of. And how did you measure it ? How critical are performances
regarding the Python limited C API ?
- implementation details in GDB where you think that we might face
problems with this approach.
## References
[1] This is true unless the program uses the Python limited C API, and
the name of the shared library in DT_NEEDED matches libpython.so.
[2] This topic was already raised on the GDB mailing list in the past:
https://inbox.sourceware.org/gdb/0aa701db281b$b8ac9df0$2a05d9d0$@symas.com/
The proposed "right way" of packaging GDB consists in building GDB for
every supported distributions against the default Python version of that
distribution. This approach can certainly be made easier using Docker
instead of physical machine.
[3] PEP 384 – Defining a Stable ABI, https://peps.python.org/pep-0384/
[4] PEP 652 – Maintaining the Stable ABI, https://peps.python.org/pep-0652/
[5] Status of Python versions, https://devguide.python.org/versions/
[6] Qt: The Transition To The Limited Python API (PEP384),
https://doc.qt.io/qtforpython-6/developer/limited_api.html
next reply other threads:[~2025-05-29 16:15 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-29 16:14 Matthieu Longo via Gdb [this message]
2025-05-29 17:42 ` Tom Tromey
2025-06-13 14:04 ` Andrew Burgess via Gdb
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=314abf0a-007c-457d-bcc3-c28384b9f098@arm.com \
--to=gdb@sourceware.org \
--cc=aburgess@redhat.com \
--cc=andre.aimoesdiasvieira@arm.com \
--cc=luis.machado@arm.com \
--cc=matthieu.longo@arm.com \
--cc=simon.marchi@polymtl.ca \
--cc=tamar.christina@arm.com \
--cc=tom@tromey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox