From: Abhay Kandpal <abhay@linux.ibm.com>
To: Kevin Buettner <kevinb@redhat.com>, gdb-patches@sourceware.org
Subject: Re: [PATCH] gcore: Handle unreadable pages within readable memory regions
Date: Tue, 10 Feb 2026 23:28:27 +0530 [thread overview]
Message-ID: <f2f0ab5f-36dc-4780-b6eb-bc0a49518ab3@linux.ibm.com> (raw)
In-Reply-To: <20260130082212.2002944-2-kevinb@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 8313 bytes --]
Hi Kevin,
Thanks for sharing this patch and for the detailed end-to-end explanation — it matches exactly what I’ve been seeing.
I’ve independently hit the same issue with glibc 2.42 on kernels > 6.3, due to the MADV_GUARD_INSTALL change.
The behavior is reproducible with both upstream and custom GDB.
I tested your patch on ppc64le and x86_64, and it fixes the problem on both.
This looks like a generic GDB issue triggered by the new glibc stack layout rather than an architecture-specific problem.
The approach of falling back to page-by-page reads on failure looks correct to me.
Thanks again for the clear analysis and fix.
BR
Abhay
On 30/01/26 13:52, Kevin Buettner wrote:
> GLIBC 2.42 changed how thread stack guard pages are implemented [2].
> In GLIBC 2.41 and earlier, guard pages were set up using mprotect() to
> mark guard regions with no permissions. Once configured, guard pages
> were visible as separate entries in /proc/PID/maps with no permissions
> (i.e. they're inaccessible). In GLIBC 2.42, guard pages are
> installed using the kernel's MADV_GUARD_INSTALL mechanism [1], which
> marks them at the page table entry (PTE) level within the existing
> mapping.
>
> As a consequence, guard pages do not appear as separate entries in
> /proc/PID/maps, but remain as part of the containing mapping. Moreover,
> thread stacks from multiple mmap() calls may be merged into a single
> virtual memory area (VMA) with read and write permissions since there's
> no guard page VMA to separate them. These guard pages cannot be
> distinguished by examining VMA listings but do return EIO when read
> from /proc/PID/mem.
>
> GDB's gcore code reads /proc/PID/smaps to discover memory regions and
> creates one BFD section per mapping. (On linux, this is performed in
> linux_find_memory_regions_full in linux-tdep.c.) With the old layout,
> memory areas with guard pages appeared separately with no permissions,
> which were filtered out. Each thread stack became its own section
> containing only readable data. With the new layout, using
> MADV_GUARD_INSTALL instead of the older mechanism, it's often the case
> that thread stacks created with multiple calls to mmap() are exposed
> as a single mapping appearing in /proc/PID/smaps with read and write
> permissions. Should that happen, GDB's code creates a single section
> covering all thread stacks and their guard pages. (Even if each
> thread stack appears in its own mapping, the fact remains that there
> will be an inaccessible portion of the mapping. When one or more
> thread stacks are coalesced into a single mapping, there will be
> several inaccessible "holes" representing the guard pages.)
>
> When gcore_copy_callback copies section contents, it reads memory in
> 1MB (MAX_COPY_BYTES) chunks. If any page in the chunk is a guard page,
> the call to target_read_memory() fails. The old code responded by
> breaking out of the copy loop, abandoning the entire section. This
> prevents correct copying of thread stack data, resulting in core files
> with zero-filled thread stacks, resulting in nearly empty backtraces.
>
> Fix this by falling back to page-by-page reading when a 1MB chunk read
> fails. Individual pages that cannot be read are filled with zeros,
> allowing the remaining readable memory to be captured.
>
> I also considered a simpler change using SPARSE_BLOCK_SIZE (4096)
> as the read size instead of MAX_COPY_BYTES (1MB). This would avoid
> the fallback logic but would cause up to 256x more syscalls. The
> proposed approach also allows meaningful warnings: we warn only if an
> entire region is unreadable (indicating a real problem), whereas
> per-page reads would make it harder to distinguish guard page failures
> from actual errors. Since guard pages are at offset 0 for
> downward-growing stacks, a large target_read_memory() fails early at
> the first unreadable byte anyway.
>
> With this fix, I see 16 failures resolved in the following test cases:
>
> gdb.ada/task_switch_in_core.exp
> gdb.arch/i386-tls-regs.exp
> gdb.threads/threadcrash.exp
> gdb.threads/tls-core.exp
>
> Looking at just one of these, from gdb.log without the fix, I see:
>
> thread apply 5 backtrace
>
> Thread 5 (LWP 3414829):
> #0 0x00007ffff7d1d982 in __syscall_cancel_arch () from /lib64/libc.so.6
> #1 0x0000000000000000 in ?? ()
> (gdb) FAIL: gdb.threads/threadcrash.exp: test_gcore: thread apply 5 backtrace
>
> And this is what it looks like with the fix in place (some paths have
> been shortened):
>
> thread apply 5 backtrace
>
> Thread 5 (Thread 0x7fffeffff6c0 (LWP 1282651) "threadcrash"): #0 0x00007ffff7d1d982 in __syscall_cancel_arch () from
> /lib64/libc.so.6 #1 0x00007ffff7d11c3c in __internal_syscall_cancel ()
> from /lib64/libc.so.6 #2 0x00007ffff7d61b62 in
> clock_nanosleep@GLIBC_2.2.5 () from /lib64/libc.so.6 #3
> 0x00007ffff7d6db37 in nanosleep () from /lib64/libc.so.6 #4
> 0x00007ffff7d8008e in sleep () from /lib64/libc.so.6 #5
> 0x00000000004006a8 in do_syscall_task (location=NORMAL) at
> threadcrash.c:158 #6 0x0000000000400885 in thread_function
> (arg=0x404340) at threadcrash.c:277 #7 0x00007ffff7d15464 in
> start_thread () from /lib64/libc.so.6 #8 0x00007ffff7d985ac in
> __clone3 () from /lib64/libc.so.6 (gdb) PASS:
> gdb.threads/threadcrash.exp: test_live_inferior: thread apply 5
> backtrace Regression testing on Fedora 42 (glibc 2.41) shows no new
> failures. References: [1] Linux commit 662df3e5c376 ("mm: madvise: implement lightweight
> guard page mechanism")
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=662df3e5c37666d6ed75c88098699e070a4b35b5
> [2] glibc commit a6fbe36b7f31 ("nptl: Add support for setup guard
> pages with MADV_GUARD_INSTALL")
> https://sourceware.org/git/?p=glibc.git;a=commit;h=a6fbe36b7f31292981422692236465ab56670ea9
>
> Claude Opus 4.5 and GLM 4.7 assisted with the development of this commit.
>
> Bug:https://sourceware.org/bugzilla/show_bug.cgi?id=33855
> ---
> gdb/gcore.c | 46 ++++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 38 insertions(+), 8 deletions(-)
>
> diff --git a/gdb/gcore.c b/gdb/gcore.c
> index 5a3ad145d4c..6b36e6064ac 100644
> --- a/gdb/gcore.c
> +++ b/gdb/gcore.c
> @@ -765,15 +765,45 @@ gcore_copy_callback (bfd *obfd, asection *osec)
> if (size > total_size)
> size = total_size;
>
> - if (target_read_memory (bfd_section_vma (osec) + offset,
> - memhunk.data (), size) != 0)
> + CORE_ADDR vma = bfd_section_vma (osec) + offset;
> +
> + if (target_read_memory (vma, memhunk.data (), size) != 0)
> {
> - warning (_("Memory read failed for corefile "
> - "section, %s bytes at %s."),
> - plongest (size),
> - paddress (current_inferior ()->arch (),
> - bfd_section_vma (osec)));
> - break;
> + /* Large read failed. This can happen when the memory region
> + contains unreadable pages (such as guard pages embedded within
> + a larger mapping). Fall back to reading page by page, filling
> + unreadable pages with zeros. */
> + gdb_byte *p = memhunk.data ();
> + bfd_size_type remaining = size;
> + CORE_ADDR addr = vma;
> + bool at_least_one_page_read = false;
> +
> + while (remaining > 0)
> + {
> + bfd_size_type chunk_size
> + = std::min (remaining, (bfd_size_type) SPARSE_BLOCK_SIZE);
> +
> + if (target_read_memory (addr, p, chunk_size) != 0)
> + {
> + /* Failed to read this page. Fill with zeros. This
> + handles guard pages and other unreadable regions
> + that may exist within a larger readable mapping. */
> + memset (p, 0, chunk_size);
> + }
> + else
> + at_least_one_page_read = true;
> +
> + p += chunk_size;
> + addr += chunk_size;
> + remaining -= chunk_size;
> + }
> + /* Warn only if the entire region was unreadable - this
> + indicates a real problem, not just embedded guard pages. */
> + if (!at_least_one_page_read)
> + warning (_("Memory read failed for corefile "
> + "section, %s bytes at %s."),
> + plongest (size),
> + paddress (current_inferior ()->arch (), vma));
> }
>
> if (!sparse_bfd_set_section_contents (obfd, osec, memhunk.data (),
[-- Attachment #2: Type: text/html, Size: 9800 bytes --]
next prev parent reply other threads:[~2026-02-10 17:59 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-30 8:22 Kevin Buettner
2026-02-10 17:58 ` Abhay Kandpal [this message]
2026-02-13 18:44 ` Kevin Buettner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f2f0ab5f-36dc-4780-b6eb-bc0a49518ab3@linux.ibm.com \
--to=abhay@linux.ibm.com \
--cc=gdb-patches@sourceware.org \
--cc=kevinb@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox