Hi Kevin, Thanks for sharing this patch and for the detailed end-to-end explanation — it matches exactly what I’ve been seeing. I’ve independently hit the same issue with glibc 2.42 on kernels > 6.3, due to the MADV_GUARD_INSTALL change. The behavior is reproducible with both upstream and custom GDB. I tested your patch on ppc64le and x86_64, and it fixes the problem on both. This looks like a generic GDB issue triggered by the new glibc stack layout rather than an architecture-specific problem. The approach of falling back to page-by-page reads on failure looks correct to me. Thanks again for the clear analysis and fix. BR Abhay On 30/01/26 13:52, Kevin Buettner wrote: > GLIBC 2.42 changed how thread stack guard pages are implemented [2]. > In GLIBC 2.41 and earlier, guard pages were set up using mprotect() to > mark guard regions with no permissions. Once configured, guard pages > were visible as separate entries in /proc/PID/maps with no permissions > (i.e. they're inaccessible). In GLIBC 2.42, guard pages are > installed using the kernel's MADV_GUARD_INSTALL mechanism [1], which > marks them at the page table entry (PTE) level within the existing > mapping. > > As a consequence, guard pages do not appear as separate entries in > /proc/PID/maps, but remain as part of the containing mapping. Moreover, > thread stacks from multiple mmap() calls may be merged into a single > virtual memory area (VMA) with read and write permissions since there's > no guard page VMA to separate them. These guard pages cannot be > distinguished by examining VMA listings but do return EIO when read > from /proc/PID/mem. > > GDB's gcore code reads /proc/PID/smaps to discover memory regions and > creates one BFD section per mapping. (On linux, this is performed in > linux_find_memory_regions_full in linux-tdep.c.) With the old layout, > memory areas with guard pages appeared separately with no permissions, > which were filtered out. Each thread stack became its own section > containing only readable data. With the new layout, using > MADV_GUARD_INSTALL instead of the older mechanism, it's often the case > that thread stacks created with multiple calls to mmap() are exposed > as a single mapping appearing in /proc/PID/smaps with read and write > permissions. Should that happen, GDB's code creates a single section > covering all thread stacks and their guard pages. (Even if each > thread stack appears in its own mapping, the fact remains that there > will be an inaccessible portion of the mapping. When one or more > thread stacks are coalesced into a single mapping, there will be > several inaccessible "holes" representing the guard pages.) > > When gcore_copy_callback copies section contents, it reads memory in > 1MB (MAX_COPY_BYTES) chunks. If any page in the chunk is a guard page, > the call to target_read_memory() fails. The old code responded by > breaking out of the copy loop, abandoning the entire section. This > prevents correct copying of thread stack data, resulting in core files > with zero-filled thread stacks, resulting in nearly empty backtraces. > > Fix this by falling back to page-by-page reading when a 1MB chunk read > fails. Individual pages that cannot be read are filled with zeros, > allowing the remaining readable memory to be captured. > > I also considered a simpler change using SPARSE_BLOCK_SIZE (4096) > as the read size instead of MAX_COPY_BYTES (1MB). This would avoid > the fallback logic but would cause up to 256x more syscalls. The > proposed approach also allows meaningful warnings: we warn only if an > entire region is unreadable (indicating a real problem), whereas > per-page reads would make it harder to distinguish guard page failures > from actual errors. Since guard pages are at offset 0 for > downward-growing stacks, a large target_read_memory() fails early at > the first unreadable byte anyway. > > With this fix, I see 16 failures resolved in the following test cases: > > gdb.ada/task_switch_in_core.exp > gdb.arch/i386-tls-regs.exp > gdb.threads/threadcrash.exp > gdb.threads/tls-core.exp > > Looking at just one of these, from gdb.log without the fix, I see: > > thread apply 5 backtrace > > Thread 5 (LWP 3414829): > #0 0x00007ffff7d1d982 in __syscall_cancel_arch () from /lib64/libc.so.6 > #1 0x0000000000000000 in ?? () > (gdb) FAIL: gdb.threads/threadcrash.exp: test_gcore: thread apply 5 backtrace > > And this is what it looks like with the fix in place (some paths have > been shortened): > > thread apply 5 backtrace > > Thread 5 (Thread 0x7fffeffff6c0 (LWP 1282651) "threadcrash"): #0 0x00007ffff7d1d982 in __syscall_cancel_arch () from > /lib64/libc.so.6 #1 0x00007ffff7d11c3c in __internal_syscall_cancel () > from /lib64/libc.so.6 #2 0x00007ffff7d61b62 in > clock_nanosleep@GLIBC_2.2.5 () from /lib64/libc.so.6 #3 > 0x00007ffff7d6db37 in nanosleep () from /lib64/libc.so.6 #4 > 0x00007ffff7d8008e in sleep () from /lib64/libc.so.6 #5 > 0x00000000004006a8 in do_syscall_task (location=NORMAL) at > threadcrash.c:158 #6 0x0000000000400885 in thread_function > (arg=0x404340) at threadcrash.c:277 #7 0x00007ffff7d15464 in > start_thread () from /lib64/libc.so.6 #8 0x00007ffff7d985ac in > __clone3 () from /lib64/libc.so.6 (gdb) PASS: > gdb.threads/threadcrash.exp: test_live_inferior: thread apply 5 > backtrace Regression testing on Fedora 42 (glibc 2.41) shows no new > failures. References: [1] Linux commit 662df3e5c376 ("mm: madvise: implement lightweight > guard page mechanism") > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=662df3e5c37666d6ed75c88098699e070a4b35b5 > [2] glibc commit a6fbe36b7f31 ("nptl: Add support for setup guard > pages with MADV_GUARD_INSTALL") > https://sourceware.org/git/?p=glibc.git;a=commit;h=a6fbe36b7f31292981422692236465ab56670ea9 > > Claude Opus 4.5 and GLM 4.7 assisted with the development of this commit. > > Bug:https://sourceware.org/bugzilla/show_bug.cgi?id=33855 > --- > gdb/gcore.c | 46 ++++++++++++++++++++++++++++++++++++++-------- > 1 file changed, 38 insertions(+), 8 deletions(-) > > diff --git a/gdb/gcore.c b/gdb/gcore.c > index 5a3ad145d4c..6b36e6064ac 100644 > --- a/gdb/gcore.c > +++ b/gdb/gcore.c > @@ -765,15 +765,45 @@ gcore_copy_callback (bfd *obfd, asection *osec) > if (size > total_size) > size = total_size; > > - if (target_read_memory (bfd_section_vma (osec) + offset, > - memhunk.data (), size) != 0) > + CORE_ADDR vma = bfd_section_vma (osec) + offset; > + > + if (target_read_memory (vma, memhunk.data (), size) != 0) > { > - warning (_("Memory read failed for corefile " > - "section, %s bytes at %s."), > - plongest (size), > - paddress (current_inferior ()->arch (), > - bfd_section_vma (osec))); > - break; > + /* Large read failed. This can happen when the memory region > + contains unreadable pages (such as guard pages embedded within > + a larger mapping). Fall back to reading page by page, filling > + unreadable pages with zeros. */ > + gdb_byte *p = memhunk.data (); > + bfd_size_type remaining = size; > + CORE_ADDR addr = vma; > + bool at_least_one_page_read = false; > + > + while (remaining > 0) > + { > + bfd_size_type chunk_size > + = std::min (remaining, (bfd_size_type) SPARSE_BLOCK_SIZE); > + > + if (target_read_memory (addr, p, chunk_size) != 0) > + { > + /* Failed to read this page. Fill with zeros. This > + handles guard pages and other unreadable regions > + that may exist within a larger readable mapping. */ > + memset (p, 0, chunk_size); > + } > + else > + at_least_one_page_read = true; > + > + p += chunk_size; > + addr += chunk_size; > + remaining -= chunk_size; > + } > + /* Warn only if the entire region was unreadable - this > + indicates a real problem, not just embedded guard pages. */ > + if (!at_least_one_page_read) > + warning (_("Memory read failed for corefile " > + "section, %s bytes at %s."), > + plongest (size), > + paddress (current_inferior ()->arch (), vma)); > } > > if (!sparse_bfd_set_section_contents (obfd, osec, memhunk.data (),