Hi Kevin,

Thanks for sharing this patch and for the detailed end-to-end explanation — it matches exactly what I’ve been seeing.

I’ve independently hit the same issue with glibc 2.42 on kernels > 6.3, due to the MADV_GUARD_INSTALL change.
The behavior is reproducible with both upstream and custom GDB.

I tested your patch on ppc64le and x86_64, and it fixes the problem on both.
This looks like a generic GDB issue triggered by the new glibc stack layout rather than an architecture-specific problem.

The approach of falling back to page-by-page reads on failure looks correct to me.
Thanks again for the clear analysis and fix.

BR
Abhay

On 30/01/26 13:52, Kevin Buettner wrote:
> GLIBC 2.42 changed how thread stack guard pages are implemented [2].
> In GLIBC 2.41 and earlier, guard pages were set up using mprotect() to
> mark guard regions with no permissions.  Once configured, guard pages
> were visible as separate entries in /proc/PID/maps with no permissions
> (i.e. they're inaccessible).  In GLIBC 2.42, guard pages are
> installed using the kernel's MADV_GUARD_INSTALL mechanism [1], which
> marks them at the page table entry (PTE) level within the existing
> mapping.
>
> As a consequence, guard pages do not appear as separate entries in
> /proc/PID/maps, but remain as part of the containing mapping.  Moreover,
> thread stacks from multiple mmap() calls may be merged into a single
> virtual memory area (VMA) with read and write permissions since there's
> no guard page VMA to separate them.  These guard pages cannot be
> distinguished by examining VMA listings but do return EIO when read
> from /proc/PID/mem.
>
> GDB's gcore code reads /proc/PID/smaps to discover memory regions and
> creates one BFD section per mapping.  (On linux, this is performed in
> linux_find_memory_regions_full in linux-tdep.c.) With the old layout,
> memory areas with guard pages appeared separately with no permissions,
> which were filtered out.  Each thread stack became its own section
> containing only readable data.  With the new layout, using
> MADV_GUARD_INSTALL instead of the older mechanism, it's often the case
> that thread stacks created with multiple calls to mmap() are exposed
> as a single mapping appearing in /proc/PID/smaps with read and write
> permissions.  Should that happen, GDB's code creates a single section
> covering all thread stacks and their guard pages.  (Even if each
> thread stack appears in its own mapping, the fact remains that there
> will be an inaccessible portion of the mapping.  When one or more
> thread stacks are coalesced into a single mapping, there will be
> several inaccessible "holes" representing the guard pages.)
>
> When gcore_copy_callback copies section contents, it reads memory in
> 1MB (MAX_COPY_BYTES) chunks.  If any page in the chunk is a guard page,
> the call to target_read_memory() fails.  The old code responded by
> breaking out of the copy loop, abandoning the entire section.  This
> prevents correct copying of thread stack data, resulting in core files
> with zero-filled thread stacks, resulting in nearly empty backtraces.
>
> Fix this by falling back to page-by-page reading when a 1MB chunk read
> fails.  Individual pages that cannot be read are filled with zeros,
> allowing the remaining readable memory to be captured.
>
> I also considered a simpler change using SPARSE_BLOCK_SIZE (4096)
> as the read size instead of MAX_COPY_BYTES (1MB).  This would avoid
> the fallback logic but would cause up to 256x more syscalls.  The
> proposed approach also allows meaningful warnings: we warn only if an
> entire region is unreadable (indicating a real problem), whereas
> per-page reads would make it harder to distinguish guard page failures
> from actual errors.  Since guard pages are at offset 0 for
> downward-growing stacks, a large target_read_memory() fails early at
> the first unreadable byte anyway.
>
> With this fix, I see 16 failures resolved in the following test cases:
>
>      gdb.ada/task_switch_in_core.exp
>      gdb.arch/i386-tls-regs.exp
>      gdb.threads/threadcrash.exp
>      gdb.threads/tls-core.exp
>
> Looking at just one of these, from gdb.log without the fix, I see:
>
>    thread apply 5 backtrace
>
>    Thread 5 (LWP 3414829):
>    #0  0x00007ffff7d1d982 in __syscall_cancel_arch () from /lib64/libc.so.6
>    #1  0x0000000000000000 in ?? ()
>    (gdb) FAIL: gdb.threads/threadcrash.exp: test_gcore: thread apply 5 backtrace
>
> And this is what it looks like with the fix in place (some paths have
> been shortened):
>
>    thread apply 5 backtrace
>
>    Thread 5 (Thread 0x7fffeffff6c0 (LWP 1282651) "threadcrash"): #0 0x00007ffff7d1d982 in __syscall_cancel_arch () from 
> /lib64/libc.so.6 #1 0x00007ffff7d11c3c in __internal_syscall_cancel () 
> from /lib64/libc.so.6 #2 0x00007ffff7d61b62 in 
> clock_nanosleep@GLIBC_2.2.5 () from /lib64/libc.so.6 #3 
> 0x00007ffff7d6db37 in nanosleep () from /lib64/libc.so.6 #4 
> 0x00007ffff7d8008e in sleep () from /lib64/libc.so.6 #5 
> 0x00000000004006a8 in do_syscall_task (location=NORMAL) at 
> threadcrash.c:158 #6 0x0000000000400885 in thread_function 
> (arg=0x404340) at threadcrash.c:277 #7 0x00007ffff7d15464 in 
> start_thread () from /lib64/libc.so.6 #8 0x00007ffff7d985ac in 
> __clone3 () from /lib64/libc.so.6 (gdb) PASS: 
> gdb.threads/threadcrash.exp: test_live_inferior: thread apply 5 
> backtrace Regression testing on Fedora 42 (glibc 2.41) shows no new 
> failures. References: [1] Linux commit 662df3e5c376 ("mm: madvise: implement lightweight
>      guard page mechanism")
>      https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=662df3e5c37666d6ed75c88098699e070a4b35b5
> [2] glibc commit a6fbe36b7f31 ("nptl: Add support for setup guard
>      pages with MADV_GUARD_INSTALL")
>      https://sourceware.org/git/?p=glibc.git;a=commit;h=a6fbe36b7f31292981422692236465ab56670ea9
>
> Claude Opus 4.5 and GLM 4.7 assisted with the development of this commit.
>
> Bug:https://sourceware.org/bugzilla/show_bug.cgi?id=33855
> ---
>   gdb/gcore.c | 46 ++++++++++++++++++++++++++++++++++++++--------
>   1 file changed, 38 insertions(+), 8 deletions(-)
>
> diff --git a/gdb/gcore.c b/gdb/gcore.c
> index 5a3ad145d4c..6b36e6064ac 100644
> --- a/gdb/gcore.c
> +++ b/gdb/gcore.c
> @@ -765,15 +765,45 @@ gcore_copy_callback (bfd *obfd, asection *osec)
>         if (size > total_size)
>   	size = total_size;
>   
> -      if (target_read_memory (bfd_section_vma (osec) + offset,
> -			      memhunk.data (), size) != 0)
> +      CORE_ADDR vma = bfd_section_vma (osec) + offset;
> +
> +      if (target_read_memory (vma, memhunk.data (), size) != 0)
>   	{
> -	  warning (_("Memory read failed for corefile "
> -		     "section, %s bytes at %s."),
> -		   plongest (size),
> -		   paddress (current_inferior ()->arch (),
> -			     bfd_section_vma (osec)));
> -	  break;
> +	  /* Large read failed.  This can happen when the memory region
> +	     contains unreadable pages (such as guard pages embedded within
> +	     a larger mapping).  Fall back to reading page by page, filling
> +	     unreadable pages with zeros.  */
> +	  gdb_byte *p = memhunk.data ();
> +	  bfd_size_type remaining = size;
> +	  CORE_ADDR addr = vma;
> +	  bool at_least_one_page_read = false;
> +
> +	  while (remaining > 0)
> +	    {
> +	      bfd_size_type chunk_size
> +		= std::min (remaining, (bfd_size_type) SPARSE_BLOCK_SIZE);
> +
> +	      if (target_read_memory (addr, p, chunk_size) != 0)
> +		{
> +		  /* Failed to read this page.  Fill with zeros.  This
> +		     handles guard pages and other unreadable regions
> +		     that may exist within a larger readable mapping.  */
> +		  memset (p, 0, chunk_size);
> +		}
> +	      else
> +		at_least_one_page_read = true;
> +
> +	      p += chunk_size;
> +	      addr += chunk_size;
> +	      remaining -= chunk_size;
> +	    }
> +	  /* Warn only if the entire region was unreadable - this
> +	     indicates a real problem, not just embedded guard pages. */
> +	  if (!at_least_one_page_read)
> +	    warning (_("Memory read failed for corefile "
> +		       "section, %s bytes at %s."),
> +		     plongest (size),
> +		     paddress (current_inferior ()->arch (), vma));
>   	}
>   
>         if (!sparse_bfd_set_section_contents (obfd, osec, memhunk.data (),