Mirror of the gdb mailing list
 help / color / mirror / Atom feed
* GDB 15/16 crashing in add_thread_silent()
@ 2025-11-14 18:56 Paul Smith via Gdb
  2025-11-14 19:12 ` Simon Marchi via Gdb
  2025-11-14 19:20 ` Paul Smith via Gdb
  0 siblings, 2 replies; 15+ messages in thread
From: Paul Smith via Gdb @ 2025-11-14 18:56 UTC (permalink / raw)
  To: gdb

Hi all;

I have a core file generated from my C++ program (running on GNU/Linux
x86_64, built with GCC 14.2).  When I try to open this core file with
either GDB 15.2 or GDB 16.3 (also built by me, on GNU/Linux x86_64 with
GCC 14.2), GDB itself will crash.

I tried opening this core with GDB 8.2 built from the Red Hat source
package, provided by Rocky Linux 8.10, and this was able to open the
core (but it doesn't have other facilities I need).


I rebuilt GDB 16.3 with -ggdb3 -O0 and here's the backtrace (I elided
some of the paths etc.):

#2  0x0000000000fd266e in handle_sigsegv (sig=11) at gdb/event-top.c:1089
#3  <signal handler called>
#4  0x0000000001079a1a in std::_Hashtable<ptid_t, std::pair<ptid_t const, thread_info*>,...>::size (this=0x28)
    at /cc/unknown/x86_64-unknown-linux-gnu/include/c++/14.2.0/bits/hashtable.h:657
#5  0x0000000001078df0 in std::_Hashtable<ptid_t, std::pair<ptid_t const, thread_info*>,...>::find (this=0x28,
    __k=...)
    at /cc/unknown/x86_64-unknown-linux-gnu/include/c++/14.2.0/bits/hashtable.h:1728
#6  0x0000000001077cba in std::unordered_map<ptid_t, thread_info*, std::hash<ptid_t>,...>::find (this=0x28, __x=...)
    at /cc/unknown/x86_64-unknown-linux-gnu/include/c++/14.2.0/bits/unordered_map.h:877
#7  0x0000000001073469 in inferior::find_thread (this=0x0, ptid=...)
    at gdb/inferior.c:251
#8  0x00000000013817cd in add_thread_silent (targ=0x3ed1ae0, ptid=...)
    at gdb/thread.c:311
#9  0x0000000000e7e877 in core_target_open (
    arg=0x7fb308098fc0 "core.9168", from_tty=1) at gdb/corelow.c:1123
#10 0x0000000000e7d210 in core_file_command (
    filename=0x7fb308098fc0 "core.9168", from_tty=1) at gdb/corelow.c:724
#11 0x0000000001113774 in catch_command_errors (
    command=0xe7d14f <core_file_command(char const*, int)>,
    arg=0x7fb308098fc0 "core.9168", from_tty=1, do_bp_actions=false) at gdb/main.c:508
#12 0x0000000001114c5b in captured_main_1 (context=0x7fff0e4eccd0)
    at gdb/main.c:1238
#13 0x00000000011152e4 in captured_main (data=0x7fff0e4eccd0)
    at gdb/main.c:1333
#14 0x0000000001115384 in gdb_main (args=0x7fff0e4eccd0) at gdb/main.c:1362
#15 0x0000000000c90cf6 in main (argc=19, argv=0x7fff0e4ecdc8) at gdb/gdb.c:38



Looks like add_thread_silent() is looking up the inferior with
find_inferior_ptid() and that doesn't find anything (returns nullptr),
which is then not checked before we try to find the current thread:

  thread_info *tp = inf->find_thread (ptid);

The latest Git HEAD code still seems to be missing this nullptr check.

Of course, the real question is why we can't find the inferior ptid.  I
have no idea about that.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: GDB 15/16 crashing in add_thread_silent()
  2025-11-14 18:56 GDB 15/16 crashing in add_thread_silent() Paul Smith via Gdb
@ 2025-11-14 19:12 ` Simon Marchi via Gdb
  2025-11-14 19:20 ` Paul Smith via Gdb
  1 sibling, 0 replies; 15+ messages in thread
From: Simon Marchi via Gdb @ 2025-11-14 19:12 UTC (permalink / raw)
  To: psmith, gdb



On 2025-11-14 13:56, Paul Smith via Gdb wrote:
> Hi all;
> 
> I have a core file generated from my C++ program (running on GNU/Linux
> x86_64, built with GCC 14.2).  When I try to open this core file with
> either GDB 15.2 or GDB 16.3 (also built by me, on GNU/Linux x86_64 with
> GCC 14.2), GDB itself will crash.
> 
> I tried opening this core with GDB 8.2 built from the Red Hat source
> package, provided by Rocky Linux 8.10, and this was able to open the
> core (but it doesn't have other facilities I need).
> 
> 
> I rebuilt GDB 16.3 with -ggdb3 -O0 and here's the backtrace (I elided
> some of the paths etc.):
> 
> #2  0x0000000000fd266e in handle_sigsegv (sig=11) at gdb/event-top.c:1089
> #3  <signal handler called>
> #4  0x0000000001079a1a in std::_Hashtable<ptid_t, std::pair<ptid_t const, thread_info*>,...>::size (this=0x28)
>     at /cc/unknown/x86_64-unknown-linux-gnu/include/c++/14.2.0/bits/hashtable.h:657
> #5  0x0000000001078df0 in std::_Hashtable<ptid_t, std::pair<ptid_t const, thread_info*>,...>::find (this=0x28,
>     __k=...)
>     at /cc/unknown/x86_64-unknown-linux-gnu/include/c++/14.2.0/bits/hashtable.h:1728
> #6  0x0000000001077cba in std::unordered_map<ptid_t, thread_info*, std::hash<ptid_t>,...>::find (this=0x28, __x=...)
>     at /cc/unknown/x86_64-unknown-linux-gnu/include/c++/14.2.0/bits/unordered_map.h:877
> #7  0x0000000001073469 in inferior::find_thread (this=0x0, ptid=...)
>     at gdb/inferior.c:251
> #8  0x00000000013817cd in add_thread_silent (targ=0x3ed1ae0, ptid=...)
>     at gdb/thread.c:311
> #9  0x0000000000e7e877 in core_target_open (
>     arg=0x7fb308098fc0 "core.9168", from_tty=1) at gdb/corelow.c:1123
> #10 0x0000000000e7d210 in core_file_command (
>     filename=0x7fb308098fc0 "core.9168", from_tty=1) at gdb/corelow.c:724
> #11 0x0000000001113774 in catch_command_errors (
>     command=0xe7d14f <core_file_command(char const*, int)>,
>     arg=0x7fb308098fc0 "core.9168", from_tty=1, do_bp_actions=false) at gdb/main.c:508
> #12 0x0000000001114c5b in captured_main_1 (context=0x7fff0e4eccd0)
>     at gdb/main.c:1238
> #13 0x00000000011152e4 in captured_main (data=0x7fff0e4eccd0)
>     at gdb/main.c:1333
> #14 0x0000000001115384 in gdb_main (args=0x7fff0e4eccd0) at gdb/main.c:1362
> #15 0x0000000000c90cf6 in main (argc=19, argv=0x7fff0e4ecdc8) at gdb/gdb.c:38
> 
> 
> 
> Looks like add_thread_silent() is looking up the inferior with
> find_inferior_ptid() and that doesn't find anything (returns nullptr),
> which is then not checked before we try to find the current thread:
> 
>   thread_info *tp = inf->find_thread (ptid);
> 
> The latest Git HEAD code still seems to be missing this nullptr check.
> 
> Of course, the real question is why we can't find the inferior ptid.  I
> have no idea about that.

The normal path is:

 - the call to inferior_appeared, in core_target_open, sets the pid of
   the current inferior

 - normally, the subsequent calls to add_to_thread_list add threads with the correct ptid::pid:

     ptid_t ptid (inf->pid, lwpid);
     thread_info *thr = add_thread (inf->process_target (), ptid);

 - however, you hit the add_thread_silent call at line corelow.c:1123:

		  if (inferior_ptid == null_ptid)
		    {
		      /* Either we found no .reg/NN section, and hence we have a
			 non-threaded core (single-threaded, from gdb's perspective),
			 or for some reason add_to_thread_list couldn't determine
			 which was the "main" thread.  The latter case shouldn't
			 usually happen, but we're dealing with input here, which can
			 always be broken in different ways.  */
		      thread_info *thread = first_thread_of_inferior (inf);

		      if (thread == NULL)
	here >>>	thread = add_thread_silent (target, ptid_t (CORELOW_PID));

		      switch_to_thread (thread);
		    }

   Where CORELOW_PID is:

   /* An arbitrary identifier for the core inferior.  */
   #define CORELOW_PID 1

   This tries to add a thread with ptid(1, 0, 0).  And there is not
   inferior with pid 1, which is probably the reason for what you see.

That line appears to be a fallback to at least have a thread, if we
weren't able to create threads from the notes section in the core.  But
I guess it's not exercised at all by the testsuite.  And it's not clear
what we should do in that situation.  That is the most I can tell you
without being able to look at the actual core.

Simon

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: GDB 15/16 crashing in add_thread_silent()
  2025-11-14 18:56 GDB 15/16 crashing in add_thread_silent() Paul Smith via Gdb
  2025-11-14 19:12 ` Simon Marchi via Gdb
@ 2025-11-14 19:20 ` Paul Smith via Gdb
  2025-11-14 19:25   ` Simon Marchi via Gdb
  1 sibling, 1 reply; 15+ messages in thread
From: Paul Smith via Gdb @ 2025-11-14 19:20 UTC (permalink / raw)
  To: gdb

I investigated more and the problem is that GDB cannot determine the
PID of the core file.  I checked the inferiors list and it looks
correct, but the ptid value passed into add_thread_silent() has a bad
PID:

(gdb) fr 8
#8  0x00000000013817cd in add_thread_silent (targ=0x3ed1ae0, ptid=...)
    at gdb/thread.c:311
warning: 311    gdb/thread.c: No such file or directory
(gdb) p ptid
$6 = {
  m_pid = 1,
  m_lwp = 0,
  m_tid = 0
}

This is apparently because we entered this code in corelow.c:

   if (inferior_ptid == null_ptid)
     {
       /* Either we found no .reg/NN section, and hence we have a
          non-threaded core (single-threaded, from gdb's perspective),
          or for some reason add_to_thread_list couldn't determine
          which was the "main" thread.  The latter case shouldn't
          usually happen, but we're dealing with input here, which can
          always be broken in different ways.  */
       thread_info *thread = first_thread_of_inferior (inf);

       if (thread == NULL)
         thread = add_thread_silent (target, ptid_t (CORELOW_PID));

       switch_to_thread (thread);
     }

It appears that add_thread_silent() doesn't work properly with
CORELOW_PID (1).

Just to note, my program is decidedly NOT single-threaded; there are
20+ active threads in it.

I see that this (still in corelow.c:core_target_open()) returns the
correct PID:

   int pid = bfd_core_file_pid (current_program_space->core_bfd ());

so I think it's a bug that when we invoke add_thread_silent() above we
use CORELOW_PID instead of just pid.


On Fri, 2025-11-14 at 13:56 -0500, Paul Smith via Gdb wrote:
> Hi all;
> 
> I have a core file generated from my C++ program (running on
> GNU/Linux
> x86_64, built with GCC 14.2).  When I try to open this core file with
> either GDB 15.2 or GDB 16.3 (also built by me, on GNU/Linux x86_64
> with
> GCC 14.2), GDB itself will crash.
> 
> I tried opening this core with GDB 8.2 built from the Red Hat source
> package, provided by Rocky Linux 8.10, and this was able to open the
> core (but it doesn't have other facilities I need).
> 
> 
> I rebuilt GDB 16.3 with -ggdb3 -O0 and here's the backtrace (I elided
> some of the paths etc.):
> 
> #2  0x0000000000fd266e in handle_sigsegv (sig=11) at gdb/event-
> top.c:1089
> #3  <signal handler called>
> #4  0x0000000001079a1a in std::_Hashtable<ptid_t, std::pair<ptid_t
> const, thread_info*>,...>::size (this=0x28)
>     at /cc/unknown/x86_64-unknown-linux-
> gnu/include/c++/14.2.0/bits/hashtable.h:657
> #5  0x0000000001078df0 in std::_Hashtable<ptid_t, std::pair<ptid_t
> const, thread_info*>,...>::find (this=0x28,
>     __k=...)
>     at /cc/unknown/x86_64-unknown-linux-
> gnu/include/c++/14.2.0/bits/hashtable.h:1728
> #6  0x0000000001077cba in std::unordered_map<ptid_t, thread_info*,
> std::hash<ptid_t>,...>::find (this=0x28, __x=...)
>     at /cc/unknown/x86_64-unknown-linux-
> gnu/include/c++/14.2.0/bits/unordered_map.h:877
> #7  0x0000000001073469 in inferior::find_thread (this=0x0, ptid=...)
>     at gdb/inferior.c:251
> #8  0x00000000013817cd in add_thread_silent (targ=0x3ed1ae0,
> ptid=...)
>     at gdb/thread.c:311
> #9  0x0000000000e7e877 in core_target_open (
>     arg=0x7fb308098fc0 "core.9168", from_tty=1) at gdb/corelow.c:1123
> #10 0x0000000000e7d210 in core_file_command (
>     filename=0x7fb308098fc0 "core.9168", from_tty=1) at
> gdb/corelow.c:724
> #11 0x0000000001113774 in catch_command_errors (
>     command=0xe7d14f <core_file_command(char const*, int)>,
>     arg=0x7fb308098fc0 "core.9168", from_tty=1, do_bp_actions=false)
> at gdb/main.c:508
> #12 0x0000000001114c5b in captured_main_1 (context=0x7fff0e4eccd0)
>     at gdb/main.c:1238
> #13 0x00000000011152e4 in captured_main (data=0x7fff0e4eccd0)
>     at gdb/main.c:1333
> #14 0x0000000001115384 in gdb_main (args=0x7fff0e4eccd0) at
> gdb/main.c:1362
> #15 0x0000000000c90cf6 in main (argc=19, argv=0x7fff0e4ecdc8) at
> gdb/gdb.c:38
> 
> 
> 
> Looks like add_thread_silent() is looking up the inferior with
> find_inferior_ptid() and that doesn't find anything (returns
> nullptr),
> which is then not checked before we try to find the current thread:
> 
>   thread_info *tp = inf->find_thread (ptid);
> 
> The latest Git HEAD code still seems to be missing this nullptr
> check.
> 
> Of course, the real question is why we can't find the inferior ptid. 
> I
> have no idea about that.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: GDB 15/16 crashing in add_thread_silent()
  2025-11-14 19:20 ` Paul Smith via Gdb
@ 2025-11-14 19:25   ` Simon Marchi via Gdb
  2025-11-14 19:38     ` Paul Smith via Gdb
  0 siblings, 1 reply; 15+ messages in thread
From: Simon Marchi via Gdb @ 2025-11-14 19:25 UTC (permalink / raw)
  To: psmith, gdb



On 2025-11-14 14:20, Paul Smith via Gdb wrote:
> I investigated more and the problem is that GDB cannot determine the
> PID of the core file.  I checked the inferiors list and it looks
> correct, but the ptid value passed into add_thread_silent() has a bad
> PID:
> 
> (gdb) fr 8
> #8  0x00000000013817cd in add_thread_silent (targ=0x3ed1ae0, ptid=...)
>     at gdb/thread.c:311
> warning: 311    gdb/thread.c: No such file or directory
> (gdb) p ptid
> $6 = {
>   m_pid = 1,
>   m_lwp = 0,
>   m_tid = 0
> }
> 
> This is apparently because we entered this code in corelow.c:
> 
>    if (inferior_ptid == null_ptid)
>      {
>        /* Either we found no .reg/NN section, and hence we have a
>           non-threaded core (single-threaded, from gdb's perspective),
>           or for some reason add_to_thread_list couldn't determine
>           which was the "main" thread.  The latter case shouldn't
>           usually happen, but we're dealing with input here, which can
>           always be broken in different ways.  */
>        thread_info *thread = first_thread_of_inferior (inf);
> 
>        if (thread == NULL)
>          thread = add_thread_silent (target, ptid_t (CORELOW_PID));
> 
>        switch_to_thread (thread);
>      }
> 
> It appears that add_thread_silent() doesn't work properly with
> CORELOW_PID (1).
> 
> Just to note, my program is decidedly NOT single-threaded; there are
> 20+ active threads in it.
> 
> I see that this (still in corelow.c:core_target_open()) returns the
> correct PID:
> 
>    int pid = bfd_core_file_pid (current_program_space->core_bfd ());
> 
> so I think it's a bug that when we invoke add_thread_silent() above we
> use CORELOW_PID instead of just pid.

If your core is threaded, you shouldn't get to that fallback "if" at
all.  This is where GDB should add all your threads:

  /* Build up thread list from BFD sections, and possibly set the
     current thread to the .reg/NN section matching the .reg
     section.  */
  asection *reg_sect
    = bfd_get_section_by_name (current_program_space->core_bfd (), ".reg");
  for (asection *sect : gdb_bfd_sections (current_program_space->core_bfd ()))
    add_to_thread_list (sect, reg_sect, inf);

If this doesn't add any threads, then you need to dig to understand why
BFD doesn't create the .reg pseudo sections.

Simon

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: GDB 15/16 crashing in add_thread_silent()
  2025-11-14 19:25   ` Simon Marchi via Gdb
@ 2025-11-14 19:38     ` Paul Smith via Gdb
  2025-11-14 20:03       ` Simon Marchi via Gdb
  2025-11-21 11:59       ` Tom de Vries via Gdb
  0 siblings, 2 replies; 15+ messages in thread
From: Paul Smith via Gdb @ 2025-11-14 19:38 UTC (permalink / raw)
  To: Simon Marchi, gdb

On Fri, 2025-11-14 at 14:25 -0500, Simon Marchi wrote:
> If your core is threaded, you shouldn't get to that fallback "if" at
> all.  This is where GDB should add all your threads:
> 
>   /* Build up thread list from BFD sections, and possibly set the
>      current thread to the .reg/NN section matching the .reg
>      section.  */
>   asection *reg_sect
>     = bfd_get_section_by_name (current_program_space->core_bfd (), ".reg");
>   for (asection *sect : gdb_bfd_sections (current_program_space->core_bfd ()))
>     add_to_thread_list (sect, reg_sect, inf);
> 
> If this doesn't add any threads, then you need to dig to understand
> why BFD doesn't create the .reg pseudo sections.

I applied this patch, which I think is correct:

--- a/gdb/corelow.c     2025-04-20 13:22:05.000000000 -0400
+++ b/gdb/corelow.c     2025-11-14 14:17:57.220145722 -0500
@@ -1120,7 +1120,7 @@
       thread_info *thread = first_thread_of_inferior (inf);

       if (thread == NULL)
-       thread = add_thread_silent (target, ptid_t (CORELOW_PID));
+       thread = add_thread_silent (target, ptid_t (pid));

       switch_to_thread (thread);
     }

Earlier if the PID couldn't be found then pid is set to CORELOW_PID
anyway, so this works and prevents the crash (although, I think GDB
should check for the nullptr return and do _something_ non-crashy...
maybe?)

After preventing the crash I get these errors which seem to align with
your diagnosis:

  warning: Couldn't find general-purpose registers in core file.

  warning: Unexpected size of section `.reg2' in core file.
  Cannot access memory at address 0x84a21264
  Cannot access memory at address 0x84a21260
  Cannot access memory at address 0x84a21260
  Core was generated by `myprogram'.

  warning: Couldn't find general-purpose registers in core file.

  warning: Unexpected size of section `.reg2' in core file.

and the core is unusable:

  (gdb) bt
  #0  <unavailable> in ?? ()
  Backtrace stopped: not enough registers or memory available to unwind further

  (gdb) thr a a bt

  Thread 1 (process 9168):
  #0  <unavailable> in ?? ()
  Backtrace stopped: not enough registers or memory available to unwind further

However, if I use the native GDB 8.2 that comes with Rocky Linux, then
it will open the core file without these errors, and even show me the
backtrace for all threads.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: GDB 15/16 crashing in add_thread_silent()
  2025-11-14 19:38     ` Paul Smith via Gdb
@ 2025-11-14 20:03       ` Simon Marchi via Gdb
  2025-11-14 20:13         ` Tom Tromey
  2025-11-21 11:59       ` Tom de Vries via Gdb
  1 sibling, 1 reply; 15+ messages in thread
From: Simon Marchi via Gdb @ 2025-11-14 20:03 UTC (permalink / raw)
  To: psmith, gdb



On 2025-11-14 14:38, Paul Smith wrote:
> On Fri, 2025-11-14 at 14:25 -0500, Simon Marchi wrote:
>> If your core is threaded, you shouldn't get to that fallback "if" at
>> all.  This is where GDB should add all your threads:
>>
>>   /* Build up thread list from BFD sections, and possibly set the
>>      current thread to the .reg/NN section matching the .reg
>>      section.  */
>>   asection *reg_sect
>>     = bfd_get_section_by_name (current_program_space->core_bfd (), ".reg");
>>   for (asection *sect : gdb_bfd_sections (current_program_space->core_bfd ()))
>>     add_to_thread_list (sect, reg_sect, inf);
>>
>> If this doesn't add any threads, then you need to dig to understand
>> why BFD doesn't create the .reg pseudo sections.
> 
> I applied this patch, which I think is correct:
> 
> --- a/gdb/corelow.c     2025-04-20 13:22:05.000000000 -0400
> +++ b/gdb/corelow.c     2025-11-14 14:17:57.220145722 -0500
> @@ -1120,7 +1120,7 @@
>        thread_info *thread = first_thread_of_inferior (inf);
> 
>        if (thread == NULL)
> -       thread = add_thread_silent (target, ptid_t (CORELOW_PID));
> +       thread = add_thread_silent (target, ptid_t (pid));
> 
>        switch_to_thread (thread);
>      }
> 
> Earlier if the PID couldn't be found then pid is set to CORELOW_PID
> anyway, so this works and prevents the crash (although, I think GDB
> should check for the nullptr return and do _something_ non-crashy...
> maybe?)

I agree that part seems clearly buggy, but I don't think you should
focus on that.  The problem has already happened earlier in the
execution.

Simon

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: GDB 15/16 crashing in add_thread_silent()
  2025-11-14 20:03       ` Simon Marchi via Gdb
@ 2025-11-14 20:13         ` Tom Tromey
  2025-11-14 20:29           ` Paul Smith via Gdb
  0 siblings, 1 reply; 15+ messages in thread
From: Tom Tromey @ 2025-11-14 20:13 UTC (permalink / raw)
  To: Simon Marchi via Gdb; +Cc: psmith, Simon Marchi

>>>>> "Simon" == Simon Marchi via Gdb <gdb@sourceware.org> writes:

>> Earlier if the PID couldn't be found then pid is set to CORELOW_PID
>> anyway, so this works and prevents the crash (although, I think GDB
>> should check for the nullptr return and do _something_ non-crashy...
>> maybe?)

Simon> I agree that part seems clearly buggy, but I don't think you should
Simon> focus on that.  The problem has already happened earlier in the
Simon> execution.

See this thread as well

https://inbox.sourceware.org/gdb-patches/20251024124208.1875651-1-tdevries@suse.de/

I wonder if the fix (assuming it arrives) should be in GDB 17.

Tom

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: GDB 15/16 crashing in add_thread_silent()
  2025-11-14 20:13         ` Tom Tromey
@ 2025-11-14 20:29           ` Paul Smith via Gdb
  2025-11-14 20:42             ` Paul Smith via Gdb
  0 siblings, 1 reply; 15+ messages in thread
From: Paul Smith via Gdb @ 2025-11-14 20:29 UTC (permalink / raw)
  To: gdb

On Fri, 2025-11-14 at 13:13 -0700, Tom Tromey wrote:
> > > > > > "Simon" == Simon Marchi via Gdb <gdb@sourceware.org>
> > > > > > writes:
> 
> > > Earlier if the PID couldn't be found then pid is set to
> > > CORELOW_PID anyway, so this works and prevents the crash
> > > (although, I think GDB should check for the nullptr return and do
> > > _something_ non-crashy... maybe?)
> 
> See this thread as well
> 
> https://inbox.sourceware.org/gdb-patches/20251024124208.1875651-1-
> tdevries@suse.de/
> 
> I wonder if the fix (assuming it arrives) should be in GDB 17.

Tom suggested exactly the same fix I did (although, in addition,
thread.c:add_thread_silent():

   inferior *inf = find_inferior_ptid (targ, ptid);

should be handling nullptr return somehow, I should think).


Anyway, Tom de Vries writes this in summary:

> I suppose this is fixable, but I'm not sure it's worth the effort for
> a core file that has such limited usability.

I don't know about his core, but my coredump definitely doesn't have
limited usability, because when I use GDB 8.2 from Rocky Linux the core
is fully usable in all respects: I can see all the threads and interact
with them in all ways.

The downside for me is that I have a lot of Python macros that I need
to work with my core files, and they don't work with the very old GDB
8.2, for various reasons.

What we need to understand is why this much older GDB can work
correctly with the core file, but newer GDB can't find basic things.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: GDB 15/16 crashing in add_thread_silent()
  2025-11-14 20:29           ` Paul Smith via Gdb
@ 2025-11-14 20:42             ` Paul Smith via Gdb
  2025-11-18 18:33               ` Tom Tromey
  0 siblings, 1 reply; 15+ messages in thread
From: Paul Smith via Gdb @ 2025-11-14 20:42 UTC (permalink / raw)
  To: gdb

On Fri, 2025-11-14 at 15:29 -0500, Paul Smith via Gdb wrote:
> Anyway, Tom de Vries writes this in summary:
> 
> > I suppose this is fixable, but I'm not sure it's worth the effort
> > for
> > a core file that has such limited usability.
> 
> I don't know about his core, but my coredump definitely doesn't have
> limited usability, because when I use GDB 8.2 from Rocky Linux the
> core is fully usable in all respects: I can see all the threads and
> interact with them in all ways.

Following Tom's debugging steps (I really don't know much about what
sections should exist in a core file or what the format should be) I
see that there is a similar problem using objdump.

If I use the native objdump from Rocky Linux, version 2.30-125.el8_10,
I get these types of results for .reg2 (note, there are no .reg
sections in this core file):

  0 note0         00009bf4  0000000000000000  0000000000000000  00001bd0  2**0
                  CONTENTS, READONLY
  1 .auxv         00000150  0000000000000000  0000000000000000  00002074  2**3
                  CONTENTS
  2 .reg2/9168    00000210  0000000000000000  0000000000000000  00002374  2**2
                  CONTENTS
  3 .reg2         00000210  0000000000000000  0000000000000000  00002374  2**2
                  CONTENTS
  4 .reg2/9168    00000210  0000000000000000  0000000000000000  00002734  2**2
                  CONTENTS
  5 .reg2/9168    00000210  0000000000000000  0000000000000000  00002af4  2**2
                  CONTENTS
  6 .reg2/9168    00000210  0000000000000000  0000000000000000  00002eb4  2**2
                  CONTENTS
  7 .reg2/9168    00000210  0000000000000000  0000000000000000  00003274  2**2
                  CONTENTS
  ...
 40 .reg2/9168    00000210  0000000000000000  0000000000000000  0000ae34  2**2
                  CONTENTS
 41 .reg2/9168    00000210  0000000000000000  0000000000000000  0000b1f4  2**2
                  CONTENTS
 42 .reg2/9168    00000210  0000000000000000  0000000000000000  0000b5b4  2**2
                  CONTENTS
 43 load1         00000000  0000000000400000  0000000000000000  00010000  2**16
                  ALLOC, READONLY, CODE
  ...

But, if I use the latest objdump 2.43.1, I get very different results:

  0 note0         00009bf4  0000000000000000  0000000000000000  00001bd0  2**0
                  CONTENTS, READONLY
  1 .auxv         00000150  0000000000000000  0000000000000000  00002074  2**3
                  CONTENTS
  2 .reg2/0       00000210  0000000000000000  0000000000000000  00002374  2**2
                  CONTENTS
  3 .reg2         00000210  0000000000000000  0000000000000000  00002374  2**2
                  CONTENTS
  4 .reg2/0       00000210  0000000000000000  0000000000000000  00002734  2**2
                  CONTENTS
  5 .reg2/0       00000210  0000000000000000  0000000000000000  00002af4  2**2
                  CONTENTS
  6 .reg2/0       00000210  0000000000000000  0000000000000000  00002eb4  2**2
                  CONTENTS
  7 .reg2/0       00000210  0000000000000000  0000000000000000  00003274  2**2
                  CONTENTS
  ...
 40 .reg2/0       00000210  0000000000000000  0000000000000000  0000ae34  2**2
                  CONTENTS
 41 .reg2/0       00000210  0000000000000000  0000000000000000  0000b1f4  2**2
                  CONTENTS
 42 .reg2/0       00000210  0000000000000000  0000000000000000  0000b5b4  2**2
                  CONTENTS
 43 load1         01ae0000  0000000000400000  0000000000000000  00010000  2**16
                  ALLOC, READONLY, CODE
  ...

So, there appears to be some difference in binutils where it can't
parse this core file properly...?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: GDB 15/16 crashing in add_thread_silent()
  2025-11-14 20:42             ` Paul Smith via Gdb
@ 2025-11-18 18:33               ` Tom Tromey
  2025-11-18 19:30                 ` Paul Smith via Gdb
  0 siblings, 1 reply; 15+ messages in thread
From: Tom Tromey @ 2025-11-18 18:33 UTC (permalink / raw)
  To: Paul Smith via Gdb; +Cc: psmith

>>>>> "Paul" == Paul Smith via Gdb <gdb@sourceware.org> writes:

[...]
Paul> So, there appears to be some difference in binutils where it can't
Paul> parse this core file properly...?

Yes, interesting find.
Unfortunately the next step is probably to bisect binutils to find the
offending patch.  Or I guess debug and try to find what's going wrong.

Tom

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: GDB 15/16 crashing in add_thread_silent()
  2025-11-18 18:33               ` Tom Tromey
@ 2025-11-18 19:30                 ` Paul Smith via Gdb
  2025-11-18 20:24                   ` Simon Marchi via Gdb
  0 siblings, 1 reply; 15+ messages in thread
From: Paul Smith via Gdb @ 2025-11-18 19:30 UTC (permalink / raw)
  To: Tom Tromey, gdb

On Tue, 2025-11-18 at 11:33 -0700, Tom Tromey wrote:
> > > > > > "Paul" == Paul Smith via Gdb <gdb@sourceware.org> writes:
> 
> [...]
> Paul> So, there appears to be some difference in binutils where it can't
> Paul> parse this core file properly...?
> 
> Yes, interesting find.
> Unfortunately the next step is probably to bisect binutils to find
> the offending patch.  Or I guess debug and try to find what's going
> wrong.

Yeah.  Unfortunately I will need to gird myself for that and I have a
lot of other things happening.  Bisecting will be a big job since we're
jumping from binutils 2.30 (known to work) to binutils 2.43 (known to
not work).

It's probably worthwhile making some effort at debugging, to try to
narrow down the locations in the code to consider during bisecting.  It
looks like somehow we cannot determine the PID / TID properly in this
core file in newer binutils, while the older binutils can still dig it
out.  Unfortunately I've never looked at any of this code before but
hopefully it will be tractable.

Maybe this weekend I can spend a bit of time on this.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: GDB 15/16 crashing in add_thread_silent()
  2025-11-18 19:30                 ` Paul Smith via Gdb
@ 2025-11-18 20:24                   ` Simon Marchi via Gdb
  2025-11-24 16:36                     ` Paul Smith via Gdb
  0 siblings, 1 reply; 15+ messages in thread
From: Simon Marchi via Gdb @ 2025-11-18 20:24 UTC (permalink / raw)
  To: psmith, Tom Tromey, gdb

On 11/18/25 2:30 PM, Paul Smith via Gdb wrote:
> On Tue, 2025-11-18 at 11:33 -0700, Tom Tromey wrote:
>>>>>>> "Paul" == Paul Smith via Gdb <gdb@sourceware.org> writes:
>>
>> [...]
>> Paul> So, there appears to be some difference in binutils where it can't
>> Paul> parse this core file properly...?
>>
>> Yes, interesting find.
>> Unfortunately the next step is probably to bisect binutils to find
>> the offending patch.  Or I guess debug and try to find what's going
>> wrong.
> 
> Yeah.  Unfortunately I will need to gird myself for that and I have a
> lot of other things happening.  Bisecting will be a big job since we're
> jumping from binutils 2.30 (known to work) to binutils 2.43 (known to
> not work).
> 
> It's probably worthwhile making some effort at debugging, to try to
> narrow down the locations in the code to consider during bisecting.  It
> looks like somehow we cannot determine the PID / TID properly in this
> core file in newer binutils, while the older binutils can still dig it
> out.  Unfortunately I've never looked at any of this code before but
> hopefully it will be tractable.
> 
> Maybe this weekend I can spend a bit of time on this.

The beauty of bisecting is that the number of steps grows logarthmically
with the number of commits, so it's not that long :).  If you are
allowed to share the core file, I wouldn't mind giving it a try.
Bisecting stuff is one of my favorite pastimes.

Simo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: GDB 15/16 crashing in add_thread_silent()
  2025-11-14 19:38     ` Paul Smith via Gdb
  2025-11-14 20:03       ` Simon Marchi via Gdb
@ 2025-11-21 11:59       ` Tom de Vries via Gdb
  2025-11-24 15:19         ` Paul Smith via Gdb
  1 sibling, 1 reply; 15+ messages in thread
From: Tom de Vries via Gdb @ 2025-11-21 11:59 UTC (permalink / raw)
  To: psmith, Simon Marchi, gdb

On 11/14/25 8:38 PM, Paul Smith via Gdb wrote:
> However, if I use the native GDB 8.2 that comes with Rocky Linux, then
> it will open the core file without these errors, and even show me the
> backtrace for all threads.

An interesting thing to know here would be the size of the PRSTATUS note:
...
$ eu-readelf -n core | grep -i prstatus
   CORE                 336  PRSTATUS
$ readelf -n core | grep -i prstatus
   CORE                 0x00000150	NT_PRSTATUS (prstatus structure)
...

AFAIU, the note should be grokked by elf_x86_64_grok_prstatus, which 
bails out unless the size is either 296 (x32 abi) or 336.

So, if your core has a PRSTATUS note that has not one of those sizes, 
then it's possible that the system gdb 8.2 you mention works because the 
source package has a patch that makes elf_x86_64_grok_prstatus work for 
that particular number, and then hopefully that patch has more information.

But upstream gdb 8.2 only accepts the same numbers as upstream gdb 
trunk: 296 or 336.

[ My current hypothesis about the related arm problem I'm looking at, is 
that the core was generated by a kernel with a bug that meant that a 
prstatus was generated with the wrong size, which was fixed by commit 
16aead81018c ("take fdpic-related parts of elf_prstatus out").  AFAIU 
that problem didn't trigger on x86_64, but I thought I mention it. ]

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: GDB 15/16 crashing in add_thread_silent()
  2025-11-21 11:59       ` Tom de Vries via Gdb
@ 2025-11-24 15:19         ` Paul Smith via Gdb
  0 siblings, 0 replies; 15+ messages in thread
From: Paul Smith via Gdb @ 2025-11-24 15:19 UTC (permalink / raw)
  To: gdb

On Fri, 2025-11-21 at 12:59 +0100, Tom de Vries wrote:
> On 11/14/25 8:38 PM, Paul Smith via Gdb wrote:
> > However, if I use the native GDB 8.2 that comes with Rocky Linux,
> > then
> > it will open the core file without these errors, and even show me
> > the
> > backtrace for all threads.
> 
> An interesting thing to know here would be the size of the PRSTATUS
> note:
> ...
> $ eu-readelf -n core | grep -i prstatus
>    CORE                 336  PRSTATUS
> $ readelf -n core | grep -i prstatus
>    CORE                 0x00000150 NT_PRSTATUS (prstatus structure)
> ...
> 
> AFAIU, the note should be grokked by elf_x86_64_grok_prstatus, which 
> bails out unless the size is either 296 (x32 abi) or 336.

I did have some time to look at this over the weekend, but not as much
as I'd like so I don't have too much to share.  I did find a few
interesting things.

First, thank you Tom for your message that was extremely enlightening.
I discovered that yes indeed for this core the PRSTATUS note size is
not as expected:

  $ readelf -n core | grep -i prstatus | sort -u
    CORE                 0x00000188       NT_PRSTATUS (prstatus structure)

If I look at some other cores (generated from different systems) that
work with unpatched GDB I see that the size for those is as expected
(0x150).  So, that's suspicious at least.

Based on your email I decided to first build a vanilla GDB 8.2 and try
that.  It failed in the same way as the latest GDB (minus the bug
regarding the incorrect PID value that causes GDB to crash):

  warning: Couldn't find general-purpose registers in core file.

  warning: Unexpected size of section `.reg2' in core file.
  Cannot access memory at address 0x8ab31264
  Cannot access memory at address 0x8ab31260
  Core was generated by `myprog'.

  warning: Couldn't find general-purpose registers in core file.

  warning: Unexpected size of section `.reg2' in core file.
  #0  <unavailable> in ?? ()
  (gdb) bt
  #0  <unavailable> in ?? ()
  Backtrace stopped: not enough registers or memory available to unwind further

So, this implies there's some fix added by Red Hat that is needed, for
at least some kernels, but has not been upstreamed for some reason.

I then tried to apply the patches by hand (there are about 200 patches)
but this didn't work well.  So I used the Red Hat rpmbuild tool to
generate a GDB 8.2 fully patched and built the same way as the Red Hat
/bin/gdb.

The GDB binary created by rpmbuild worked just as well as /bin/gdb, so
that's good!

I did grep the 200 patches and didn't find any patches obviously
related to prstatus or grok: there are some "grok" patches
adding/updating support for PPC and ARM but I didn't see anything for
Intel/AMD (but could well have missed it).

Unfortunately I ran out of time here.

My next steps are to investigate the configure options used when
compiling the RedHat /bin/gdb, and compare the source code as one big
diff rather than 200 small diffs.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: GDB 15/16 crashing in add_thread_silent()
  2025-11-18 20:24                   ` Simon Marchi via Gdb
@ 2025-11-24 16:36                     ` Paul Smith via Gdb
  0 siblings, 0 replies; 15+ messages in thread
From: Paul Smith via Gdb @ 2025-11-24 16:36 UTC (permalink / raw)
  To: Simon Marchi, gdb

On Tue, 2025-11-18 at 15:24 -0500, Simon Marchi wrote:
> The beauty of bisecting is that the number of steps grows
> logarthmically with the number of commits, so it's not that long :). 
> If you are allowed to share the core file, I wouldn't mind giving it
> a try.  Bisecting stuff is one of my favorite pastimes.

:)

Thanks for the offer.  The binary is publicly available for download
(although it's not FOSS), but unfortunately the core is from a customer
so I can't share it.  If it was from our QA or similar I'd let you take
a crack at it.

Let's see how far I make it.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-11-24 16:37 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-14 18:56 GDB 15/16 crashing in add_thread_silent() Paul Smith via Gdb
2025-11-14 19:12 ` Simon Marchi via Gdb
2025-11-14 19:20 ` Paul Smith via Gdb
2025-11-14 19:25   ` Simon Marchi via Gdb
2025-11-14 19:38     ` Paul Smith via Gdb
2025-11-14 20:03       ` Simon Marchi via Gdb
2025-11-14 20:13         ` Tom Tromey
2025-11-14 20:29           ` Paul Smith via Gdb
2025-11-14 20:42             ` Paul Smith via Gdb
2025-11-18 18:33               ` Tom Tromey
2025-11-18 19:30                 ` Paul Smith via Gdb
2025-11-18 20:24                   ` Simon Marchi via Gdb
2025-11-24 16:36                     ` Paul Smith via Gdb
2025-11-21 11:59       ` Tom de Vries via Gdb
2025-11-24 15:19         ` Paul Smith via Gdb

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox