Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed
* [RFA] Fix crash on Linux 2.4 when threaded program exits
@ 2009-04-01 18:22 Joel Brobecker
  2009-04-01 18:42 ` Pedro Alves
  0 siblings, 1 reply; 3+ messages in thread
From: Joel Brobecker @ 2009-04-01 18:22 UTC (permalink / raw)
  To: gdb-patches

[-- Attachment #1: Type: text/plain, Size: 2307 bytes --]

The debugger crashes when debugging a threaded program when the program
exits:

    (gdb) run
    Starting program: /[...]/q 
    [Thread debugging using libthread_db enabled]
    [New Thread 0xb748ebb0 (LWP 9340)]
    [New Thread 0xb728abb0 (LWP 9341)]
    Test2
    Test1
    [Thread 0xb748ebb0 (LWP 9340) exited]
    [Thread 0xb728abb0 (LWP 9341) exited]
    [Thread 0xb75d9b80 (LWP 9337) exited]
    Recursive internal problem.
    zsh: 9330 abort      gdb-head q

It appears that this is only specific to Linux kernels 2.4, and the way
the NPTL behaves on that version of the kernel: With 2.4, we only receive
an "exited" notification for the main thread, whereas with 2.6, we receive
the notification for each and every thread.

What happens in the 2.4 case is that we delete the lp structure for
the thread that exited and then still try to use it shortly after.
At this point, the memory has been free'ed and the contents has been
corrupted. As a result, we hit an internal error that hits another
internal error that causes the abort.

The code in linux-nat.c:linux_nat_filter_event looks like this:

  if ((WIFEXITED (status) || WIFSIGNALED (status)) && num_lwps > 1)
    {
      [delete threads that have vanished]

      exit_lwp (lp);

      /* If there is at least one more LWP, then the exit signal was
         not the end of the debugged application and should be
         ignored.  */
      if (num_lwps > 0)
        return NULL;
    }

As you can see, in the linux-2.4 case, we end up deleting all threads,
then call exit_lwp to delete the main thread. Next we check num_lwps
which is zero, so we continue. Shortly after that, in the same routine,
we already access lp (around line 2717, "lp->ignore_sigint"), but the
symptoms actually appear slightly later when accessing the lp ptid
in order to set the inferior_ptid which is used to get the associated
inferior.

The fix was to delete the lp and return NULL iff there are other
lwps that still exist.

2009-04-01  Joel Brobecker  <brobecker@adacore.com>

        * linux-nat.c (linux_nat_filter_events): Do not delete the lwp if
        this is the last one.

Tested on x86-linux (with a 2.4.21 Linux kernel). It fixes ~25 failures.
Tested on x86_64-linux (with a 2.6 kernel). No regression.

Does this look correct?

Thanks,
-- 
Joel

[-- Attachment #2: threads-24.diff --]
[-- Type: text/x-diff, Size: 782 bytes --]

diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index be99ece..feca722 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -2644,13 +2644,14 @@ linux_nat_filter_event (int lwpid, int status, int options)
 			    "LLW: %s exited.\n",
 			    target_pid_to_str (lp->ptid));
 
-      exit_lwp (lp);
-
-      /* If there is at least one more LWP, then the exit signal was
-	 not the end of the debugged application and should be
-	 ignored.  */
-      if (num_lwps > 0)
-	return NULL;
+      if (num_lwps > 1)
+       {
+	 /* If there is at least one more LWP, then the exit signal
+	    was not the end of the debugged application and should be
+	    ignored.  */
+	 exit_lwp (lp);
+	 return NULL;
+       }
     }
 
   /* Check if the current LWP has previously exited.  In the nptl

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFA] Fix crash on Linux 2.4 when threaded program exits
  2009-04-01 18:22 [RFA] Fix crash on Linux 2.4 when threaded program exits Joel Brobecker
@ 2009-04-01 18:42 ` Pedro Alves
  2009-04-01 18:58   ` Joel Brobecker
  0 siblings, 1 reply; 3+ messages in thread
From: Pedro Alves @ 2009-04-01 18:42 UTC (permalink / raw)
  To: gdb-patches; +Cc: Joel Brobecker

On Wednesday 01 April 2009 19:22:20, Joel Brobecker wrote:

> 2009-04-01  Joel Brobecker  <brobecker@adacore.com>
> 
>         * linux-nat.c (linux_nat_filter_events): Do not delete the lwp if
>         this is the last one.
> 
> Tested on x86-linux (with a 2.4.21 Linux kernel). It fixes ~25 failures.
> Tested on x86_64-linux (with a 2.6 kernel). No regression.
> 
> Does this look correct?

Yes, this is OK.

The comments about ntpl and thread exit notifications on nptl
are confusing to a reader considering Linux 2.6.  They could do
with a:

 s/In the nptl thread model/In the nptl thread model on Linux 2.4/.

wait_lwp does mention this 2.4 + backported nptl artifact explicitly.

-- 
Pedro Alves


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFA] Fix crash on Linux 2.4 when threaded program exits
  2009-04-01 18:42 ` Pedro Alves
@ 2009-04-01 18:58   ` Joel Brobecker
  0 siblings, 0 replies; 3+ messages in thread
From: Joel Brobecker @ 2009-04-01 18:58 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

[-- Attachment #1: Type: text/plain, Size: 602 bytes --]

> > 2009-04-01  Joel Brobecker  <brobecker@adacore.com>
> > 
> >         * linux-nat.c (linux_nat_filter_events): Do not delete the lwp if
> >         this is the last one.

> Yes, this is OK.

Thanks :)

> The comments about ntpl and thread exit notifications on nptl
> are confusing to a reader considering Linux 2.6.  They could do
> with a:
> 
>  s/In the nptl thread model/In the nptl thread model on Linux 2.4/.
> 
> wait_lwp does mention this 2.4 + backported nptl artifact explicitly.

That's a good suggestion. Attached is the patch I ended checking in.

Thanks for the quick review,
-- 
Joel

[-- Attachment #2: thread-24.diff --]
[-- Type: text/x-diff, Size: 1975 bytes --]

Index: linux-nat.c
===================================================================
RCS file: /cvs/src/src/gdb/linux-nat.c,v
retrieving revision 1.126
diff -u -p -r1.126 linux-nat.c
--- linux-nat.c	25 Mar 2009 10:02:13 -0000	1.126
+++ linux-nat.c	1 Apr 2009 18:54:37 -0000
@@ -2623,16 +2623,16 @@ linux_nat_filter_event (int lwpid, int s
   /* Check if the thread has exited.  */
   if ((WIFEXITED (status) || WIFSIGNALED (status)) && num_lwps > 1)
     {
-      /* If this is the main thread, we must stop all threads and
-	 verify if they are still alive.  This is because in the nptl
-	 thread model, there is no signal issued for exiting LWPs
+      /* If this is the main thread, we must stop all threads and verify
+	 if they are still alive.  This is because in the nptl thread model
+	 on Linux 2.4, there is no signal issued for exiting LWPs
 	 other than the main thread.  We only get the main thread exit
 	 signal once all child threads have already exited.  If we
 	 stop all the threads and use the stop_wait_callback to check
 	 if they have exited we can determine whether this signal
 	 should be ignored or whether it means the end of the debugged
 	 application, regardless of which threading model is being
-	 used.  */
+	 used.	*/
       if (GET_PID (lp->ptid) == GET_LWP (lp->ptid))
 	{
 	  lp->stopped = 1;
@@ -2644,13 +2644,14 @@ linux_nat_filter_event (int lwpid, int s
 			    "LLW: %s exited.\n",
 			    target_pid_to_str (lp->ptid));
 
-      exit_lwp (lp);
-
-      /* If there is at least one more LWP, then the exit signal was
-	 not the end of the debugged application and should be
-	 ignored.  */
-      if (num_lwps > 0)
-	return NULL;
+      if (num_lwps > 1)
+       {
+	 /* If there is at least one more LWP, then the exit signal
+	    was not the end of the debugged application and should be
+	    ignored.  */
+	 exit_lwp (lp);
+	 return NULL;
+       }
     }
 
   /* Check if the current LWP has previously exited.  In the nptl

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-04-01 18:58 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-01 18:22 [RFA] Fix crash on Linux 2.4 when threaded program exits Joel Brobecker
2009-04-01 18:42 ` Pedro Alves
2009-04-01 18:58   ` Joel Brobecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox