thread exit goes defunct

Mirror of the gdb mailing list
 help / color / mirror / Atom feed

* thread exit goes defunct
@ 2001-02-12  7:19 Nicolas Vignal
  2001-02-12  8:51 ` Shaw Terwilliger
  0 siblings, 1 reply; 4+ messages in thread
From: Nicolas Vignal @ 2001-02-12  7:19 UTC (permalink / raw)
  To: gdb

Hello

Under gdb when a thread exit, he goes in the defunct state.

I read this mail about the same problem

http://sources.redhat.com/ml/gdb/2000-10/msg00008.html

What about the kernel patch ?

I tried with a kernel-2.4.0 and I still have the same problem.
Is there another way to fixe this problem ?

Regards

	Nicolas

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: thread exit goes defunct
  2001-02-12  7:19 thread exit goes defunct Nicolas Vignal
@ 2001-02-12  8:51 ` Shaw Terwilliger
  2001-02-13  3:20   ` Mark Kettenis
  0 siblings, 1 reply; 4+ messages in thread
From: Shaw Terwilliger @ 2001-02-12  8:51 UTC (permalink / raw)
  To: Nicolas Vignal; +Cc: gdb

Nicolas Vignal wrote:
> Under gdb when a thread exit, he goes in the defunct state.

I've been wondering about this too.  Since the old threads don't die,
I quickly run out of processes, which makes debugging long-running
servers (which spawn new threads on new connections) difficult.

-- 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iEYEARECAAYFAjqIFIIACgkQPEbgvbl6u4HH3gCgiXrPn/gg4AbhpH0SKa2K3ful
tGQAn0u2sRKBx4lCwnnLeGlUBMM/bAah
=qBve
-----END PGP SIGNATURE-----
From nsd@redhat.com Mon Feb 12 09:46:00 2001
From: Nick Duffek <nsd@redhat.com>
To: eliz@is.elta.co.il
Cc: gdb@sources.redhat.com, kettenis@wins.uva.nl
Subject: Re: Register cache
Date: Mon, 12 Feb 2001 09:46:00 -0000
Message-id: <200102121753.f1CHr9t11723@rtl.cygnus.com>
References: <Pine.SUN.3.91.1010212092158.12969C-100000@is>
X-SW-Source: 2001-02/msg00124.html
Content-length: 1377

On 12-Feb-2001, Eli Zaretskii wrote:

>So you are telling, in effect, that it's okay to have
>i387_supply_fsave get all the FP registers

Yes, I think that's a reasonable and useful i387 interface.

>and x86 targets which don't like that should provide ther own code
>instead of using i387_supply_fsave?

Certainly they can't use i387_supply_fsave, so something like
i387_supply_fpreg would be necessary.

Note that the {supply,fill}_*regset functions aren't part of the register
cache interface: they're just a set of target-specific functions that
several targets happen have in common.

Those targets generally can't fetch just one floating-point register,
which is why the supply_*regset functions lack a REGNO argument.  As Mark
said, the register cache continues to work properly if
target_fetch_registers fetches more registers than requested, so there's
no loss and potentially some gain for those targets to fetch all fp
registers.

If it's possible and more efficient on your target to fetch one fp
register instead of all of them, then I think i387_supply_fpreg is a good
idea.

In my opinion, interface expansion is preferable to code duplication, so I
agree with your patch to put i387_supply_fpreg in i387-nat.c.  However,
that's really a coding philosophy issue, and since Mark wrote i387-nat.c,
I'm inclined to bow to his opinion on how it gets changed.

Nick

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: thread exit goes defunct
  2001-02-12  8:51 ` Shaw Terwilliger
@ 2001-02-13  3:20   ` Mark Kettenis
  2001-02-13  8:40     ` Nicolas Vignal
  0 siblings, 1 reply; 4+ messages in thread
From: Mark Kettenis @ 2001-02-13  3:20 UTC (permalink / raw)
  To: Shaw Terwilliger; +Cc: Nicolas Vignal, gdb

Shaw Terwilliger <sterwill@sourcegear.com> writes:

> Nicolas Vignal wrote:
> > Under gdb when a thread exit, he goes in the defunct state.
> 
> I've been wondering about this too.  Since the old threads don't die,
> I quickly run out of processes, which makes debugging long-running
> servers (which spawn new threads on new connections) difficult.

This *is* caused by a long standing kernel bug.  Alan Cox has promised
to look into it if he can find the time, but it doesn't have a really
high prority for him.  Anyway, someone who does care should take this
up with the kernel folks; see the attached message that I posted to
the linux-kernel mailing list last december.  I simply don't care
enough to keep sending this stuff to the Linux kernel folks.

Mark

Peter Berger <peterb@telerama.com> writes:

> > > The zombie problem is a kernel bug.  AFAIK there is no kernel that
> > > doesn't have this bug.  Unfortunately getting the bugfix in isn't very
> > > easy.  I'll try poking Linus again when I can find the time.
> 
> *sob*
> 
> Alan Cox says it is a glibc bug.

Well, he says it's likely to be a glibc bug, and from the information
you've given him I don't blame him for coming to that conclusion.

However, the "zombie problem" is caused by the way ptrace() interacts
with clone()/exit()/wait(), which I consider to be a kernel bug.  Let
me explain:

When LinuxThreads (the glibc pthreads implementation) creates a new
thread using the clone() system call, it arranges for itself to get a
special "cancel" signal to be delivered when the newly thread exits
(instead of SIGCHLD), such that it is possible to distinguish between
exiting threads and genuine child processes.  When it recieves this
special "cancel" signal it calls wait() with the __WCLONE flag such
that exited threads (and only exited threads, not exited child
processes) are reaped and don't live on as zombies.

When run under GDB, the debugger attaches to every newly created
thread (using ptrace()).  This has the effect that the thread's parent
is now the debugger instead of the special manager thread that is
supposed to reap that particular thread when it exits.  So the event
is reported to GDB (with the special "cancel" signal), and GDB does
the wait() with the __WCLONE flag that's necessary to detach from the
exited thread.  However the kernel notices that the exited thread was
being traced and keeps it around as a zombie to give the LinuxThreads
library the opportunity to wait() for it.  Unfortunately the kernel
unconditionally sends a SIGCHLD instead of the special "cancel"
signal, and the event goes unnoticed, as can be seen from the
following bit of code in linux/kernel/exit.c:sys_wait4():

                                if (p->p_opptr != p->p_pptr) {
                                        write_lock_irq(&tasklist_lock);
                                        REMOVE_LINKS(p);
                                        p->p_pptr = p->p_opptr;
                                        SET_LINKS(p);
                                        write_unlock_irq(&tasklist_lock);
                                        notify_parent(p, SIGCHLD);
                                } else
                                        release(p);

AFAICT all officially released kernels have this problem.

Furthermore, there is a problem with some kernels (the 2.4 series and
the latest kernels in the 2.2 series), where the special exit signal
is being reset to SIGCHLD by the following code in
linux/kernel/exit.c:exit_notify():

        /* Let father know we died 
         *
         * Thread signals are configurable, but you aren't going to use
         * that to send signals to arbitary processes. 
         * That stops right now.
         *
         * If the parent exec id doesn't match the exec id we saved
         * when we started then we know the parent has changed security
         * domain.
         *
         * If our self_exec id doesn't match our parent_exec_id then
         * we have changed execution domain as these two values started
         * the same after a fork.
         *      
         */

        if(current->exit_signal != SIGCHLD &&
            ( current->parent_exec_id != t->self_exec_id  ||
              current->self_exec_id != current->parent_exec_id) 
            && !capable(CAP_KILL))
                current->exit_signal = SIGCHLD;

since current->parent_exit_id and t->self_exec_id don't match for
traced processes (since t->self_exec_id is the exec id of the debugger
and not of the origional parent).  This means that even if the
LinuxThreads library receives its special "cancel" signal, it won't
notice the exited threads (since it uses wait() with the __WCLONE
flag), and those threads will live on as zombies until the entire
process exits.

I'm not sure how to fix this problem.  I would say that the attached
patch (against Linux 2.2.18) should fix things, but I'm not entirely
confident that it doesn't open a security hole.  Oh, and I didn't test
this exact patch, I just tested something similar some time ago.

Mark

--- exit.c.orig Tue Jan  4 19:12:25 2000
+++ exit.c      Wed Dec 13 14:36:24 2000
@@ -291,7 +291,7 @@
         * is about to become orphaned.
         */

-       t = current->p_pptr;
+       t = current->p_opptr;

        if ((t->pgrp != current->pgrp) &&
            (t->session == current->session) &&
@@ -497,7 +497,7 @@
                                        p->p_pptr = p->p_opptr;
                                        SET_LINKS(p);
                                        write_unlock_irq(&tasklist_lock);
-                                       notify_parent(p, SIGCHLD);
+                                       notify_parent(p, p->exit_signal);
                                } else
                                        release(p);
 #ifdef DEBUG_PROC_TREE

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: thread exit goes defunct
  2001-02-13  3:20   ` Mark Kettenis
@ 2001-02-13  8:40     ` Nicolas Vignal
  0 siblings, 0 replies; 4+ messages in thread
From: Nicolas Vignal @ 2001-02-13  8:40 UTC (permalink / raw)
  To: Mark Kettenis, Shaw Terwilliger, gdb; +Cc: gdb

Wait and see...

Regards

	Nicolas

On Tuesday 13 February 2001 12:20, Mark Kettenis wrote:
> Shaw Terwilliger <sterwill@sourcegear.com> writes:
> > Nicolas Vignal wrote:
> > > Under gdb when a thread exit, he goes in the defunct state.
> >
> > I've been wondering about this too.  Since the old threads don't die,
> > I quickly run out of processes, which makes debugging long-running
> > servers (which spawn new threads on new connections) difficult.
>
> This *is* caused by a long standing kernel bug.  Alan Cox has promised
> to look into it if he can find the time, but it doesn't have a really
> high prority for him.  Anyway, someone who does care should take this
> up with the kernel folks; see the attached message that I posted to
> the linux-kernel mailing list last december.  I simply don't care
> enough to keep sending this stuff to the Linux kernel folks.
>
> Mark
>
> Peter Berger <peterb@telerama.com> writes:


...


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2001-02-13  8:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-02-12  7:19 thread exit goes defunct Nicolas Vignal
2001-02-12  8:51 ` Shaw Terwilliger
2001-02-13  3:20   ` Mark Kettenis
2001-02-13  8:40     ` Nicolas Vignal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox