Re: Is the current gdb 5.1 broken for Linuxthreads?

Mirror of the gdb mailing list
 help / color / mirror / Atom feed

* Re: Is the current gdb 5.1 broken for Linuxthreads?
@ 2001-09-21  2:27 James Cownie
  2001-09-21  5:04 ` Eric Paire
  0 siblings, 1 reply; 21+ messages in thread
From: James Cownie @ 2001-09-21  2:27 UTC (permalink / raw)
  To: H . J . Lu; +Cc: Eric Paire, Andrew Cagney, Mark Kettenis, GDB

Eric wrote :- 

>  There is no support for MT core dumps.

H.J. replied :-

> Try the current Red Hat kernel/ac kernel. They support it. The patch
> is very small. I am enclosing it here.

However inspection shows that that patch does _not_ implement a
multi-threaded core dump. What it does is to dump a full core file for
each thread.

That seems a somewhat perverse approach, given that 

1) the ELF core dump format easily handles a genuine multi-threaded
   core dump (cf Solaris, IRIX, ...)

2) debuggers already know how to read such multi-threaded core dumps
   and present them as a process with multiple threads.

3) dumping a full core dump for each thread is (to first
   approximation) using nthreads too much I/O and nthreads too much
   disk space.

What is wanted is a genuine multi-threaded core dump, not this
horror...

-- Jim 

James Cownie	<jcownie@etnus.com>
Etnus, LLC.     +44 117 9071438
http://www.etnus.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is the current gdb 5.1 broken for Linuxthreads?
  2001-09-21  2:27 Is the current gdb 5.1 broken for Linuxthreads? James Cownie
@ 2001-09-21  5:04 ` Eric Paire
  2001-09-21  5:25   ` James Cownie
                     ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Eric Paire @ 2001-09-21  5:04 UTC (permalink / raw)
  To: James Cownie; +Cc: H . J . Lu, Andrew Cagney, Mark Kettenis, GDB

> 
> Eric wrote :- 
> 
> >  There is no support for MT core dumps.
> 
> H.J. replied :-
> 
> > Try the current Red Hat kernel/ac kernel. They support it. The patch
> > is very small. I am enclosing it here.
> 
> However inspection shows that that patch does _not_ implement a
> multi-threaded core dump. What it does is to dump a full core file for
> each thread.
> 
> That seems a somewhat perverse approach, given that 
> 
> 1) the ELF core dump format easily handles a genuine multi-threaded
>    core dump (cf Solaris, IRIX, ...)
> 
This is not feasible in Linux as Linus does not want to implement any
specific pthread feature in the kernel (and the core dump is 100%
kernel code), e.g. why a thread doing a fault should kill the other,
perhaps the application is written in such a way that it can recover
from it.

> 2) debuggers already know how to read such multi-threaded core dumps
>    and present them as a process with multiple threads.
> 
The point is that debugger should understand the way MT core dumps are
done

> 3) dumping a full core dump for each thread is (to first
>    approximation) using nthreads too much I/O and nthreads too much
>    disk space.
> 
If you look carefully at it, it only dumps the first thread, and the dump
is no longer allowed for any thread of this process.

> What is wanted is a genuine multi-threaded core dump, not this
> horror...
> 
I would not say that it is, because it exists, and we have been leaving
without anything for years. I would just say that it is a first step
that is more useful that nothing.

-Eric
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ Eric PAIRE
Web  : http://www.ri.silicomp.com/~paire  | Groupe SILICOMP - Research Institute
Email: eric.paire@ri.silicomp.com         | 2, avenue de Vignate
Phone: +33 (0) 476 63 48 71               | F-38610 Gieres
Fax  : +33 (0) 476 51 05 32               | FRANCE


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is the current gdb 5.1 broken for Linuxthreads?
  2001-09-21  5:04 ` Eric Paire
@ 2001-09-21  5:25   ` James Cownie
  2001-09-21  8:35   ` H . J . Lu
  2001-09-21  8:39   ` Andrew Cagney
  2 siblings, 0 replies; 21+ messages in thread
From: James Cownie @ 2001-09-21  5:25 UTC (permalink / raw)
  To: Eric Paire; +Cc: H . J . Lu, Andrew Cagney, Mark Kettenis, GDB

> == Eric
> > == Me, 
> > That seems a somewhat perverse approach, given that 
> > 
> > 1) the ELF core dump format easily handles a genuine multi-threaded
> >    core dump (cf Solaris, IRIX, ...)
> > 
> This is not feasible in Linux as Linus does not want to implement any
> specific pthread feature in the kernel (and the core dump is 100%
> kernel code), e.g. why a thread doing a fault should kill the other,
> perhaps the application is written in such a way that it can recover
> from it.

It remains perverse no matter what Linus' view of it is. (I'm sure he
would be happy to own to being perverse under some circumstances !). 

The argument you outline (that there are threaded codes for which
dumping the whole process as a result of a failure in one thread would
be the wrong behaviour) does not imply that there are no codes for
which dumping the whole process on thread failure would be the
_correct_ behaviour. All it argues is that the behaviour on thread
error needs to be configurable on a per-proces basis; that is not a
big surprise.

Whether you view such a per-process piece of state and behaviour as
being "pthread specific" is a political (not a technical) choice. Were
such an implementation to exist it would be useful to pthread code,
but would also be equally useful to codes which create threads with
naked clone but want to dump all threads on error.

After all, the kernel implements clone, and pthreads uses that but
no-one seems to think clone should go away because it's "pthread
support in the kernel".

> If you look carefully at it, it only dumps the first thread, and the
> dump is no longer allowed for any thread of this process.

In which case it is still _not_ a multi-threaded core dump, which is
what you started off by asking for. It's just a normal core dump of
one thread. (And information about all other threads is lost). While
this may be more useful than the previous state (dump a core file from
a thread which you almost certainly weren't interested in), no matter
how you look at it it's not a multi-threaded core dump.

> > 2) debuggers already know how to read such multi-threaded core dumps
> >    and present them as a process with multiple threads.
> > 
> The point is that debugger should understand the way MT core dumps are
> done

But you just said that there still are _no_ MT core dumps for the
debugger to understand. What's new to understand about a normal single
threaded core dump from one thread ?

> > What is wanted is a genuine multi-threaded core dump, not this
> > horror...
> > 
> I would not say that it is, because it exists, 

Where ? You just told me it didn't and that all that gets dumped is
one single-threaded core file.

> and we have been living without anything for years.

This is just another spin on "Eat sh*t, 10 billion flies can't be
wrong", an argument I have never found very convincing.

-- Jim 

James Cownie	<jcownie@etnus.com>
Etnus, LLC.     +44 117 9071438
http://www.etnus.com

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is the current gdb 5.1 broken for Linuxthreads?
  2001-09-21  5:04 ` Eric Paire
  2001-09-21  5:25   ` James Cownie
@ 2001-09-21  8:35   ` H . J . Lu
  2001-09-21  8:39   ` Andrew Cagney
  2 siblings, 0 replies; 21+ messages in thread
From: H . J . Lu @ 2001-09-21  8:35 UTC (permalink / raw)
  To: Eric Paire; +Cc: James Cownie, Andrew Cagney, Mark Kettenis, GDB

On Fri, Sep 21, 2001 at 01:53:58PM +0200, Eric Paire wrote:
> > 
> > Eric wrote :- 
> > 
> > >  There is no support for MT core dumps.
> > 
> > H.J. replied :-
> > 
> > > Try the current Red Hat kernel/ac kernel. They support it. The patch
> > > is very small. I am enclosing it here.
> > 
> > However inspection shows that that patch does _not_ implement a
> > multi-threaded core dump. What it does is to dump a full core file for
> > each thread.
> > 
> > That seems a somewhat perverse approach, given that 
> > 
> > 1) the ELF core dump format easily handles a genuine multi-threaded
> >    core dump (cf Solaris, IRIX, ...)
> > 
> This is not feasible in Linux as Linus does not want to implement any
> specific pthread feature in the kernel (and the core dump is 100%
> kernel code), e.g. why a thread doing a fault should kill the other,
> perhaps the application is written in such a way that it can recover
> from it.

I am hoping to see POSIX semaphore for Linux. If we want some support
in kernel, kernel may have to know threads. If it is the case, we can
do many other interesting things in kernel for threads.

H.J.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is the current gdb 5.1 broken for Linuxthreads?
  2001-09-21  5:04 ` Eric Paire
  2001-09-21  5:25   ` James Cownie
  2001-09-21  8:35   ` H . J . Lu
@ 2001-09-21  8:39   ` Andrew Cagney
  2 siblings, 0 replies; 21+ messages in thread
From: Andrew Cagney @ 2001-09-21  8:39 UTC (permalink / raw)
  To: Eric Paire; +Cc: James Cownie, H . J . Lu, Andrew Cagney, Mark Kettenis, GDB

> > 2) debuggers already know how to read such multi-threaded core dumps
> >    and present them as a process with multiple threads.
> > 
> The point is that debugger should understand the way MT core dumps are
> done

Including multiple threads in a single core dump is a pretty well
understood problem.  Believe it or not that problem is solved by a
standard and that standard, on the whole, is a good thing.  In this
case it provides a clear, well understood interface between the
debugger and the kernel.  Re-invent the standard (without very good
reason and I don't see one here) and everyone looses - we spend our
time trying to keep things in sync.

Case in point?  GDB's support of Linux's kernel thread model.  Until
people sat down and came up with the libthread / threaddb interface we
were in a situtation where, every time a new linux kernel was spun,
we'd need to, yet again, re-spin GDB (and in the process break
compatibility with some other kernel).  While the current GDB thread
code may not be perfect it is still streets ahead of what we had
before.  Why?  Because it is now possible to fix a problem once and
not have it come back again and again and again.

Should GDB be able to manage multiple simultaneous core files?  Yes,
after all, why not.  Should such a feature be made a predicate GDB
supporting Linux's multi-threaded core dumps.  I think not. I think
adding such a feature is outside of the scope of the immediate problem
- getting the linux kernel to do a core dump that complies to current
conventions.

enjoy,
	Andrew

(1) The floating-point support has been overhauled.  Ditto for the
register model.  The target vector is probably next.  If you know
anything about GDB;s internals you'll know that an overhall of the
target vector is a precursor to many new features, support for
multiple core dump files is probably one.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Is the current gdb 5.1 broken for Linuxthreads?
@ 2001-09-17 12:47 H . J . Lu
       [not found] ` <20010917161350.A25349@lucon.org>
  0 siblings, 1 reply; 21+ messages in thread
From: H . J . Lu @ 2001-09-17 12:47 UTC (permalink / raw)
  To: GDB, GNU C Library

Here is a modified example from glibc.

# gcc -g ex11.c -lpthread -lrt -D_GNU_SOURCE
# ./a.out&
# ps -xal | grep a.out
000  1103 27904  3705   9   0 247300 600 rt_sig S    pts/17     0:00 ./a.out
040  1103 27905 27904  11   0 247300 600 do_pol S    pts/17     0:00 ./a.out
040  1103 27906 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
040  1103 27907 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
040  1103 27908 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
040  1103 27909 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
040  1103 27910 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
040  1103 27911 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
040  1103 27912 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
040  1103 27913 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
040  1103 27914 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
040  1103 27915 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
040  1103 27916 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
040  1103 27917 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
040  1103 27918 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
040  1103 27919 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
040  1103 27920 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
040  1103 27921 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
040  1103 27922 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
040  1103 27923 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
040  1103 27924 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
040  1103 27925 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
040  1103 27926 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
040  1103 27927 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
040  1103 27928 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
040  1103 27929 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
040  1103 27930 27905  10   0 247300 600 nanosl S    pts/17     0:00 ./a.out
040  1103 27931 27905  10   0 247300 600 nanosl S    pts/17     0:00 ./a.out
040  1103 27932 27905  10   0 247300 600 nanosl S    pts/17     0:00 ./a.out
040  1103 27933 27905  11   0 247300 600 nanosl S    pts/17     0:00 ./a.out
040  1103 27934 27905  10   0 247300 600 nanosl S    pts/17     0:00 ./a.out

# gdb a.out
...
(gdb) att 27934
Attaching to program: /home/hjl/bugs/gdb/thread/./a.out, process 27934
Child process unexpectedly missing: No child processes.

Program terminated with signal ?, Unknown signal.
The program no longer exists.
(gdb)

But

(gdb) att 27904

worked fine. It is a serious regression from gdb 4.18 from RedHat
6.2.


H.J.
From hjl@lucon.org Mon Sep 17 16:13:00 2001
From: "H . J . Lu" <hjl@lucon.org>
To: GDB <gdb@sourceware.cygnus.com>
Subject: Re: Is the current gdb 5.1 broken for Linuxthreads?
Date: Mon, 17 Sep 2001 16:13:00 -0000
Message-id: <20010917161350.A25349@lucon.org>
References: <20010917124710.A21992@lucon.org>
X-SW-Source: 2001-09/msg00139.html
Content-length: 3420

On Mon, Sep 17, 2001 at 12:47:10PM -0700, H . J . Lu wrote:
> Here is a modified example from glibc.
> 
> # gcc -g ex11.c -lpthread -lrt -D_GNU_SOURCE
> # ./a.out&
> # ps -xal | grep a.out
> 000  1103 27904  3705   9   0 247300 600 rt_sig S    pts/17     0:00 ./a.out
> 040  1103 27905 27904  11   0 247300 600 do_pol S    pts/17     0:00 ./a.out
> 040  1103 27906 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> 040  1103 27907 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> 040  1103 27908 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> 040  1103 27909 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> 040  1103 27910 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> 040  1103 27911 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> 040  1103 27912 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> 040  1103 27913 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> 040  1103 27914 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> 040  1103 27915 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> 040  1103 27916 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> 040  1103 27917 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> 040  1103 27918 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> 040  1103 27919 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> 040  1103 27920 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> 040  1103 27921 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> 040  1103 27922 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> 040  1103 27923 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> 040  1103 27924 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> 040  1103 27925 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> 040  1103 27926 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> 040  1103 27927 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> 040  1103 27928 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> 040  1103 27929 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> 040  1103 27930 27905  10   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> 040  1103 27931 27905  10   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> 040  1103 27932 27905  10   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> 040  1103 27933 27905  11   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> 040  1103 27934 27905  10   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> 
> # gdb a.out
> ...
> (gdb) att 27934
> Attaching to program: /home/hjl/bugs/gdb/thread/./a.out, process 27934
> Child process unexpectedly missing: No child processes.
> 
> Program terminated with signal ?, Unknown signal.
> The program no longer exists.
> (gdb)
> 
> But
> 
> (gdb) att 27904
> 
> worked fine. It is a serious regression from gdb 4.18 from RedHat
> 6.2.
> 

The more I looked at it, the more borken gdb is with linuxthreads:

# gcc -g ex11.c -lpthread -lrt -D_GNU_SOURCE -static
# a.out
# gdb a.out
...
(gdb) att 14226
Attaching to program: /home/hjl/bugs/gdb/thread/a.out, process 14226
...
lin-lwp.c:620: gdb-internal-error: stop_wait_callback: Assertion `pid ==
GET_LWP (lp->ptid)' failed.
An internal GDB error was detected.  This may make further



H.J.


^ permalink raw reply	[flat|nested] 21+ messages in thread

[parent not found: <20010917161350.A25349@lucon.org>]

* Re: Is the current gdb 5.1 broken for Linuxthreads?
       [not found] ` <20010917161350.A25349@lucon.org>
@ 2001-09-17 19:13   ` H . J . Lu
  2001-09-18 13:56     ` H . J . Lu
  2001-09-19  6:32     ` Mark Kettenis
  0 siblings, 2 replies; 21+ messages in thread
From: H . J . Lu @ 2001-09-17 19:13 UTC (permalink / raw)
  To: GDB

On Mon, Sep 17, 2001 at 04:13:50PM -0700, H . J . Lu wrote:
> On Mon, Sep 17, 2001 at 12:47:10PM -0700, H . J . Lu wrote:
> > Here is a modified example from glibc.
> > 
> > # gcc -g ex11.c -lpthread -lrt -D_GNU_SOURCE
> > # ./a.out&
> > # ps -xal | grep a.out
> > 000  1103 27904  3705   9   0 247300 600 rt_sig S    pts/17     0:00 ./a.out
> > 040  1103 27905 27904  11   0 247300 600 do_pol S    pts/17     0:00 ./a.out
> > 040  1103 27906 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> > 040  1103 27907 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> > 040  1103 27908 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> > 040  1103 27909 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> > 040  1103 27910 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> > 040  1103 27911 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> > 040  1103 27912 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> > 040  1103 27913 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> > 040  1103 27914 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> > 040  1103 27915 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> > 040  1103 27916 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> > 040  1103 27917 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> > 040  1103 27918 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> > 040  1103 27919 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> > 040  1103 27920 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> > 040  1103 27921 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> > 040  1103 27922 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> > 040  1103 27923 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> > 040  1103 27924 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> > 040  1103 27925 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> > 040  1103 27926 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> > 040  1103 27927 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> > 040  1103 27928 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> > 040  1103 27929 27905   9   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> > 040  1103 27930 27905  10   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> > 040  1103 27931 27905  10   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> > 040  1103 27932 27905  10   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> > 040  1103 27933 27905  11   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> > 040  1103 27934 27905  10   0 247300 600 nanosl S    pts/17     0:00 ./a.out
> > 
> > # gdb a.out
> > ...
> > (gdb) att 27934
> > Attaching to program: /home/hjl/bugs/gdb/thread/./a.out, process 27934
> > Child process unexpectedly missing: No child processes.
> > 
> > Program terminated with signal ?, Unknown signal.
> > The program no longer exists.
> > (gdb)
> > 
> > But
> > 
> > (gdb) att 27904
> > 
> > worked fine. It is a serious regression from gdb 4.18 from RedHat
> > 6.2.
> > 
> 
> The more I looked at it, the more borken gdb is with linuxthreads:
> 
> # gcc -g ex11.c -lpthread -lrt -D_GNU_SOURCE -static
> # a.out
> # gdb a.out
> ...
> (gdb) att 14226
> Attaching to program: /home/hjl/bugs/gdb/thread/a.out, process 14226
> ...
> lin-lwp.c:620: gdb-internal-error: stop_wait_callback: Assertion `pid ==
> GET_LWP (lp->ptid)' failed.
> An internal GDB error was detected.  This may make further

It looks like with gdb 5.1, I have to attach the very first thread. Is
that documented anywhere? Shouldn't gdb find the very first thread
and attach it for me?


H.J.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is the current gdb 5.1 broken for Linuxthreads?
  2001-09-17 19:13   ` H . J . Lu
@ 2001-09-18 13:56     ` H . J . Lu
  2001-09-19  0:46       ` Eli Zaretskii
  2001-09-19  6:56       ` Mark Kettenis
  2001-09-19  6:32     ` Mark Kettenis
  1 sibling, 2 replies; 21+ messages in thread
From: H . J . Lu @ 2001-09-18 13:56 UTC (permalink / raw)
  To: GDB

On Mon, Sep 17, 2001 at 07:13:57PM -0700, H . J . Lu wrote:
> > 
> > The more I looked at it, the more borken gdb is with linuxthreads:
> > 
> > # gcc -g ex11.c -lpthread -lrt -D_GNU_SOURCE -static
> > # a.out
> > # gdb a.out
> > ...
> > (gdb) att 14226
> > Attaching to program: /home/hjl/bugs/gdb/thread/a.out, process 14226
> > ...
> > lin-lwp.c:620: gdb-internal-error: stop_wait_callback: Assertion `pid ==
> > GET_LWP (lp->ptid)' failed.
> > An internal GDB error was detected.  This may make further
> 
> It looks like with gdb 5.1, I have to attach the very first thread. Is
> that documented anywhere? Shouldn't gdb find the very first thread
> and attach it for me?

It seems that the Linuxthreads support in gdb 5.1 is very fragile. In
some aspects, it is worse than gdb 4.17/4.18 with various Linuxthreads
patches. The problem seems to be gdb starts with the none-threaded mode
and the Linuxthreads support is only activated at very late time. In
some cases, it is too late. One problem seems to call wait () on cloned
processes. Can't we treat none-threaded Linux procceses as a
Linuxthreads with one thread? That is what gdb 4.17 does.

BTW, people may be very disappointed at the current Linuxthreads
support in gdb 5.1.

H.J.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is the current gdb 5.1 broken for Linuxthreads?
  2001-09-18 13:56     ` H . J . Lu
@ 2001-09-19  0:46       ` Eli Zaretskii
  2001-09-19  8:43         ` H . J . Lu
  2001-09-19  6:56       ` Mark Kettenis
  1 sibling, 1 reply; 21+ messages in thread
From: Eli Zaretskii @ 2001-09-19  0:46 UTC (permalink / raw)
  To: hjl; +Cc: gdb

> Date: Tue, 18 Sep 2001 13:55:55 -0700
> From: "H . J . Lu" <hjl@lucon.org>
> 
> BTW, people may be very disappointed at the current Linuxthreads
> support in gdb 5.1.

I wonder why was it necessary to make such a comment.  It doesn't add
any technical information to what you already posted, but it might
discourage people from working on fixing the problems.

Can we please stay technical and deviod of any unfriendly attitude?
Saying that such and such feature doesn't work is all people need to
try to fix the problem.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is the current gdb 5.1 broken for Linuxthreads?
  2001-09-19  0:46       ` Eli Zaretskii
@ 2001-09-19  8:43         ` H . J . Lu
  0 siblings, 0 replies; 21+ messages in thread
From: H . J . Lu @ 2001-09-19  8:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gdb

On Wed, Sep 19, 2001 at 10:43:45AM +0300, Eli Zaretskii wrote:
> > Date: Tue, 18 Sep 2001 13:55:55 -0700
> > From: "H . J . Lu" <hjl@lucon.org>
> > 
> > BTW, people may be very disappointed at the current Linuxthreads
> > support in gdb 5.1.
> 
> I wonder why was it necessary to make such a comment.  It doesn't add
> any technical information to what you already posted, but it might
> discourage people from working on fixing the problems.

That is the very first response I got for all my messages on this. Does
it tell anything? FYI, my users told me the new gdb was broken for them
and wanted to go back to the old one.

> 
> Can we please stay technical and deviod of any unfriendly attitude?
> Saying that such and such feature doesn't work is all people need to
> try to fix the problem.

I thought noone cared about it. Let's fix the bug.


H.J.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is the current gdb 5.1 broken for Linuxthreads?
  2001-09-18 13:56     ` H . J . Lu
  2001-09-19  0:46       ` Eli Zaretskii
@ 2001-09-19  6:56       ` Mark Kettenis
  2001-09-19  7:39         ` Eric Paire
  2001-09-19  9:10         ` H . J . Lu
  1 sibling, 2 replies; 21+ messages in thread
From: Mark Kettenis @ 2001-09-19  6:56 UTC (permalink / raw)
  To: H . J . Lu; +Cc: GDB

"H . J . Lu" <hjl@lucon.org> writes:

> It seems that the Linuxthreads support in gdb 5.1 is very fragile. In
> some aspects, it is worse than gdb 4.17/4.18 with various Linuxthreads
> patches. The problem seems to be gdb starts with the none-threaded mode
> and the Linuxthreads support is only activated at very late time. In
> some cases, it is too late. One problem seems to call wait () on cloned
> processes. Can't we treat none-threaded Linux procceses as a
> Linuxthreads with one thread? That is what gdb 4.17 does.

And in some sense the threads support in GDB 5.1 is better than GDB
4.17/4.18 with patches.  The 4.17/4.18 Linuxthreads-patches are
unmaintainable.  Whenever the internals of the threads library change
you'll need to patch GDB.  They also make it very hard to add support
for other threads libraries to GDB.

Not activating the LinuxThreads support until the threads library is
detected by GDB still seems the right approach to me.  The
LinuxThreads support has to do certain things that interfere with the
process being debugged, and you don't want that for non-threaded
processes.  That said, I think it should be possible to make the LWP
layer in lin-lwp.c (which is largely threads library independent) the
default layer for Linux without any unwanted side-effects.  At least
for 2.4 kernels.

BTW, debugging threaded apps under Linux will always be somewhat
fragile as long as there isn't a sane kernel threads interface to the
kernel.  There should be an interface to stop all threads in a
synchronous way.  Unfortunately, I have no hope that such an interface
will be added to the kernel.

> BTW, people may be very disappointed at the current Linuxthreads
> support in gdb 5.1.

If they are they should help improving it.  Several people have
reported problems.  Most of these I have been unable to reproduce.
Hardly anyone even bothers to answer me if I ask for a small
self-contained testcase for the problem.

Mark

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is the current gdb 5.1 broken for Linuxthreads?
  2001-09-19  6:56       ` Mark Kettenis
@ 2001-09-19  7:39         ` Eric Paire
  2001-09-19  9:05           ` H . J . Lu
  2001-09-19 13:39           ` Andrew Cagney
  2001-09-19  9:10         ` H . J . Lu
  1 sibling, 2 replies; 21+ messages in thread
From: Eric Paire @ 2001-09-19  7:39 UTC (permalink / raw)
  To: Mark Kettenis; +Cc: H . J . Lu, GDB

Mark Kettenis <kettenis@science.uva.nl> writes:

> "H . J . Lu" <hjl@lucon.org> writes:
> 
> > It seems that the Linuxthreads support in gdb 5.1 is very fragile. In
> > some aspects, it is worse than gdb 4.17/4.18 with various Linuxthreads
> > patches. The problem seems to be gdb starts with the none-threaded mode
> > and the Linuxthreads support is only activated at very late time. In
> > some cases, it is too late. One problem seems to call wait () on cloned
> > processes. Can't we treat none-threaded Linux procceses as a
> > Linuxthreads with one thread? That is what gdb 4.17 does.
> 
> And in some sense the threads support in GDB 5.1 is better than GDB
> 4.17/4.18 with patches.  The 4.17/4.18 Linuxthreads-patches are
> unmaintainable.  Whenever the internals of the threads library change
> you'll need to patch GDB.  They also make it very hard to add support
> for other threads libraries to GDB.
> 
As original developer of the linuxthread patches for GDB-4.17, I mostly
agree with Mark. At that time, I saw a lot of complaints that GDB was
unable to debug LinuxThreaded programs (and the only answer was do it
yourself, which I did). The only benefit I saw with this version is that
the support existed and a lot of people used it (which probably was the
main reason why GDB developers decided to support it in a cleaner way
in GDB-5.0).

I must add that I have been using gdb-4.18 even on RedHat-7.1, as official
GDB-5.0 and development version of GDB-5.1 have been buggy. But now,
it seems to me as useable as the gdb-4.18 one, so I switched to the
current GDB-5.1. GDB_4.18 is now dead, and we must help GDB-5.1 to
become the version that people have to use for linuxthreaded apps.

> Not activating the LinuxThreads support until the threads library is
> detected by GDB still seems the right approach to me.  The
> LinuxThreads support has to do certain things that interfere with the
> process being debugged, and you don't want that for non-threaded
> processes.  That said, I think it should be possible to make the LWP
> layer in lin-lwp.c (which is largely threads library independent) the
> default layer for Linux without any unwanted side-effects.  At least
> for 2.4 kernels.
> 
> BTW, debugging threaded apps under Linux will always be somewhat
> fragile as long as there isn't a sane kernel threads interface to the
> kernel.  There should be an interface to stop all threads in a
> synchronous way.  Unfortunately, I have no hope that such an interface
> will be added to the kernel.
> 
I don't agree with you: There are at least 2 bugs in the current Linux
kernel which makes you think that the support is fragile:
1) SIGSTOP management is not-POSIX conformant
2) reparenting of debugged processes is buggy

I already started a thread to explain that that stopping all threads in
a synchronous way was an illusion: Think of a 2-way processor on which
2 threads are running on each processor: If one thread stops, the time
required by one processor to handle the trap, discover that others
threads must be stopped, makwe the interprocessor request, ... allows
the other thread to run thousands of instructions on the second
processor before being stopped. The result is that you think all threads
have stopped at the same time, while it's false, even if you have the best
interface you can think of.

> > BTW, people may be very disappointed at the current Linuxthreads
> > support in gdb 5.1.
> 
> If they are they should help improving it.  Several people have
> reported problems.  Most of these I have been unable to reproduce.
> Hardly anyone even bothers to answer me if I ask for a small
> self-contained testcase for the problem.
> 
I must say that I have been very disappointed with GDB-5.0 (which was
almost unusable with "real" multithreaded applications). But, the fact
that I switched from my version of GDB-4.18 towards GDB-5.1 seems to
me a good sign, even if I agree that there is still some minor enhancements
to make.

Just my 2 cents,
-Eric
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ Eric PAIRE
Web  : http://www.ri.silicomp.com/~paire  | Groupe SILICOMP - Research Institute
Email: eric.paire@ri.silicomp.com         | 2, avenue de Vignate
Phone: +33 (0) 476 63 48 71               | F-38610 Gieres
Fax  : +33 (0) 476 51 05 32               | FRANCE


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is the current gdb 5.1 broken for Linuxthreads?
  2001-09-19  7:39         ` Eric Paire
@ 2001-09-19  9:05           ` H . J . Lu
  2001-09-20  0:59             ` Eric Paire
  2001-09-19 13:39           ` Andrew Cagney
  1 sibling, 1 reply; 21+ messages in thread
From: H . J . Lu @ 2001-09-19  9:05 UTC (permalink / raw)
  To: Eric Paire; +Cc: Mark Kettenis, GDB

On Wed, Sep 19, 2001 at 04:38:51PM +0200, Eric Paire wrote:
> > BTW, debugging threaded apps under Linux will always be somewhat
> > fragile as long as there isn't a sane kernel threads interface to the
> > kernel.  There should be an interface to stop all threads in a
> > synchronous way.  Unfortunately, I have no hope that such an interface
> > will be added to the kernel.
> > 
> I don't agree with you: There are at least 2 bugs in the current Linux
> kernel which makes you think that the support is fragile:
> 1) SIGSTOP management is not-POSIX conformant
> 2) reparenting of debugged processes is buggy
> 

Could you please provide testcases for them? Even better, do you have
kernel patches?

> 
> > > BTW, people may be very disappointed at the current Linuxthreads
> > > support in gdb 5.1.
> > 
> > If they are they should help improving it.  Several people have
> > reported problems.  Most of these I have been unable to reproduce.
> > Hardly anyone even bothers to answer me if I ask for a small
> > self-contained testcase for the problem.
> > 

I provided one small self-contained testcase to show 3 problems:

1. Attach none-first thread doesn't work on dynamic binaries.
2. Attach none-first thread doesn't work on static binaries.
3. Attach first thread doesn't work on static binaries.

Can anyone duplicate them?


H.J.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is the current gdb 5.1 broken for Linuxthreads?
  2001-09-19  9:05           ` H . J . Lu
@ 2001-09-20  0:59             ` Eric Paire
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Paire @ 2001-09-20  0:59 UTC (permalink / raw)
  To: H . J . Lu; +Cc: Mark Kettenis, GDB

> On Wed, Sep 19, 2001 at 04:38:51PM +0200, Eric Paire wrote:
> > > BTW, debugging threaded apps under Linux will always be somewhat
> > > fragile as long as there isn't a sane kernel threads interface to the
> > > kernel.  There should be an interface to stop all threads in a
> > > synchronous way.  Unfortunately, I have no hope that such an interface
> > > will be added to the kernel.
> > > 
> > I don't agree with you: There are at least 2 bugs in the current Linux
> > kernel which makes you think that the support is fragile:
> > 1) SIGSTOP management is not-POSIX conformant
> > 2) reparenting of debugged processes is buggy
> > 
> 
> Could you please provide testcases for them? Even better, do you have
> kernel patches?
> 
For 1), this is not very difficult to show, and I have always wondered why
so few people complained about it. The main effect of SIGSTOP in gdb is that
it makes GDB intrusive in the application, as sending SIGSTOP to a process
wakes it up if it was already sleeping in the kernel, and making the blocked
system call returning EINTR.

For most of the cases, this is hidden by the libc which wraps around system
calls the EINTR errno (such as for pthreads synchronization), but not for
all. The general philosophy of the SIGSTOP/SIGCONT is that a process receiving
SIGSTOP while being blocked in the kernel should be prevented from returning
to user space if unblocked (while in Linux it returns to user space with
EINTR), which is the point why GDB is not intrusive in many UNIX, and is
instrusive with Linux. By there are so many modifications to do in the
kernel that nobody (yet) has started to implement a correct semantics for
SIGSTOP/SIGCONT (as most of the problems are hiddden in the glibc).

For 2) here is the following scenario that did not work in linux-2.2.0:
Process A forks process B and process C (gdb, ...) calls ptrace_attach()
on process B. If process C (gdb) exits without calling ptrace_detach()
on process B, then
	a) process B is inherited by init task (instead of process A),
	b) if process A is blocked in wait4(), then it will not be
		awaken if process B dies (since process B is now child of
		init).

I must admit that I have not checked that problem 2) still exist on linux-2.4,
but it is clear that this is not an issue for GDB (as we can hope that gdb
is fair with the kernel, it always calls ptrace_detach() before exiting ;-).

Hope this helps,
-Eric
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ Eric PAIRE
Web  : http://www.ri.silicomp.com/~paire  | Groupe SILICOMP - Research Institute
Email: eric.paire@ri.silicomp.com         | 2, avenue de Vignate
Phone: +33 (0) 476 63 48 71               | F-38610 Gieres
Fax  : +33 (0) 476 51 05 32               | FRANCE


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is the current gdb 5.1 broken for Linuxthreads?
  2001-09-19  7:39         ` Eric Paire
  2001-09-19  9:05           ` H . J . Lu
@ 2001-09-19 13:39           ` Andrew Cagney
  2001-09-20  1:36             ` Eric Paire
  1 sibling, 1 reply; 21+ messages in thread
From: Andrew Cagney @ 2001-09-19 13:39 UTC (permalink / raw)
  To: Eric Paire; +Cc: Mark Kettenis, H . J . Lu, GDB

> I already started a thread to explain that that stopping all threads in
> a synchronous way was an illusion: Think of a 2-way processor on which
> 2 threads are running on each processor: If one thread stops, the time
> required by one processor to handle the trap, discover that others
> threads must be stopped, makwe the interprocessor request, ... allows
> the other thread to run thousands of instructions on the second
> processor before being stopped. The result is that you think all threads
> have stopped at the same time, while it's false, even if you have the best
> interface you can think of.

Just an aside, everyone will agree with your point that synchronized 
thread stop model is an illusion.  However, that doesn't make the 
model/illusion wrong.  Most other systems still make a synchronised halt 
interface available since it is simple and fast - the complexity of 
having to suspend all related threads being constrained to the kernel.

As a separate issue, it would be good if GDB was able to control threads 
with a finer guranularity then all/none running.

enjoy,
Andrew

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is the current gdb 5.1 broken for Linuxthreads?
  2001-09-19 13:39           ` Andrew Cagney
@ 2001-09-20  1:36             ` Eric Paire
  2001-09-20  8:03               ` H . J . Lu
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Paire @ 2001-09-20  1:36 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: Mark Kettenis, H . J . Lu, GDB

> > I already started a thread to explain that that stopping all threads in
> > a synchronous way was an illusion: Think of a 2-way processor on which
> > 2 threads are running on each processor: If one thread stops, the time
> > required by one processor to handle the trap, discover that others
> > threads must be stopped, makwe the interprocessor request, ... allows
> > the other thread to run thousands of instructions on the second
> > processor before being stopped. The result is that you think all threads
> > have stopped at the same time, while it's false, even if you have the best
> > interface you can think of.
> 
> Just an aside, everyone will agree with your point that synchronized 
> thread stop model is an illusion.  However, that doesn't make the 
> model/illusion wrong.  Most other systems still make a synchronised halt 
> interface available since it is simple and fast - the complexity of 
> having to suspend all related threads being constrained to the kernel.
> 
From the user point of view, it seems simple and fast. From the kernel
point of view, this is somewhat difficult to achieve (I already had to deal
with when I worked on OSF/MACH3.0), particularly on multi-processors
(on uniprocessors, all threads but the current one are not in running
state, while on MP, some of them may be in running state on other
processors). IMHO, although I admit that this is not very easy, we can
nevertheless stop all threads individually with SIGSTOP, so that I do
not see why we should add complexity in the kernel to simplify something
we can already do in the user space (and certainly, Linus doesn't for now).

There is another difficulty on Linux which seems much more important than
these ones. There is no support for MT core dumps. As Linus has always
refused to add such functionality in the kernel (which is somewhat similar
with your simple interface to stop all threads), a solution should be though
of in the user space.

-Eric
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ Eric PAIRE
Web  : http://www.ri.silicomp.com/~paire  | Groupe SILICOMP - Research Institute
Email: eric.paire@ri.silicomp.com         | 2, avenue de Vignate
Phone: +33 (0) 476 63 48 71               | F-38610 Gieres
Fax  : +33 (0) 476 51 05 32               | FRANCE


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is the current gdb 5.1 broken for Linuxthreads?
  2001-09-20  1:36             ` Eric Paire
@ 2001-09-20  8:03               ` H . J . Lu
  2001-09-20 21:49                 ` Eric Paire
  0 siblings, 1 reply; 21+ messages in thread
From: H . J . Lu @ 2001-09-20  8:03 UTC (permalink / raw)
  To: Eric Paire; +Cc: Andrew Cagney, Mark Kettenis, GDB

On Thu, Sep 20, 2001 at 10:36:06AM +0200, Eric Paire wrote:
> 
> There is another difficulty on Linux which seems much more important than
> these ones. There is no support for MT core dumps. As Linus has always
> refused to add such functionality in the kernel (which is somewhat similar
> with your simple interface to stop all threads), a solution should be though
> of in the user space.
> 

Try the current Red Hat kernel/ac kernel. They support it. The patch is
very small. I am enclosing it here.

H.J.
--
Index: linux-2.4/fs/binfmt_aout.c
===================================================================
RCS file: /trillian/src/cvs_root/linux-2.4/fs/binfmt_aout.c,v
retrieving revision 1.1
diff -u -r1.1 binfmt_aout.c
--- linux-2.4/fs/binfmt_aout.c	2001/02/06 23:40:30	1.1
+++ linux-2.4/fs/binfmt_aout.c	2001/02/27 16:50:43
@@ -31,7 +31,8 @@
 
 static int load_aout_binary(struct linux_binprm *, struct pt_regs * regs);
 static int load_aout_library(struct file*);
-static int aout_core_dump(long signr, struct pt_regs * regs, struct file *file);
+static int aout_core_dump(long signr, struct pt_regs * regs,
+	struct file *file, struct mm_struct * mm);
 
 extern void dump_thread(struct pt_regs *, struct user *);
 
@@ -78,7 +79,8 @@
  * dumping of the process results in another error..
  */
 
-static int aout_core_dump(long signr, struct pt_regs * regs, struct file *file)
+static int aout_core_dump(long signr, struct pt_regs * regs,
+	struct file * file, struct mm_struct * mm)
 {
 	mm_segment_t fs;
 	int has_dumped = 0;
Index: linux-2.4/fs/binfmt_elf.c
===================================================================
RCS file: /trillian/src/cvs_root/linux-2.4/fs/binfmt_elf.c,v
retrieving revision 1.1
diff -u -r1.1 binfmt_elf.c
--- linux-2.4/fs/binfmt_elf.c	2001/02/06 23:40:30	1.1
+++ linux-2.4/fs/binfmt_elf.c	2001/02/27 16:50:43
@@ -56,7 +56,8 @@
  * don't even try.
  */
 #ifdef USE_ELF_CORE_DUMP
-static int elf_core_dump(long signr, struct pt_regs * regs, struct file * file);
+static int elf_core_dump(long signr, struct pt_regs * regs,
+	struct file * file, struct mm_struct * mm);
 #else
 #define elf_core_dump	NULL
 #endif
@@ -981,7 +982,8 @@
  * and then they are actually written out.  If we run out of core limit
  * we just truncate.
  */
-static int elf_core_dump(long signr, struct pt_regs * regs, struct file * file)
+static int elf_core_dump(long signr, struct pt_regs * regs,
+	struct file * file, struct mm_struct * mm)
 {
 	int has_dumped = 0;
 	mm_segment_t fs;
@@ -998,7 +1000,7 @@
 	elf_fpregset_t fpu;		/* NT_PRFPREG */
 	struct elf_prpsinfo psinfo;	/* NT_PRPSINFO */
 
-	segs = current->mm->map_count;
+	segs = mm->map_count;
 
 #ifdef DEBUG
 	printk("elf_core_dump: %d segs %lu limit\n", segs, limit);
@@ -1158,7 +1160,7 @@
 	dataoff = offset = roundup(offset, ELF_EXEC_PAGESIZE);
 
 	/* Write program headers for segments dump */
-	for(vma = current->mm->mmap; vma != NULL; vma = vma->vm_next) {
+	for(vma = mm->mmap; vma != NULL; vma = vma->vm_next) {
 		struct elf_phdr phdr;
 		size_t sz;
 
@@ -1187,7 +1189,7 @@
 
 	DUMP_SEEK(dataoff);
 
-	for(vma = current->mm->mmap; vma != NULL; vma = vma->vm_next) {
+	for(vma = mm->mmap; vma != NULL; vma = vma->vm_next) {
 		unsigned long addr;
 
 		if (!maydump(vma))
Index: linux-2.4/fs/exec.c
===================================================================
RCS file: /trillian/src/cvs_root/linux-2.4/fs/exec.c,v
retrieving revision 1.2
diff -u -r1.2 exec.c
--- linux-2.4/fs/exec.c	2001/02/07 01:17:26	1.2
+++ linux-2.4/fs/exec.c	2001/02/27 16:50:43
@@ -916,16 +916,18 @@
 
 int do_coredump(long signr, struct pt_regs * regs)
 {
+	struct mm_struct *mm;
 	struct linux_binfmt * binfmt;
-	char corename[6+sizeof(current->comm)];
+	char corename[6+sizeof(current->comm)+10];
 	struct file * file;
 	struct inode * inode;
+	int r;
 
 	lock_kernel();
 	binfmt = current->binfmt;
 	if (!binfmt || !binfmt->core_dump)
 		goto fail;
-	if (!current->dumpable || atomic_read(&current->mm->mm_users) != 1)
+	if (!current->dumpable)
 		goto fail;
 	current->dumpable = 0;
 	if (current->rlim[RLIMIT_CORE].rlim_cur < binfmt->min_coredump)
@@ -937,6 +939,8 @@
 #else
 	corename[4] = '\0';
 #endif
+ 	if (atomic_read(&current->mm->mm_users) != 1)
+ 		sprintf(&corename[4], ".%d", current->pid);
 	file = filp_open(corename, O_CREAT | 2 | O_NOFOLLOW, 0600);
 	if (IS_ERR(file))
 		goto fail;
@@ -954,10 +958,29 @@
 		goto close_fail;
 	if (do_truncate(file->f_dentry, 0) != 0)
 		goto close_fail;
-	if (!binfmt->core_dump(signr, regs, file))
-		goto close_fail;
+	/*
+	 *  Copy the mm structure to avoid potential races with
+	 *    other threads
+	 */
+	if ((mm = kmem_cache_alloc(mm_cachep, SLAB_KERNEL)) == NULL)
+		goto close_fail;
+	memcpy(mm, current->mm, sizeof(*mm));
+	if (!mm_init(mm)) {
+		kmem_cache_free(mm_cachep, mm);
+		goto close_fail;
+	}
+	down(&current->mm->mmap_sem);
+	r = dup_mmap(mm);
+	up(&current->mm->mmap_sem);
+	if (r) {
+		mmput(mm);
+		goto close_fail;
+	}
+	r = binfmt->core_dump(signr, regs, file, mm);
+	mmput(mm);
 	unlock_kernel();
-	filp_close(file, NULL);
+	if (r)
+		filp_close(file, NULL);
 	return 1;
 
 close_fail:
Index: linux-2.4/include/linux/binfmts.h
===================================================================
RCS file: /trillian/src/cvs_root/linux-2.4/include/linux/binfmts.h,v
retrieving revision 1.1
diff -u -r1.1 binfmts.h
--- linux-2.4/include/linux/binfmts.h	2001/02/06 23:41:21	1.1
+++ linux-2.4/include/linux/binfmts.h	2001/02/27 16:50:43
@@ -41,7 +41,8 @@
 	struct module *module;
 	int (*load_binary)(struct linux_binprm *, struct  pt_regs * regs);
 	int (*load_shlib)(struct file *);
-	int (*core_dump)(long signr, struct pt_regs * regs, struct file * file);
+	int (*core_dump)(long signr, struct pt_regs * regs,
+			struct file * file, struct mm_struct *mm);
 	unsigned long min_coredump;	/* minimal dump size */
 };
 
Index: linux-2.4/kernel/fork.c
===================================================================
RCS file: /trillian/src/cvs_root/linux-2.4/kernel/fork.c,v
retrieving revision 1.2
diff -u -r1.2 fork.c
--- linux-2.4/kernel/fork.c	2001/02/07 01:17:29	1.2
+++ linux-2.4/kernel/fork.c	2001/02/27 16:50:43
@@ -122,7 +122,7 @@
 	return last_pid;
 }
 
-static inline int dup_mmap(struct mm_struct * mm)
+int dup_mmap(struct mm_struct * mm)
 {
 	struct vm_area_struct * mpnt, *tmp, **pprev;
 	int retval;
@@ -197,7 +197,7 @@
 #define allocate_mm()	(kmem_cache_alloc(mm_cachep, SLAB_KERNEL))
 #define free_mm(mm)	(kmem_cache_free(mm_cachep, (mm)))
 
-static struct mm_struct * mm_init(struct mm_struct * mm)
+struct mm_struct * mm_init(struct mm_struct * mm)
 {
 	atomic_set(&mm->mm_users, 1);
 	atomic_set(&mm->mm_count, 1);


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is the current gdb 5.1 broken for Linuxthreads?
  2001-09-20  8:03               ` H . J . Lu
@ 2001-09-20 21:49                 ` Eric Paire
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Paire @ 2001-09-20 21:49 UTC (permalink / raw)
  To: H . J . Lu; +Cc: Andrew Cagney, Mark Kettenis, GDB

> On Thu, Sep 20, 2001 at 10:36:06AM +0200, Eric Paire wrote:
> > 
> > There is another difficulty on Linux which seems much more important than
> > these ones. There is no support for MT core dumps. As Linus has always
> > refused to add such functionality in the kernel (which is somewhat similar
> > with your simple interface to stop all threads), a solution should be though
> > of in the user space.
> > 
> 
> Try the current Red Hat kernel/ac kernel. They support it. The patch is
> very small. I am enclosing it here.
> 
Thanks, I am aware of this patch that has been flying around for years.
I have never tested it, but I think that as you are able to know where
the faulty thread is (with bt), you should not be able to get a correct
stack trace of other threads (as their register state is not in the dump,
you do not know where to start the stack analysis). Just a guess which
makes me think that this patch is just a first step to a general
multi-threaded dump.

Thanks anyway for the info,
-Eric
P.S. I have always though that there should be a way to get the missing
information in a core dump, but the core dump is gonna be linuxthread-dependent
and not in the standard format of a multithreaded ELF core dump. Unfortunately,
I have not time left to make such investigation...
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ Eric PAIRE
Web  : http://www.ri.silicomp.com/~paire  | Groupe SILICOMP - Research Institute
Email: eric.paire@ri.silicomp.com         | 2, avenue de Vignate
Phone: +33 (0) 476 63 48 71               | F-38610 Gieres
Fax  : +33 (0) 476 51 05 32               | FRANCE


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is the current gdb 5.1 broken for Linuxthreads?
  2001-09-19  6:56       ` Mark Kettenis
  2001-09-19  7:39         ` Eric Paire
@ 2001-09-19  9:10         ` H . J . Lu
  1 sibling, 0 replies; 21+ messages in thread
From: H . J . Lu @ 2001-09-19  9:10 UTC (permalink / raw)
  To: Mark Kettenis; +Cc: GDB

On Wed, Sep 19, 2001 at 03:53:42PM +0200, Mark Kettenis wrote:
> 
> > BTW, people may be very disappointed at the current Linuxthreads
> > support in gdb 5.1.
> 
> If they are they should help improving it.  Several people have
> reported problems.  Most of these I have been unable to reproduce.
> Hardly anyone even bothers to answer me if I ask for a small
> self-contained testcase for the problem.

That is why I thanked the user who reported the problem. Now the
problem is known with a small self-contained testcase. Can we fix it
now? That is all I care.


H.J.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is the current gdb 5.1 broken for Linuxthreads?
  2001-09-17 19:13   ` H . J . Lu
  2001-09-18 13:56     ` H . J . Lu
@ 2001-09-19  6:32     ` Mark Kettenis
  2001-09-19  9:16       ` H . J . Lu
  1 sibling, 1 reply; 21+ messages in thread
From: Mark Kettenis @ 2001-09-19  6:32 UTC (permalink / raw)
  To: H . J . Lu; +Cc: GDB

"H . J . Lu" <hjl@lucon.org> writes:

> It looks like with gdb 5.1, I have to attach the very first thread. Is
> that documented anywhere? Shouldn't gdb find the very first thread
> and attach it for me?

Yep, you'll have to attach to the very first thread.  No it isn't
documented anywhere.  Yes, GDB should at least try to find out what's
the "very first thread", and indeed right now it doesn't.

Since the kernel treats the initial process differently from the
"cloned" processes, GDB has to know about the initial process.
There's no easy way to get this information from the kernel, so GDB
must either get the information from the user, or from the threads
library.  At the point that I wrote the code I didn't immediately see
how to get the necessary info from the threads library, so the user
must specify it.  I'll try to find the proper place to document this,
and think again about getting the info from the threads library.

Mark

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Is the current gdb 5.1 broken for Linuxthreads?
  2001-09-19  6:32     ` Mark Kettenis
@ 2001-09-19  9:16       ` H . J . Lu
  0 siblings, 0 replies; 21+ messages in thread
From: H . J . Lu @ 2001-09-19  9:16 UTC (permalink / raw)
  To: Mark Kettenis; +Cc: GDB

On Wed, Sep 19, 2001 at 03:28:55PM +0200, Mark Kettenis wrote:
> "H . J . Lu" <hjl@lucon.org> writes:
> 
> > It looks like with gdb 5.1, I have to attach the very first thread. Is
> > that documented anywhere? Shouldn't gdb find the very first thread
> > and attach it for me?
> 
> Yep, you'll have to attach to the very first thread.  No it isn't
> documented anywhere.  Yes, GDB should at least try to find out what's
> the "very first thread", and indeed right now it doesn't.
> 
> Since the kernel treats the initial process differently from the
> "cloned" processes, GDB has to know about the initial process.
> There's no easy way to get this information from the kernel, so GDB
> must either get the information from the user, or from the threads
> library.  At the point that I wrote the code I didn't immediately see
> how to get the necessary info from the threads library, so the user
> must specify it.  I'll try to find the proper place to document this,
> and think again about getting the info from the threads library.
> 

The only problem I saw so far is gdb calls wait () on cloned processes
when we are not attaching the very first thread. What else could be
wrong if we are not attaching the very first thread.


H.J.


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2001-09-21  8:39 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-09-21  2:27 Is the current gdb 5.1 broken for Linuxthreads? James Cownie
2001-09-21  5:04 ` Eric Paire
2001-09-21  5:25   ` James Cownie
2001-09-21  8:35   ` H . J . Lu
2001-09-21  8:39   ` Andrew Cagney
  -- strict thread matches above, loose matches on Subject: below --
2001-09-17 12:47 H . J . Lu
     [not found] ` <20010917161350.A25349@lucon.org>
2001-09-17 19:13   ` H . J . Lu
2001-09-18 13:56     ` H . J . Lu
2001-09-19  0:46       ` Eli Zaretskii
2001-09-19  8:43         ` H . J . Lu
2001-09-19  6:56       ` Mark Kettenis
2001-09-19  7:39         ` Eric Paire
2001-09-19  9:05           ` H . J . Lu
2001-09-20  0:59             ` Eric Paire
2001-09-19 13:39           ` Andrew Cagney
2001-09-20  1:36             ` Eric Paire
2001-09-20  8:03               ` H . J . Lu
2001-09-20 21:49                 ` Eric Paire
2001-09-19  9:10         ` H . J . Lu
2001-09-19  6:32     ` Mark Kettenis
2001-09-19  9:16       ` H . J . Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox