Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed
* [0/9]#2 Fix lost siginfo_t
@ 2010-08-30  7:10 Jan Kratochvil
  2010-08-30 19:50 ` Pedro Alves
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Kratochvil @ 2010-08-30  7:10 UTC (permalink / raw)
  To: gdb-patches

series #2 of: http://sourceware.org/ml/gdb-patches/2010-07/msg00405.html

Hi,

currently GDB's postponing and resending signals across threads drops the
associated siginfo_t information.  This can break real world inferiors
	https://bugzilla.redhat.com/attachment.cgi?id=413842
and it is also tested by a new testcase.

IMO all current `int status', `int host_signal' and `enum target_signal'
should be associated with their siginfo_t as otherwise they carry incomplete
information.  This is implemented by this patchset although it brings sizeof
(target_signal) to 144.  It is currently passed by value everywhere, it could
be shortened by the currently unused 80 bytes siginfo_t padding to 64 bytes.

I tried to make target_signal a pointer with reference counter.  I have the
unfinished patch here but I really do not find it viable.  All the assignments
and passing by value to functions make it difficult to catch all the cases.
With C++ GDB this patchset could be greatly simplified with fixed performance.

Fortunately target_signal does not have to be allocated dynamically.  The
maximum-over-all-targets of sizeof (siginfo_t) is known during compilation.
Even for remote targets the tdep file must be compiled in.  And GDB currently
does not support any dlopen()able plugin tdeps (*).  Therefore currently if
none of the siginfo-supporting targets ({amd64,i386,arm}-linux) is compiled in
this patchset has absolutely no new overhead.

(*) One can probably dlopen() a new tdep .so file through python.  Let this be
    unsupported, it could internal_error in target_signal_siginfo_set otherwise.

Some native host files probably got broken needing trivial updates for the
target_signal_t->struct change.  I did some obvious ones but I could not catch
all without the specific host machine for a build.

Planned gdb/remote.c part will probably need to modify the
target_signal_siginfo_* functions a bit, therefore only RFCing that part now
to be checked-in only together with the gdb/remote.c part later.

No regressions on {x86_64,x86_64-m32,i686}-fedora14snapshot-linux-gnu.
  New gdb.threads/siginfo-threads.exp fully PASSes there.
{ppc64,s390x}-rhel60snapshot-linux-gnu: no regressions.
  New gdb.threads/siginfo-threads.exp PASSes but si_* checks are unsupported
  as gdbarch_get_siginfo_type is not supported there.
{ppc64(ppc),ia64,s390x}-rhel56-linux-gnu: no regressions.
  New gdb.threads/siginfo-threads.exp is completely UNSUPPORTED as the kernel
  there has no rt_tgsigqueueinfo syscall.

Not yet done:

 * gdb/remote.c still does not feature the siginfo_t preservation.
   I definitely need/plan to implement it for gdb/remote.c .

   Still the communication will probably get slowed down - both by more
   round-trips to always fetch+push associated TARGET_OBJECT_SIGNAL_INFO and
   also by all the 128 bytes blocks transferred back and forth.  The protocol
   would be probably appropriate to extend a bit for it.  I would normally
   rather care about ugdb (Linux kernel server, GPLv2) than FSF GDB gdbserver,
   unless required for the FSF GDB gdb/remote.c counterpart patch acceptance.

Not planned for this patchset:

 * gdb/gdbserver/ siginfo_t preservation.  I would normally
   rather care about ugdb (Linux kernel server, GPLv2) than FSF GDB gdbserver,
   unless required for the FSF GDB gdb/remote.c counterpart patch acceptance.

 * sim/ did not use `enum target_signal' and now it still does not use
   `target_signal_t' and none of the associated accessors.  There are several
   new FIXMEs at the gdb/remote-sim.c interface.

 * As target_signal_t is now gdbarch-specific for the specific frame I guess
   if you get in SPU frame a signal with "stop print pass", do `(gdb) return'
   to get out into a PPC frame and do `(gdb) continue' GDB will gdb_assert due
   to non-matching gdbarch.  One could probably extend siginfo_fixup() for
   this case although it is more questionable what behavior is correct in such
   case.


Thanks,
Jan


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [0/9]#2 Fix lost siginfo_t
  2010-08-30  7:10 [0/9]#2 Fix lost siginfo_t Jan Kratochvil
@ 2010-08-30 19:50 ` Pedro Alves
  2010-08-30 20:11   ` Jan Kratochvil
  0 siblings, 1 reply; 8+ messages in thread
From: Pedro Alves @ 2010-08-30 19:50 UTC (permalink / raw)
  To: gdb-patches; +Cc: Jan Kratochvil

I must confess that my knee-jerk reaction to the main idea
of the patchset is "I'm not convinced this is a good idea".
A thread can't be stopped for more than one reason at the
same time, so why isn't target_wait plus an on-the-side
interface to get at the expensive-to-get siginfo enough?

I've just extracted the testsuite hunks from patches 7 and 8,
so that I've got the new gdb.threads/siginfo-threads.exp test
on my pristine mainline checkout, with no other changes from the
series, and ran the test against amd64-linux gdbserver:

>make check RUNTESTFLAGS="--target_board=native-gdbserver siginfo-threads.exp"
(..)
                === gdb Summary ===

# of expected passes            22



compared to native amd64-linux gdb, which indeed has these failures:

Running ../../../src/gdb/testsuite/gdb.threads/siginfo-threads.exp ...
FAIL: gdb.threads/siginfo-threads.exp: signal 1 si_signo
FAIL: gdb.threads/siginfo-threads.exp: signal 1 si_pid
FAIL: gdb.threads/siginfo-threads.exp: signal 2 si_signo
FAIL: gdb.threads/siginfo-threads.exp: signal 2 si_pid
FAIL: gdb.threads/siginfo-threads.exp: signal 3 si_signo
FAIL: gdb.threads/siginfo-threads.exp: signal 3 si_pid
FAIL: gdb.threads/siginfo-threads.exp: continue to break-at-exit

                === gdb Summary ===

# of expected passes            15
# of unexpected failures        7


I'm guessing that this is because of gdbserver/linux-low.c's
much better handling of pending signals than linux-nat.c's.
(see linux-low.c's lwp->pending_signals, and linux_resume_one_lwp,
for example).

-- 
Pedro Alves


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [0/9]#2 Fix lost siginfo_t
  2010-08-30 19:50 ` Pedro Alves
@ 2010-08-30 20:11   ` Jan Kratochvil
  2010-08-30 20:49     ` Mark Kettenis
  2010-08-31 10:46     ` Pedro Alves
  0 siblings, 2 replies; 8+ messages in thread
From: Jan Kratochvil @ 2010-08-30 20:11 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

On Mon, 30 Aug 2010 21:50:02 +0200, Pedro Alves wrote:
> I must confess that my knee-jerk reaction to the main idea
> of the patchset is "I'm not convinced this is a good idea".
> A thread can't be stopped for more than one reason at the
> same time, so why isn't target_wait plus an on-the-side
> interface to get at the expensive-to-get siginfo enough?

Because linux-nat.c internals stop it on a different signal (SIGSTOP one) than
the interestign one (e.g. SIGUSR1) for which GDB was stopping.


> and ran the test against amd64-linux gdbserver:

I can confirm FSF gdbserver also works for me with
gdb.threads/siginfo-threads.exp .  The testcase also correctly SIGSTOPs its
getppid(), therefore gdbserver, which is the right target for the testcase.


> I'm guessing that this is because of gdbserver/linux-low.c's
> much better handling of pending signals than linux-nat.c's.
> (see linux-low.c's lwp->pending_signals, and linux_resume_one_lwp,
> for example).

I had to fix linux-nat.c for backport reasons anyway but I probably could keep
it only internal (and hack it more dirty without the FSF import intention).
I see I went needlessly far.

I am not sure how much interest is in pusing it for FSF linux-nat.c . I made
it more as a proof of concept before I start fixing gdb/remote.c (for ugdb).
I did not expect gdb/remote.c needs no changes at all.

I still have not seen any FSF general future development direction,
linux-nat.c is IMO obsolete already, at least due to FSF gdbserver.  OK to
post some patches to run in remote mode during default "run" etc. commands?


Thanks,
Jan


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [0/9]#2 Fix lost siginfo_t
  2010-08-30 20:11   ` Jan Kratochvil
@ 2010-08-30 20:49     ` Mark Kettenis
  2010-08-30 21:13       ` Jan Kratochvil
  2010-08-31  7:26       ` Doug Evans
  2010-08-31 10:46     ` Pedro Alves
  1 sibling, 2 replies; 8+ messages in thread
From: Mark Kettenis @ 2010-08-30 20:49 UTC (permalink / raw)
  To: jan.kratochvil; +Cc: pedro, gdb-patches

> Date: Mon, 30 Aug 2010 22:11:10 +0200
> From: Jan Kratochvil <jan.kratochvil@redhat.com>
> 
> On Mon, 30 Aug 2010 21:50:02 +0200, Pedro Alves wrote:
> > I must confess that my knee-jerk reaction to the main idea
> > of the patchset is "I'm not convinced this is a good idea".
> > A thread can't be stopped for more than one reason at the
> > same time, so why isn't target_wait plus an on-the-side
> > interface to get at the expensive-to-get siginfo enough?
> 
> Because linux-nat.c internals stop it on a different signal (SIGSTOP one) than
> the interestign one (e.g. SIGUSR1) for which GDB was stopping.
> 
> 
> > and ran the test against amd64-linux gdbserver:
> 
> I can confirm FSF gdbserver also works for me with
> gdb.threads/siginfo-threads.exp .  The testcase also correctly SIGSTOPs its
> getppid(), therefore gdbserver, which is the right target for the testcase.
> 
> 
> > I'm guessing that this is because of gdbserver/linux-low.c's
> > much better handling of pending signals than linux-nat.c's.
> > (see linux-low.c's lwp->pending_signals, and linux_resume_one_lwp,
> > for example).
> 
> I had to fix linux-nat.c for backport reasons anyway but I probably could keep
> it only internal (and hack it more dirty without the FSF import intention).
> I see I went needlessly far.
> 
> I am not sure how much interest is in pusing it for FSF linux-nat.c . I made
> it more as a proof of concept before I start fixing gdb/remote.c (for ugdb).
> I did not expect gdb/remote.c needs no changes at all.
> 
> I still have not seen any FSF general future development direction,
> linux-nat.c is IMO obsolete already, at least due to FSF gdbserver.  OK to
> post some patches to run in remote mode during default "run" etc. commands?

That seems rather backwards.  If gdbserver works so well, the
linux-nat.c code should be able to work even better.

I fear that it's time for another rewrite of linux-nat.c.  I think I
started lin-lwp.c back in the Linux 2.2 days, when supporting Linux
2.0 was still very much alive and needed to be supported.  Support for
debugging threads was minimal in 2.2 and pretty much non-existent in
2.0.  That code was a horrible horrible hack, and the current code is
just hacks stacked on hack stacked on hacks on top of that.  It is
time to drop support for threads on ancient systems and the old glibc
threads library that goes with them.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [0/9]#2 Fix lost siginfo_t
  2010-08-30 20:49     ` Mark Kettenis
@ 2010-08-30 21:13       ` Jan Kratochvil
  2010-08-31  7:26       ` Doug Evans
  1 sibling, 0 replies; 8+ messages in thread
From: Jan Kratochvil @ 2010-08-30 21:13 UTC (permalink / raw)
  To: Mark Kettenis; +Cc: pedro, gdb-patches

On Mon, 30 Aug 2010 22:47:25 +0200, Mark Kettenis wrote:
> That seems rather backwards.  If gdbserver works so well, the
> linux-nat.c code should be able to work even better.

Just due to inevitable multi-host systems (read "cloud"; software is still
slower and GHz have their technological limit) the linux-nat.c model (=without
a wire protocol) is a history.


Thanks,
Jan


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [0/9]#2 Fix lost siginfo_t
  2010-08-30 20:49     ` Mark Kettenis
  2010-08-30 21:13       ` Jan Kratochvil
@ 2010-08-31  7:26       ` Doug Evans
  2010-08-31 15:58         ` Joel Brobecker
  1 sibling, 1 reply; 8+ messages in thread
From: Doug Evans @ 2010-08-31  7:26 UTC (permalink / raw)
  To: Mark Kettenis; +Cc: jan.kratochvil, pedro, gdb-patches

On Mon, Aug 30, 2010 at 1:47 PM, Mark Kettenis <mark.kettenis@xs4all.nl> wrote:
> I fear that it's time for another rewrite of linux-nat.c.

For reference sake,
I think such a rewrite wouldn't go far enough.

It's time to remove all the duplicated effort in gdb vs gdbserver.
[E.g. requiring gdbserver for tracepoints?  blech]
My $0.02.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [0/9]#2 Fix lost siginfo_t
  2010-08-30 20:11   ` Jan Kratochvil
  2010-08-30 20:49     ` Mark Kettenis
@ 2010-08-31 10:46     ` Pedro Alves
  1 sibling, 0 replies; 8+ messages in thread
From: Pedro Alves @ 2010-08-31 10:46 UTC (permalink / raw)
  To: Jan Kratochvil; +Cc: gdb-patches

On Monday 30 August 2010 21:11:10, Jan Kratochvil wrote:
> On Mon, 30 Aug 2010 21:50:02 +0200, Pedro Alves wrote:
> > I must confess that my knee-jerk reaction to the main idea
> > of the patchset is "I'm not convinced this is a good idea".
> > A thread can't be stopped for more than one reason at the
> > same time, so why isn't target_wait plus an on-the-side
> > interface to get at the expensive-to-get siginfo enough?
> 
> Because linux-nat.c internals stop it on a different signal (SIGSTOP one) than
> the interestign one (e.g. SIGUSR1) for which GDB was stopping.

Ah.  Yeah, gdbserver/linux-low.c does not in fact for a SIGSTOP
out of the inferior and requeues signals like linux-nat.c does.
When an lwp stops with a non-SIGSTOP signal, it just sets the
lwp->stop_expected flag (meaning, there's a SIGSTOP in the kernel
signal queue that is expected, because we put it there), and
leaves the lwp stopped as is.  There are pros and cons in
doing it that way or doing linux-nat.c's way.

> I had to fix linux-nat.c for backport reasons anyway but I probably could keep
> it only internal (and hack it more dirty without the FSF import intention).
> I see I went needlessly far.

Sorry.  :-/

> I am not sure how much interest is in pusing it for FSF linux-nat.c . I made
> it more as a proof of concept before I start fixing gdb/remote.c (for ugdb).
> I did not expect gdb/remote.c needs no changes at all.
> 
> I still have not seen any FSF general future development direction,
> linux-nat.c is IMO obsolete already, at least due to FSF gdbserver>.
> OK to
> post some patches to run in remote mode during default "run" etc. commands?

gdb and gdbserver codebases are different, and each has features not
supported by the other, so, dropping one of them now instantly drops
supported features.

What I have been trying to do over the last couple of years that
I've had the opportunity of working on gdbserver, is progressively make
the codebases converge, design, API, and implementation wise, taking the
ideas from each (gdbserver's target_wait interface, no remote protocol
in the backends, breakpoint canceling, etc.).  If we keep
doing that, eventually, one of the backends will be a superset of the other
(my bets on gdbserver's), and we can then consider sharing code between
them, or even make gdb invoke gdbserver as a separate binary.  Of course,
if someone were to work on doing the conciliation as _main focus_ instead
of piggy backing that on other work that customers actually care for, then
things would move faster...

-- 
Pedro Alves


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [0/9]#2 Fix lost siginfo_t
  2010-08-31  7:26       ` Doug Evans
@ 2010-08-31 15:58         ` Joel Brobecker
  0 siblings, 0 replies; 8+ messages in thread
From: Joel Brobecker @ 2010-08-31 15:58 UTC (permalink / raw)
  To: Doug Evans; +Cc: Mark Kettenis, jan.kratochvil, pedro, gdb-patches

> > I fear that it's time for another rewrite of linux-nat.c.
> 
> For reference sake,
> I think such a rewrite wouldn't go far enough.

FWIW, If that helps simplifying the code, I am all for it too.
But I also agree with your assessement.  The "target-specific"
IWBN if the gdbserver -low files and the gdb -nat files were
merged, one day.

-- 
Joel


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-08-31 15:58 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-30  7:10 [0/9]#2 Fix lost siginfo_t Jan Kratochvil
2010-08-30 19:50 ` Pedro Alves
2010-08-30 20:11   ` Jan Kratochvil
2010-08-30 20:49     ` Mark Kettenis
2010-08-30 21:13       ` Jan Kratochvil
2010-08-31  7:26       ` Doug Evans
2010-08-31 15:58         ` Joel Brobecker
2010-08-31 10:46     ` Pedro Alves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox