[gdbserver] Problems trying to resume dead threads

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

* [gdbserver] Problems trying to resume dead threads
@ 2008-07-19 17:17 Ulrich Weigand
  2008-08-04 13:40 ` Daniel Jacobowitz
  0 siblings, 1 reply; 9+ messages in thread
From: Ulrich Weigand @ 2008-07-19 17:17 UTC (permalink / raw)
  To: gdb-patches

Hello,

gdbserver on Linux seems to have difficulties handling
the case where a thread dies while it is stopped.  This can
happen during the loop over all threads in linux_resume:

1. Thread A is resumed and starts running
2. Thread A causes Thread B to be killed (e.g. by simply
   calling exit ())
3. gdbserver tries and fails to resume Thread B

The appended test case shows an extreme example of this:
If you run it under gdb/gdbserver, interrupt with Ctrl-C,
and then issue "set terminate = 1" before continuing, 
you'll notice the failure:

(gdb) target remote :1234
Remote debugging using :1234
[Switching to Thread 14663]
0x00002af301f82b60 in _start () from /lib64/ld-linux-x86-64.so.2
(gdb) c
Continuing.

Program received signal SIGINT, Interrupt.
0x00002af30247c491 in clone () from /lib64/libc.so.6
(gdb) set terminate = 1
(gdb) c
Continuing.
warning: Remote failure reply: E01

Program received signal 0, Signal 0.
Cannot remove breakpoints because program is no longer writable.
It might be running in another process.
Further execution is probably impossible.
0x0000000000000000 in ?? ()

uweigand@upg1:~/fsf/gdb-head-build/gdb> ./gdbserver/gdbserver :1234 ./test
Process ./test created; pid = 14246
Listening on port 1234
Remote debugging from host 127.0.0.1
Warning: ptrace(regsets_store_inferior_registers): No such process
Warning: ptrace(regsets_store_inferior_registers): No such process
ptrace: No such process.
input_interrupt, count = 1 c = 36 ('$')
ptrace(regsets_fetch_inferior_registers) PID=14246: No such process
ptrace(regsets_fetch_inferior_registers) PID=14246: No such process
Killing inferior

Interestingly enough, running the same test case under the
GDB native target works most of time (although I did get it
to fail at least once) -- even though on inspection it appeared
the loop over threads in linux_nat_resume should have the same
problem ...

Any suggestions how to fix this?

Bye,
Ulrich

#include <pthread.h>
#include <stdio.h>
#include <limits.h>
#include <stdlib.h>

volatile int terminate = 0;

void *
thread_function (void *arg)
{
  int x = * (int *) arg;

  while (!terminate)
    ;

  exit (x);

  return NULL;
}

int 
main (int argc, char **argv)
{
  pthread_attr_t attr;
  pthread_t threads[256];
  int args[256];
  int i, j;

  pthread_attr_init (&attr);
  pthread_attr_setstacksize (&attr, PTHREAD_STACK_MIN);

  for (j = 0; j < 256; ++j)
    {
      args[j] = j;
      pthread_create (&threads[j], &attr, thread_function, &args[j]);
    }

  pthread_attr_destroy (&attr);

  return 0;
}
-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gdbserver] Problems trying to resume dead threads
  2008-07-19 17:17 [gdbserver] Problems trying to resume dead threads Ulrich Weigand
@ 2008-08-04 13:40 ` Daniel Jacobowitz
  2008-08-04 18:25   ` Ulrich Weigand
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Jacobowitz @ 2008-08-04 13:40 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: gdb-patches

Sorry - as you can see, I am once again behind on gdb-patches.

On Sat, Jul 19, 2008 at 07:16:48PM +0200, Ulrich Weigand wrote:
> Hello,
> 
> gdbserver on Linux seems to have difficulties handling
> the case where a thread dies while it is stopped.  This can
> happen during the loop over all threads in linux_resume:

I can reproduce this problem by using the binary from killed.exp and
running strace on gdbserver.  I can also reproduce it on an embedded
ARM target by running killed.exp.  I can't reproduce it on my desktop
running killed.exp, which suggests this is normally hidden by
scheduler decisions - you need a long enough gap between the two
PTRACE_CONT's.

What do you think of this change?  Ideally, we could wait with WNOHANG
at this point to check for the exit case, but we'd have to restructure
a bit of the event loop to handle pending status == exited.

-- 
Daniel Jacobowitz
CodeSourcery

2008-08-04  Daniel Jacobowitz  <dan@codesourcery.com>

	* linux-low.c (linux_resume_one_process): Ignore ESRCH.

Index: linux-low.c
===================================================================
RCS file: /cvs/src/src/gdb/gdbserver/linux-low.c,v
retrieving revision 1.79
diff -u -p -r1.79 linux-low.c
--- linux-low.c	28 Jul 2008 18:28:56 -0000	1.79
+++ linux-low.c	4 Aug 2008 13:38:24 -0000
@@ -1193,7 +1193,19 @@ linux_resume_one_process (struct inferio
 
   current_inferior = saved_inferior;
   if (errno)
-    perror_with_name ("ptrace");
+    {
+      /* ESRCH from ptrace either means that the thread was already
+	 running (an error) or that it is gone (a race condition).  If
+	 it's gone, we will get a notification the next time we wait,
+	 so we can ignore the error.  We could differentiate these
+	 two, but it's tricky without waiting; the thread still exists
+	 as a zombie, so sending it signal 0 would succeed.  So just
+	 ignore ESRCH.  */
+      if (errno == ESRCH)
+	return;
+
+      perror_with_name ("ptrace");
+    }
 }
 
 static struct thread_resume *resume_ptr;


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gdbserver] Problems trying to resume dead threads
  2008-08-04 13:40 ` Daniel Jacobowitz
@ 2008-08-04 18:25   ` Ulrich Weigand
  2008-08-04 18:30     ` Daniel Jacobowitz
  0 siblings, 1 reply; 9+ messages in thread
From: Ulrich Weigand @ 2008-08-04 18:25 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb-patches

Daniel Jacobowitz wrote:

> I can reproduce this problem by using the binary from killed.exp and
> running strace on gdbserver.  I can also reproduce it on an embedded
> ARM target by running killed.exp.  I can't reproduce it on my desktop
> running killed.exp, which suggests this is normally hidden by
> scheduler decisions - you need a long enough gap between the two
> PTRACE_CONT's.
> 
> What do you think of this change?  Ideally, we could wait with WNOHANG
> at this point to check for the exit case, but we'd have to restructure
> a bit of the event loop to handle pending status == exited.

Hmm, still fails with my Cell test case like this:
writing register 25: No such process
ptrace(regsets_fetch_inferior_registers) PID=14241: No such process
reading register 0: No such process

The initial error happens in usr_store_inferior_registers called via
the regcache_invalidate_one call in linux_resume_one_process, just 
before the location you modified.  Whether this writes anything 
probably depends on target properties like decr_pc_after_break ...
Ignoring ESRCH in usr_store_inferior_registers as well seems to 
fix the problem for me.

In any case, I'm not sure why usr_store_inferior_registers errors
out ... the parallel regsets_store_inferior_registers only gives
a warning in this case.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gdbserver] Problems trying to resume dead threads
  2008-08-04 18:25   ` Ulrich Weigand
@ 2008-08-04 18:30     ` Daniel Jacobowitz
  2008-08-04 19:46       ` Ulrich Weigand
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Jacobowitz @ 2008-08-04 18:30 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: gdb-patches

On Mon, Aug 04, 2008 at 08:23:14PM +0200, Ulrich Weigand wrote:
> Hmm, still fails with my Cell test case like this:
> writing register 25: No such process
> ptrace(regsets_fetch_inferior_registers) PID=14241: No such process
> reading register 0: No such process

:-( It must depend on where you are in gdbserver when the process is
killed.  I hadn't thought about that.

Perhaps we should downgrade all these errors to warnings for errno ==
ESRCH?

-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gdbserver] Problems trying to resume dead threads
  2008-08-04 18:30     ` Daniel Jacobowitz
@ 2008-08-04 19:46       ` Ulrich Weigand
  2008-08-04 19:55         ` Daniel Jacobowitz
  0 siblings, 1 reply; 9+ messages in thread
From: Ulrich Weigand @ 2008-08-04 19:46 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb-patches

Daniel Jacobowitz wrote:
> On Mon, Aug 04, 2008 at 08:23:14PM +0200, Ulrich Weigand wrote:
> > Hmm, still fails with my Cell test case like this:
> > writing register 25: No such process
> > ptrace(regsets_fetch_inferior_registers) PID=14241: No such process
> > reading register 0: No such process
> 
> :-( It must depend on where you are in gdbserver when the process is
> killed.  I hadn't thought about that.
> 
> Perhaps we should downgrade all these errors to warnings for errno ==
> ESRCH?

It seems the "read" errors are just artifacts: because of the first
error (on writing the register), the "error" call performs a longjmp
to the toplevel, which leaves things in a somewhat strange state.

The only "real" errors I see (in addition to the one in
linux_resume_one_process) are the cases in regsets_store_inferior_registers
(which is already a warning) and usr_store_inferior_registers (which
is not).

In any case, I don't think these should be even warnings for ESRCH:
showing a warning in a situation that is completely normal and in
fact handled correctly would just confuse users IMO.

I'd propose to just silently ignore ESRCH errors while writing registers
(in addition to your patch).  What do you think?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gdbserver] Problems trying to resume dead threads
  2008-08-04 19:46       ` Ulrich Weigand
@ 2008-08-04 19:55         ` Daniel Jacobowitz
  2008-08-05 21:06           ` Ulrich Weigand
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Jacobowitz @ 2008-08-04 19:55 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: gdb-patches

On Mon, Aug 04, 2008 at 09:45:46PM +0200, Ulrich Weigand wrote:
> It seems the "read" errors are just artifacts: because of the first
> error (on writing the register), the "error" call performs a longjmp
> to the toplevel, which leaves things in a somewhat strange state.
> 
> The only "real" errors I see (in addition to the one in
> linux_resume_one_process) are the cases in regsets_store_inferior_registers
> (which is already a warning) and usr_store_inferior_registers (which
> is not).
> 
> In any case, I don't think these should be even warnings for ESRCH:
> showing a warning in a situation that is completely normal and in
> fact handled correctly would just confuse users IMO.
> 
> I'd propose to just silently ignore ESRCH errors while writing registers
> (in addition to your patch).  What do you think?

I think that's acceptable, though not ideal.  ESRCH can mean "the
program is gone", or for ptrace it can mean "the program is not
stopped".  So there are a class of bugs in gdbserver which can lead to
the ESRCH error path.  But distinguishing them from this case is quite
difficult.

-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gdbserver] Problems trying to resume dead threads
  2008-08-04 19:55         ` Daniel Jacobowitz
@ 2008-08-05 21:06           ` Ulrich Weigand
  2008-08-05 21:59             ` Daniel Jacobowitz
  0 siblings, 1 reply; 9+ messages in thread
From: Ulrich Weigand @ 2008-08-05 21:06 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb-patches

Daniel Jacobowitz wrote:
> On Mon, Aug 04, 2008 at 09:45:46PM +0200, Ulrich Weigand wrote:
> > I'd propose to just silently ignore ESRCH errors while writing registers
> > (in addition to your patch).  What do you think?
> 
> I think that's acceptable, though not ideal.  ESRCH can mean "the
> program is gone", or for ptrace it can mean "the program is not
> stopped".  So there are a class of bugs in gdbserver which can lead to
> the ESRCH error path.  But distinguishing them from this case is quite
> difficult.

The following patch implements this approach, fixing the problem for me.
Tested on powerpc-linux in local gdbserver mode.  OK?

Bye,
Ulrich

ChangeLog:

	* linux-low.c (linux_resume_one_process): Ignore ESRCH.
	(usr_store_inferior_registers): Likewise.
	(regsets_store_inferior_registers): Likewise.


diff -urNp src-orig/gdb/gdbserver/linux-low.c src/gdb/gdbserver/linux-low.c
--- src-orig/gdb/gdbserver/linux-low.c	2008-08-05 19:44:47.000000000 +0200
+++ src/gdb/gdbserver/linux-low.c	2008-08-05 19:53:44.000000000 +0200
@@ -1195,7 +1195,19 @@ linux_resume_one_process (struct inferio
 
   current_inferior = saved_inferior;
   if (errno)
-    perror_with_name ("ptrace");
+    {
+      /* ESRCH from ptrace either means that the thread was already
+	 running (an error) or that it is gone (a race condition).  If
+	 it's gone, we will get a notification the next time we wait,
+	 so we can ignore the error.  We could differentiate these
+	 two, but it's tricky without waiting; the thread still exists
+	 as a zombie, so sending it signal 0 would succeed.  So just
+	 ignore ESRCH.  */
+      if (errno == ESRCH)
+	return;
+
+      perror_with_name ("ptrace");
+    }
 }
 
 static struct thread_resume *resume_ptr;
@@ -1464,6 +1476,12 @@ usr_store_inferior_registers (int regno)
 		  *(PTRACE_XFER_TYPE *) (buf + i));
 	  if (errno != 0)
 	    {
+	      /* At this point, ESRCH should mean the process is already gone, 
+		 in which case we simply ignore attempts to change its registers.
+		 See also the related comment in linux_resume_one_process.  */
+	      if (errno == ESRCH)
+		return;
+
 	      if ((*the_low_target.cannot_store_register) (regno) == 0)
 		{
 		  char *err = strerror (errno);
@@ -1580,6 +1598,13 @@ regsets_store_inferior_registers ()
 	      disabled_regsets[regset - target_regsets] = 1;
 	      continue;
 	    }
+	  else if (errno == ESRCH)
+	    {
+	      /* At this point, ESRCH should mean the process is already gone, 
+		 in which case we simply ignore attempts to change its registers.
+		 See also the related comment in linux_resume_one_process.  */
+	      return 0;
+	    }
 	  else
 	    {
 	      perror ("Warning: ptrace(regsets_store_inferior_registers)");


-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gdbserver] Problems trying to resume dead threads
  2008-08-05 21:06           ` Ulrich Weigand
@ 2008-08-05 21:59             ` Daniel Jacobowitz
  2008-08-05 22:15               ` Ulrich Weigand
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Jacobowitz @ 2008-08-05 21:59 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: gdb-patches

On Tue, Aug 05, 2008 at 11:05:15PM +0200, Ulrich Weigand wrote:
> Daniel Jacobowitz wrote:
> > On Mon, Aug 04, 2008 at 09:45:46PM +0200, Ulrich Weigand wrote:
> > > I'd propose to just silently ignore ESRCH errors while writing registers
> > > (in addition to your patch).  What do you think?
> > 
> > I think that's acceptable, though not ideal.  ESRCH can mean "the
> > program is gone", or for ptrace it can mean "the program is not
> > stopped".  So there are a class of bugs in gdbserver which can lead to
> > the ESRCH error path.  But distinguishing them from this case is quite
> > difficult.
> 
> The following patch implements this approach, fixing the problem for me.
> Tested on powerpc-linux in local gdbserver mode.  OK?

Yes, this is OK.  Thanks!

-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [gdbserver] Problems trying to resume dead threads
  2008-08-05 21:59             ` Daniel Jacobowitz
@ 2008-08-05 22:15               ` Ulrich Weigand
  0 siblings, 0 replies; 9+ messages in thread
From: Ulrich Weigand @ 2008-08-05 22:15 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb-patches

Daniel Jacobowitz wrote:
> On Tue, Aug 05, 2008 at 11:05:15PM +0200, Ulrich Weigand wrote:
> > The following patch implements this approach, fixing the problem for me.
> > Tested on powerpc-linux in local gdbserver mode.  OK?
> 
> Yes, this is OK.  Thanks!

Committed, thanks!

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  Ulrich.Weigand@de.ibm.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-08-05 22:15 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-07-19 17:17 [gdbserver] Problems trying to resume dead threads Ulrich Weigand
2008-08-04 13:40 ` Daniel Jacobowitz
2008-08-04 18:25   ` Ulrich Weigand
2008-08-04 18:30     ` Daniel Jacobowitz
2008-08-04 19:46       ` Ulrich Weigand
2008-08-04 19:55         ` Daniel Jacobowitz
2008-08-05 21:06           ` Ulrich Weigand
2008-08-05 21:59             ` Daniel Jacobowitz
2008-08-05 22:15               ` Ulrich Weigand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox