Mirror of the gdb mailing list
 help / color / mirror / Atom feed
* gdb and multi-threaded (NPTL) programs
@ 2006-03-24 20:24 John Fodor
  2006-03-24 20:27 ` Daniel Jacobowitz
  2006-03-24 20:40 ` Andreas Schwab
  0 siblings, 2 replies; 11+ messages in thread
From: John Fodor @ 2006-03-24 20:24 UTC (permalink / raw)
  To: gdb

Hi Folks,

A colleague had trouble debugging a multi-threaded (NPTL) program using 
GDB. To see what was going on, I created a purely artificial program 
where 2 POSIX threads synchronize their execution using semaphores. 
While single-stepping one thread, the other thread gets an EINTR error 
from sem_wait. After looking at info gdb (related to threads) I got my 
answer:

"There is an unfortunate side effect.  If one thread stops for a
breakpoint, or for some other reason, and another thread is blocked in a
system call, then the system call may return prematurely.  This is a
consequence of the interaction between multiple threads and the signals
that GDB uses to implement breakpoints and other events that stop
execution.

To handle this problem, your program should check the return value of
each system call and react appropriately.  This is good programming
style anyways. For example, do not write code like this:

        sleep (10);

The call to `sleep' will return early if a different thread stops at
a breakpoint or for some other reason. Instead, write this:

        int unslept = 10;
        while (unslept > 0)
          unslept = sleep (unslept);

A system call is allowed to return early, so the system is still
conforming to its specification.  But GDB does cause your
multi-threaded program to behave differently than it would without GDB."

Hmmm... so people who use POSIX threads have to put every syscall into a 
loop, ignoring EINTR? What if it's a real timeout? Sorry this does not 
seem reasonable to me.

Will there be a fix in the future to this unfortunate side-effect? How 
do NPTL programmers single-step their programs today? Using syscalls in 
loops? Using a different debugger?

Thanks for you help.

John


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: gdb and multi-threaded (NPTL) programs
  2006-03-24 20:24 gdb and multi-threaded (NPTL) programs John Fodor
@ 2006-03-24 20:27 ` Daniel Jacobowitz
  2006-03-24 20:52   ` John Fodor
  2006-03-24 20:40 ` Andreas Schwab
  1 sibling, 1 reply; 11+ messages in thread
From: Daniel Jacobowitz @ 2006-03-24 20:27 UTC (permalink / raw)
  To: John Fodor; +Cc: gdb

On Fri, Mar 24, 2006 at 02:57:19PM -0500, John Fodor wrote:
> "There is an unfortunate side effect.  If one thread stops for a
> breakpoint, or for some other reason, and another thread is blocked in a
> system call, then the system call may return prematurely.  This is a
> consequence of the interaction between multiple threads and the signals
> that GDB uses to implement breakpoints and other events that stop
> execution.

Really, in my opinion, it's a kernel bug; the syscall should
automatically restart in this case.  Some syscalls now do that, on
current kernels.  Others don't.  It's hard to fix this without breaking
them in other ways.

In this case the syscall is sys_futex.  When interrupted, futex_wait
returns -EINTR.  This is documented to happen whether the signal was
handled or not.  Maybe adding a fifth signal restart option to the
existing four in the Linux kernel could fix this: ERESTARTNOSIGNAL.
That wouldn't be hard to implement if you want to try it.  You'd have
to do some thinking about the semantics of futexes to make sure it was
safe.

> Hmmm... so people who use POSIX threads have to put every syscall into a 
> loop, ignoring EINTR? What if it's a real timeout? Sorry this does not 
> seem reasonable to me.

Let's be precise here: "what if it's a real signal".  sem_wait does not
time out.  sem_timedwait returns ETIMEDOUT, not EINTR, for timeouts.

> Will there be a fix in the future to this unfortunate side-effect? How 
> do NPTL programmers single-step their programs today? Using syscalls in 
> loops? Using a different debugger?

In practice this does not bother most programmers.  If your application
uses signals, it often needs to do this anyway!

-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: gdb and multi-threaded (NPTL) programs
  2006-03-24 20:24 gdb and multi-threaded (NPTL) programs John Fodor
  2006-03-24 20:27 ` Daniel Jacobowitz
@ 2006-03-24 20:40 ` Andreas Schwab
  2006-03-24 20:45   ` Eric Desjardins
  2006-03-24 21:15   ` John Fodor
  1 sibling, 2 replies; 11+ messages in thread
From: Andreas Schwab @ 2006-03-24 20:40 UTC (permalink / raw)
  To: John Fodor; +Cc: gdb

John Fodor <john_fodor@mac.com> writes:

> Hmmm... so people who use POSIX threads have to put every syscall into a 
> loop, ignoring EINTR?

Every library call that is allowed to return with EINTR must be handled
appropriately.  sem_wait is specified as being able to return with EINTR.
If your program can't handle that it has a bug.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: gdb and multi-threaded (NPTL) programs
  2006-03-24 20:40 ` Andreas Schwab
@ 2006-03-24 20:45   ` Eric Desjardins
  2006-03-24 20:47     ` Andreas Schwab
  2006-03-24 21:15   ` John Fodor
  1 sibling, 1 reply; 11+ messages in thread
From: Eric Desjardins @ 2006-03-24 20:45 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: John Fodor, gdb

Hi,

Well I was suprised at first when I discover that. On my RHEL 4.0
machine, the man page is not very EINTR aware. Maybe it is a man page
bug?

Eric

RETURN VALUE
       The sem_wait and sem_getvalue functions always return 0.  All
other semaphore functions return 0 on success and -1
       on error, in addition to writing an error code in errno.

ERRORS
       The sem_init function sets errno to the following codes on error:

              EINVAL value exceeds the maximal counter value
SEM_VALUE_MAX

              ENOSYS pshared is not zero

       The sem_trywait function sets errno to the following error code
on error:

              EAGAIN the semaphore count is currently 0

       The sem_post function sets errno to the following error code on
error:

              ERANGE after  incrementation,  the  semaphore value would
exceed SEM_VALUE_MAX (the semaphore count is left
                     unchanged in this case)

       The sem_destroy function sets errno to the following error code
on error:

              EBUSY  some threads are currently blocked waiting on the
semaphore.

AUTHOR
       Xavier Leroy <Xavier.Leroy@inria.fr>


Le vendredi 24 mars 2006 à 21:24 +0100, Andreas Schwab a écrit :
> John Fodor <john_fodor@mac.com> writes:
> 
> > Hmmm... so people who use POSIX threads have to put every syscall into a 
> > loop, ignoring EINTR?
> 
> Every library call that is allowed to return with EINTR must be handled
> appropriately.  sem_wait is specified as being able to return with EINTR.
> If your program can't handle that it has a bug.
> 
> Andreas.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: gdb and multi-threaded (NPTL) programs
  2006-03-24 20:45   ` Eric Desjardins
@ 2006-03-24 20:47     ` Andreas Schwab
  0 siblings, 0 replies; 11+ messages in thread
From: Andreas Schwab @ 2006-03-24 20:47 UTC (permalink / raw)
  To: Eric Desjardins; +Cc: John Fodor, gdb

Eric Desjardins <eric.desjardins@autodesk.com> writes:

> Well I was suprised at first when I discover that. On my RHEL 4.0
> machine, the man page is not very EINTR aware. Maybe it is a man page
> bug?

See
<http://www.opengroup.org/onlinepubs/009695399/functions/sem_wait.html>.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: gdb and multi-threaded (NPTL) programs
  2006-03-24 20:27 ` Daniel Jacobowitz
@ 2006-03-24 20:52   ` John Fodor
  2006-03-24 21:07     ` Daniel Jacobowitz
  0 siblings, 1 reply; 11+ messages in thread
From: John Fodor @ 2006-03-24 20:52 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb

Daniel Jacobowitz wrote:
> On Fri, Mar 24, 2006 at 02:57:19PM -0500, John Fodor wrote:
> 
>>"There is an unfortunate side effect.  If one thread stops for a
>>breakpoint, or for some other reason, and another thread is blocked in a
>>system call, then the system call may return prematurely.  This is a
>>consequence of the interaction between multiple threads and the signals
>>that GDB uses to implement breakpoints and other events that stop
>>execution.
> 
> 
> Really, in my opinion, it's a kernel bug; the syscall should
> automatically restart in this case.  Some syscalls now do that, on
> current kernels.  Others don't.  It's hard to fix this without breaking
> them in other ways.
> 
> In this case the syscall is sys_futex.  When interrupted, futex_wait
> returns -EINTR.  This is documented to happen whether the signal was
> handled or not.  Maybe adding a fifth signal restart option to the
> existing four in the Linux kernel could fix this: ERESTARTNOSIGNAL.
> That wouldn't be hard to implement if you want to try it.  You'd have
> to do some thinking about the semantics of futexes to make sure it was
> safe.

Sounds like a good idea. Let me know how it goes :)

> 
> 
>>Hmmm... so people who use POSIX threads have to put every syscall into a 
>>loop, ignoring EINTR? What if it's a real timeout? Sorry this does not 
>>seem reasonable to me.
> 
> 
> Let's be precise here: "what if it's a real signal".  sem_wait does not
> time out.  sem_timedwait returns ETIMEDOUT, not EINTR, for timeouts.

I wasn't referring to sem_wait specifically. I was thinking of any 
general syscall that will return EINTR after a SIGALARM. But you're 
right, could be some other signal.

> 
> 
>>Will there be a fix in the future to this unfortunate side-effect? How 
>>do NPTL programmers single-step their programs today? Using syscalls in 
>>loops? Using a different debugger?
> 
> 
> In practice this does not bother most programmers.  If your application
> uses signals, it often needs to do this anyway!
> 

If you use signals you can set SA_RESTART for catchable signals. What 
we're talking about here is a syscall wrapper just so we can single-step 
with gdb. Lot's of S/W examples don't have these wrappers. Anyway I do 
appreciate you help. Thanks.

Can someone send me a nice wrapper macro?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: gdb and multi-threaded (NPTL) programs
  2006-03-24 20:52   ` John Fodor
@ 2006-03-24 21:07     ` Daniel Jacobowitz
  0 siblings, 0 replies; 11+ messages in thread
From: Daniel Jacobowitz @ 2006-03-24 21:07 UTC (permalink / raw)
  To: John Fodor; +Cc: gdb

On Fri, Mar 24, 2006 at 03:44:09PM -0500, John Fodor wrote:
> Sounds like a good idea. Let me know how it goes :)

Sorry, you're the one objecting to the current behavior :-)

I've done my share of fixing these bugs; I don't have time to try
another one.

> I wasn't referring to sem_wait specifically. I was thinking of any 
> general syscall that will return EINTR after a SIGALARM.

That's receving a signal, not timing out.  But anyway.

> >In practice this does not bother most programmers.  If your application
> >uses signals, it often needs to do this anyway!
> >
> 
> If you use signals you can set SA_RESTART for catchable signals.

That only works for restartable syscalls - sem_wait is not, in fact,
restartable, and I believe that restarting it after SA_RESTART would
violate the POSIX spec.  Not 100% sure on that though.

-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: gdb and multi-threaded (NPTL) programs
  2006-03-24 20:40 ` Andreas Schwab
  2006-03-24 20:45   ` Eric Desjardins
@ 2006-03-24 21:15   ` John Fodor
  2006-03-24 21:21     ` Andreas Schwab
  1 sibling, 1 reply; 11+ messages in thread
From: John Fodor @ 2006-03-24 21:15 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: gdb

Andreas Schwab wrote:
> John Fodor <john_fodor@mac.com> writes:
> 
> 
>>Hmmm... so people who use POSIX threads have to put every syscall into a 
>>loop, ignoring EINTR?
> 
> 
> Every library call that is allowed to return with EINTR must be handled
> appropriately.  sem_wait is specified as being able to return with EINTR.
> If your program can't handle that it has a bug.
> 
> Andreas.
> 

I guess you're right. But remember this is an EINTR because we are 
single-stepping. Are yoiu saying that one should retry all syscalls on 
every EINTR? Thanks.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: gdb and multi-threaded (NPTL) programs
  2006-03-24 21:15   ` John Fodor
@ 2006-03-24 21:21     ` Andreas Schwab
  2006-03-28  9:33       ` Jim Blandy
  0 siblings, 1 reply; 11+ messages in thread
From: Andreas Schwab @ 2006-03-24 21:21 UTC (permalink / raw)
  To: John Fodor; +Cc: gdb

John Fodor <john_fodor@mac.com> writes:

> I guess you're right. But remember this is an EINTR because we are 
> single-stepping.

Or receiving a stop signal for any other reason.

> Are yoiu saying that one should retry all syscalls on every EINTR?

It depends on what is appropriate.  Every program can have its own
requirements.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: gdb and multi-threaded (NPTL) programs
  2006-03-24 21:21     ` Andreas Schwab
@ 2006-03-28  9:33       ` Jim Blandy
  2006-03-28 10:43         ` Daniel Jacobowitz
  0 siblings, 1 reply; 11+ messages in thread
From: Jim Blandy @ 2006-03-28  9:33 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: John Fodor, gdb

I think some of this discussion is missing the point.  A debugger
should let the user examine the program's behavior while disturbing it
as little as possible.  If the debugger happens to be implemented
using signals, and those signal disturb the program's behavior, then
that's a flaw.

The call to sem_wait should probably be wrapped in a loop that checks
for EINTR, but that's beside the point.  Running the program under the
debugger shouldn't even change the number of times the thread goes
around the EINTR loop.

So I'm agreeing with Daniel, I guess: it's a kernel bug.  But it
sounds to me like Daniel is saying that you need to find some change
to the non-debugging behavior that would also fix the debugging
behavior, which I don't agree with.  In theory, you should change the
debugging behavior and have no effect on the syscall's interface to
the code that calls it.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: gdb and multi-threaded (NPTL) programs
  2006-03-28  9:33       ` Jim Blandy
@ 2006-03-28 10:43         ` Daniel Jacobowitz
  0 siblings, 0 replies; 11+ messages in thread
From: Daniel Jacobowitz @ 2006-03-28 10:43 UTC (permalink / raw)
  To: Jim Blandy; +Cc: Andreas Schwab, John Fodor, gdb

On Fri, Mar 24, 2006 at 01:15:26PM -0800, Jim Blandy wrote:
> So I'm agreeing with Daniel, I guess: it's a kernel bug.  But it
> sounds to me like Daniel is saying that you need to find some change
> to the non-debugging behavior that would also fix the debugging
> behavior, which I don't agree with.  In theory, you should change the
> debugging behavior and have no effect on the syscall's interface to
> the code that calls it.

That's not what I said at all.  I said that you'd need to change
the kernel source code for the syscall to not return -EINTR, but
instead return some new code; I'm certainly not suggesting that the
new code ever be returned to userspace!

There is plenty of precedent for the construct I describe in the Linux
signal handling implementation.

-- 
Daniel Jacobowitz
CodeSourcery


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-03-24 21:21 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-03-24 20:24 gdb and multi-threaded (NPTL) programs John Fodor
2006-03-24 20:27 ` Daniel Jacobowitz
2006-03-24 20:52   ` John Fodor
2006-03-24 21:07     ` Daniel Jacobowitz
2006-03-24 20:40 ` Andreas Schwab
2006-03-24 20:45   ` Eric Desjardins
2006-03-24 20:47     ` Andreas Schwab
2006-03-24 21:15   ` John Fodor
2006-03-24 21:21     ` Andreas Schwab
2006-03-28  9:33       ` Jim Blandy
2006-03-28 10:43         ` Daniel Jacobowitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox