Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed
* RFA: lin-lwp bug with software-single-step or schedlock
@ 2002-10-22 21:25 Daniel Jacobowitz
  2002-10-22 22:35 ` Andrew Cagney
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Daniel Jacobowitz @ 2002-10-22 21:25 UTC (permalink / raw)
  To: gdb-patches; +Cc: msnyder, kettenis

This bug was noticed on MIPS, because MIPS GNU/Linux is
SOFTWARE_SINGLE_STEP_P.  There's a comment in lin_lwp_resume:

  /* Apparently the interpretation of PID is dependent on STEP: If
     STEP is non-zero, a specific PID means `step only this process
     id'.  But if STEP is zero, then PID means `continue *all*
     processes, but give the signal only to this one'.  */
  resume_all = (PIDGET (ptid) == -1) || !step;

Now, I did some digging, and I believe this comment is completely incorrect. 
Saying "signal SIGWINCH" causes PIDGET (ptid) == -1, and it is assumed the
signal will be delivered to inferior_ptid.  There's some other problem there
- I think I've discovered that we will neglect to single-step over a
breakpoint if we are told to continue with a signal, which is a bit dubious
of a decision - but by and large it works as expected.

So if STEP is 0, we always resume all processes.  STEP at this point _only_
refers to whether we want a PTRACE_SINGLESTEP or equivalent;
SOFTWARE_SINGLE_STEP has already been handled.  We can't make policy
decisions based on STEP any more.

I tried removing the || !step.  It's pretty hard to tell, since there are
still a few non-deterministic failures on my test systems (which is what I
was actually hunting when I found this!) but I believe testsuite results are
improved on i386.  One run of just the thread tests (after the patch in my
last message, which I've committed), shows that these all got fixed:

FAIL: gdb.threads/schedlock.exp: other thread 0 didn't run (ran)
FAIL: gdb.threads/schedlock.exp: other thread 2 didn't run (ran)
FAIL: gdb.threads/schedlock.exp: other thread 3 didn't run (ran)
FAIL: gdb.threads/schedlock.exp: other thread 4 didn't run (ran)
FAIL: gdb.threads/schedlock.exp: other thread 5 didn't run (ran)

with my change:
# of expected passes            188
# of unexpected failures        3

without:
# of expected passes            183
# of unexpected failures        7

There's still two that look like test issues rather than bugs, one that is
fixed by a patch Kevin posted a long time ago, and one in print-threads.exp
that I'm not sure of the cause for yet.



On MIPS it's more pronounced, as I'd expect.
with my change:
# of expected passes            170
# of unexpected failures        12

without:
# of expected passes            114
# of unexpected failures        41


At least one of the MIPS issues remaining appears to be hitting the software
single step breakpoint in the wrong thread.  It doesn't seem to be marked as
thread-specific, and it should be.  I'll follow up on that later.


That was the long version.  Here's the short version.  OK to apply?

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer


2002-10-23  Daniel Jacobowitz  <drow@mvista.com>

	* lin-lwp.c (lin_lwp_resume): Remove resume_all test for !step.

Index: lin-lwp.c
===================================================================
RCS file: /cvs/src/src/gdb/lin-lwp.c,v
retrieving revision 1.35
diff -u -p -r1.35 lin-lwp.c
--- lin-lwp.c	27 Aug 2002 22:37:06 -0000	1.35
+++ lin-lwp.c	23 Oct 2002 04:23:13 -0000
@@ -579,11 +579,8 @@ lin_lwp_resume (ptid_t ptid, int step, e
   struct lwp_info *lp;
   int resume_all;
 
-  /* Apparently the interpretation of PID is dependent on STEP: If
-     STEP is non-zero, a specific PID means `step only this process
-     id'.  But if STEP is zero, then PID means `continue *all*
-     processes, but give the signal only to this one'.  */
-  resume_all = (PIDGET (ptid) == -1) || !step;
+  /* A specific PTID means `step only this process id'.  */
+  resume_all = (PIDGET (ptid) == -1);
 
   if (resume_all)
     iterate_over_lwps (resume_set_callback, NULL);


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFA: lin-lwp bug with software-single-step or schedlock
  2002-10-22 21:25 RFA: lin-lwp bug with software-single-step or schedlock Daniel Jacobowitz
@ 2002-10-22 22:35 ` Andrew Cagney
  2002-10-23  7:40   ` Daniel Jacobowitz
  2002-10-23 14:13 ` Mark Kettenis
  2002-10-31 13:01 ` Daniel Jacobowitz
  2 siblings, 1 reply; 7+ messages in thread
From: Andrew Cagney @ 2002-10-22 22:35 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: gdb-patches, msnyder, kettenis

> This bug was noticed on MIPS, because MIPS GNU/Linux is
> SOFTWARE_SINGLE_STEP_P.  There's a comment in lin_lwp_resume:
> 
>   /* Apparently the interpretation of PID is dependent on STEP: If
>      STEP is non-zero, a specific PID means `step only this process
>      id'.  But if STEP is zero, then PID means `continue *all*
>      processes, but give the signal only to this one'.  */
>   resume_all = (PIDGET (ptid) == -1) || !step;
> 
> Now, I did some digging, and I believe this comment is completely incorrect. 
> Saying "signal SIGWINCH" causes PIDGET (ptid) == -1, and it is assumed the
> signal will be delivered to inferior_ptid.  There's some other problem there
> - I think I've discovered that we will neglect to single-step over a
> breakpoint if we are told to continue with a signal, which is a bit dubious
> of a decision - but by and large it works as expected.
> 
> So if STEP is 0, we always resume all processes.  STEP at this point _only_
> refers to whether we want a PTRACE_SINGLESTEP or equivalent;
> SOFTWARE_SINGLE_STEP has already been handled.  We can't make policy
> decisions based on STEP any more.
> 
> I tried removing the || !step.  It's pretty hard to tell, since there are
> still a few non-deterministic failures on my test systems (which is what I
> was actually hunting when I found this!) but I believe testsuite results are
> improved on i386.  One run of just the thread tests (after the patch in my
> last message, which I've committed), shows that these all got fixed:

Shouldn't, per the remote.c Hg discussion, the code be changed so that 
lin_lwp_resume() has complete information and, hence, can correctly 
determine if resume all/one is needed.

Andrew



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFA: lin-lwp bug with software-single-step or schedlock
  2002-10-22 22:35 ` Andrew Cagney
@ 2002-10-23  7:40   ` Daniel Jacobowitz
  0 siblings, 0 replies; 7+ messages in thread
From: Daniel Jacobowitz @ 2002-10-23  7:40 UTC (permalink / raw)
  To: Andrew Cagney; +Cc: gdb-patches, msnyder, kettenis

On Wed, Oct 23, 2002 at 01:32:30AM -0400, Andrew Cagney wrote:
> >This bug was noticed on MIPS, because MIPS GNU/Linux is
> >SOFTWARE_SINGLE_STEP_P.  There's a comment in lin_lwp_resume:
> >
> >  /* Apparently the interpretation of PID is dependent on STEP: If
> >     STEP is non-zero, a specific PID means `step only this process
> >     id'.  But if STEP is zero, then PID means `continue *all*
> >     processes, but give the signal only to this one'.  */
> >  resume_all = (PIDGET (ptid) == -1) || !step;
> >
> >Now, I did some digging, and I believe this comment is completely 
> >incorrect. Saying "signal SIGWINCH" causes PIDGET (ptid) == -1, and it is 
> >assumed the
> >signal will be delivered to inferior_ptid.  There's some other problem 
> >there
> >- I think I've discovered that we will neglect to single-step over a
> >breakpoint if we are told to continue with a signal, which is a bit dubious
> >of a decision - but by and large it works as expected.
> >
> >So if STEP is 0, we always resume all processes.  STEP at this point _only_
> >refers to whether we want a PTRACE_SINGLESTEP or equivalent;
> >SOFTWARE_SINGLE_STEP has already been handled.  We can't make policy
> >decisions based on STEP any more.
> >
> >I tried removing the || !step.  It's pretty hard to tell, since there are
> >still a few non-deterministic failures on my test systems (which is what I
> >was actually hunting when I found this!) but I believe testsuite results 
> >are
> >improved on i386.  One run of just the thread tests (after the patch in my
> >last message, which I've committed), shows that these all got fixed:
> 
> Shouldn't, per the remote.c Hg discussion, the code be changed so that 
> lin_lwp_resume() has complete information and, hence, can correctly 
> determine if resume all/one is needed.

Except the case is a little different - with remote we've never had a
problem figuring out if all/one is needed, only figuring out _which_
thread to signal/treat specially.  The information on whether to resume
one or all is there; it's in ptid, which lin-lwp was misinterpreting. 
We should eventually update the interface to the resume functions to
eliminate this hackery; I was thinking something like:

  void target_resume (ptid_t ptid, int step, int resume_all);

But that can be done as a follow-up.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFA: lin-lwp bug with software-single-step or schedlock
  2002-10-22 21:25 RFA: lin-lwp bug with software-single-step or schedlock Daniel Jacobowitz
  2002-10-22 22:35 ` Andrew Cagney
@ 2002-10-23 14:13 ` Mark Kettenis
  2002-10-23 14:44   ` Michael Snyder
  2002-10-23 14:55   ` Daniel Jacobowitz
  2002-10-31 13:01 ` Daniel Jacobowitz
  2 siblings, 2 replies; 7+ messages in thread
From: Mark Kettenis @ 2002-10-23 14:13 UTC (permalink / raw)
  To: drow; +Cc: gdb-patches, msnyder

   Date: Wed, 23 Oct 2002 00:26:15 -0400
   From: Daniel Jacobowitz <drow@mvista.com>

   This bug was noticed on MIPS, because MIPS GNU/Linux is
   SOFTWARE_SINGLE_STEP_P.  There's a comment in lin_lwp_resume:

     /* Apparently the interpretation of PID is dependent on STEP: If
	STEP is non-zero, a specific PID means `step only this process
	id'.  But if STEP is zero, then PID means `continue *all*
	processes, but give the signal only to this one'.  */
     resume_all = (PIDGET (ptid) == -1) || !step;

I'm fairly certain it's not without reason that I wrote this comment
as it is.

   Now, I did some digging, and I believe this comment is completely
   incorrect.  Saying "signal SIGWINCH" causes PIDGET (ptid) == -1,
   and it is assumed the signal will be delivered to inferior_ptid.
   There's some other problem there - I think I've discovered that we
   will neglect to single-step over a breakpoint if we are told to
   continue with a signal, which is a bit dubious of a decision - but
   by and large it works as expected.

I don't see directly why, but I wouldn't be surprised by it.

   So if STEP is 0, we always resume all processes.  STEP at this point _only_
   refers to whether we want a PTRACE_SINGLESTEP or equivalent;
   SOFTWARE_SINGLE_STEP has already been handled.  We can't make policy
   decisions based on STEP any more.

Indeed, there's something wrong here.

   I tried removing the || !step.  It's pretty hard to tell, since there are
   still a few non-deterministic failures on my test systems (which is what I
   was actually hunting when I found this!) but I believe testsuite results are
   improved on i386.

There is one thing that might be affected.  Suppose you have a signal
such as SIGUSR1 that stops the inferior but is also passed on to the
inferior.  If a multi-threaded program gets this signal, GDB will
stop.  If you now change the current thread to some other thread and
try to single-step.  Will the signal be delivered to the origional
thread?

If your patch doesn't affect this, I think your patch is OK to check
in.  Otherwise we'll have to think about this a bit more.

Mark


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFA: lin-lwp bug with software-single-step or schedlock
  2002-10-23 14:13 ` Mark Kettenis
@ 2002-10-23 14:44   ` Michael Snyder
  2002-10-23 14:55   ` Daniel Jacobowitz
  1 sibling, 0 replies; 7+ messages in thread
From: Michael Snyder @ 2002-10-23 14:44 UTC (permalink / raw)
  To: Mark Kettenis; +Cc: drow, gdb-patches

Mark Kettenis wrote:
> 
>    Date: Wed, 23 Oct 2002 00:26:15 -0400
>    From: Daniel Jacobowitz <drow@mvista.com>
> 
>    This bug was noticed on MIPS, because MIPS GNU/Linux is
>    SOFTWARE_SINGLE_STEP_P.  There's a comment in lin_lwp_resume:
> 
>      /* Apparently the interpretation of PID is dependent on STEP: If
>         STEP is non-zero, a specific PID means `step only this process
>         id'.  But if STEP is zero, then PID means `continue *all*
>         processes, but give the signal only to this one'.  */
>      resume_all = (PIDGET (ptid) == -1) || !step;
> 
> I'm fairly certain it's not without reason that I wrote this comment
> as it is.

Uh, you didn't.  I did.  You copied it from "lin-thread.c".
And I wrote it because I found it empirically to be true at the time.


>    Now, I did some digging, and I believe this comment is completely
>    incorrect.  Saying "signal SIGWINCH" causes PIDGET (ptid) == -1,
>    and it is assumed the signal will be delivered to inferior_ptid.
>    There's some other problem there - I think I've discovered that we
>    will neglect to single-step over a breakpoint if we are told to
>    continue with a signal, which is a bit dubious of a decision - but
>    by and large it works as expected.
> 
> I don't see directly why, but I wouldn't be surprised by it.

The whole business is rather unplanned.  The exact meaning and 
connotation of those variables is nowhere defined in writing.
Mainly it's a matter of keeping consistent with what the code
in gdb expects, which you sometimes have to guess, or learn by
examining an older threaded architecture (eg. solaris).

>    So if STEP is 0, we always resume all processes.  STEP at this point _only_
>    refers to whether we want a PTRACE_SINGLESTEP or equivalent;
>    SOFTWARE_SINGLE_STEP has already been handled.  We can't make policy
>    decisions based on STEP any more.
> 
> Indeed, there's something wrong here.

Sounds like something has changed upstream.


>    I tried removing the || !step.  It's pretty hard to tell, since there are
>    still a few non-deterministic failures on my test systems (which is what I
>    was actually hunting when I found this!) but I believe testsuite results are
>    improved on i386.
> 
> There is one thing that might be affected.  Suppose you have a signal
> such as SIGUSR1 that stops the inferior but is also passed on to the
> inferior.  If a multi-threaded program gets this signal, GDB will
> stop.  If you now change the current thread to some other thread and
> try to single-step.  Will the signal be delivered to the origional
> thread?

That's what "prepare_to_proceed" is supposed to take care of.

> If your patch doesn't affect this, I think your patch is OK to check
> in.  Otherwise we'll have to think about this a bit more.

I'll pipe up and comment, as above, but I'll let you two guys
work out the decision between you.

Michael


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFA: lin-lwp bug with software-single-step or schedlock
  2002-10-23 14:13 ` Mark Kettenis
  2002-10-23 14:44   ` Michael Snyder
@ 2002-10-23 14:55   ` Daniel Jacobowitz
  1 sibling, 0 replies; 7+ messages in thread
From: Daniel Jacobowitz @ 2002-10-23 14:55 UTC (permalink / raw)
  To: Mark Kettenis; +Cc: gdb-patches, msnyder

On Wed, Oct 23, 2002 at 11:13:22PM +0200, Mark Kettenis wrote:
>    Date: Wed, 23 Oct 2002 00:26:15 -0400
>    From: Daniel Jacobowitz <drow@mvista.com>
> 
>    This bug was noticed on MIPS, because MIPS GNU/Linux is
>    SOFTWARE_SINGLE_STEP_P.  There's a comment in lin_lwp_resume:
> 
>      /* Apparently the interpretation of PID is dependent on STEP: If
> 	STEP is non-zero, a specific PID means `step only this process
> 	id'.  But if STEP is zero, then PID means `continue *all*
> 	processes, but give the signal only to this one'.  */
>      resume_all = (PIDGET (ptid) == -1) || !step;
> 
> I'm fairly certain it's not without reason that I wrote this comment
> as it is.

I'm sure - I'd dearly like to know why, since I'm sure it'll bite us
later :)

>    Now, I did some digging, and I believe this comment is completely
>    incorrect.  Saying "signal SIGWINCH" causes PIDGET (ptid) == -1,
>    and it is assumed the signal will be delivered to inferior_ptid.
>    There's some other problem there - I think I've discovered that we
>    will neglect to single-step over a breakpoint if we are told to
>    continue with a signal, which is a bit dubious of a decision - but
>    by and large it works as expected.
> 
> I don't see directly why, but I wouldn't be surprised by it.
> 
>    So if STEP is 0, we always resume all processes.  STEP at this point _only_
>    refers to whether we want a PTRACE_SINGLESTEP or equivalent;
>    SOFTWARE_SINGLE_STEP has already been handled.  We can't make policy
>    decisions based on STEP any more.
> 
> Indeed, there's something wrong here.
> 
>    I tried removing the || !step.  It's pretty hard to tell, since there are
>    still a few non-deterministic failures on my test systems (which is what I
>    was actually hunting when I found this!) but I believe testsuite results are
>    improved on i386.
> 
> There is one thing that might be affected.  Suppose you have a signal
> such as SIGUSR1 that stops the inferior but is also passed on to the
> inferior.  If a multi-threaded program gets this signal, GDB will
> stop.  If you now change the current thread to some other thread and
> try to single-step.  Will the signal be delivered to the origional
> thread?
> 
> If your patch doesn't affect this, I think your patch is OK to check
> in.  Otherwise we'll have to think about this a bit more.

Well, let's see.  There's some interesting behavior here.

- stopped at GDB prompt, thread 8 current
- say "signal SIGWINCH"
- thread 8 gets the signal

- stopped at GDB prompt, thread 8 current
- say "thread 9"
- say "signal SIGWINCH"
- thread 9 gets the signal

- stopped at GDB prompt, thread 8 current
- say "thread 9"
- say "set scheduler-locking on"
- say "stepi"
- thread 9 steps

- thread 8 gets SIGWINCH (which is set to stop print pass in my session)
- thread 8 is current thread at stop
- say "continue"
- thread 8 gets SIGWINCH

- thread 8 gets SIGWINCH
- thread 8 is current thread at stop
- say "thread 9"
- say "continue"
- thread 9 gets SIGWINCH

Oops.  However, this behavior is 100% unchanged by my patch; that is,
it didn't work before either.  Using step instead of continue makes no
difference either.

Is my patch OK in this case?

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: RFA: lin-lwp bug with software-single-step or schedlock
  2002-10-22 21:25 RFA: lin-lwp bug with software-single-step or schedlock Daniel Jacobowitz
  2002-10-22 22:35 ` Andrew Cagney
  2002-10-23 14:13 ` Mark Kettenis
@ 2002-10-31 13:01 ` Daniel Jacobowitz
  2 siblings, 0 replies; 7+ messages in thread
From: Daniel Jacobowitz @ 2002-10-31 13:01 UTC (permalink / raw)
  To: gdb-patches; +Cc: msnyder, kettenis

On Wed, Oct 23, 2002 at 12:26:15AM -0400, Daniel Jacobowitz wrote:
> 2002-10-23  Daniel Jacobowitz  <drow@mvista.com>
> 
> 	* lin-lwp.c (lin_lwp_resume): Remove resume_all test for !step.

Based on Mark's comments, I've committed this.  Later we should hash
out exactly when which thread should get which signal, and write
testcases for that...

> Index: lin-lwp.c
> ===================================================================
> RCS file: /cvs/src/src/gdb/lin-lwp.c,v
> retrieving revision 1.35
> diff -u -p -r1.35 lin-lwp.c
> --- lin-lwp.c	27 Aug 2002 22:37:06 -0000	1.35
> +++ lin-lwp.c	23 Oct 2002 04:23:13 -0000
> @@ -579,11 +579,8 @@ lin_lwp_resume (ptid_t ptid, int step, e
>    struct lwp_info *lp;
>    int resume_all;
>  
> -  /* Apparently the interpretation of PID is dependent on STEP: If
> -     STEP is non-zero, a specific PID means `step only this process
> -     id'.  But if STEP is zero, then PID means `continue *all*
> -     processes, but give the signal only to this one'.  */
> -  resume_all = (PIDGET (ptid) == -1) || !step;
> +  /* A specific PTID means `step only this process id'.  */
> +  resume_all = (PIDGET (ptid) == -1);
>  
>    if (resume_all)
>      iterate_over_lwps (resume_set_callback, NULL);
> 

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2002-10-31 21:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-22 21:25 RFA: lin-lwp bug with software-single-step or schedlock Daniel Jacobowitz
2002-10-22 22:35 ` Andrew Cagney
2002-10-23  7:40   ` Daniel Jacobowitz
2002-10-23 14:13 ` Mark Kettenis
2002-10-23 14:44   ` Michael Snyder
2002-10-23 14:55   ` Daniel Jacobowitz
2002-10-31 13:01 ` Daniel Jacobowitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox