hardware watchpoints in non-stop - "moribund"/delayed watchpoint traps

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

* hardware watchpoints in non-stop - "moribund"/delayed watchpoint traps
@ 2009-11-18 14:46 Pedro Alves
  2009-11-18 17:01 ` Jan Kratochvil
  0 siblings, 1 reply; 7+ messages in thread
From: Pedro Alves @ 2009-11-18 14:46 UTC (permalink / raw)
  To: gdb-patches

Hi,

I'm working on making watchpoints actually work in non-stop
mode (against a private stub).

One thing that was never handled is "moribund" / delayed
watchpoint traps.  Cases like these:


| Time/ | GDB                               | Target                         |
|  Step |                                   |                                |
|-------+-----------------------------------+--------------------------------|
|     1 | user sets watchpoint, gdb tells   |                                |
|       | target to insert watchpoint       |                                |
|-------+-----------------------------------+--------------------------------|
|     2 |                                   | inserts watchpoint (sets debug |
|       |                                   | registers)                     |
|-------+-----------------------------------+--------------------------------|
|       |                                   | hits watchpoint (puts event    |
|     3 |                                   | in queue/sends to gdb)         |
|-------+-----------------------------------+--------------------------------|
|     4 | user deletes watchpoint, gdb      |                                |
|       | tells target to delete watchpoint |                                |
|-------+-----------------------------------+--------------------------------|
|     5 |                                   | receives watchpoint            |
|       |                                   | delete request                 |
|-------+-----------------------------------+--------------------------------|
|     6 | receives watchpoint hit event     |                                |


Say the target is able to report stopped data addresses on watchpoint
hits (some can't).  When you reach step 6, bpstat_stop_status doesn't
find any watchpoint in the breakpoint list for the reported stopped
data address, and hence doesn't put any watchpoint related bpstat in
the list --- step 4 had deleted the corresponding watchpoint from
gdb's breakpoint lists.

The end result is that GDB simply reports the trap a random SIGTRAP,
which is far from user friendly (Program received signal SIGTRAP,
Trace/breakpoint trap).


This actually triggers very often with current GDB, because
watchpoints have never been adjusted to work in always-inserted mode
--- GDB always removes/inserts watchpoints on every reported event.
I'll post patches to fix that later.


This shouldn't be limited to non-stop mode.  I'm thinking (but haven't
tried to reproduce it), that something like this will happen in
linux/all-stop mode when Jan's reordered-watchpoints patch goes in:


| Time/ | GDB                               | Target                         |
|  Step |                                   |                                |
|-------+-----------------------------------+--------------------------------|
|     1 | sets watchpoint                   |                                |
|-------+-----------------------------------+--------------------------------|
|       |                                   | two threads hit watchpoint.    |
|     2 |                                   | One event is reported, the     |
|       |                                   | other left pending.            |
|-------+-----------------------------------+--------------------------------|
|     3 | user deletes watchpoint,          |                                |
|       | continues                         |                                |
|-------+-----------------------------------+--------------------------------|
|     4 |                                   | skips resuming --- has pending |
|       |                                   | status to report, and reports  |
|       |                                   | that now                       |
|-------+-----------------------------------+--------------------------------|
|     5 | receives watchpoint hit           |                                |
|       | event, but there's no             |                                |
|       | watchpoint listed for this        |                                |
|       | stopped data address              |                                |
|-------+-----------------------------------+--------------------------------|
|     6 | report random SIGTRAP to the user |                                |


We could probably tweak Jan' new test at
<http://sourceware.org/ml/gdb-patches/2009-11/msg00400.html> to trigger this.


So, here's the simple patch to handle this.  Any comments/concerns?

2009-11-18  Pedro Alves  <pedro@codesourcery.com>

	* infrun.c (handle_inferior_event): Hardware watchpoint traps are
	never random signals.

-- 
Pedro Alves


---
 gdb/gdbserver/i386-low.c |    1 +
 gdb/infrun.c             |    9 +++++++++
 2 files changed, 10 insertions(+)

Index: src/gdb/infrun.c
===================================================================
--- src.orig/gdb/infrun.c	2009-11-18 12:09:05.000000000 +0000
+++ src/gdb/infrun.c	2009-11-18 14:06:34.000000000 +0000
@@ -3613,9 +3613,18 @@ targets should add new threads to the th
          be necessary for call dummies on a non-executable stack on
          SPARC.  */
 
+      /* This is where we handle "moribund" watchpoints.  Unlike
+	 software breakpoints traps, hardware watchpoint traps are
+	 always distinguishable from random traps.  If no high-level
+	 watchpoint is associated with the reported stop data address
+	 anymore, then the bpstat does not explain the signal ---
+	 simply make sure to ignore it if `stopped_by_watchpoint' is
+	 set.  */
+
       if (ecs->event_thread->stop_signal == TARGET_SIGNAL_TRAP)
 	ecs->random_signal
 	  = !(bpstat_explains_signal (ecs->event_thread->stop_bpstat)
+	      || stopped_by_watchpoint
 	      || ecs->event_thread->trap_expected
 	      || (ecs->event_thread->step_range_end
 		  && ecs->event_thread->step_resume_breakpoint == NULL));


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: hardware watchpoints in non-stop - "moribund"/delayed watchpoint  traps
  2009-11-18 14:46 hardware watchpoints in non-stop - "moribund"/delayed watchpoint traps Pedro Alves
@ 2009-11-18 17:01 ` Jan Kratochvil
  2009-11-18 17:51   ` Pedro Alves
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Kratochvil @ 2009-11-18 17:01 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

On Wed, 18 Nov 2009 15:45:38 +0100, Pedro Alves wrote:
> This shouldn't be limited to non-stop mode.  I'm thinking (but haven't
> tried to reproduce it), that something like this will happen in
> linux/all-stop mode when Jan's reordered-watchpoints patch goes in:
> 
> 
> | Time/ | GDB                               | Target                         |
> |  Step |                                   |                                |
> |-------+-----------------------------------+--------------------------------|
> |     1 | sets watchpoint                   |                                |
> |-------+-----------------------------------+--------------------------------|
> |       |                                   | two threads hit watchpoint.    |
> |     2 |                                   | One event is reported, the     |
> |       |                                   | other left pending.            |
> |-------+-----------------------------------+--------------------------------|
> |     3 | user deletes watchpoint,          |                                |
> |       | continues                         |                                |
> |-------+-----------------------------------+--------------------------------|
> |     4 |                                   | skips resuming --- has pending |
> |       |                                   | status to report, and reports  |
> |       |                                   | that now                       |
> |-------+-----------------------------------+--------------------------------|
> |     5 | receives watchpoint hit           |                                |
> |       | event, but there's no             |                                |
> |       | watchpoint listed for this        |                                |
> |       | stopped data address              |                                |
> |-------+-----------------------------------+--------------------------------|
> |     6 | report random SIGTRAP to the user |                                |
> 
> 
> We could probably tweak Jan' new test at
> <http://sourceware.org/ml/gdb-patches/2009-11/msg00400.html> to trigger this.

I agree, nice catch.

Such patch to test it is at the bottom.


>        if (ecs->event_thread->stop_signal == TARGET_SIGNAL_TRAP)
>  	ecs->random_signal
>  	  = !(bpstat_explains_signal (ecs->event_thread->stop_bpstat)
> +	      || stopped_by_watchpoint
>  	      || ecs->event_thread->trap_expected
>  	      || (ecs->event_thread->step_range_end
>  		  && ecs->event_thread->step_resume_breakpoint == NULL));

This means forgotten triggers (as currently without the hw-watchpoints
patch 1/4) would be hidden.

As the is already the infrastructure for moribund locations isn't it better to
enable them even for all-stop mode and check the address explicitly against
them?  Sorry for no such patch in this mail.


Regards,
Jan


--- a/gdb/testsuite/gdb.threads/watchthreads-reorder.exp
+++ b/gdb/testsuite/gdb.threads/watchthreads-reorder.exp
@@ -88,6 +88,8 @@ foreach reorder {0 1} {
 	gdb_test "rwatch unused1_rwatch" "Hardware read watchpoint \[0-9\]+: unused1_rwatch"
 	gdb_test "rwatch unused2_rwatch" "Hardware read watchpoint \[0-9\]+: unused2_rwatch"
     }
+    gdb_test "delete 2"
+    gdb_test "delete 3"
 
     gdb_test "continue" \
 	     "Hardware read watchpoint \[0-9\]+: thread\[12\]_rwatch\r\n\r\nValue = 0\r\n0x\[0-9a-f\]+ in thread\[12\]_func .*" \


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: hardware watchpoints in non-stop - "moribund"/delayed watchpoint traps
  2009-11-18 17:01 ` Jan Kratochvil
@ 2009-11-18 17:51   ` Pedro Alves
  2009-11-18 18:40     ` Jan Kratochvil
  0 siblings, 1 reply; 7+ messages in thread
From: Pedro Alves @ 2009-11-18 17:51 UTC (permalink / raw)
  To: Jan Kratochvil; +Cc: gdb-patches

On Wednesday 18 November 2009 16:59:53, Jan Kratochvil wrote:
> On Wed, 18 Nov 2009 15:45:38 +0100, Pedro Alves wrote:
> 
> >        if (ecs->event_thread->stop_signal == TARGET_SIGNAL_TRAP)
> >  	ecs->random_signal
> >  	  = !(bpstat_explains_signal (ecs->event_thread->stop_bpstat)
> > +	      || stopped_by_watchpoint
> >  	      || ecs->event_thread->trap_expected
> >  	      || (ecs->event_thread->step_range_end
> >  		  && ecs->event_thread->step_resume_breakpoint == NULL));
> 
> This means forgotten triggers (as currently without the hw-watchpoints
> patch 1/4) would be hidden.

But is that a correctness problem?  I guess I could put something like:

if (debug_infrun
    && !bpstat_explains_signal (ecs->event_thread->stop_bpstat)
    && stopped_by_watchpoint)
  fprintf_unfiltered (gdb_stdlog,
                     "infrun: no user watchpoint explains watchpoint event, ignoring");

so that we can tell from the logs if something is reporting bad
watchpoint SIGTRAPs.

> As the is already the infrastructure for moribund locations isn't it better to
> enable them even for all-stop mode and check the address explicitly against
> them?  Sorry for no such patch in this mail.

At first I thought so, but, I then changed my mind into thinking the extra
complexity isn't necessary.  We needed the moribund breakpoint locations
heuristic, to be able to distinguish random SIGTRAPs from delayed software
breakpoint traps.  We don't keep them indefinitly, so to reduce the
chances of mistaking a real random SIGTRAP from a delayed software
breakpoint SIGTRAP.  We don't need the heuristic with hardware watchpoint
traps --- we can always tell the difference.

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: hardware watchpoints in non-stop - "moribund"/delayed watchpoint  traps
  2009-11-18 17:51   ` Pedro Alves
@ 2009-11-18 18:40     ` Jan Kratochvil
  2009-11-18 19:46       ` Pedro Alves
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Kratochvil @ 2009-11-18 18:40 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

On Wed, 18 Nov 2009 18:50:50 +0100, Pedro Alves wrote:
> if (debug_infrun
>     && !bpstat_explains_signal (ecs->event_thread->stop_bpstat)
>     && stopped_by_watchpoint)
>   fprintf_unfiltered (gdb_stdlog,
>                      "infrun: no user watchpoint explains watchpoint event, ignoring");
> 
> so that we can tell from the logs if something is reporting bad
> watchpoint SIGTRAPs.

The problem is it is not clear from this message there is a GDB bug and the
testsuite already contains too many racy failures to be catching more of them.


> At first I thought so, but, I then changed my mind into thinking the extra
> complexity isn't necessary.  We needed the moribund breakpoint locations
> heuristic, to be able to distinguish random SIGTRAPs from delayed software
> breakpoint traps.  We don't keep them indefinitly, so to reduce the
> chances of mistaking a real random SIGTRAP from a delayed software
> breakpoint SIGTRAP.  We don't need the heuristic with hardware watchpoint
> traps --- we can always tell the difference.

The heuristic can be disabled in the all-stop mode remembering all watchpoint
locations indefinitely exactly until count_events_callback() reports zero.

Still the moribund infrastructure is IMO suitable for this task.

But sure your patch is better than the current state and I can post what
I find more strict later.


Thanks,
Jan


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: hardware watchpoints in non-stop - "moribund"/delayed watchpoint traps
  2009-11-18 18:40     ` Jan Kratochvil
@ 2009-11-18 19:46       ` Pedro Alves
  2009-11-18 20:52         ` Jan Kratochvil
  0 siblings, 1 reply; 7+ messages in thread
From: Pedro Alves @ 2009-11-18 19:46 UTC (permalink / raw)
  To: Jan Kratochvil; +Cc: gdb-patches

On Wednesday 18 November 2009 18:39:17, Jan Kratochvil wrote:

> The problem is it is not clear from this message there is a GDB bug and the

> testsuite already contains too many racy failures to be catching more of them.

I'd bet a couple of those are actually exactly as the all-stop case
I described and actually fixed by this change.

> > At first I thought so, but, I then changed my mind into thinking the extra
> > complexity isn't necessary.  We needed the moribund breakpoint locations
> > heuristic, to be able to distinguish random SIGTRAPs from delayed software
> > breakpoint traps.  We don't keep them indefinitly, so to reduce the
> > chances of mistaking a real random SIGTRAP from a delayed software
> > breakpoint SIGTRAP.  We don't need the heuristic with hardware watchpoint
> > traps --- we can always tell the difference.
> 
> The heuristic can be disabled in the all-stop mode remembering all watchpoint
> locations indefinitely exactly until count_events_callback() reports zero.

"indefinitely exactly until" is contradictory. :-)

Think about it.  The fact that we count a certain made-up number of events
_is_ part of the heuristic.  I've seen the moribund events count reaching
zero sooner than it should have, and GDB reporting a random
SIGTRAP by mistake.

> But sure your patch is better than the current state and I can post what
> I find more strict later.

If we have moribund watchpoint locations, then most of the races
you're talking about would still be hidden, exactly by the moribund
locations.  I don't think we'd be stricter.  It seems the only case that
wouldn't happen is the case of the target keeping reporting bad SIGTRAPs
repeatedly, but I'm sure those are noticeable by other symptoms,
like, e.g., target slowness.

Keep in mind that on targets that can't report a stopped data
address, gdb already necessarily ignores bad watchpoint traps.

-- 
Pedro Alves

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: hardware watchpoints in non-stop - "moribund"/delayed watchpoint  traps
  2009-11-18 19:46       ` Pedro Alves
@ 2009-11-18 20:52         ` Jan Kratochvil
  2009-11-20 14:15           ` Pedro Alves
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Kratochvil @ 2009-11-18 20:52 UTC (permalink / raw)
  To: Pedro Alves; +Cc: gdb-patches

On Wed, 18 Nov 2009 20:45:50 +0100, Pedro Alves wrote:
> It seems the only case that wouldn't happen is the case of the target
> keeping reporting bad SIGTRAPs repeatedly, but I'm sure those are noticeable
> by other symptoms, like, e.g., target slowness.

Yes, I meant this case - the one which is now present in FSF GDB HEAD and
which is fixed by the hw-watchpoints patch 1/4 and which could quietly regress
one day again.

Anyway it is just a sanity check.


Regards,
Jan


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: hardware watchpoints in non-stop - "moribund"/delayed watchpoint traps
  2009-11-18 20:52         ` Jan Kratochvil
@ 2009-11-20 14:15           ` Pedro Alves
  0 siblings, 0 replies; 7+ messages in thread
From: Pedro Alves @ 2009-11-20 14:15 UTC (permalink / raw)
  To: gdb-patches

Here's what I just checked in.

I've made two more changes: there's no use keeping moribund
locations around that won't be used.  And, I've also updated
the comment in breakpoint.c.  The "but not a correctness problem"
comment is too optimistic, on targets that need PC
adjustment on breakpoint traps.

-- 
Pedro Alves

2009-11-20  Pedro Alves  <pedro@codesourcery.com>

	* infrun.c (handle_inferior_event): Hardware hatchpoint traps are
	never random signals.
	* breakpoint.c (update_global_location_list): Always delete
	immediately delete hardware watchpoint locations and other
	locations whose target address isn't meaningful.  Update comment
	explaining the hazard of moribund locations.

---
 gdb/breakpoint.c |   61 +++++++++++++++++++++++++++++++++++++++++++------------
 gdb/infrun.c     |   16 ++++++++++++++
 2 files changed, 64 insertions(+), 13 deletions(-)

Index: src/gdb/infrun.c
===================================================================
--- src.orig/gdb/infrun.c	2009-11-20 12:45:54.000000000 +0000
+++ src/gdb/infrun.c	2009-11-20 12:46:18.000000000 +0000
@@ -3593,6 +3593,21 @@ targets should add new threads to the th
 	 function.  */
       stop_print_frame = 1;
 
+      /* This is where we handle "moribund" watchpoints.  Unlike
+	 software breakpoints traps, hardware watchpoint traps are
+	 always distinguishable from random traps.  If no high-level
+	 watchpoint is associated with the reported stop data address
+	 anymore, then the bpstat does not explain the signal ---
+	 simply make sure to ignore it if `stopped_by_watchpoint' is
+	 set.  */
+
+      if (debug_infrun
+	  && ecs->event_thread->stop_signal == TARGET_SIGNAL_TRAP
+	  && !bpstat_explains_signal (ecs->event_thread->stop_bpstat)
+	  && stopped_by_watchpoint)
+	fprintf_unfiltered (gdb_stdlog, "\
+infrun: no user watchpoint explains watchpoint SIGTRAP, ignoring\n");
+
       /* NOTE: cagney/2003-03-29: These two checks for a random signal
          at one stage in the past included checks for an inferior
          function call's call dummy's return breakpoint.  The original
@@ -3616,6 +3631,7 @@ targets should add new threads to the th
       if (ecs->event_thread->stop_signal == TARGET_SIGNAL_TRAP)
 	ecs->random_signal
 	  = !(bpstat_explains_signal (ecs->event_thread->stop_bpstat)
+	      || stopped_by_watchpoint
 	      || ecs->event_thread->trap_expected
 	      || (ecs->event_thread->step_range_end
 		  && ecs->event_thread->step_resume_breakpoint == NULL));
Index: src/gdb/breakpoint.c
===================================================================
--- src.orig/gdb/breakpoint.c	2009-11-20 12:45:54.000000000 +0000
+++ src/gdb/breakpoint.c	2009-11-20 12:46:18.000000000 +0000
@@ -8300,20 +8300,55 @@ update_global_location_list (int should_
 
       if (!found_object)
 	{
-	  if (removed && non_stop)
+	  if (removed && non_stop
+	      && breakpoint_address_is_meaningful (old_loc->owner)
+	      && !is_hardware_watchpoint (old_loc->owner))
 	    {
-	      /* This location was removed from the targets.  In non-stop mode,
-		 a race condition is possible where we've removed a breakpoint,
-		 but stop events for that breakpoint are already queued and will
-		 arrive later.  To suppress spurious SIGTRAPs reported to user,
-		 we keep this breakpoint location for a bit, and will retire it
-		 after we see 3 * thread_count events.
-		 The theory here is that reporting of events should,
-		 "on the average", be fair, so after that many event we'll see
-		 events from all threads that have anything of interest, and no
-		 longer need to keep this breakpoint.  This is just a
-		 heuristic, but if it's wrong, we'll report unexpected SIGTRAP,
-		 which is usability issue, but not a correctness problem.  */
+	      /* This location was removed from the target.  In
+		 non-stop mode, a race condition is possible where
+		 we've removed a breakpoint, but stop events for that
+		 breakpoint are already queued and will arrive later.
+		 We apply an heuristic to be able to distinguish such
+		 SIGTRAPs from other random SIGTRAPs: we keep this
+		 breakpoint location for a bit, and will retire it
+		 after we see some number of events.  The theory here
+		 is that reporting of events should, "on the average",
+		 be fair, so after a while we'll see events from all
+		 threads that have anything of interest, and no longer
+		 need to keep this breakpoint location around.  We
+		 don't hold locations forever so to reduce chances of
+		 mistaking a non-breakpoint SIGTRAP for a breakpoint
+		 SIGTRAP.
+
+		 The heuristic failing can be disastrous on
+		 decr_pc_after_break targets.
+
+		 On decr_pc_after_break targets, like e.g., x86-linux,
+		 if we fail to recognize a late breakpoint SIGTRAP,
+		 because events_till_retirement has reached 0 too
+		 soon, we'll fail to do the PC adjustment, and report
+		 a random SIGTRAP to the user.  When the user resumes
+		 the inferior, it will most likely immediately crash
+		 with SIGILL/SIGBUS/SEGSEGV, or worse, get silently
+		 corrupted, because of being resumed e.g., in the
+		 middle of a multi-byte instruction, or skipped a
+		 one-byte instruction.  This was actually seen happen
+		 on native x86-linux, and should be less rare on
+		 targets that do not support new thread events, like
+		 remote, due to the heuristic depending on
+		 thread_count.
+
+		 Mistaking a random SIGTRAP for a breakpoint trap
+		 causes similar symptoms (PC adjustment applied when
+		 it shouldn't), but then again, playing with SIGTRAPs
+		 behind the debugger's back is asking for trouble.
+
+		 Since hardware watchpoint traps are always
+		 distinguishable from other traps, so we don't need to
+		 apply keep hardware watchpoint moribund locations
+		 around.  We simply always ignore hardware watchpoint
+		 traps we can no longer explain.  */
+
 	      old_loc->events_till_retirement = 3 * (thread_count () + 1);
 	      old_loc->owner = NULL;
 


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-11-20 14:15 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-18 14:46 hardware watchpoints in non-stop - "moribund"/delayed watchpoint traps Pedro Alves
2009-11-18 17:01 ` Jan Kratochvil
2009-11-18 17:51   ` Pedro Alves
2009-11-18 18:40     ` Jan Kratochvil
2009-11-18 19:46       ` Pedro Alves
2009-11-18 20:52         ` Jan Kratochvil
2009-11-20 14:15           ` Pedro Alves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox