[PATCH] RFC: All stepping threads need the same checks and cleanup as the currently stepping thread

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

From: Sterling Augustine <saugustine@google.com>
To: gdb-patches <gdb-patches@sourceware.org>,
	Pedro Alves <palves@redhat.com>
Subject: [PATCH] RFC: All stepping threads need the same checks and cleanup as the currently stepping thread
Date: Tue, 21 Jan 2014 22:17:00 -0000	[thread overview]
Message-ID: <CAEG7qUzQ2YR+89he=hMa7eNZb9fMxp1o20ActJvXx10aeVOySw@mail.gmail.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 1761 bytes --]

I am having trouble reducing a large and heavily multi-threaded
program (with many SIGPROFs) to a publishable test case, but I have
been intermittently hitting the following assertion in gdb's "resume"
function:

gdb_assert (pc_in_thread_step_range (pc, tp));

The PC is in a function called inside the step range, and therefore
you would expect process_event_stop_test's checks for this (starting
with the comment, "We stepped out of the stepping range") to catch it
and fix it up.

But the problem is that the thread out of the step range is not the
currently stepped thread. Process_event_stop_test calls
switch_back_to_stepped_thread, which, in turn, calls resume, bypassing
the extra logic in process_event_stop_test to fix up the step range,
and leading to the assertion error.

But I don't see any reason to assume that only the current thread
would need the additional cleanup found in process_event_stop_test. In
fact, switch_back_to_stepped_thread has a special case (hidden in
currently_stepping_or_nexting_callback), to _prevent_ it from
restarting the current thread, presumably so it that thread can get
the additional cleanup.

The enclosed patch does two things:

1. Adds a large amount of tracing, which helped me diagnose the problem.
2. Changes switch_back_to_stepped_thread to still switch back to a
stepped thread, but to avoid restarting it, allowing the additional
checks in process_event_stop_test to work their magic.

The GDB testsuite still passes (in particular, no new failures in
testsuite/gdb.threads), but I know there are a lot of subtleties here.

This is the best fix I can come up with, but are there any other
suggestions on how to approach the problem? (I'll do a ChangeLog entry
after taking comments.)

Thanks,

Sterling

[-- Attachment #2: switch-thread.patch --]
[-- Type: text/x-patch, Size: 9907 bytes --]

diff --git a/gdb/infcmd.c b/gdb/infcmd.c
index 2d50f41..c042e0d 100644
--- a/gdb/infcmd.c
+++ b/gdb/infcmd.c
@@ -32,6 +32,7 @@
 #include "gdbcore.h"
 #include "target.h"
 #include "language.h"
+//#include "symfile.h"
 #include "objfiles.h"
 #include "completer.h"
 #include "ui-out.h"
@@ -1049,6 +1050,10 @@ step_once (int skip_subroutines, int single_inst, int count, int thread)
 				 &tp->control.step_range_start,
 				 &tp->control.step_range_end);
 
+          if (debug_infrun)
+            fprintf_unfiltered (gdb_stdlog,
+				"infrun: %s may range step\n",
+                                target_pid_to_str (tp->ptid));
 	  tp->control.may_range_step = 1;
 
 	  /* If we have no line info, switch to stepi mode.  */
@@ -1056,6 +1061,10 @@ step_once (int skip_subroutines, int single_inst, int count, int thread)
 	    {
 	      tp->control.step_range_start = tp->control.step_range_end = 1;
 	      tp->control.may_range_step = 0;
+              if (debug_infrun)
+                fprintf_unfiltered (gdb_stdlog,
+                                    "infrun: %s may not range step\n",
+                                    target_pid_to_str (tp->ptid));
 	    }
 	  else if (tp->control.step_range_end == 0)
 	    {
@@ -1346,6 +1355,10 @@ until_next_command (int from_tty)
       tp->control.step_range_end = sal.end;
     }
   tp->control.may_range_step = 1;
+  if (debug_infrun)
+    fprintf_unfiltered (gdb_stdlog,
+                        "infrun: %s may range step\n",
+                        target_pid_to_str (tp->ptid));
 
   tp->control.step_over_calls = STEP_OVER_ALL;
 
diff --git a/gdb/infrun.c b/gdb/infrun.c
index 311bf9c..e6f51c9 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -1315,6 +1315,10 @@ displaced_step_prepare (ptid_t ptid)
      the scratch buffer lands within the stepping range (e.g., a
      jump/branch).  */
   tp->control.may_range_step = 0;
+  if (debug_infrun)
+    fprintf_unfiltered (gdb_stdlog,
+                        "infrun: %s may not range step\n",
+                        target_pid_to_str (tp->ptid));
 
   /* We have to displaced step one thread at a time, as we only have
      access to a single scratch space per inferior.  */
@@ -1775,7 +1779,14 @@ a command like `return' or `jump' to continue execution."));
   /* If we have a breakpoint to step over, make sure to do a single
      step only.  Same if we have software watchpoints.  */
   if (tp->control.trap_expected || bpstat_should_step ())
-    tp->control.may_range_step = 0;
+    {
+      tp->control.may_range_step = 0;
+      if (debug_infrun)
+        fprintf_unfiltered (gdb_stdlog,
+                            "infrun: %s may not range step\n",
+                            target_pid_to_str (tp->ptid));
+    }
+
 
   /* If enabled, step over breakpoints by executing a copy of the
      instruction at a different address.
@@ -1945,6 +1956,14 @@ a command like `return' or `jump' to continue execution."));
 	     operation, like stepping the thread out of the dynamic
 	     linker or the displaced stepping scratch pad.  We
 	     shouldn't have allowed a range step then.  */
+	  if (debug_infrun)
+	    fprintf_unfiltered (gdb_stdlog,
+				"infrun: %s step range = [%lx-%lx] pc = %lx\n",
+                                target_pid_to_str (tp->ptid),
+                                tp->control.step_range_start,
+                                tp->control.step_range_end,
+                                pc);
+
 	  gdb_assert (pc_in_thread_step_range (pc, tp));
 	}
 
@@ -1990,6 +2009,10 @@ clear_proceed_status_thread (struct thread_info *tp)
   tp->control.step_range_start = 0;
   tp->control.step_range_end = 0;
   tp->control.may_range_step = 0;
+  if (debug_infrun)
+    fprintf_unfiltered (gdb_stdlog,
+                        "infrun: %s may not range step\n",
+                        target_pid_to_str (tp->ptid));
   tp->control.step_frame_id = null_frame_id;
   tp->control.step_stack_frame_id = null_frame_id;
   tp->control.step_over_calls = STEP_OVER_UNDEBUGGABLE;
@@ -3254,6 +3277,10 @@ handle_inferior_event (struct execution_control_state *ecs)
       /* Disable range stepping.  If the next step request could use a
 	 range, this will be end up re-enabled then.  */
       ecs->event_thread->control.may_range_step = 0;
+      if (debug_infrun)
+        fprintf_unfiltered (gdb_stdlog,
+                            "infrun: %s may not range step\n",
+                            target_pid_to_str (ecs->ptid));
     }
 
   /* Dependent on valid ECS->EVENT_THREAD.  */
@@ -4644,7 +4671,9 @@ process_event_stop_test (struct execution_control_state *ecs)
 
     case BPSTAT_WHAT_HP_STEP_RESUME:
       if (debug_infrun)
-	fprintf_unfiltered (gdb_stdlog, "infrun: BPSTAT_WHAT_HP_STEP_RESUME\n");
+	fprintf_unfiltered (gdb_stdlog, "infrun: BPSTAT_WHAT_HP_STEP_RESUME "
+                            "step_after_step_resume_breakpoint = %d\n",
+                            ecs->event_thread->step_after_step_resume_breakpoint);
 
       delete_step_resume_breakpoint (ecs->event_thread);
       if (ecs->event_thread->step_after_step_resume_breakpoint)
@@ -4727,6 +4756,10 @@ process_event_stop_test (struct execution_control_state *ecs)
 	 necessary (e.g., if we're stepping over a breakpoint or we
 	 have software watchpoints).  */
       ecs->event_thread->control.may_range_step = 1;
+      if (debug_infrun)
+        fprintf_unfiltered (gdb_stdlog,
+                            "infrun: %s may range step\n",
+                            target_pid_to_str (ecs->ptid));
 
       /* When stepping backward, stop at beginning of line range
 	 (unless it's the function entry point, in which case
@@ -5245,6 +5278,10 @@ process_event_stop_test (struct execution_control_state *ecs)
   ecs->event_thread->control.step_range_start = stop_pc_sal.pc;
   ecs->event_thread->control.step_range_end = stop_pc_sal.end;
   ecs->event_thread->control.may_range_step = 1;
+  if (debug_infrun)
+    fprintf_unfiltered (gdb_stdlog,
+                        "infrun: %s may range step\n",
+                        target_pid_to_str (ecs->ptid));
   set_step_info (frame, stop_pc_sal);
 
   if (debug_infrun)
@@ -5274,10 +5311,7 @@ switch_back_to_stepped_thread (struct execution_control_state *ecs)
 	  if ((ecs->event_thread->control.trap_expected
 	       && ecs->event_thread->suspend.stop_signal != GDB_SIGNAL_TRAP)
 	      || ecs->event_thread->stepping_over_breakpoint)
-	    {
-	      keep_going (ecs);
-	      return 1;
-	    }
+            return 0;
 
 	  /* If the stepping thread exited, then don't try to switch
 	     back and resume it, which could fail in several different
@@ -5307,24 +5341,18 @@ switch_back_to_stepped_thread (struct execution_control_state *ecs)
 				    "stepped thread, it has vanished\n");
 
 	      delete_thread (tp->ptid);
-	      keep_going (ecs);
-	      return 1;
+	      return 0;
 	    }
 
-	  /* Otherwise, we no longer expect a trap in the current thread.
-	     Clear the trap_expected flag before switching back -- this is
-	     what keep_going would do as well, if we called it.  */
-	  ecs->event_thread->control.trap_expected = 0;
-
 	  if (debug_infrun)
 	    fprintf_unfiltered (gdb_stdlog,
-				"infrun: switching back to stepped thread\n");
+				"infrun: switching back to stepped thread %s\n",
+                                target_pid_to_str (tp->ptid));
 
 	  ecs->event_thread = tp;
 	  ecs->ptid = tp->ptid;
 	  context_switch (ecs->ptid);
-	  keep_going (ecs);
-	  return 1;
+          return 0;
 	}
     }
   return 0;
diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index 2371ad4..1c6bd45 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -46,6 +46,7 @@
 #include "gregset.h"		/* for gregset */
 #include "gdbcore.h"		/* for get_exec_file */
 #include <ctype.h>		/* for isdigit */
+#include "gdbthread.h"		/* for struct thread_info etc.  */
 #include <sys/stat.h>		/* for struct stat */
 #include <fcntl.h>		/* for O_RDONLY */
 #include "inf-loop.h"
@@ -64,6 +65,7 @@
 #include "agent.h"
 #include "tracepoint.h"
 #include "exceptions.h"
+#include "linux-ptrace.h"
 #include "buffer.h"
 #include "target-descriptions.h"
 #include "filestuff.h"
@@ -2601,6 +2603,9 @@ stop_wait_callback (struct lwp_info *lp, void *data)
 
       if (WSTOPSIG (status) != SIGSTOP)
 	{
+          struct thread_info *tp = find_thread_ptid (lp->ptid);
+          CORE_ADDR pc = regcache_read_pc (get_thread_regcache (lp->ptid));
+
 	  /* The thread was stopped with a signal other than SIGSTOP.  */
 
 	  save_sigtrap (lp);
@@ -2611,6 +2616,18 @@ stop_wait_callback (struct lwp_info *lp, void *data)
 				status_to_str ((int) status),
 				target_pid_to_str (lp->ptid));
 
+          if (debug_linux_nat
+              && tp->control.may_range_step
+              && !pc_in_thread_step_range (pc, tp))
+            {
+              fprintf_unfiltered (gdb_stdlog,
+                                  "SWC: %s %lx out of step range [%lx-%lx]\n",
+                                  target_pid_to_str (tp->ptid),
+                                  pc,
+                                  tp->control.step_range_start,
+                                  tp->control.step_range_end);
+            }
+
 	  /* Save the sigtrap event.  */
 	  lp->status = status;
 	  gdb_assert (!lp->stopped);
@@ -2686,7 +2703,13 @@ count_events_callback (struct lwp_info *lp, void *data)
 
   /* Count only resumed LWPs that have a SIGTRAP event pending.  */
   if (lp->resumed && linux_nat_lp_status_is_event (lp))
-    (*count)++;
+    {
+      if (debug_linux_nat)
+	fprintf_unfiltered (gdb_stdlog,
+			    "CEC: LWP %ld has an event\n",
+			    ptid_get_lwp (lp->ptid));
+      (*count)++;
+    }
 
   return 0;
 }
@@ -3212,6 +3235,9 @@ linux_nat_wait_1 (struct target_ops *ops,
   block_child_signals (&prev_mask);
 
 retry:
+  if (debug_linux_nat)
+    fprintf_unfiltered (gdb_stdlog, "LLW: retry\n");
+
   lp = NULL;
   status = 0;

next             reply	other threads:[~2014-01-21 22:17 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-21 22:17 Sterling Augustine [this message]
2014-01-22 14:04 ` Pedro Alves
2014-01-23 17:55   ` Sterling Augustine
2014-01-23 18:53     ` Pedro Alves
2014-01-24 12:44       ` Pedro Alves
2014-02-07 19:52       ` Pedro Alves
2014-02-07 20:12         ` Pedro Alves
2014-02-26 17:12         ` Pedro Alves

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEG7qUzQ2YR+89he=hMa7eNZb9fMxp1o20ActJvXx10aeVOySw@mail.gmail.com' \
    --to=saugustine@google.com \
    --cc=gdb-patches@sourceware.org \
    --cc=palves@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox