From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Snyder To: gdb-patches@sources.redhat.com, kettenis@wins.uva.nl Subject: Re: [PATCH RFC] linux threads: prevent starvation Date: Tue, 26 Jun 2001 19:08:00 -0000 Message-id: <3B394027.DF7247D8@cygnus.com> References: <3B26BCE9.AA6731EC@cygnus.com> X-SW-Source: 2001-06/msg00433.html Hi Mark, Have you had a chance to look at this yet? I am attaching a new version of the diffs. Only one line is different from the previous diff, but it's an important bug fix over the original patch. Michael > OK, this deserves some explanation. > > Problem: Debugging with GDB can cause some threads to starve > in a multi-threaded Linux program. > > Testcase: gdb/testsuite/gdb.threads/pthreads.exp > FAIL: gdb.threads/pthreads.exp: Some threads didn't run. > > Cause: In lin_lwp_wait, we call waitpid (-1, __WCLONE). > If more than one LWP is currently stopped at a breakpoint, > the highest-numbered one will be returned. All the others > will have their breakpoints pushed back. If we then continue, > and the highest-numbered LWP has time to hit a breakpoint > again, no one else will have a chance to run. > > Solution: Instead of always accepting the event for the > first LWP returned by waitpid, use a monte carlo method > to select among all LWPs that have breakpoint events. > > Implementation: I've made use of the lwp list and saved status > that Mark has already implemented. I've saved the SIGTRAP events > along with any other events accidentally fetched by stop_wait_callback, > and I've added the event fetched by lin_lwp_wait to the same queue. > Then I've added code that examines the list of fetched SIGTRAP events > and chooses one at random, making that LWP the event LWP and returning > its event to gdb. > > This passes the testsuite (where the old method fails), and I've > done extensive hand testing as well. In fact, this has exposed > a number of bugs in core gdb's thread handling, some of which I've > already submitted patches for. Others are still to come. With > all of these patches in place, this code seems to be quite robust. > I can put a breakpoint in code that is shared by several threads, > and keep them all stepping and nexting indefinitely. > 2001-06-12 Michael Snyder * lin-lwp.c: Prevent thread starvation by using a monte carlo method to choose which of several event threads to handle next. (resume_callback): Disable code that attempts to handle step_resume breakpoints. Let core gdb handle this. (stop_wait_callback): Defer pushback of breakpoint events until later; add SIGTRAP events to the queue of unhandled events. Keep calling waitpid until SIGSTOP retrieved. If more than one non-SIGSTOP event is retrieved, push them back onto the process queue using kill. (count_events_callback, select_singlestep_lwp_callback, select_event_lwp_callback, cancel_breakpoints_callback, select_event_lwp): New functions. Implement monte carlo method for selecting which of several SIGTRAP threads to handle next. Push back the breakpoint event for all threads other than the selected one. (lin_lwp_wait): Call select_event_lwp to decide which of several sigtrapped lwps to handle next. Index: lin-lwp.c =================================================================== RCS file: /cvs/src/src/gdb/lin-lwp.c,v retrieving revision 1.24 diff -c -3 -p -r1.24 lin-lwp.c *** lin-lwp.c 2001/06/07 19:31:10 1.24 --- lin-lwp.c 2001/06/27 02:07:25 *************** resume_callback (struct lwp_info *lp, vo *** 458,464 **** { struct thread_info *tp; ! #if 1 /* FIXME: kettenis/2000-08-26: This should really be handled properly by core GDB. */ --- 458,464 ---- { struct thread_info *tp; ! #if 0 /* FIXME: kettenis/2000-08-26: This should really be handled properly by core GDB. */ *************** stop_wait_callback (struct lwp_info *lp, *** 585,591 **** pid_t pid; int status; - get_another_event: gdb_assert (lp->status == 0); pid = waitpid (GET_LWP (lp->ptid), &status, --- 585,590 ---- *************** stop_wait_callback (struct lwp_info *lp, *** 619,631 **** } gdb_assert (WIFSTOPPED (status)); - lp->stopped = 1; if (WSTOPSIG (status) != SIGSTOP) { ! if (WSTOPSIG (status) == SIGTRAP ! && breakpoint_inserted_here_p (read_pc_pid (pid_to_ptid (pid)) ! - DECR_PC_AFTER_BREAK)) { /* If a LWP other than the LWP that we're reporting an event for has hit a GDB breakpoint (as opposed to --- 618,627 ---- } gdb_assert (WIFSTOPPED (status)); if (WSTOPSIG (status) != SIGSTOP) { ! if (WSTOPSIG (status) == SIGTRAP) { /* If a LWP other than the LWP that we're reporting an event for has hit a GDB breakpoint (as opposed to *************** stop_wait_callback (struct lwp_info *lp, *** 640,660 **** user will delete or disable the breakpoint, but the thread will have already tripped on it. */ - if (debug_lin_lwp) - fprintf_unfiltered (gdb_stdlog, - "Tripped breakpoint at %lx in LWP %d" - " while waiting for SIGSTOP.\n", - (long) read_pc_pid (lp->ptid), pid); - - /* Set the PC to before the trap. */ - if (DECR_PC_AFTER_BREAK) - write_pc_pid (read_pc_pid (pid_to_ptid (pid)) - - DECR_PC_AFTER_BREAK, - pid_to_ptid (pid)); - /* Now resume this LWP and get the SIGSTOP event. */ ptrace (PTRACE_CONT, GET_LWP (lp->ptid), 0, 0); ! goto get_another_event; } else if (WSTOPSIG (status) == SIGINT && signal_pass_state (SIGINT) == 0) --- 636,657 ---- user will delete or disable the breakpoint, but the thread will have already tripped on it. */ /* Now resume this LWP and get the SIGSTOP event. */ ptrace (PTRACE_CONT, GET_LWP (lp->ptid), 0, 0); ! if (debug_lin_lwp) ! { ! fprintf_unfiltered (gdb_stderr, ! "SWC: Candidate SIGTRAP event in %ld\n", ! GET_LWP (lp->ptid)); ! } ! /* Hold the SIGTRAP for handling by lin_lwp_wait. */ ! stop_wait_callback (lp, data); ! /* If there's another event, throw it back into the queue. */ ! if (lp->status) ! kill (GET_LWP (lp->ptid), WSTOPSIG (lp->status)); ! /* Save the sigtrap event. */ ! lp->status = status; ! return 0; } else if (WSTOPSIG (status) == SIGINT && signal_pass_state (SIGINT) == 0) *************** stop_wait_callback (struct lwp_info *lp, *** 664,690 **** just ignore all SIGINT events from all lwp's except for the one that was caught by lin_lwp_wait. */ ! /* Now resume this LWP and get the SIGSTP event. */ ptrace (PTRACE_CONT, GET_LWP (lp->ptid), 0, 0); ! goto get_another_event; } else { if (debug_lin_lwp) ! fprintf_unfiltered (gdb_stdlog, ! "Received %s in LWP %d while waiting for SIGSTOP.\n", ! strsignal (WSTOPSIG (status)), pid); ! /* The thread was stopped with a signal other than ! SIGSTOP, and didn't accidentally trip a breakpoint. ! Record the wait status. */ ! lp->status = status; } } else { /* We caught the SIGSTOP that we intended to catch, so there's no SIGSTOP pending. */ lp->signalled = 0; } } --- 661,702 ---- just ignore all SIGINT events from all lwp's except for the one that was caught by lin_lwp_wait. */ ! /* Now resume this LWP and get the SIGSTOP event. */ ptrace (PTRACE_CONT, GET_LWP (lp->ptid), 0, 0); ! return stop_wait_callback (lp, data); } else { + /* The thread was stopped with a signal other than + SIGSTOP, and didn't accidentally trip a breakpoint. */ + if (debug_lin_lwp) ! { ! fprintf_unfiltered (gdb_stderr, ! "SWC: Pending event %d in %ld\n", ! WSTOPSIG (status), GET_LWP (lp->ptid)); ! } ! /* Now resume this LWP and get the SIGSTOP event. */ ! ptrace (PTRACE_CONT, GET_LWP (lp->ptid), 0, 0); ! /* Hold this event/waitstatus while we check to see if ! there are any more (we still want to get that SIGSTOP). */ ! stop_wait_callback (lp, data); ! /* If the lp->status field is still empty, use it to hold ! this event. If not, then this event must be returned ! to the event queue of the LWP. */ ! if (lp->status == 0) ! lp->status = status; ! else ! kill (GET_LWP (lp->ptid), WSTOPSIG (status)); ! return 0; } } else { /* We caught the SIGSTOP that we intended to catch, so there's no SIGSTOP pending. */ + lp->stopped = 1; lp->signalled = 0; } } *************** running_callback (struct lwp_info *lp, v *** 710,715 **** --- 722,862 ---- return (lp->stopped == 0); } + /* Count the LWP's that have had events. */ + + static int + count_events_callback (struct lwp_info *lp, void *data) + { + int *count = data; + + /* Count only threads that have a SIGTRAP pending. */ + if (lp->status != 0 && + WIFSTOPPED (lp->status) && + WSTOPSIG (lp->status) == SIGTRAP && + count != NULL) /* paranoia */ + (*count)++; + + return 0; + } + + /* Select the LWP (if any) that is currently being single-stepped. */ + + static int + select_singlestep_lwp_callback (struct lwp_info *lp, void *data) + { + if (lp->step && lp->status != 0) + return 1; + else + return 0; + } + + /* Select the Nth LWP that has had a SIGTRAP event. */ + + static int + select_event_lwp_callback (struct lwp_info *lp, void *data) + { + int *selector = data; + + /* Select only threads that have a SIGTRAP event pending. */ + if (lp->status != 0 && + WIFSTOPPED (lp->status) && + WSTOPSIG (lp->status) == SIGTRAP && + selector != NULL) /* paranoia */ + if ((*selector)-- == 0) + return 1; + + return 0; + } + + static int + cancel_breakpoints_callback (struct lwp_info *lp, void *data) + { + struct lwp_info *event_lp = data; + + if (lp != event_lp && + lp->status != 0 && + WIFSTOPPED (lp->status) && + WSTOPSIG (lp->status) == SIGTRAP && + breakpoint_inserted_here_p (read_pc_pid (lp->ptid) - + DECR_PC_AFTER_BREAK)) + { + if (debug_lin_lwp) + { + fprintf_unfiltered (gdb_stdlog, + "Push back BP for %ld\n", + GET_LWP (lp->ptid)); + } + /* For each lp except the event lp, if there was a trap, + set the PC to before the trap. */ + if (DECR_PC_AFTER_BREAK) + { + write_pc_pid (read_pc_pid (lp->ptid) - DECR_PC_AFTER_BREAK, + lp->ptid); + } + lp->status = 0; + } + return 0; + } + + /* Select one LWP out of those that have events to be the event thread. */ + + static void + select_event_lwp (struct lwp_info **orig_lp, int *status) + { + int num_events = 0; + int random_selector; + struct lwp_info *event_lp; + + /* Give preference to any LWP that is being single-stepped. */ + event_lp = iterate_over_lwps (select_singlestep_lwp_callback, NULL); + if (event_lp != NULL) + { + if (debug_lin_lwp) + fprintf_unfiltered (gdb_stdlog, + "Select singlestep lwp %ld\n", + GET_LWP (event_lp->ptid)); + } + else + { + /* No single-stepping LWP. Select one at random, out of those + which have had SIGTRAP events. */ + + /* First see how many SIGTRAP events we have. */ + iterate_over_lwps (count_events_callback, (void *) &num_events); + + /* OK, now randomly pick the Nth LWP of those that have had SIGTRAP. */ + random_selector = (int) + ((num_events * (double) rand ()) / (RAND_MAX + 1.0)); + + if (debug_lin_lwp) + { + if (num_events > 1) + fprintf_unfiltered (gdb_stdlog, + "Found %d SIGTRAP events, selecting #%d\n", + num_events, random_selector); + else if (num_events <= 0) + fprintf_unfiltered (gdb_stdlog, + "ERROR select_event_lwp %d events!\n", + num_events); + } + + event_lp = iterate_over_lwps (select_event_lwp_callback, + (void *) &random_selector); + } + + if (event_lp != NULL) + { + /* "event_lp" is now the selected event thread. + If any other threads have had SIGTRAP events, these events + must now be cancelled, so that the respective thread will + trip the breakpoint again once it is resumed. */ + iterate_over_lwps (cancel_breakpoints_callback, (void *) event_lp); + *orig_lp = event_lp; + *status = event_lp->status; + event_lp->status = 0; + } + } + /* Return non-zero if LP has been resumed. */ static int *************** lin_lwp_wait (ptid_t ptid, struct target *** 741,757 **** /* First check if there is a LWP with a wait status pending. */ if (pid == -1) { ! /* Any LWP will do. */ lp = iterate_over_lwps (status_callback, NULL); if (lp) { ! if (debug_lin_lwp) fprintf_unfiltered (gdb_stdlog, ! "Using pending wait status for LWP %ld.\n", GET_LWP (lp->ptid)); - status = lp->status; - lp->status = 0; } /* But if we don't fine one, we'll have to wait, and check both --- 888,907 ---- /* First check if there is a LWP with a wait status pending. */ if (pid == -1) { ! /* Any LWP that's been resumed will do. */ lp = iterate_over_lwps (status_callback, NULL); if (lp) { ! status = lp->status; ! lp->status = 0; ! if (debug_lin_lwp && status) fprintf_unfiltered (gdb_stdlog, ! "Using pending wait status %d for LWP %ld.\n", ! WIFSTOPPED (status) ? WSTOPSIG (status) : ! WIFSIGNALED (status) ? WTERMSIG (status) : ! WEXITSTATUS (status), GET_LWP (lp->ptid)); } /* But if we don't fine one, we'll have to wait, and check both *************** lin_lwp_wait (ptid_t ptid, struct target *** 772,782 **** status = lp->status; lp->status = 0; ! if (debug_lin_lwp) ! if (status) ! fprintf_unfiltered (gdb_stdlog, ! "Using pending wait status for LWP %ld.\n", ! GET_LWP (lp->ptid)); /* If we have to wait, take into account whether PID is a cloned process or not. And we have to convert it to something that --- 922,934 ---- status = lp->status; lp->status = 0; ! if (debug_lin_lwp && status) ! fprintf_unfiltered (gdb_stdlog, ! "Using pending wait status %d for LWP %ld.\n", ! WIFSTOPPED (status) ? WSTOPSIG (status) : ! WIFSIGNALED (status) ? WTERMSIG (status) : ! WEXITSTATUS (status), ! GET_LWP (lp->ptid)); /* If we have to wait, take into account whether PID is a cloned process or not. And we have to convert it to something that *************** lin_lwp_wait (ptid_t ptid, struct target *** 947,952 **** --- 1099,1111 ---- /* This LWP is stopped now. */ lp->stopped = 1; + /* MVS: so save its event_status. */ + lp->status = status; + if (debug_lin_lwp) + fprintf_unfiltered (gdb_stdlog, + "LLW: Candidate event %d in %ld\n", + WSTOPSIG (status), GET_LWP (lp->ptid)); + /* Now stop all other LWP's ... */ iterate_over_lwps (stop_callback, NULL); *************** lin_lwp_wait (ptid_t ptid, struct target *** 954,964 **** longer running. */ iterate_over_lwps (stop_wait_callback, NULL); /* If we're not running in "threaded" mode, we'll report the bare process id. */ if (WIFSTOPPED (status) && WSTOPSIG (status) == SIGTRAP) ! trap_ptid = (threaded ? lp->ptid : pid_to_ptid (GET_LWP (lp->ptid))); else trap_ptid = null_ptid; --- 1113,1135 ---- longer running. */ iterate_over_lwps (stop_wait_callback, NULL); + /* MVS Now choose an event thread from among those that + have had events. Giving equal priority to all threads + that have had events helps prevent starvation. */ + + select_event_lwp (&lp, &status); + /* If we're not running in "threaded" mode, we'll report the bare process id. */ if (WIFSTOPPED (status) && WSTOPSIG (status) == SIGTRAP) ! { ! trap_ptid = (threaded ? lp->ptid : pid_to_ptid (GET_LWP (lp->ptid))); ! if (debug_lin_lwp) ! fprintf_unfiltered (gdb_stdlog, ! "LLW: trap_ptid is %ld\n", ! GET_LWP (trap_ptid)); ! } else trap_ptid = null_ptid; >From ac131313@cygnus.com Wed Jun 27 10:24:00 2001 From: Andrew Cagney To: gdb-patches@sources.redhat.com Subject: [patch/mi] Output full breakpoint table headings Date: Wed, 27 Jun 2001 10:24:00 -0000 Message-id: <3B3A16C8.8070003@cygnus.com> X-SW-Source: 2001-06/msg00434.html Content-length: 439 Hello, I've committed the attached. For mi!=0 it always outputs a full breakpoint table header and the breakpoint table body as a list. One note on on my coding style: o I've tried to always test mi_version == 0 rather than alternate between that and tests like mi_version>0. (make searching easier) o I've tried to do things so that the removal of the mi0 code is a no-brainer. (reduce need to re-indent) enjoy, Andrew