From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4754 invoked by alias); 19 May 2008 15:21:31 -0000 Received: (qmail 4506 invoked by uid 22791); 19 May 2008 15:21:29 -0000 X-Spam-Check-By: sourceware.org Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (65.74.133.4) by sourceware.org (qpsmtpd/0.31) with ESMTP; Mon, 19 May 2008 15:21:11 +0000 Received: (qmail 9862 invoked from network); 19 May 2008 15:21:08 -0000 Received: from unknown (HELO orlando.local) (pedro@127.0.0.2) by mail.codesourcery.com with ESMTPA; 19 May 2008 15:21:08 -0000 From: Pedro Alves To: gdb-patches@sourceware.org Subject: Re: [RFC] 10/10 non-stop for linux native Date: Mon, 19 May 2008 17:20:00 -0000 User-Agent: KMail/1.9.9 References: <200805061650.10912.pedro@codesourcery.com> In-Reply-To: <200805061650.10912.pedro@codesourcery.com> MIME-Version: 1.0 Content-Type: Multipart/Mixed; boundary="Boundary-00=_krZMIfNZXPMzwMo" Message-Id: <200805191621.08320.pedro@codesourcery.com> X-IsSubscribed: yes Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2008-05/txt/msg00563.txt.bz2 --Boundary-00=_krZMIfNZXPMzwMo Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Content-Disposition: inline Content-length: 1007 A Tuesday 06 May 2008 16:50:10, Pedro Alves wrote: > This adds non-stop support for linux native. > > The changes are: > > - ptracing a running thread doesn't work. > > This implies that, we must ensure that the proc_services > usage in linux-thread-db.c talks to a pid of a stopped lwp. > > Checking if a thread is alive with ptrace doesn't work > for running threads. Worse, ptrace errors out claiming > the thread doesn't exits. > > - We must not stop all threads, obviously. > > - We must mark threads as running if we're resuming > them behind the core's back. > > - Implement target_stop_ptid to interrupt only one thread I removed a bit of code that handled a SIGINT in sync_execution with non-stop on, as that was something I'm not sure we want. One issue I would like comments on, is that -exec-continue / target_stop_ptid relies on an event happening on the inferior. Should we not rely on that, and just suspend the thread with SIGSTOP ? Would this affect frontends ? -- Pedro Alves --Boundary-00=_krZMIfNZXPMzwMo Content-Type: text/x-diff; charset="utf-8"; name="010-non_stop_linux.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="010-non_stop_linux.diff" Content-length: 13827 2008-05-19 Pedro Alves * linux-fork.c (linux_fork_killall): Use SIGKILL instead of PTRACE_KILL. * linux-nat.c (find_lwp_pid): Make public. (sigint_clear_callback): New. (linux_nat_resume): In non-stop mode, only touch the passed in ptid. Clear the sigint flag. (linux_handle_extended_wait): On a clone event, add new lwp to GDB's thread table, and mark as running, executing and stopped appropriatelly. (linux_nat_wait): In non-stop mode, don't stop all lwps. (kill_callback): If lwp is not stopped, use SIGKILL. (linux_nat_thread_alive): Use signal 0 to detect if thread is alive. (send_sigint_callback): New. (linux_nat_stop): New. (linux_nat_stop_ptid): New. (linux_nat_add_target): Set to_stop and to_stop_ptid. * linux-nat.h (struct lwp_info): Add sigint field. (find_lwp_pid): Declare. * linux-thread-db.c (thread_from_lwp, enable_thread_event) (check_event): Set proc_handle.pid to the stopped lwp. (thread_db_find_new_threads): If current lwp is executing, don't try to read from it. --- gdb/linux-fork.c | 4 - gdb/linux-nat.c | 162 +++++++++++++++++++++++++++++++++++++++----------- gdb/linux-nat.h | 6 + gdb/linux-thread-db.c | 15 ++++ 4 files changed, 152 insertions(+), 35 deletions(-) Index: src/gdb/linux-fork.c =================================================================== --- src.orig/gdb/linux-fork.c 2008-05-19 15:00:15.000000000 +0100 +++ src/gdb/linux-fork.c 2008-05-19 15:28:43.000000000 +0100 @@ -337,7 +337,9 @@ linux_fork_killall (void) { pid = PIDGET (fp->ptid); do { - ptrace (PT_KILL, pid, 0, 0); + /* Use SIGKILL instead of PTRACE_KILL because the former works even + if the thread is running, while the later doesn't. */ + kill (pid, SIGKILL); ret = waitpid (pid, &status, 0); /* We might get a SIGCHLD instead of an exit status. This is aggravated by the first kill above - a child has just Index: src/gdb/linux-nat.c =================================================================== --- src.orig/gdb/linux-nat.c 2008-05-19 15:14:07.000000000 +0100 +++ src/gdb/linux-nat.c 2008-05-19 15:28:43.000000000 +0100 @@ -212,6 +212,8 @@ static void linux_nat_async (void (*call static int linux_nat_async_mask (int mask); static int kill_lwp (int lwpid, int signo); +static int send_sigint_callback (struct lwp_info *lp, void *data); + /* Captures the result of a successful waitpid call, along with the options used in that call. */ struct waitpid_result @@ -920,7 +922,7 @@ delete_lwp (ptid_t ptid) /* Return a pointer to the structure describing the LWP corresponding to PID. If no corresponding LWP could be found, return NULL. */ -static struct lwp_info * +struct lwp_info * find_lwp_pid (ptid_t ptid) { struct lwp_info *lp; @@ -1445,6 +1447,13 @@ resume_set_callback (struct lwp_info *lp return 0; } +static int +sigint_clear_callback (struct lwp_info *lp, void *data) +{ + lp->sigint = 0; + return 0; +} + static void linux_nat_resume (ptid_t ptid, int step, enum target_signal signo) { @@ -1468,10 +1477,17 @@ linux_nat_resume (ptid_t ptid, int step, /* A specific PTID means `step only this process id'. */ resume_all = (PIDGET (ptid) == -1); - if (resume_all) - iterate_over_lwps (resume_set_callback, NULL); - else - iterate_over_lwps (resume_clear_callback, NULL); + if (non_stop && resume_all) + internal_error (__FILE__, __LINE__, + "can't resume all in non-stop mode"); + + if (!non_stop) + { + if (resume_all) + iterate_over_lwps (resume_set_callback, NULL); + else + iterate_over_lwps (resume_clear_callback, NULL); + } /* If PID is -1, it's the current inferior that should be handled specially. */ @@ -1481,6 +1497,7 @@ linux_nat_resume (ptid_t ptid, int step, lp = find_lwp_pid (ptid); gdb_assert (lp != NULL); + /* Convert to something the lower layer understands. */ ptid = pid_to_ptid (GET_LWP (lp->ptid)); /* Remember if we're stepping. */ @@ -1489,6 +1506,9 @@ linux_nat_resume (ptid_t ptid, int step, /* Mark this LWP as resumed. */ lp->resumed = 1; + /* Remove the SIGINT mark. Used in non-stop mode. */ + lp->sigint = 0; + /* If we have a pending wait status for this thread, there is no point in resuming the process. But first make sure that linux_nat_wait won't preemptively handle the event - we @@ -1631,6 +1651,8 @@ linux_handle_extended_wait (struct lwp_i ourstatus->kind = TARGET_WAITKIND_VFORKED; else { + struct cleanup *old_chain; + ourstatus->kind = TARGET_WAITKIND_IGNORE; new_lp = add_lwp (BUILD_LWP (new_pid, GET_PID (inferior_ptid))); new_lp->cloned = 1; @@ -1650,20 +1672,42 @@ linux_handle_extended_wait (struct lwp_i else status = 0; + /* Make thread_db aware of this thread. We do this this + early, because we need to mark the new thread as running. + thread_db needs a stopped inferior_ptid -- since we know + LP is stopped, so use it this time. */ + old_chain = save_inferior_ptid (); + inferior_ptid = lp->ptid; + lp->stopped = 1; + target_find_new_threads (); + do_cleanups (old_chain); + if (!in_thread_list (new_lp->ptid)) + { + /* We're not using thread_db. Attach and add it to + GDB's list. */ + lin_lwp_attach_lwp (new_lp->ptid); + target_post_attach (GET_LWP (new_lp->ptid)); + add_thread (new_lp->ptid); + } + if (stopping) new_lp->stopped = 1; else { + new_lp->stopped = 0; new_lp->resumed = 1; ptrace (PTRACE_CONT, PIDGET (lp->waitstatus.value.related_pid), 0, status ? WSTOPSIG (status) : 0); + set_running (new_lp->ptid, 1); + set_executing (new_lp->ptid, 1); } if (debug_linux_nat) fprintf_unfiltered (gdb_stdlog, "LHEW: Got clone event from LWP %ld, resuming\n", GET_LWP (lp->ptid)); + lp->stopped = 0; ptrace (PTRACE_CONT, GET_LWP (lp->ptid), 0, 0); return 1; @@ -2387,7 +2431,7 @@ linux_nat_filter_event (int lwpid, int s /* Make sure there is at least one thread running. */ gdb_assert (iterate_over_lwps (running_callback, NULL)); - /* Discard the event. */ + /* Discard the event. */ return NULL; } } @@ -2519,6 +2563,7 @@ linux_nat_wait (ptid_t ptid, struct targ lp->resumed = 1; /* Add the main thread to GDB's thread list. */ add_thread_silent (lp->ptid); + set_running (lp->ptid, 1); } sigemptyset (&flush_mask); @@ -2747,19 +2792,23 @@ retry: fprintf_unfiltered (gdb_stdlog, "LLW: Candidate event %s in %s.\n", status_to_str (status), target_pid_to_str (lp->ptid)); - /* Now stop all other LWP's ... */ - iterate_over_lwps (stop_callback, NULL); + if (!non_stop) + { + /* Now stop all other LWP's ... */ + iterate_over_lwps (stop_callback, NULL); - /* ... and wait until all of them have reported back that they're no - longer running. */ - iterate_over_lwps (stop_wait_callback, &flush_mask); - iterate_over_lwps (flush_callback, &flush_mask); - - /* If we're not waiting for a specific LWP, choose an event LWP from - among those that have had events. Giving equal priority to all - LWPs that have had events helps prevent starvation. */ - if (pid == -1) - select_event_lwp (&lp, &status); + /* ... and wait until all of them have reported back that + they're no longer running. */ + iterate_over_lwps (stop_wait_callback, &flush_mask); + iterate_over_lwps (flush_callback, &flush_mask); + + /* If we're not waiting for a specific LWP, choose an event LWP + from among those that have had events. Giving equal priority + to all LWPs that have had events helps prevent + starvation. */ + if (pid == -1) + select_event_lwp (&lp, &status); + } /* Now that we've selected our final event LWP, cancel any breakpoints in other LWPs that have hit a GDB breakpoint. See @@ -2796,13 +2845,26 @@ static int kill_callback (struct lwp_info *lp, void *data) { errno = 0; - ptrace (PTRACE_KILL, GET_LWP (lp->ptid), 0, 0); - if (debug_linux_nat) - fprintf_unfiltered (gdb_stdlog, - "KC: PTRACE_KILL %s, 0, 0 (%s)\n", - target_pid_to_str (lp->ptid), - errno ? safe_strerror (errno) : "OK"); + /* PTRACE_KILL doesn't work when the thread is running. */ + if (!lp->stopped) + { + kill_lwp (GET_LWP (lp->ptid), SIGKILL); + if (debug_linux_nat) + fprintf_unfiltered (gdb_stdlog, + "KC: kill_lwp (SIGKILL) %s (%s)\n", + target_pid_to_str (lp->ptid), + errno ? safe_strerror (errno) : "OK"); + } + else + { + ptrace (PTRACE_KILL, GET_LWP (lp->ptid), 0, 0); + if (debug_linux_nat) + fprintf_unfiltered (gdb_stdlog, + "KC: PTRACE_KILL %s, 0, 0 (%s)\n", + target_pid_to_str (lp->ptid), + errno ? safe_strerror (errno) : "OK"); + } return 0; } @@ -2943,22 +3005,22 @@ linux_nat_xfer_partial (struct target_op static int linux_nat_thread_alive (ptid_t ptid) { + int err; + gdb_assert (is_lwp (ptid)); - errno = 0; - ptrace (PTRACE_PEEKUSER, GET_LWP (ptid), 0, 0); + /* Send signal 0 instead of anything ptrace, because ptracing a + running thread errors out claiming that the thread doesn't + exist. */ + err = kill_lwp (GET_LWP (ptid), 0); + if (debug_linux_nat) fprintf_unfiltered (gdb_stdlog, - "LLTA: PTRACE_PEEKUSER %s, 0, 0 (%s)\n", + "LLTA: KILL(SIG0) %s (%s)\n", target_pid_to_str (ptid), - errno ? safe_strerror (errno) : "OK"); + err ? safe_strerror (err) : "OK"); - /* Not every Linux kernel implements PTRACE_PEEKUSER. But we can - handle that case gracefully since ptrace will first do a lookup - for the process based upon the passed-in pid. If that fails we - will get either -ESRCH or -EPERM, otherwise the child exists and - is alive. */ - if (errno == ESRCH || errno == EPERM) + if (err != 0) return 0; return 1; @@ -4118,6 +4180,35 @@ linux_nat_set_async_mode (int on) linux_nat_async_enabled = on; } +static int +send_sigint_callback (struct lwp_info *lp, void *data) +{ + if (!lp->stopped && !lp->sigint) + { + kill_lwp (GET_LWP (lp->ptid), SIGINT); + lp->sigint = 1; + } + return 0; +} + +static void +linux_nat_stop (void) +{ + if (non_stop) + iterate_over_lwps (send_sigint_callback, NULL); + else + linux_ops->to_stop (); +} + +static void +linux_nat_stop_ptid (ptid_t ptid) +{ + if (ptid_equal (ptid, minus_one_ptid)) + iterate_over_lwps (send_sigint_callback, NULL); + else + kill_lwp (GET_LWP (ptid), SIGINT); +} + void linux_nat_add_target (struct target_ops *t) { @@ -4148,6 +4239,9 @@ linux_nat_add_target (struct target_ops t->to_terminal_inferior = linux_nat_terminal_inferior; t->to_terminal_ours = linux_nat_terminal_ours; + t->to_stop = linux_nat_stop; + t->to_stop_ptid = linux_nat_stop_ptid; + /* We don't change the stratum; this target will sit at process_stratum and thread_db will set at thread_stratum. This is a little strange, since this is a multi-threaded-capable Index: src/gdb/linux-nat.h =================================================================== --- src.orig/gdb/linux-nat.h 2008-05-19 15:00:15.000000000 +0100 +++ src/gdb/linux-nat.h 2008-05-19 15:28:43.000000000 +0100 @@ -37,6 +37,10 @@ struct lwp_info SIGCHLD. */ int cloned; + /* Non-zero if we sent this LWP a SIGINT (but the LWP didn't report + it back yet). */ + int sigint; + /* Non-zero if we sent this LWP a SIGSTOP (but the LWP didn't report it back yet). */ int signalled; @@ -88,6 +92,8 @@ extern struct lwp_info *lwp_list; #define is_lwp(ptid) (GET_LWP (ptid) != 0) #define BUILD_LWP(lwp, pid) ptid_build (pid, lwp, 0) +struct lwp_info *find_lwp_pid (ptid_t ptid); + /* Attempt to initialize libthread_db. */ void check_for_thread_db (void); Index: src/gdb/linux-thread-db.c =================================================================== --- src.orig/gdb/linux-thread-db.c 2008-05-19 15:14:07.000000000 +0100 +++ src/gdb/linux-thread-db.c 2008-05-19 15:28:43.000000000 +0100 @@ -308,6 +308,8 @@ thread_from_lwp (ptid_t ptid) LWP. */ gdb_assert (GET_LWP (ptid) != 0); + /* Access an lwp we know is stopped. */ + proc_handle.pid = GET_LWP (ptid); err = td_ta_map_lwp2thr_p (thread_agent, GET_LWP (ptid), &th); if (err != TD_OK) error (_("Cannot find user-level thread for LWP %ld: %s"), @@ -418,6 +420,9 @@ enable_thread_event (td_thragent_t *thre td_notify_t notify; td_err_e err; + /* Access an lwp we know is stopped. */ + proc_handle.pid = GET_LWP (inferior_ptid); + /* Get the breakpoint address for thread EVENT. */ err = td_ta_event_addr_p (thread_agent, event, ¬ify); if (err != TD_OK) @@ -761,6 +766,9 @@ check_event (ptid_t ptid) if (stop_pc != td_create_bp_addr && stop_pc != td_death_bp_addr) return; + /* Access an lwp we know is stopped. */ + proc_handle.pid = GET_LWP (ptid); + /* If we are at a create breakpoint, we do not know what new lwp was created and cannot specifically locate the event message for it. We have to call td_ta_event_getmsg() to get @@ -955,7 +963,14 @@ static void thread_db_find_new_threads (void) { td_err_e err; + struct lwp_info *lp = find_lwp_pid (inferior_ptid); + + if (!lp || !lp->stopped) + /* In linux, we can only read memory through a stopped lwp. */ + return; + /* Access an lwp we know is stopped. */ + proc_handle.pid = GET_LWP (inferior_ptid); /* Iterate over all user-space threads to discover new threads. */ err = td_ta_thr_iter_p (thread_agent, find_new_threads_callback, NULL, TD_THR_ANY_STATE, TD_THR_LOWEST_PRIORITY, --Boundary-00=_krZMIfNZXPMzwMo--