From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13009 invoked by alias); 10 Jul 2008 21:51:52 -0000 Received: (qmail 12998 invoked by uid 22791); 10 Jul 2008 21:51:50 -0000 X-Spam-Check-By: sourceware.org Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (65.74.133.4) by sourceware.org (qpsmtpd/0.31) with ESMTP; Thu, 10 Jul 2008 21:51:31 +0000 Received: (qmail 4436 invoked from network); 10 Jul 2008 21:51:28 -0000 Received: from unknown (HELO orlando.local) (pedro@127.0.0.2) by mail.codesourcery.com with ESMTPA; 10 Jul 2008 21:51:28 -0000 From: Pedro Alves To: Daniel Jacobowitz Subject: Re: [non-stop] 08/10 linux native support Date: Thu, 10 Jul 2008 21:51:00 -0000 User-Agent: KMail/1.9.9 Cc: gdb-patches@sourceware.org References: <200806152205.49241.pedro@codesourcery.com> <200807101901.23598.pedro@codesourcery.com> <20080710195855.GA24287@caradoc.them.org> In-Reply-To: <20080710195855.GA24287@caradoc.them.org> MIME-Version: 1.0 Content-Type: Multipart/Mixed; boundary="Boundary-00=_gRodIqMqOEmzkbs" Message-Id: <200807102251.28066.pedro@codesourcery.com> X-IsSubscribed: yes Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2008-07/txt/msg00169.txt.bz2 --Boundary-00=_gRodIqMqOEmzkbs Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Content-length: 815 A Thursday 10 July 2008 20:58:55, Daniel Jacobowitz wrote: > Right. Could you try this version? Thanks! > Basically the same as your previous posting, except that I moved the > logic assuring we find the first thread when we find the first child > into the thread-db layer. Then, this patch cleaned it up a bit further. Basically, it gets rid of the find_lwp_pid call in thread_db_find_new_threads. Instead I'm using ALL_LWPS, which is already exported. This gets rid of the find_lwp_pid -> linux_nat_lwp_pid rename throughout, and removes the need for thread_db_find_new_threads_1. I then reimported a couple of comments and cleanups that were on the last patch, since you had picked up the previous-to-last. Otherwise, the logic is the same. Regtested on x86_64-unknown-linux-gnu. OK? -- Pedro Alves --Boundary-00=_gRodIqMqOEmzkbs Content-Type: text/x-diff; charset="iso-8859-1"; name="008-non_stop_linux.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="008-non_stop_linux.diff" Content-length: 19508 2008-07-10 Pedro Alves Non-stop linux native. * linux-nat.c (linux_test_for_tracefork): Block events while we're here. (find_lwp_pid): Rename to... (linux_nat_find_lwp_pid): ... this. Make public. Update all callers. (get_pending_status): Implement non-stop mode. (linux_nat_detach): Stop threads before detaching. (linux_nat_resume): In non-stop mode, always resume only a single PTID. (linux_handle_extended_wait): On a clone event, in non-stop mode, add new lwp to GDB's thread table, and mark as running, executing and stopped appropriately. (linux_nat_filter_event): Don't assume there are other running threads when a thread exits. (linux_nat_wait): Mark the main thread as running and executing. In non-stop mode, don't stop all lwps. (linux_nat_kill): Stop lwps before killing them. (linux_nat_thread_alive): Use signal 0 to detect if a thread is alive. (send_sigint_callback): New. (linux_nat_stop): New. (linux_nat_add_target): Set to_stop to linux_nat_stop. * linux-nat.h (thread_db_attach_lwp): Declare. * linux-thread-db.c (thread_get_info_callback): Check for new threads if we have none. (thread_from_lwp, enable_thread_event): Set proc_handle.pid to the stopped lwp. Check for new threads if we have none. (thread_db_attach_lwp): New. (thread_db_init): Set proc_handle.pid to inferior_ptid. (check_event): Set proc_handle.pid to the stopped lwp. (thread_db_find_new_threads): Set proc_handle.pid to any stopped lwp available, bail out if there is none. * linux-fork.c (linux_fork_killall): Use SIGKILL instead of PTRACE_KILL. --- gdb/linux-fork.c | 4 gdb/linux-nat.c | 246 ++++++++++++++++++++++++++++++++++++++++---------- gdb/linux-nat.h | 2 gdb/linux-thread-db.c | 77 +++++++++++++++ 4 files changed, 282 insertions(+), 47 deletions(-) Index: src/gdb/linux-nat.c =================================================================== --- src.orig/gdb/linux-nat.c 2008-07-10 22:14:19.000000000 +0100 +++ src/gdb/linux-nat.c 2008-07-10 22:19:40.000000000 +0100 @@ -285,6 +285,9 @@ static void linux_nat_async (void (*call static int linux_nat_async_mask (int mask); static int kill_lwp (int lwpid, int signo); +static int send_sigint_callback (struct lwp_info *lp, void *data); +static int stop_callback (struct lwp_info *lp, void *data); + /* Captures the result of a successful waitpid call, along with the options used in that call. */ struct waitpid_result @@ -487,6 +490,9 @@ linux_test_for_tracefork (int original_p { int child_pid, ret, status; long second_pid; + enum sigchld_state async_events_original_state; + + async_events_original_state = linux_nat_async_events (sigchld_sync); linux_supports_tracefork_flag = 0; linux_supports_tracevforkdone_flag = 0; @@ -517,6 +523,7 @@ linux_test_for_tracefork (int original_p if (ret != 0) { warning (_("linux_test_for_tracefork: failed to kill child")); + linux_nat_async_events (async_events_original_state); return; } @@ -527,6 +534,7 @@ linux_test_for_tracefork (int original_p warning (_("linux_test_for_tracefork: unexpected wait status 0x%x from " "killed child"), status); + linux_nat_async_events (async_events_original_state); return; } @@ -566,6 +574,8 @@ linux_test_for_tracefork (int original_p if (ret != 0) warning (_("linux_test_for_tracefork: failed to kill child")); my_waitpid (child_pid, &status, 0); + + linux_nat_async_events (async_events_original_state); } /* Return non-zero iff we have tracefork functionality available. @@ -1376,16 +1386,80 @@ get_pending_status (struct lwp_info *lp, events are always cached in waitpid_queue. */ *status = 0; - if (GET_LWP (lp->ptid) == GET_LWP (last_ptid)) + + if (non_stop) { - if (stop_signal != TARGET_SIGNAL_0 - && signal_pass_state (stop_signal)) - *status = W_STOPCODE (target_signal_to_host (stop_signal)); + enum target_signal signo = TARGET_SIGNAL_0; + + if (is_executing (lp->ptid)) + { + /* If the core thought this lwp was executing --- e.g., the + executing property hasn't been updated yet, but the + thread has been stopped with a stop_callback / + stop_wait_callback sequence (see linux_nat_detach for + example) --- we can only have pending events in the local + queue. */ + if (queued_waitpid (GET_LWP (lp->ptid), status, __WALL) != -1) + { + if (WIFSTOPPED (status)) + signo = target_signal_from_host (WSTOPSIG (status)); + + /* If not stopped, then the lwp is gone, no use in + resending a signal. */ + } + } + else + { + /* If the core knows the thread is not executing, then we + have the last signal recorded in + thread_info->stop_signal, unless this is inferior_ptid, + in which case, it's in the global stop_signal, due to + context switching. */ + + if (ptid_equal (lp->ptid, inferior_ptid)) + signo = stop_signal; + else + { + struct thread_info *tp = find_thread_pid (lp->ptid); + gdb_assert (tp); + signo = tp->stop_signal; + } + } + + if (signo != TARGET_SIGNAL_0 + && !signal_pass_state (signo)) + { + if (debug_linux_nat) + fprintf_unfiltered (gdb_stdlog, "\ +GPT: lwp %s had signal %s, but it is in no pass state\n", + target_pid_to_str (lp->ptid), + target_signal_to_string (signo)); + } + else + { + if (signo != TARGET_SIGNAL_0) + *status = W_STOPCODE (target_signal_to_host (signo)); + + if (debug_linux_nat) + fprintf_unfiltered (gdb_stdlog, + "GPT: lwp %s as pending signal %s\n", + target_pid_to_str (lp->ptid), + target_signal_to_string (signo)); + } } - else if (target_can_async_p ()) - queued_waitpid (GET_LWP (lp->ptid), status, __WALL); else - *status = lp->status; + { + if (GET_LWP (lp->ptid) == GET_LWP (last_ptid)) + { + if (stop_signal != TARGET_SIGNAL_0 + && signal_pass_state (stop_signal)) + *status = W_STOPCODE (target_signal_to_host (stop_signal)); + } + else if (target_can_async_p ()) + queued_waitpid (GET_LWP (lp->ptid), status, __WALL); + else + *status = lp->status; + } return 0; } @@ -1449,6 +1523,13 @@ linux_nat_detach (char *args, int from_t if (target_can_async_p ()) linux_nat_async (NULL, 0); + /* Stop all threads before detaching. ptrace requires that the + thread is stopped to sucessfully detach. */ + iterate_over_lwps (stop_callback, NULL); + /* ... and wait until all of them have reported back that + they're no longer running. */ + iterate_over_lwps (stop_wait_callback, NULL); + iterate_over_lwps (detach_callback, NULL); /* Only the initial process should be left right now. */ @@ -1538,10 +1619,17 @@ linux_nat_resume (ptid_t ptid, int step, /* A specific PTID means `step only this process id'. */ resume_all = (PIDGET (ptid) == -1); - if (resume_all) - iterate_over_lwps (resume_set_callback, NULL); - else - iterate_over_lwps (resume_clear_callback, NULL); + if (non_stop && resume_all) + internal_error (__FILE__, __LINE__, + "can't resume all in non-stop mode"); + + if (!non_stop) + { + if (resume_all) + iterate_over_lwps (resume_set_callback, NULL); + else + iterate_over_lwps (resume_clear_callback, NULL); + } /* If PID is -1, it's the current inferior that should be handled specially. */ @@ -1551,6 +1639,7 @@ linux_nat_resume (ptid_t ptid, int step, lp = find_lwp_pid (ptid); gdb_assert (lp != NULL); + /* Convert to something the lower layer understands. */ ptid = pid_to_ptid (GET_LWP (lp->ptid)); /* Remember if we're stepping. */ @@ -1701,9 +1790,12 @@ linux_handle_extended_wait (struct lwp_i ourstatus->kind = TARGET_WAITKIND_VFORKED; else { + struct cleanup *old_chain; + ourstatus->kind = TARGET_WAITKIND_IGNORE; new_lp = add_lwp (BUILD_LWP (new_pid, GET_PID (inferior_ptid))); new_lp->cloned = 1; + new_lp->stopped = 1; if (WSTOPSIG (status) != SIGSTOP) { @@ -1720,13 +1812,38 @@ linux_handle_extended_wait (struct lwp_i else status = 0; - if (stopping) - new_lp->stopped = 1; - else + if (non_stop) + { + /* Add the new thread to GDB's lists as soon as possible + so that: + + 1) the frontend doesn't have to wait for a stop to + display them, and, + + 2) we tag it with the correct running state. */ + + /* If the thread_db layer is active, let it know about + this new thread, and add it to GDB's list. */ + if (!thread_db_attach_lwp (new_lp->ptid)) + { + /* We're not using thread_db. Add it to GDB's + list. */ + target_post_attach (GET_LWP (new_lp->ptid)); + add_thread (new_lp->ptid); + } + + if (!stopping) + { + set_running (new_lp->ptid, 1); + set_executing (new_lp->ptid, 1); + } + } + + if (!stopping) { + new_lp->stopped = 0; new_lp->resumed = 1; - ptrace (PTRACE_CONT, - PIDGET (lp->waitstatus.value.related_pid), 0, + ptrace (PTRACE_CONT, new_pid, 0, status ? WSTOPSIG (status) : 0); } @@ -2463,13 +2580,7 @@ linux_nat_filter_event (int lwpid, int s not the end of the debugged application and should be ignored. */ if (num_lwps > 0) - { - /* Make sure there is at least one thread running. */ - gdb_assert (iterate_over_lwps (running_callback, NULL)); - - /* Discard the event. */ - return NULL; - } + return NULL; } /* Check if the current LWP has previously exited. In the nptl @@ -2599,6 +2710,8 @@ linux_nat_wait (ptid_t ptid, struct targ lp->resumed = 1; /* Add the main thread to GDB's thread list. */ add_thread_silent (lp->ptid); + set_running (lp->ptid, 1); + set_executing (lp->ptid, 1); } sigemptyset (&flush_mask); @@ -2826,19 +2939,23 @@ retry: fprintf_unfiltered (gdb_stdlog, "LLW: Candidate event %s in %s.\n", status_to_str (status), target_pid_to_str (lp->ptid)); - /* Now stop all other LWP's ... */ - iterate_over_lwps (stop_callback, NULL); + if (!non_stop) + { + /* Now stop all other LWP's ... */ + iterate_over_lwps (stop_callback, NULL); - /* ... and wait until all of them have reported back that they're no - longer running. */ - iterate_over_lwps (stop_wait_callback, &flush_mask); - iterate_over_lwps (flush_callback, &flush_mask); - - /* If we're not waiting for a specific LWP, choose an event LWP from - among those that have had events. Giving equal priority to all - LWPs that have had events helps prevent starvation. */ - if (pid == -1) - select_event_lwp (&lp, &status); + /* ... and wait until all of them have reported back that + they're no longer running. */ + iterate_over_lwps (stop_wait_callback, &flush_mask); + iterate_over_lwps (flush_callback, &flush_mask); + + /* If we're not waiting for a specific LWP, choose an event LWP + from among those that have had events. Giving equal priority + to all LWPs that have had events helps prevent + starvation. */ + if (pid == -1) + select_event_lwp (&lp, &status); + } /* Now that we've selected our final event LWP, cancel any breakpoints in other LWPs that have hit a GDB breakpoint. See @@ -2970,6 +3087,13 @@ linux_nat_kill (void) } else { + /* Stop all threads before killing them, since ptrace requires + that the thread is stopped to sucessfully PTRACE_KILL. */ + iterate_over_lwps (stop_callback, NULL); + /* ... and wait until all of them have reported back that + they're no longer running. */ + iterate_over_lwps (stop_wait_callback, NULL); + /* Kill all LWP's ... */ iterate_over_lwps (kill_callback, NULL); @@ -3022,22 +3146,22 @@ linux_nat_xfer_partial (struct target_op static int linux_nat_thread_alive (ptid_t ptid) { + int err; + gdb_assert (is_lwp (ptid)); - errno = 0; - ptrace (PTRACE_PEEKUSER, GET_LWP (ptid), 0, 0); + /* Send signal 0 instead of anything ptrace, because ptracing a + running thread errors out claiming that the thread doesn't + exist. */ + err = kill_lwp (GET_LWP (ptid), 0); + if (debug_linux_nat) fprintf_unfiltered (gdb_stdlog, - "LLTA: PTRACE_PEEKUSER %s, 0, 0 (%s)\n", + "LLTA: KILL(SIG0) %s (%s)\n", target_pid_to_str (ptid), - errno ? safe_strerror (errno) : "OK"); + err ? safe_strerror (err) : "OK"); - /* Not every Linux kernel implements PTRACE_PEEKUSER. But we can - handle that case gracefully since ptrace will first do a lookup - for the process based upon the passed-in pid. If that fails we - will get either -ESRCH or -EPERM, otherwise the child exists and - is alive. */ - if (errno == ESRCH || errno == EPERM) + if (err != 0) return 0; return 1; @@ -4239,6 +4363,35 @@ linux_nat_set_async_mode (int on) linux_nat_async_enabled = on; } +static int +send_sigint_callback (struct lwp_info *lp, void *data) +{ + /* Use is_running instead of !lp->stopped, because the lwp may be + stopped due to an internal event, and we want to interrupt it in + that case too. What we want is to check if the thread is stopped + from the point of view of the user. */ + if (is_running (lp->ptid)) + kill_lwp (GET_LWP (lp->ptid), SIGINT); + return 0; +} + +static void +linux_nat_stop (ptid_t ptid) +{ + if (non_stop) + { + if (ptid_equal (ptid, minus_one_ptid)) + iterate_over_lwps (send_sigint_callback, &ptid); + else + { + struct lwp_info *lp = find_lwp_pid (ptid); + send_sigint_callback (lp, NULL); + } + } + else + linux_ops->to_stop (ptid); +} + void linux_nat_add_target (struct target_ops *t) { @@ -4269,6 +4422,9 @@ linux_nat_add_target (struct target_ops t->to_terminal_inferior = linux_nat_terminal_inferior; t->to_terminal_ours = linux_nat_terminal_ours; + /* Methods for non-stop support. */ + t->to_stop = linux_nat_stop; + /* We don't change the stratum; this target will sit at process_stratum and thread_db will set at thread_stratum. This is a little strange, since this is a multi-threaded-capable Index: src/gdb/linux-nat.h =================================================================== --- src.orig/gdb/linux-nat.h 2008-07-10 22:14:19.000000000 +0100 +++ src/gdb/linux-nat.h 2008-07-10 22:14:30.000000000 +0100 @@ -94,6 +94,8 @@ void check_for_thread_db (void); /* Tell the thread_db layer what native target operations to use. */ void thread_db_init (struct target_ops *); +int thread_db_attach_lwp (ptid_t ptid); + /* Find process PID's pending signal set from /proc/pid/status. */ void linux_proc_pending_signals (int pid, sigset_t *pending, sigset_t *blocked, sigset_t *ignored); Index: src/gdb/linux-thread-db.c =================================================================== --- src.orig/gdb/linux-thread-db.c 2008-07-10 22:14:19.000000000 +0100 +++ src/gdb/linux-thread-db.c 2008-07-10 22:15:05.000000000 +0100 @@ -283,7 +283,10 @@ thread_get_info_callback (const td_thrha if (thread_info == NULL) { /* New thread. Attach to it now (why wait?). */ - attach_thread (thread_ptid, thp, &ti); + if (!have_threads ()) + thread_db_find_new_threads (); + else + attach_thread (thread_ptid, thp, &ti); thread_info = find_thread_pid (thread_ptid); gdb_assert (thread_info != NULL); } @@ -308,6 +311,8 @@ thread_from_lwp (ptid_t ptid) LWP. */ gdb_assert (GET_LWP (ptid) != 0); + /* Access an lwp we know is stopped. */ + proc_handle.pid = GET_LWP (ptid); err = td_ta_map_lwp2thr_p (thread_agent, GET_LWP (ptid), &th); if (err != TD_OK) error (_("Cannot find user-level thread for LWP %ld: %s"), @@ -332,6 +337,48 @@ thread_from_lwp (ptid_t ptid) } +/* Attach to lwp PTID, doing whatever else is required to have this + LWP under the debugger's control --- e.g., enabling event + reporting. Returns true on success. */ +int +thread_db_attach_lwp (ptid_t ptid) +{ + td_thrhandle_t th; + td_thrinfo_t ti; + td_err_e err; + + if (!using_thread_db) + return 0; + + /* This ptid comes from linux-nat.c, which should always fill in the + LWP. */ + gdb_assert (GET_LWP (ptid) != 0); + + /* Access an lwp we know is stopped. */ + proc_handle.pid = GET_LWP (ptid); + + /* If we have only looked at the first thread before libpthread was + initialized, we may not know its thread ID yet. Make sure we do + before we add another thread to the list. */ + if (!have_threads ()) + thread_db_find_new_threads (); + + err = td_ta_map_lwp2thr_p (thread_agent, GET_LWP (ptid), &th); + if (err != TD_OK) + /* Cannot find user-level thread. */ + return 0; + + err = td_thr_get_info_p (&th, &ti); + if (err != TD_OK) + { + warning (_("Cannot get thread info: %s"), thread_db_err_str (err)); + return 0; + } + + attach_thread (ptid, &th, &ti); + return 1; +} + void thread_db_init (struct target_ops *target) { @@ -418,6 +465,9 @@ enable_thread_event (td_thragent_t *thre td_notify_t notify; td_err_e err; + /* Access an lwp we know is stopped. */ + proc_handle.pid = GET_LWP (inferior_ptid); + /* Get the breakpoint address for thread EVENT. */ err = td_ta_event_addr_p (thread_agent, event, ¬ify); if (err != TD_OK) @@ -761,6 +811,15 @@ check_event (ptid_t ptid) if (stop_pc != td_create_bp_addr && stop_pc != td_death_bp_addr) return; + /* Access an lwp we know is stopped. */ + proc_handle.pid = GET_LWP (ptid); + + /* If we have only looked at the first thread before libpthread was + initialized, we may not know its thread ID yet. Make sure we do + before we add another thread to the list. */ + if (!have_threads ()) + thread_db_find_new_threads (); + /* If we are at a create breakpoint, we do not know what new lwp was created and cannot specifically locate the event message for it. We have to call td_ta_event_getmsg() to get @@ -951,11 +1010,27 @@ find_new_threads_callback (const td_thrh return 0; } +/* Search for new threads, accessing memory through stopped thread + PTID. */ + static void thread_db_find_new_threads (void) { td_err_e err; + struct lwp_info *lp; + ptid_t ptid; + + /* In linux, we can only read memory through a stopped lwp. */ + ALL_LWPS (lp, ptid) + if (lp->stopped) + break; + + if (!lp) + /* There is no stopped thread. Bail out. */ + return; + /* Access an lwp we know is stopped. */ + proc_handle.pid = GET_LWP (ptid); /* Iterate over all user-space threads to discover new threads. */ err = td_ta_thr_iter_p (thread_agent, find_new_threads_callback, NULL, TD_THR_ANY_STATE, TD_THR_LOWEST_PRIORITY, Index: src/gdb/linux-fork.c =================================================================== --- src.orig/gdb/linux-fork.c 2008-07-10 22:14:19.000000000 +0100 +++ src/gdb/linux-fork.c 2008-07-10 22:14:30.000000000 +0100 @@ -337,7 +337,9 @@ linux_fork_killall (void) { pid = PIDGET (fp->ptid); do { - ptrace (PT_KILL, pid, 0, 0); + /* Use SIGKILL instead of PTRACE_KILL because the former works even + if the thread is running, while the later doesn't. */ + kill (pid, SIGKILL); ret = waitpid (pid, &status, 0); /* We might get a SIGCHLD instead of an exit status. This is aggravated by the first kill above - a child has just --Boundary-00=_gRodIqMqOEmzkbs--