From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 5076 invoked by alias); 2 Jun 2003 18:49:50 -0000 Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sources.redhat.com Received: (qmail 5021 invoked from network); 2 Jun 2003 18:49:49 -0000 Received: from unknown (HELO mx1.redhat.com) (66.187.233.31) by sources.redhat.com with SMTP; 2 Jun 2003 18:49:49 -0000 Received: from int-mx2.corp.redhat.com (nat-pool-rdu-dmz.redhat.com [172.16.52.200] (may be forged)) by mx1.redhat.com (8.11.6/8.11.6) with ESMTP id h52InmH12045 for ; Mon, 2 Jun 2003 14:49:48 -0400 Received: from potter.sfbay.redhat.com (potter.sfbay.redhat.com [172.16.27.15]) by int-mx2.corp.redhat.com (8.11.6/8.11.6) with ESMTP id h52InlT31080 for ; Mon, 2 Jun 2003 14:49:47 -0400 Received: from redhat.com (reddwarf.sfbay.redhat.com [172.16.24.50]) by potter.sfbay.redhat.com (8.11.6/8.11.6) with ESMTP id h52Ink020885; Mon, 2 Jun 2003 11:49:46 -0700 Message-ID: <3EDB9C4A.C91BC813@redhat.com> Date: Mon, 02 Jun 2003 18:49:00 -0000 From: Michael Snyder Organization: Red Hat, Inc. X-Accept-Language: en MIME-Version: 1.0 To: "J. Johnston" CC: gdb-patches@sources.redhat.com Subject: Re: RFC: nptl threading patch for linux References: <3EA84E74.3010101@redhat.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-SW-Source: 2003-06/txt/msg00081.txt.bz2 After reviewing the on-list discussion, it sounds like this patch should go in. Jeff, would you do the honors? Michael "J. Johnston" wrote: > > The following is the last part of my revised nptl patch that has > been broken up per Daniel J.'s suggestion. There are no generated > files included in the patch. > > A bit of background is needed. First of all, in the nptl model, > lwps and pids are distinct entities. If you perform a ps on a > multithreaded application, you will not see its child threads show > up. This differs from the linuxthreads model whereby an lwp was > actually a pid. In the nptl model, you cannot issue a kill to > a child thread via its lwp. If you want to do this, you need to use > the tkill syscall instead. The action of sending a signal to a specific > lwp is used commonly in lin-lwp.c. The first part of the nptl change is to > determine at configuration time if we have the tkill syscall and then at run-time, > if we can use it. A new function kill_lwp has been added which either calls the > tkill syscall or the regular kill function as appropriate. > > Another key difference in behavior between nptl and linuxthreads is what happens > when a thread exits. In the linuxthreads model, each thread that exits causes > an exit event to occur. The main thread can exit before all of its children. > In the nptl model, only the main thread generates the exit event and it only > does so after all child threads have exited. So, to determine when an lwp > has exited, we have to constantly check its status when we are able. When > we get an exit event we have to determine if we are exiting the program or > just one of the threads. Some additional logic has been added to the exit > checking code such that if we have (lwp == pid), then we stop all active > threads and check if they have already exited. If they have not exited, > we resume these threads, otherwise, we delete them. > > Daniel J. brought up a good point regarding my previous attempt at this logic > whereby I stopped all threads and resumed all threads. That old logic is wrong > when scheduler_locking is set on. The new logic addresses this because > it only stops/resumes threads which are not already stopped. Currently, if we lock > into a linuxthreads thread and we run it until exit, we will cause a gdb_assert() to trigger > because we will field the exit event and end up with no running threads. Under nptl, we > never get notified of the thread exiting so for this scenario, we end up hung. > In this instance, there is nothing we can do without a kernel change. Daniel J. > is looking into a kernel change. For user threads, we could activate the death event > in thread-db.c and make a low-level call to notify the lwp layer, but this does not > handle lwps created directly by the end-user. This issue needs to be discussed > further outside of this patch because I want to propose changing the assertion to > allow the user to resume the other threads. I have tested the change with such a > scenario (locking a thread and running until exit). For linuxthreads we trigger the > assertion as before. For nptl, we hang as expected as we never get the exit event. > A bug can be opened for this if the patch is accepted. > > The last piece of logic has to do with a new interface needed by the nptl libthread_db > library to get the thread area. I have grouped this together with the lin-lwp changes > because the lin-lwp changes cannot work without this crucial change. > > This patch has been tested for both linuxthreads and with an nptl kernel. > > Ok to commit? > > -- Jeff J. > > 2003-04-24 Jeff Johnston > > * acconfig.h: Add HAVE_TKILL_SYSCALL definition check. > * config.in: Regenerated. > * configure.in: Add test for syscall function and check for > __NR_tkill macro in to set HAVE_TKILL_SYSCALL. > * configure: Regenerated. > * thread-db.c (check_event): For create/death event breakpoints, > loop through all messages to ensure that we read the message > corresponding to the breakpoint we are at. > * lin-lwp.c [HAVE_TKILL_SYSCALL]: Include and > . > (kill_lwp): New function that uses tkill syscall or > uses kill, depending on whether threading model is nptl or not. > All callers of kill() changed to use kill_lwp(). > (lin_lwp_wait): Make special check when WIFEXITED occurs to > see if all threads have already exited in the nptl model. > (stop_and_resume_callback): New callback function used by the > lin_lwp_wait thread exit handling code. > (stop_wait_callback): Check for threads already having exited and > delete such threads fromt the lwp list when discovered. > (stop_callback): Don't assert retcode of kill call. > > Roland McGrath > * i386-linux-nat.c (ps_get_thread_area): New function needed by > nptl libthread_db. > > ------------------------------------------------------------------------------- > Index: lin-lwp.c > =================================================================== > RCS file: /cvs/src/src/gdb/lin-lwp.c,v > retrieving revision 1.43 > diff -u -r1.43 lin-lwp.c > --- lin-lwp.c 28 Mar 2003 21:42:41 -0000 1.43 > +++ lin-lwp.c 23 Apr 2003 22:52:44 -0000 > @@ -24,6 +24,10 @@ > #include "gdb_string.h" > #include > #include > +#ifdef HAVE_TKILL_SYSCALL > +#include > +#include > +#endif > #include > #include "gdb_wait.h" > > @@ -156,6 +160,7 @@ > > /* Prototypes for local functions. */ > static int stop_wait_callback (struct lwp_info *lp, void *data); > +static int lin_lwp_thread_alive (ptid_t ptid); > > /* Convert wait status STATUS to a string. Used for printing debug > messages only. */ > @@ -627,6 +632,32 @@ > } > > > +/* Issue kill to specified lwp. */ > + > +static int tkill_failed; > + > +static int > +kill_lwp (int lwpid, int signo) > +{ > + errno = 0; > + > +/* Use tkill, if possible, in case we are using nptl threads. If tkill > + fails, then we are not using nptl threads and we should be using kill. */ > + > +#ifdef HAVE_TKILL_SYSCALL > + if (!tkill_failed) > + { > + int ret = syscall (__NR_tkill, lwpid, signo); > + if (errno != ENOSYS) > + return ret; > + errno = 0; > + tkill_failed = 1; > + } > +#endif > + > + return kill (lwpid, signo); > +} > + > /* Send a SIGSTOP to LP. */ > > static int > @@ -642,8 +673,15 @@ > "SC: kill %s ****\n", > target_pid_to_str (lp->ptid)); > } > - ret = kill (GET_LWP (lp->ptid), SIGSTOP); > - gdb_assert (ret == 0); > + errno = 0; > + ret = kill_lwp (GET_LWP (lp->ptid), SIGSTOP); > + if (debug_lin_lwp) > + { > + fprintf_unfiltered (gdb_stdlog, > + "SC: lwp kill %d %s\n", > + ret, > + errno ? safe_strerror (errno) : "ERRNO-OK"); > + } > > lp->signalled = 1; > gdb_assert (lp->status == 0); > @@ -667,11 +705,23 @@ > > gdb_assert (lp->status == 0); > > - pid = waitpid (GET_LWP (lp->ptid), &status, lp->cloned ? __WCLONE : 0); > + pid = waitpid (GET_LWP (lp->ptid), &status, 0); > if (pid == -1 && errno == ECHILD) > - /* OK, the proccess has disappeared. We'll catch the actual > - exit event in lin_lwp_wait. */ > - return 0; > + { > + pid = waitpid (GET_LWP (lp->ptid), &status, __WCLONE); > + if (pid == -1 && errno == ECHILD) > + { > + /* The thread has previously exited. We need to delete it now > + because in the case of nptl threads, there won't be an > + exit event unless it is the main thread. */ > + if (debug_lin_lwp) > + fprintf_unfiltered (gdb_stdlog, > + "SWC: %s exited.\n", > + target_pid_to_str (lp->ptid)); > + delete_lwp (lp->ptid); > + return 0; > + } > + } > > gdb_assert (pid == GET_LWP (lp->ptid)); > > @@ -683,6 +733,7 @@ > status_to_str (status)); > } > > + /* Check if the thread has exited. */ > if (WIFEXITED (status) || WIFSIGNALED (status)) > { > gdb_assert (num_lwps > 1); > @@ -697,7 +748,31 @@ > target_pid_to_str (lp->ptid)); > } > if (debug_lin_lwp) > - fprintf_unfiltered (gdb_stdlog, "SWC: %s exited.\n", > + fprintf_unfiltered (gdb_stdlog, > + "SWC: %s exited.\n", > + target_pid_to_str (lp->ptid)); > + > + delete_lwp (lp->ptid); > + return 0; > + } > + > + /* Check if the current LWP has previously exited. For nptl threads, > + there is no exit signal issued for LWPs that are not the > + main thread so we should check whenever the thread is stopped. */ > + if (!lin_lwp_thread_alive (lp->ptid)) > + { > + if (in_thread_list (lp->ptid)) > + { > + /* Core GDB cannot deal with us deleting the current > + thread. */ > + if (!ptid_equal (lp->ptid, inferior_ptid)) > + delete_thread (lp->ptid); > + printf_unfiltered ("[%s exited]\n", > + target_pid_to_str (lp->ptid)); > + } > + if (debug_lin_lwp) > + fprintf_unfiltered (gdb_stdlog, > + "SWC: %s already exited.\n", > target_pid_to_str (lp->ptid)); > > delete_lwp (lp->ptid); > @@ -756,7 +831,14 @@ > /* If there's another event, throw it back into the queue. */ > if (lp->status) > { > - kill (GET_LWP (lp->ptid), WSTOPSIG (lp->status)); > + if (debug_lin_lwp) > + { > + fprintf_unfiltered (gdb_stdlog, > + "SWC: kill %s, %s\n", > + target_pid_to_str (lp->ptid), > + status_to_str ((int) status)); > + } > + kill_lwp (GET_LWP (lp->ptid), WSTOPSIG (lp->status)); > } > /* Save the sigtrap event. */ > lp->status = status; > @@ -800,7 +882,7 @@ > target_pid_to_str (lp->ptid), > status_to_str ((int) status)); > } > - kill (GET_LWP (lp->ptid), WSTOPSIG (status)); > + kill_lwp (GET_LWP (lp->ptid), WSTOPSIG (status)); > } > return 0; > } > @@ -1049,6 +1131,25 @@ > > #endif > > +/* Stop an active thread, verify it still exists, then resume it. */ > + > +static int > +stop_and_resume_callback (struct lwp_info *lp, void *data) > +{ > + struct lwp_info *ptr; > + > + if (!lp->stopped && !lp->signalled) > + { > + stop_callback (lp, NULL); > + stop_wait_callback (lp, NULL); > + /* Resume if the lwp still exists. */ > + for (ptr = lwp_list; ptr; ptr = ptr->next) > + if (lp == ptr) > + resume_callback (lp, NULL); > + } > + return 0; > +} > + > static ptid_t > lin_lwp_wait (ptid_t ptid, struct target_waitstatus *ourstatus) > { > @@ -1206,10 +1307,61 @@ > } > } > > - /* Make sure we don't report a TARGET_WAITKIND_EXITED or > - TARGET_WAITKIND_SIGNALLED event if there are still LWP's > - left in the process. */ > + /* Check if the thread has exited. */ > if ((WIFEXITED (status) || WIFSIGNALED (status)) && num_lwps > 1) > + { > + if (in_thread_list (lp->ptid)) > + { > + /* Core GDB cannot deal with us deleting the current > + thread. */ > + if (!ptid_equal (lp->ptid, inferior_ptid)) > + delete_thread (lp->ptid); > + printf_unfiltered ("[%s exited]\n", > + target_pid_to_str (lp->ptid)); > + } > + > + /* If this is the main thread, we must stop all threads and > + verify if they are still alive. This is because in the nptl > + thread model, there is no signal issued for exiting LWPs > + other than the main thread. We only get the main thread > + exit signal once all child threads have already exited. > + If we stop all the threads and use the stop_wait_callback > + to check if they have exited we can determine whether this > + signal should be ignored or whether it means the end of the > + debugged application, regardless of which threading model > + is being used. */ > + if (GET_PID (lp->ptid) == GET_LWP (lp->ptid)) > + { > + lp->stopped = 1; > + iterate_over_lwps (stop_and_resume_callback, NULL); > + } > + > + if (debug_lin_lwp) > + fprintf_unfiltered (gdb_stdlog, > + "LLW: %s exited.\n", > + target_pid_to_str (lp->ptid)); > + > + delete_lwp (lp->ptid); > + > + /* If there is at least one more LWP, then the exit signal > + was not the end of the debugged application and should be > + ignored. */ > + if (num_lwps > 0) > + { > + /* Make sure there is at least one thread running. */ > + gdb_assert (iterate_over_lwps (running_callback, NULL)); > + > + /* Discard the event. */ > + status = 0; > + continue; > + } > + } > + > + /* Check if the current LWP has previously exited. In the nptl > + thread model, LWPs other than the main thread do not issue > + signals when they exit so we must check whenever the thread > + has stopped. A similar check is made in stop_wait_callback(). */ > + if (num_lwps > 1 && !lin_lwp_thread_alive (lp->ptid)) > { > if (in_thread_list (lp->ptid)) > { > Index: acconfig.h > =================================================================== > RCS file: /cvs/src/src/gdb/acconfig.h,v > retrieving revision 1.24 > diff -u -r1.24 acconfig.h > --- acconfig.h 4 Jan 2003 00:34:42 -0000 1.24 > +++ acconfig.h 23 Apr 2003 22:52:44 -0000 > @@ -95,6 +95,9 @@ > /* Define if using Solaris thread debugging. */ > #undef HAVE_THREAD_DB_LIB > > +/* Define if you support the tkill syscall. */ > +#undef HAVE_TKILL_SYSCALL > + > /* Define on a GNU/Linux system to work around problems in sys/procfs.h. */ > #undef START_INFERIOR_TRAPS_EXPECTED > #undef sys_quotactl > Index: configure.in > =================================================================== > RCS file: /cvs/src/src/gdb/configure.in,v > retrieving revision 1.126 > diff -u -r1.126 configure.in > --- configure.in 26 Feb 2003 15:10:47 -0000 1.126 > +++ configure.in 23 Apr 2003 22:52:44 -0000 > @@ -384,6 +384,7 @@ > AC_CHECK_FUNCS(setpgid setpgrp) > AC_CHECK_FUNCS(sigaction sigprocmask sigsetmask) > AC_CHECK_FUNCS(socketpair) > +AC_CHECK_FUNCS(syscall) > > dnl AC_FUNC_SETPGRP does not work when cross compiling > dnl Instead, assume we will have a prototype for setpgrp if cross compiling. > @@ -909,6 +910,24 @@ > if test "x$gdb_cv_thread_db_h_has_td_notalloc" = "xyes"; then > AC_DEFINE(THREAD_DB_HAS_TD_NOTALLOC, 1, > [Define if has the TD_NOTALLOC error code.]) > +fi > + > +dnl See if we have a sys/syscall header file that has __NR_tkill. > +if test "x$ac_cv_header_sys_syscall_h" = "xyes"; then > + AC_CACHE_CHECK([whether has __NR_tkill], > + gdb_cv_sys_syscall_h_has_tkill, > + AC_TRY_COMPILE( > + [#include ], > + [int i = __NR_tkill;], > + gdb_cv_sys_syscall_h_has_tkill=yes, > + gdb_cv_sys_syscall_h_has_tkill=no > + ) > + ) > +fi > +dnl See if we can issue tkill syscall. > +if test "x$gdb_cv_sys_syscall_h_has_tkill" = "xyes" && test "x$ac_cv_func_syscall" = "xyes"; then > + AC_DEFINE(HAVE_TKILL_SYSCALL, 1, > + [Define if we can use the tkill syscall.]) > fi > > dnl Handle optional features that can be enabled. > Index: i386-linux-nat.c > =================================================================== > RCS file: /cvs/src/src/gdb/i386-linux-nat.c,v > retrieving revision 1.44 > diff -u -r1.44 i386-linux-nat.c > --- i386-linux-nat.c 16 Apr 2003 15:22:02 -0000 1.44 > +++ i386-linux-nat.c 23 Apr 2003 22:52:44 -0000 > @@ -70,6 +70,9 @@ > /* Defines I386_LINUX_ORIG_EAX_REGNUM. */ > #include "i386-linux-tdep.h" > > +/* Defines ps_err_e, struct ps_prochandle. */ > +#include "gdb_proc_service.h" > + > /* Prototypes for local functions. */ > static void dummy_sse_values (void); > > @@ -682,6 +685,21 @@ > offsetof (struct user, u_debugreg[regnum]), value); > if (errno != 0) > perror_with_name ("Couldn't write debug register"); > +} > + > +extern ps_err_e > +ps_get_thread_area(const struct ps_prochandle *ph, > + lwpid_t lwpid, int idx, void **base) > +{ > + unsigned long int desc[3]; > +#define PTRACE_GET_THREAD_AREA 25 > + > + if (ptrace (PTRACE_GET_THREAD_AREA, > + lwpid, (void *) idx, (unsigned long) &desc) < 0) > + return PS_ERR; > + > + *(int *)base = desc[1]; > + return PS_OK; > } > > void