From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 7949 invoked by alias); 4 Jun 2003 20:52:35 -0000 Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sources.redhat.com Received: (qmail 7859 invoked from network); 4 Jun 2003 20:52:30 -0000 Received: from unknown (HELO touchme.toronto.redhat.com) (207.219.125.105) by sources.redhat.com with SMTP; 4 Jun 2003 20:52:30 -0000 Received: from redhat.com (toocool.toronto.redhat.com [172.16.14.72]) by touchme.toronto.redhat.com (Postfix) with ESMTP id 34594800021; Wed, 4 Jun 2003 16:52:30 -0400 (EDT) Message-ID: <3EDE5C0E.1050903@redhat.com> Date: Wed, 04 Jun 2003 20:52:00 -0000 From: "J. Johnston" Organization: Red Hat Inc. User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020823 Netscape/7.0 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Michael Snyder Cc: gdb-patches@sources.redhat.com Subject: Re: RFC: nptl threading patch for linux References: <3EA84E74.3010101@redhat.com> <3EDB9C4A.C91BC813@redhat.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-SW-Source: 2003-06/txt/msg00180.txt.bz2 Thanks Michael. Patch checked in. I deleted the ChangeLog entry for thread-db.c which was already made into a separate patch. -- Jeff J. Michael Snyder wrote: > After reviewing the on-list discussion, it sounds like this patch > should go in. Jeff, would you do the honors? > > Michael > > "J. Johnston" wrote: > >>The following is the last part of my revised nptl patch that has >>been broken up per Daniel J.'s suggestion. There are no generated >>files included in the patch. >> >>A bit of background is needed. First of all, in the nptl model, >>lwps and pids are distinct entities. If you perform a ps on a >>multithreaded application, you will not see its child threads show >>up. This differs from the linuxthreads model whereby an lwp was >>actually a pid. In the nptl model, you cannot issue a kill to >>a child thread via its lwp. If you want to do this, you need to use >>the tkill syscall instead. The action of sending a signal to a specific >>lwp is used commonly in lin-lwp.c. The first part of the nptl change is to >>determine at configuration time if we have the tkill syscall and then at run-time, >>if we can use it. A new function kill_lwp has been added which either calls the >>tkill syscall or the regular kill function as appropriate. >> >>Another key difference in behavior between nptl and linuxthreads is what happens >>when a thread exits. In the linuxthreads model, each thread that exits causes >>an exit event to occur. The main thread can exit before all of its children. >>In the nptl model, only the main thread generates the exit event and it only >>does so after all child threads have exited. So, to determine when an lwp >>has exited, we have to constantly check its status when we are able. When >>we get an exit event we have to determine if we are exiting the program or >>just one of the threads. Some additional logic has been added to the exit >>checking code such that if we have (lwp == pid), then we stop all active >>threads and check if they have already exited. If they have not exited, >>we resume these threads, otherwise, we delete them. >> >>Daniel J. brought up a good point regarding my previous attempt at this logic >>whereby I stopped all threads and resumed all threads. That old logic is wrong >>when scheduler_locking is set on. The new logic addresses this because >>it only stops/resumes threads which are not already stopped. Currently, if we lock >>into a linuxthreads thread and we run it until exit, we will cause a gdb_assert() to trigger >>because we will field the exit event and end up with no running threads. Under nptl, we >>never get notified of the thread exiting so for this scenario, we end up hung. >>In this instance, there is nothing we can do without a kernel change. Daniel J. >>is looking into a kernel change. For user threads, we could activate the death event >>in thread-db.c and make a low-level call to notify the lwp layer, but this does not >>handle lwps created directly by the end-user. This issue needs to be discussed >>further outside of this patch because I want to propose changing the assertion to >>allow the user to resume the other threads. I have tested the change with such a >>scenario (locking a thread and running until exit). For linuxthreads we trigger the >>assertion as before. For nptl, we hang as expected as we never get the exit event. >>A bug can be opened for this if the patch is accepted. >> >>The last piece of logic has to do with a new interface needed by the nptl libthread_db >>library to get the thread area. I have grouped this together with the lin-lwp changes >>because the lin-lwp changes cannot work without this crucial change. >> >>This patch has been tested for both linuxthreads and with an nptl kernel. >> >>Ok to commit? >> >>-- Jeff J. >> >>2003-04-24 Jeff Johnston >> >> * acconfig.h: Add HAVE_TKILL_SYSCALL definition check. >> * config.in: Regenerated. >> * configure.in: Add test for syscall function and check for >> __NR_tkill macro in to set HAVE_TKILL_SYSCALL. >> * configure: Regenerated. >> * thread-db.c (check_event): For create/death event breakpoints, >> loop through all messages to ensure that we read the message >> corresponding to the breakpoint we are at. >> * lin-lwp.c [HAVE_TKILL_SYSCALL]: Include and >> . >> (kill_lwp): New function that uses tkill syscall or >> uses kill, depending on whether threading model is nptl or not. >> All callers of kill() changed to use kill_lwp(). >> (lin_lwp_wait): Make special check when WIFEXITED occurs to >> see if all threads have already exited in the nptl model. >> (stop_and_resume_callback): New callback function used by the >> lin_lwp_wait thread exit handling code. >> (stop_wait_callback): Check for threads already having exited and >> delete such threads fromt the lwp list when discovered. >> (stop_callback): Don't assert retcode of kill call. >> >> Roland McGrath >> * i386-linux-nat.c (ps_get_thread_area): New function needed by >> nptl libthread_db. >> >> ------------------------------------------------------------------------------- >>Index: lin-lwp.c >>=================================================================== >>RCS file: /cvs/src/src/gdb/lin-lwp.c,v >>retrieving revision 1.43 >>diff -u -r1.43 lin-lwp.c >>--- lin-lwp.c 28 Mar 2003 21:42:41 -0000 1.43 >>+++ lin-lwp.c 23 Apr 2003 22:52:44 -0000 >>@@ -24,6 +24,10 @@ >> #include "gdb_string.h" >> #include >> #include >>+#ifdef HAVE_TKILL_SYSCALL >>+#include >>+#include >>+#endif >> #include >> #include "gdb_wait.h" >> >>@@ -156,6 +160,7 @@ >> >> /* Prototypes for local functions. */ >> static int stop_wait_callback (struct lwp_info *lp, void *data); >>+static int lin_lwp_thread_alive (ptid_t ptid); >> >> /* Convert wait status STATUS to a string. Used for printing debug >> messages only. */ >>@@ -627,6 +632,32 @@ >> } >> >> >>+/* Issue kill to specified lwp. */ >>+ >>+static int tkill_failed; >>+ >>+static int >>+kill_lwp (int lwpid, int signo) >>+{ >>+ errno = 0; >>+ >>+/* Use tkill, if possible, in case we are using nptl threads. If tkill >>+ fails, then we are not using nptl threads and we should be using kill. */ >>+ >>+#ifdef HAVE_TKILL_SYSCALL >>+ if (!tkill_failed) >>+ { >>+ int ret = syscall (__NR_tkill, lwpid, signo); >>+ if (errno != ENOSYS) >>+ return ret; >>+ errno = 0; >>+ tkill_failed = 1; >>+ } >>+#endif >>+ >>+ return kill (lwpid, signo); >>+} >>+ >> /* Send a SIGSTOP to LP. */ >> >> static int >>@@ -642,8 +673,15 @@ >> "SC: kill %s ****\n", >> target_pid_to_str (lp->ptid)); >> } >>- ret = kill (GET_LWP (lp->ptid), SIGSTOP); >>- gdb_assert (ret == 0); >>+ errno = 0; >>+ ret = kill_lwp (GET_LWP (lp->ptid), SIGSTOP); >>+ if (debug_lin_lwp) >>+ { >>+ fprintf_unfiltered (gdb_stdlog, >>+ "SC: lwp kill %d %s\n", >>+ ret, >>+ errno ? safe_strerror (errno) : "ERRNO-OK"); >>+ } >> >> lp->signalled = 1; >> gdb_assert (lp->status == 0); >>@@ -667,11 +705,23 @@ >> >> gdb_assert (lp->status == 0); >> >>- pid = waitpid (GET_LWP (lp->ptid), &status, lp->cloned ? __WCLONE : 0); >>+ pid = waitpid (GET_LWP (lp->ptid), &status, 0); >> if (pid == -1 && errno == ECHILD) >>- /* OK, the proccess has disappeared. We'll catch the actual >>- exit event in lin_lwp_wait. */ >>- return 0; >>+ { >>+ pid = waitpid (GET_LWP (lp->ptid), &status, __WCLONE); >>+ if (pid == -1 && errno == ECHILD) >>+ { >>+ /* The thread has previously exited. We need to delete it now >>+ because in the case of nptl threads, there won't be an >>+ exit event unless it is the main thread. */ >>+ if (debug_lin_lwp) >>+ fprintf_unfiltered (gdb_stdlog, >>+ "SWC: %s exited.\n", >>+ target_pid_to_str (lp->ptid)); >>+ delete_lwp (lp->ptid); >>+ return 0; >>+ } >>+ } >> >> gdb_assert (pid == GET_LWP (lp->ptid)); >> >>@@ -683,6 +733,7 @@ >> status_to_str (status)); >> } >> >>+ /* Check if the thread has exited. */ >> if (WIFEXITED (status) || WIFSIGNALED (status)) >> { >> gdb_assert (num_lwps > 1); >>@@ -697,7 +748,31 @@ >> target_pid_to_str (lp->ptid)); >> } >> if (debug_lin_lwp) >>- fprintf_unfiltered (gdb_stdlog, "SWC: %s exited.\n", >>+ fprintf_unfiltered (gdb_stdlog, >>+ "SWC: %s exited.\n", >>+ target_pid_to_str (lp->ptid)); >>+ >>+ delete_lwp (lp->ptid); >>+ return 0; >>+ } >>+ >>+ /* Check if the current LWP has previously exited. For nptl threads, >>+ there is no exit signal issued for LWPs that are not the >>+ main thread so we should check whenever the thread is stopped. */ >>+ if (!lin_lwp_thread_alive (lp->ptid)) >>+ { >>+ if (in_thread_list (lp->ptid)) >>+ { >>+ /* Core GDB cannot deal with us deleting the current >>+ thread. */ >>+ if (!ptid_equal (lp->ptid, inferior_ptid)) >>+ delete_thread (lp->ptid); >>+ printf_unfiltered ("[%s exited]\n", >>+ target_pid_to_str (lp->ptid)); >>+ } >>+ if (debug_lin_lwp) >>+ fprintf_unfiltered (gdb_stdlog, >>+ "SWC: %s already exited.\n", >> target_pid_to_str (lp->ptid)); >> >> delete_lwp (lp->ptid); >>@@ -756,7 +831,14 @@ >> /* If there's another event, throw it back into the queue. */ >> if (lp->status) >> { >>- kill (GET_LWP (lp->ptid), WSTOPSIG (lp->status)); >>+ if (debug_lin_lwp) >>+ { >>+ fprintf_unfiltered (gdb_stdlog, >>+ "SWC: kill %s, %s\n", >>+ target_pid_to_str (lp->ptid), >>+ status_to_str ((int) status)); >>+ } >>+ kill_lwp (GET_LWP (lp->ptid), WSTOPSIG (lp->status)); >> } >> /* Save the sigtrap event. */ >> lp->status = status; >>@@ -800,7 +882,7 @@ >> target_pid_to_str (lp->ptid), >> status_to_str ((int) status)); >> } >>- kill (GET_LWP (lp->ptid), WSTOPSIG (status)); >>+ kill_lwp (GET_LWP (lp->ptid), WSTOPSIG (status)); >> } >> return 0; >> } >>@@ -1049,6 +1131,25 @@ >> >> #endif >> >>+/* Stop an active thread, verify it still exists, then resume it. */ >>+ >>+static int >>+stop_and_resume_callback (struct lwp_info *lp, void *data) >>+{ >>+ struct lwp_info *ptr; >>+ >>+ if (!lp->stopped && !lp->signalled) >>+ { >>+ stop_callback (lp, NULL); >>+ stop_wait_callback (lp, NULL); >>+ /* Resume if the lwp still exists. */ >>+ for (ptr = lwp_list; ptr; ptr = ptr->next) >>+ if (lp == ptr) >>+ resume_callback (lp, NULL); >>+ } >>+ return 0; >>+} >>+ >> static ptid_t >> lin_lwp_wait (ptid_t ptid, struct target_waitstatus *ourstatus) >> { >>@@ -1206,10 +1307,61 @@ >> } >> } >> >>- /* Make sure we don't report a TARGET_WAITKIND_EXITED or >>- TARGET_WAITKIND_SIGNALLED event if there are still LWP's >>- left in the process. */ >>+ /* Check if the thread has exited. */ >> if ((WIFEXITED (status) || WIFSIGNALED (status)) && num_lwps > 1) >>+ { >>+ if (in_thread_list (lp->ptid)) >>+ { >>+ /* Core GDB cannot deal with us deleting the current >>+ thread. */ >>+ if (!ptid_equal (lp->ptid, inferior_ptid)) >>+ delete_thread (lp->ptid); >>+ printf_unfiltered ("[%s exited]\n", >>+ target_pid_to_str (lp->ptid)); >>+ } >>+ >>+ /* If this is the main thread, we must stop all threads and >>+ verify if they are still alive. This is because in the nptl >>+ thread model, there is no signal issued for exiting LWPs >>+ other than the main thread. We only get the main thread >>+ exit signal once all child threads have already exited. >>+ If we stop all the threads and use the stop_wait_callback >>+ to check if they have exited we can determine whether this >>+ signal should be ignored or whether it means the end of the >>+ debugged application, regardless of which threading model >>+ is being used. */ >>+ if (GET_PID (lp->ptid) == GET_LWP (lp->ptid)) >>+ { >>+ lp->stopped = 1; >>+ iterate_over_lwps (stop_and_resume_callback, NULL); >>+ } >>+ >>+ if (debug_lin_lwp) >>+ fprintf_unfiltered (gdb_stdlog, >>+ "LLW: %s exited.\n", >>+ target_pid_to_str (lp->ptid)); >>+ >>+ delete_lwp (lp->ptid); >>+ >>+ /* If there is at least one more LWP, then the exit signal >>+ was not the end of the debugged application and should be >>+ ignored. */ >>+ if (num_lwps > 0) >>+ { >>+ /* Make sure there is at least one thread running. */ >>+ gdb_assert (iterate_over_lwps (running_callback, NULL)); >>+ >>+ /* Discard the event. */ >>+ status = 0; >>+ continue; >>+ } >>+ } >>+ >>+ /* Check if the current LWP has previously exited. In the nptl >>+ thread model, LWPs other than the main thread do not issue >>+ signals when they exit so we must check whenever the thread >>+ has stopped. A similar check is made in stop_wait_callback(). */ >>+ if (num_lwps > 1 && !lin_lwp_thread_alive (lp->ptid)) >> { >> if (in_thread_list (lp->ptid)) >> { >>Index: acconfig.h >>=================================================================== >>RCS file: /cvs/src/src/gdb/acconfig.h,v >>retrieving revision 1.24 >>diff -u -r1.24 acconfig.h >>--- acconfig.h 4 Jan 2003 00:34:42 -0000 1.24 >>+++ acconfig.h 23 Apr 2003 22:52:44 -0000 >>@@ -95,6 +95,9 @@ >> /* Define if using Solaris thread debugging. */ >> #undef HAVE_THREAD_DB_LIB >> >>+/* Define if you support the tkill syscall. */ >>+#undef HAVE_TKILL_SYSCALL >>+ >> /* Define on a GNU/Linux system to work around problems in sys/procfs.h. */ >> #undef START_INFERIOR_TRAPS_EXPECTED >> #undef sys_quotactl >>Index: configure.in >>=================================================================== >>RCS file: /cvs/src/src/gdb/configure.in,v >>retrieving revision 1.126 >>diff -u -r1.126 configure.in >>--- configure.in 26 Feb 2003 15:10:47 -0000 1.126 >>+++ configure.in 23 Apr 2003 22:52:44 -0000 >>@@ -384,6 +384,7 @@ >> AC_CHECK_FUNCS(setpgid setpgrp) >> AC_CHECK_FUNCS(sigaction sigprocmask sigsetmask) >> AC_CHECK_FUNCS(socketpair) >>+AC_CHECK_FUNCS(syscall) >> >> dnl AC_FUNC_SETPGRP does not work when cross compiling >> dnl Instead, assume we will have a prototype for setpgrp if cross compiling. >>@@ -909,6 +910,24 @@ >> if test "x$gdb_cv_thread_db_h_has_td_notalloc" = "xyes"; then >> AC_DEFINE(THREAD_DB_HAS_TD_NOTALLOC, 1, >> [Define if has the TD_NOTALLOC error code.]) >>+fi >>+ >>+dnl See if we have a sys/syscall header file that has __NR_tkill. >>+if test "x$ac_cv_header_sys_syscall_h" = "xyes"; then >>+ AC_CACHE_CHECK([whether has __NR_tkill], >>+ gdb_cv_sys_syscall_h_has_tkill, >>+ AC_TRY_COMPILE( >>+ [#include ], >>+ [int i = __NR_tkill;], >>+ gdb_cv_sys_syscall_h_has_tkill=yes, >>+ gdb_cv_sys_syscall_h_has_tkill=no >>+ ) >>+ ) >>+fi >>+dnl See if we can issue tkill syscall. >>+if test "x$gdb_cv_sys_syscall_h_has_tkill" = "xyes" && test "x$ac_cv_func_syscall" = "xyes"; then >>+ AC_DEFINE(HAVE_TKILL_SYSCALL, 1, >>+ [Define if we can use the tkill syscall.]) >> fi >> >> dnl Handle optional features that can be enabled. >>Index: i386-linux-nat.c >>=================================================================== >>RCS file: /cvs/src/src/gdb/i386-linux-nat.c,v >>retrieving revision 1.44 >>diff -u -r1.44 i386-linux-nat.c >>--- i386-linux-nat.c 16 Apr 2003 15:22:02 -0000 1.44 >>+++ i386-linux-nat.c 23 Apr 2003 22:52:44 -0000 >>@@ -70,6 +70,9 @@ >> /* Defines I386_LINUX_ORIG_EAX_REGNUM. */ >> #include "i386-linux-tdep.h" >> >>+/* Defines ps_err_e, struct ps_prochandle. */ >>+#include "gdb_proc_service.h" >>+ >> /* Prototypes for local functions. */ >> static void dummy_sse_values (void); >> >>@@ -682,6 +685,21 @@ >> offsetof (struct user, u_debugreg[regnum]), value); >> if (errno != 0) >> perror_with_name ("Couldn't write debug register"); >>+} >>+ >>+extern ps_err_e >>+ps_get_thread_area(const struct ps_prochandle *ph, >>+ lwpid_t lwpid, int idx, void **base) >>+{ >>+ unsigned long int desc[3]; >>+#define PTRACE_GET_THREAD_AREA 25 >>+ >>+ if (ptrace (PTRACE_GET_THREAD_AREA, >>+ lwpid, (void *) idx, (unsigned long) &desc) < 0) >>+ return PS_ERR; >>+ >>+ *(int *)base = desc[1]; >>+ return PS_OK; >> } >> >> void > >