From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 46340 invoked by alias); 15 Jun 2018 12:12:08 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 46330 invoked by uid 89); 15 Jun 2018 12:12:07 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.8 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=resume, hoped, sends, reliably X-HELO: mx1.redhat.com Received: from mx3-rdu2.redhat.com (HELO mx1.redhat.com) (66.187.233.73) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 15 Jun 2018 12:12:06 +0000 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6252E8A3DD; Fri, 15 Jun 2018 12:12:04 +0000 (UTC) Received: from [127.0.0.1] (ovpn04.gateway.prod.ext.ams2.redhat.com [10.39.146.4]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6602120284D6; Fri, 15 Jun 2018 12:12:03 +0000 (UTC) Subject: Re: [PATCHv2] gdb: Don't drop SIGSTOP during stop_all_threads To: Andrew Burgess , gdb-patches@sourceware.org References: <6841cae5-f4d4-a1d4-be74-4db2718bb949@redhat.com> <20180614194601.13322-1-andrew.burgess@embecosm.com> Cc: Richard Bunt From: Pedro Alves Message-ID: <2a06039a-26dc-bb85-af2d-a7a77a56a8d4@redhat.com> Date: Fri, 15 Jun 2018 12:12:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <20180614194601.13322-1-andrew.burgess@embecosm.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-SW-Source: 2018-06/txt/msg00391.txt.bz2 On 06/14/2018 08:46 PM, Andrew Burgess wrote: > This is an update based on Pedro's feedback: > Thank you. > So, in this version the preload library is a little more complex. I > now try to emulate the kernels sending of SIGCHLD after a short period > of time in order to make it appear like the child thread status really > has just arrived from the kernel later than it really did. This works > better than I had hoped, with this change in place not only does the > test now pass on gdbserver, but with the preload library I can (in a > very simple test) use GDB as normal. Awesome. I found a few more small nits, but this is OK with those fixed. > +# > +# GDB would then trigger a call to stop_all_threads which would > +# continue to wait for all of the outstanding threads to stop, when > +# the outstanding stop events finally arrived GDB would then > +# (incorrectly) discard the stop event, resume the thread, and > +# continue to wait for the thread to stop.... which it now never > +# would. > +# > +# In order to try and expose this issue reliably, this test preloads a > +# library that intercepts waitpid calls. All waitpid calls targeting > +# pid -1 with the WNOHANG flag are rate limited so that only 1 per > +# second can complete. Additional calls are forced to return 0 > +# indicating no event waiting. This is enough to trigger the bug > +# during the attach phase, however, it breaks the rest of GDB's > +# behaviour, so we can't do more than attach with this library in > +# place. Does this last sentence ("breaks the rest of GDB") need updating? > + If the kernel is slow in either delivering the signal, or making the > + result available to the waitpid call then GDB will enter a sigsuspend > + call in order to wait for the inferior threads to change state, this is > + signalled to GDB with a SIGCHLD. > + > + A bug in GDB meant that in some cases we would deadlock during this > + process. This was rarely seen as the kernel is usually quick at > + delivering signals and making the results available to waitpid, so quick > + that GDB would gather the statuses from all inferior threads in the > + original pass. > + > + The idea in this library is to rate limit calls to waitpid (where pid is > + -1 and the WNOHANG option is set) so that only 1 per second can return > + an answer. Any additional calls will report that no threads are > + currently ready. This should match the behaviour we see on a slow > + kernel. > + > + However, given that usually when using this library, the kernel does > + have the waitpid result ready this means that the kernel will never send > + GDB a SIGCHLD. This means that when GDB enters sigsuspend it will block > + forever. Alternatively, if GDB enters it's polling loop the lack of s/is's/its/ > + SIGCHLD means that we will never see an event on the child threads. To > + resolve these problems the library intercepts calls to sigsuspend and > + forces the call to exit if there is a pending waitpid result. Also, > + when we know that there's a waitpid result that we've ignored, we create > + a new thread which, after a short delay, will send GDB a SIGCHLD. */ > + > + > +/* Cache the result of a waitpid call that has not been reported back to > + GDB yet. We only ever cache a single result. Once we have a result > + cached then later calls to waitpid with the WNOHANG option will return a > + result of 0. */ > + > +struct > +{ > + /* Flag to indicate when we have a result cached. */ > + int cached_p; > + > + /* The cached result fields from a waitpid call. */ > + pid_t pid; > + int wstatus; > +} cached_wait_status; "static" to avoid interposition issues. > + > +/* Lock to hold when modifying SIGNAL_THREAD_ACTIVE_P. */ > + > +pthread_mutex_t thread_creation_lock_obj = PTHREAD_MUTEX_INITIALIZER; Ditto. > +#define thread_creation_lock (&thread_creation_lock_obj) > + > +/* This flag is only modified while holding the THREAD_CREATION_LOCK mutex. > + When this flag is true then there is a signal thread alive that will be > + sending a SIGCHLD at some point in the future. */ > + > +int signal_thread_active_p; Ditto. > + > +/* When we last allowed a waitpid to complete. */ > + > +static struct timeval last_waitpid_time = { 0, 0 }; > + > +/* The number of seconds that must elapse between calls to waitpid where > + the pid is -1 and the WNOHANG option is set. If calls occur faster than > + this then we force a result of 0 to be returned from waitpid. */ > + > +#define WAITPID_MIN_TIME (1) > + > +/* Return true (non-zero) if we should skip this call to waitpid, or false > + (zero) if this waitpid call should be handled with a call to the "real" > + waitpid library. Allows 1 waitpid call per second. */ s/waitpid library/waitpid function/ ? > + > +static int > +should_skip_waitpid (void) > +{ > + struct timeval *tv = &last_waitpid_time; > + if (tv->tv_sec == 0) > + { > + if (gettimeofday (tv, NULL) < 0) > + error ("error: gettimeofday failed\n"); > + return 0; /* Don't skip. */ > + } > + else > + { > + struct timeval new_tv; > + > + if (gettimeofday (&new_tv, NULL) < 0) > + error ("error: gettimeofday failed\n"); > + > + if ((new_tv.tv_sec - tv->tv_sec) < WAITPID_MIN_TIME) > + return 1; /* Skip. */ > + > + *tv = new_tv; > + } > + > + /* Don't skip. */ > + return 0; > +} > + > +/* Perform a real waitpid call. */ > + > +static pid_t > +real_waitpid (pid_t pid, int *wstatus, int options) > +{ > + typedef pid_t (*fptr_t) (pid_t, int *, int); > + static fptr_t real_func = NULL; > + > + if (real_func == NULL) > + { > + real_func = dlsym (RTLD_NEXT, "waitpid"); > + if (real_func == NULL) > + error ("error: failed to find real waitpid\n"); > + } > + > + return (*real_func) (pid, wstatus, options); > +} > + > +/* Thread worked created when we cached a waitpid result. Delays for s/worked/worker/ s/cached/cache/ > + a short period of time and then sends SIGCHLD to the GDB process. This > + should trigger GDB to call waitpid again, at which point we will make > + the cached waitpid result available. */ > + > +static void* > +send_sigchld_thread (void *arg) > +{ > + /* Delay one second longer than WAITPID_MIN_TIME so that there can be no > + chance that a call to SHOULD_SKIP_WAITPID will return true one the s/one/once/ Thanks, Pedro Alves