From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 119872 invoked by alias); 13 Aug 2018 12:03:52 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 118898 invoked by uid 89); 13 Aug 2018 12:03:52 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-2.8 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mx1.redhat.com Received: from mx3-rdu2.redhat.com (HELO mx1.redhat.com) (66.187.233.73) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 13 Aug 2018 12:03:50 +0000 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 63D694023461; Mon, 13 Aug 2018 12:03:49 +0000 (UTC) Received: from [127.0.0.1] (ovpn04.gateway.prod.ext.ams2.redhat.com [10.39.146.4]) by smtp.corp.redhat.com (Postfix) with ESMTP id A43132156712; Mon, 13 Aug 2018 12:03:48 +0000 (UTC) Subject: Re: [PATCH] gdb: Fix instability in thread groups test To: Andrew Burgess References: <20180810095750.13017-1-andrew.burgess@embecosm.com> <7da382e5-bd5e-25c2-b3f8-f38e692f35a1@redhat.com> <20180813114137.GX3155@embecosm.com> Cc: Simon Marchi , gdb-patches@sourceware.org From: Pedro Alves Message-ID: <2e47657d-b81b-497d-58bf-0463980dec24@redhat.com> Date: Mon, 13 Aug 2018 12:03:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <20180813114137.GX3155@embecosm.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-SW-Source: 2018-08/txt/msg00329.txt.bz2 On 08/13/2018 12:41 PM, Andrew Burgess wrote: > * Pedro Alves [2018-08-13 10:51:44 +0100]: > >> But shouldn't we make GDB handle this better? Make the output >> more "atomic" in the sense that we either show a valid complete >> entry, or no entry? There's an inherent race >> here, since we use multiple /proc accesses to fill up a process >> entry. If we start fetching process info for a process, and the process >> disappears midway, I'd think it better to discard that process's entry, >> as-if we had not even seen it, i.e., as if we had listed the set of >> processes a tiny moment later. > > I agree. > > We also need to think about process reuse. So with multiple accesses > to /proc we might start with one process, and end up with a completely > new process. > > I might be overthinking it, but my first guess at a reliable strategy > would be: > > 1. Find each /proc/PID directory. > 2. Read /proc/PID/stat and extract the start time. Failure to read > this causes the process to be abandoned. > 3. Read all of the other /proc/PID/XXX files as needed. Any failure > results in the process being abandoned. > 4. Reread /proc/PID/stat and confirm the start time hasn't changed, > this would indicate a new process having slipped in. > My initial quick thought was just to drop the process entry if it turns out we end up with an empty core set. I wonder whether we can prevent PID reuse by keeping a descriptor for /proc/PID/ open while we open the other files. Probably not. Otherwise, your scheme sounds like the next best. > Given the system is still running, we can never be sure that we have > "all" processes, so throwing out anything that looks wrong seems like > the right strategy. > > Also in step #4 we know we've just missed a process - something new > has started, but we ignore it. I think this is fine though given the > racy nature of this sort of thing... > > The only question is, could these thoughts be dropped into a bug > report, Sure. > and the original patch to remove the unstable result applied? > Or maybe the test updated to either PASS or KFAIL? I'd prefer the KFAIL option. At the very least, a comment in the .exp file. Thanks, Pedro Alves