From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-149780-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 119872 invoked by alias); 13 Aug 2018 12:03:52 -0000
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
Received: (qmail 118898 invoked by uid 89); 13 Aug 2018 12:03:52 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-2.8 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=
X-HELO: mx1.redhat.com
Received: from mx3-rdu2.redhat.com (HELO mx1.redhat.com) (66.187.233.73) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 13 Aug 2018 12:03:50 +0000
Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6])	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))	(No client certificate requested)	by mx1.redhat.com (Postfix) with ESMTPS id 63D694023461;	Mon, 13 Aug 2018 12:03:49 +0000 (UTC)
Received: from [127.0.0.1] (ovpn04.gateway.prod.ext.ams2.redhat.com [10.39.146.4])	by smtp.corp.redhat.com (Postfix) with ESMTP id A43132156712;	Mon, 13 Aug 2018 12:03:48 +0000 (UTC)
Subject: Re: [PATCH] gdb: Fix instability in thread groups test
To: Andrew Burgess <andrew.burgess@embecosm.com>
References: <20180810095750.13017-1-andrew.burgess@embecosm.com> <d3447a86dc47513f44c82b799330fee6@polymtl.ca> <7da382e5-bd5e-25c2-b3f8-f38e692f35a1@redhat.com> <20180813114137.GX3155@embecosm.com>
Cc: Simon Marchi <simon.marchi@polymtl.ca>, gdb-patches@sourceware.org
From: Pedro Alves <palves@redhat.com>
Message-ID: <2e47657d-b81b-497d-58bf-0463980dec24@redhat.com>
Date: Mon, 13 Aug 2018 12:03:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0
MIME-Version: 1.0
In-Reply-To: <20180813114137.GX3155@embecosm.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-SW-Source: 2018-08/txt/msg00329.txt.bz2

On 08/13/2018 12:41 PM, Andrew Burgess wrote:
> * Pedro Alves <palves@redhat.com> [2018-08-13 10:51:44 +0100]:
> 
>> But shouldn't we make GDB handle this better?  Make the output
>> more "atomic" in the sense that we either show a valid complete
>> entry, or no entry?  There's an inherent race
>> here, since we use multiple /proc accesses to fill up a process
>> entry.  If we start fetching process info for a process, and the process
>> disappears midway, I'd think it better to discard that process's entry,
>> as-if we had not even seen it, i.e., as if we had listed the set of
>> processes a tiny moment later.
> 
> I agree.
> 
> We also need to think about process reuse.  So with multiple accesses
> to /proc we might start with one process, and end up with a completely
> new process.
> 
> I might be overthinking it, but my first guess at a reliable strategy
> would be:
> 
>   1. Find each /proc/PID directory.
>   2. Read /proc/PID/stat and extract the start time.  Failure to read
>      this causes the process to be abandoned.
>   3. Read all of the other /proc/PID/XXX files as needed.  Any failure
>      results in the process being abandoned.
>   4. Reread /proc/PID/stat and confirm the start time hasn't changed,
>      this would indicate a new process having slipped in.
> 

My initial quick thought was just to drop the process entry if
it turns out we end up with an empty core set.  

I wonder whether we can prevent PID reuse by keeping a descriptor
for /proc/PID/ open while we open the other files.  Probably not.
Otherwise, your scheme sounds like the next best.

> Given the system is still running, we can never be sure that we have
> "all" processes, so throwing out anything that looks wrong seems like
> the right strategy.
> 
> Also in step #4 we know we've just missed a process - something new
> has started, but we ignore it.  I think this is fine though given the
> racy nature of this sort of thing...
> 
> The only question is, could these thoughts be dropped into a bug
> report, 


Sure.


> and the original patch to remove the unstable result applied?
> Or maybe the test updated to either PASS or KFAIL?

I'd prefer the KFAIL option.  At the very least, a comment in
the .exp file.

Thanks,
Pedro Alves