From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 39840 invoked by alias); 13 Aug 2018 11:41:43 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 38052 invoked by uid 89); 13 Aug 2018 11:41:42 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-26.5 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=nature, H*Ad:D*ca, unstable, H*r:sk:host81- X-HELO: mail-wm0-f66.google.com Received: from mail-wm0-f66.google.com (HELO mail-wm0-f66.google.com) (74.125.82.66) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 13 Aug 2018 11:41:41 +0000 Received: by mail-wm0-f66.google.com with SMTP id n11-v6so8483592wmc.2 for ; Mon, 13 Aug 2018 04:41:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to :user-agent; bh=UOMkaHJXadYilCk+RnVtBd1afUzKOzl8sKpLpKdRftc=; b=RSQyR5ZH+3YqnF72Yo/PTN+DnypqIfXDL96YlB50/faNcKsKualbNsXtEAj3Paygin eCbk3uv4hfFieI55onJ81d4VUsAC2NabmuTqQcvVgEmEVM3vbPjEXqliW8o5ZPQOrejc iroLOYlSq5swGAtxlSt+tpNAaq9GC5DAiZbvP+MoYvWgqaWoJUzvadC5P8WK8K93zhNy NxJkcY9HW3cwIE9YnHGVv6ZaGKwN70c66Cb1W6VBR9LhagcrtuqGENAiEVQFFca0cZyN VHKuISsBx4ZiemWimD5CopoJd9VPGm//B+BZRUiuCZBTAW9NZxa43jUET5GvwG+7SeS+ pbwQ== Return-Path: Received: from localhost (host81-140-215-41.range81-140.btcentralplus.com. [81.140.215.41]) by smtp.gmail.com with ESMTPSA id i15-v6sm12517356wro.7.2018.08.13.04.41.38 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 13 Aug 2018 04:41:38 -0700 (PDT) Date: Mon, 13 Aug 2018 11:41:00 -0000 From: Andrew Burgess To: Pedro Alves Cc: Simon Marchi , gdb-patches@sourceware.org Subject: Re: [PATCH] gdb: Fix instability in thread groups test Message-ID: <20180813114137.GX3155@embecosm.com> References: <20180810095750.13017-1-andrew.burgess@embecosm.com> <7da382e5-bd5e-25c2-b3f8-f38e692f35a1@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: <7da382e5-bd5e-25c2-b3f8-f38e692f35a1@redhat.com> X-Fortune: Alright, you!! Imitate a WOUNDED SEAL pleading for a PARKING SPACE!! X-Editor: GNU Emacs [ http://www.gnu.org/software/emacs ] User-Agent: Mutt/1.9.2 (2017-12-15) X-IsSubscribed: yes X-SW-Source: 2018-08/txt/msg00328.txt.bz2 * Pedro Alves [2018-08-13 10:51:44 +0100]: > On 08/10/2018 10:26 PM, Simon Marchi wrote: > > On 2018-08-10 05:57, Andrew Burgess wrote: > >> In the test script gdb.mi/list-thread-groups-available.exp we ask GDB > >> to list all thread groups, and match the output against a > >> regexp. Occasionally, I would see this test fail. > >> > >> The expected output is a list of entries, each entry looking roughly > >> like this: > >> > >> =A0 {id=3D"",type=3D"process",description=3D"", > >> =A0=A0 user=3D"",cores=3D["","",...]} > >> > >> All the fields after 'id' and 'type' are optional, and the 'cores' > >> list can contain 1 or more "" entries. > >> > >> On my machine (Running Fedora 27, kernel 4.17.3-100.fc27.x86_64) > >> usually the 'description' is a non-empty string, and the 'cores' list > >> has at least one entry in it.=A0 But sometimes, very rarely, I'll see = an > >> entry in the process group list where the 'description' is an empty > >> string, the 'user' is the string "?", and the 'cores' list is empty. > >> Such an entry looks like this: > >> > >> =A0=A0 {id=3D"19863",type=3D"process",description=3D"",user=3D"?",core= s=3D[]} > >> > >> I believe that this is caused by the process exiting while GDB is > >> scanning /proc for process information.=A0 The current code in > >> gdb/nat/linux-osdata.c is not (I think) resilient against exiting > >> processes. > >> > >> This commit adjusts the regex that matches the 'cores' list so that an > >> empty list is acceptable, with this patch in place the test script > >> gdb.mi/list-thread-groups-available.exp never fails for me now. > >> > >> gdb/testsuite/ChangeLog: > >> > >> =A0=A0=A0=A0* gdb.mi/list-thread-groups-available.exp: Update test reg= exp. > >> --- > >> =A0gdb/testsuite/ChangeLog=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 | 4 ++++ > >> =A0gdb/testsuite/gdb.mi/list-thread-groups-available.exp | 2 +- > >> =A02 files changed, 5 insertions(+), 1 deletion(-) > >> > >> diff --git a/gdb/testsuite/gdb.mi/list-thread-groups-available.exp > >> b/gdb/testsuite/gdb.mi/list-thread-groups-available.exp > >> index c4dab2a2c34..88f9ee9b63d 100644 > >> --- a/gdb/testsuite/gdb.mi/list-thread-groups-available.exp > >> +++ b/gdb/testsuite/gdb.mi/list-thread-groups-available.exp > >> @@ -45,7 +45,7 @@ set id_re "id=3D\"$decimal\"" > >> =A0set type_re "type=3D\"process\"" > >> =A0set description_re "description=3D\"$string_re\"" > >> =A0set user_re "user=3D\"$string_re\"" > >> -set cores_re "cores=3D\\\[\"$decimal\"(,\"$decimal\")*\\\]" > >> +set cores_re "cores=3D\\\[(\"$decimal\"(,\"$decimal\")*)?\\\]" > >> > >> =A0# List all available processes. > >> =A0set process_entry_re > >> "{${id_re},${type_re}(,$description_re)?(,$user_re)?(,$cores_re)?}" > >=20 > > Hi Andrew, > >=20 > > The patch LGTM.=A0 I manually reproduced this case by spawning a proces= s (tail -f /dev/null) and noting its pid.=A0 In linux_xfer_osdata_processes= , I added: > >=20 > > =A0 if (pid =3D=3D ) > > =A0=A0=A0 sleep (5); > >=20 > > and killing the process during that sleep. >=20 > But shouldn't we make GDB handle this better? Make the output > more "atomic" in the sense that we either show a valid complete > entry, or no entry? There's an inherent race > here, since we use multiple /proc accesses to fill up a process > entry. If we start fetching process info for a process, and the process > disappears midway, I'd think it better to discard that process's entry, > as-if we had not even seen it, i.e., as if we had listed the set of > processes a tiny moment later. I agree. We also need to think about process reuse. So with multiple accesses to /proc we might start with one process, and end up with a completely new process. I might be overthinking it, but my first guess at a reliable strategy would be: 1. Find each /proc/PID directory. 2. Read /proc/PID/stat and extract the start time. Failure to read this causes the process to be abandoned. 3. Read all of the other /proc/PID/XXX files as needed. Any failure results in the process being abandoned. 4. Reread /proc/PID/stat and confirm the start time hasn't changed, this would indicate a new process having slipped in. Given the system is still running, we can never be sure that we have "all" processes, so throwing out anything that looks wrong seems like the right strategy. Also in step #4 we know we've just missed a process - something new has started, but we ignore it. I think this is fine though given the racy nature of this sort of thing... The only question is, could these thoughts be dropped into a bug report, and the original patch to remove the unstable result applied? Or maybe the test updated to either PASS or KFAIL? Thanks, Andrew