From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp.CeBiTec.Uni-Bielefeld.DE (smtp.CeBiTec.Uni-Bielefeld.DE [129.70.160.84]) by sourceware.org (Postfix) with ESMTPS id A46C238930F4 for ; Wed, 17 Jun 2020 14:45:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org A46C238930F4 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=CeBiTec.Uni-Bielefeld.DE Authentication-Results: sourceware.org; spf=none smtp.mailfrom=ro@cebitec.uni-bielefeld.de Received: from localhost (localhost [127.0.0.1]) by smtp.CeBiTec.Uni-Bielefeld.DE (Postfix) with ESMTP id 7CD85AC005; Wed, 17 Jun 2020 16:45:53 +0200 (CEST) X-Virus-Scanned: amavisd-new at CeBiTec.Uni-Bielefeld.DE Received: from smtp.CeBiTec.Uni-Bielefeld.DE ([127.0.0.1]) by localhost (smtp.cebitec.uni-bielefeld.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id R5jmzcQxXfNh; Wed, 17 Jun 2020 16:45:53 +0200 (CEST) Received: from manam.CeBiTec.Uni-Bielefeld.DE (p4fddbb33.dip0.t-ipconnect.de [79.221.187.51]) by smtp.CeBiTec.Uni-Bielefeld.DE (Postfix) with ESMTPSA id DE7F5AB7FF; Wed, 17 Jun 2020 16:45:52 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=CeBiTec.Uni-Bielefeld.DE; s=20200306; t=1592405153; bh=XkyzI9aAshRYOQwgJa9oFgVrEw25YIOM6GVWZNf+C3A=; h=From:To:Cc:Subject:References:Date:In-Reply-To:From; b=kWonWKLX6Rl7u4mRQQsENxihBXkJrhsMLhth4J1tWC+iqwNJ+t9+9MaFJcW9Blv2h GHR1zwyDK/yr9KErEp/GtgKD3PC/YxYL0oM7TqewLKZoNOj0eepptDBtKc+uU1eWvu ZCGMTzpIjUvRkdm8/3qKIuwiPLH1ZJt+Esjzng8NlvJVzibMsm4fJIeyQiaD+/cL4s fjUL+8kJ7TYwwaQ04g7yx64Lwpz8WNH1sf3IhBWsOI2OCBp4DfqNdhM+lesGuoKEHK 2nMfN9TMc3AWh2/Yvtqjfn2ogHfLdD7kqAJBLFiMTruj5u0ljaDyMqfqjBih3BOeFo nq8G/FJGvVFWg== From: Rainer Orth To: Pedro Alves Cc: gdb-patches@sourceware.org Subject: Re: Unbreaking gdb on Solaris post-multitarget [PR 25939] References: <7fb790ae-61a9-a6a3-3b87-74fcac400664@redhat.com> Date: Wed, 17 Jun 2020 16:45:51 +0200 In-Reply-To: <7fb790ae-61a9-a6a3-3b87-74fcac400664@redhat.com> (Pedro Alves's message of "Tue, 16 Jun 2020 20:16:38 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (usg-unix-v) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-3791.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Jun 2020 14:45:57 -0000 Hi Pedro, > On 6/16/20 3:21 PM, Rainer Orth wrote: >> Some time ago, when testing gdb master on Solaris again after several >> months, I discovered that gdb couldn't execute even a trivial program >> anymore. This had gone unnoticed by the Solaris buildbots since the >> code continued to compile just fine. Those bots are build-only since >> many tests (especially thread tests) are either flaky or time out. >> >> A reghunt identified the multi-target merge as the culprit. > > I'm sorry about that. no worries: the Solaris port had been in a relatively bad shape even before, so maybe this will allow to get to the bottom of things and fix them. >> I've managed to get a bit further with the following patch which is >> intended to push the procfs target first: > > That patch looks good to me. Thanks. >> However, while I now get over the initial assertion failure, I run >> instead into >> >> procfs: couldn't find pid 0 in procinfo list. >> procfs: init_inferior, open_proc_files line 2878, /proc/6031: No such file or directory. >> >> When I break in procfs.c (procfs_init_inferior), I can see that >> create_procinfo succeeds. However, looking at the process tree at this >> point, I see that the debuggee is still marked as defunct >> >> 18377 /vol/gcc/bin/gdb -i=mi /vol/gnu/obj/gdb/gdb/reghunt/no-r >> 18379 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122457/gdb/gdb >> 18382 >> >> so open_procinfo_files fails because /proc/ only contains psinfo >> and usage, but no ctl file yet. >> >> I tried to do the same with a version of gdb from immediately before the >> multi-target merge: while that can run a test program interactively just >> fine, > > It's not clear to me whether you're saying that a version from before > the multi-target changes can run a test program fine due to not needing > the push_target fix, or whether the multi-target patchset itself caused > this second issue you're observing even when debugging a simple hello > program. I've experimented a bit more yesterday. Immediately before the multi-target patch, I have: $ cat top-gdb.gdb file ./gdb run -q -D data-directory -x bottom-gdb.gdb $ cat bottom-gdb.gdb file ./hello b main run $ gdb-9 -q -x top-gdb.gdb Setting up the environment for debugging gdb. Breakpoint 1 at 0x196c898: file /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/gdbsupport/errors.c, line 54. Breakpoint 2 at 0x179e138: file /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/cli/cli-cmds.c, line 201. [Thread debugging using libthread_db enabled] [New Thread 1 (LWP 1)] [New LWP 2 ] [New LWP 3 ] [New LWP 4 ] [New LWP 5 ] [New LWP 6 ] [New LWP 7 ] [New LWP 8 ] [New LWP 9 ] Breakpoint 1 at 0x401036: file hello.c, line 6. [Thread debugging using libthread_db enabled] [New Thread 1 (LWP 1)] [Switching to Thread 1 (LWP 1)] Thread 2 hit Breakpoint 1, main () at hello.c:6 6 printf ("Hello world\n"); At that point the process hierarchy is as expected: 22745 gdb-9 -q -x top-gdb.gdb 22761 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122456/gdb/gdb -q 22768 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122456/gdb/hell With the multi-target merge, my push_target and the worker-threads disabled (more below), I get instead $ gdb -q -x ~/top-gdb.gdb Setting up the environment for debugging gdb. Breakpoint 1 at 0x197ca44: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/gdbsupport/errors.c, line 54. Breakpoint 2 at 0x17adf8a: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/cli/cli-cmds.c, line 201. [Thread debugging using libthread_db enabled] [New Thread 1 (LWP 1)] Breakpoint 1 at 0x401036: file hello.c, line 6. bottom-gdb.gdb:3: Error in sourced command file: procfs: couldn't find pid 0 in procinfo list. and this process tree: 23011 gdb-9 -q -x top-gdb.gdb 23012 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122457/gdb/gdb -q 23013 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122457/gdb/hell However, if I add b find_procinfo_or_die to investigate the above error ("couldn't find pid 0), with the mt patch there's Setting up the environment for debugging gdb. Breakpoint 1 at 0x197ca44: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/gdbsupport/errors.c, line 54. Breakpoint 2 at 0x17adf8a: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/cli/cli-cmds.c, line 201. Breakpoint 3 at 0x1afc288: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/procfs.c, line 327. [Thread debugging using libthread_db enabled] [New Thread 1 (LWP 1)] Breakpoint 1 at 0x401036: file hello.c, line 6. bottom-gdb.gdb:3: Error in sourced command file: procfs: init_inferior, open_proc_files line 2879, /proc/23022: No such file or directory. [Switching to Thread 1 (LWP 1)] Thread 2 hit Breakpoint 3, find_procinfo_or_die (pid=23022, tid=0) at /vol/gnu/src/gdb/hg/master/reghunt/gdb/procfs.c:327 327 procinfo *pi = find_procinfo (pid, tid); which is no wonder given the child process is marked as defunct, so its /proc files cannot be opened: 23020 gdb-9 -q -x top-gdb.gdb 23021 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122457/gdb/gdb -q 23022 However, when I try the same in the pre-mt-patch gdb: Setting up the environment for debugging gdb. Breakpoint 1 at 0x196c898: file /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/gdbsupport/errors.c, line 54. Breakpoint 2 at 0x179e138: file /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/cli/cli-cmds.c, line 201. Breakpoint 3 at 0x1ae7e26: file /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/procfs.c, line 325. [Thread debugging using libthread_db enabled] [New Thread 1 (LWP 1)] [New LWP 2 ] [New LWP 3 ] [New LWP 4 ] [New LWP 5 ] [New LWP 6 ] [New LWP 7 ] [New LWP 8 ] [New LWP 9 ] Breakpoint 1 at 0x401036: file hello.c, line 6. bottom-gdb.gdb:3: Error in sourced command file: procfs: init_inferior, open_proc_files line 2870, /proc/23028: No such file or directory. [New Thread 2 ] [New Thread 3 ] [New Thread 4 ] [New Thread 5 ] [New Thread 6 ] [New Thread 7 ] [New Thread 8 ] [New Thread 9 ] [Switching to Thread 1 (LWP 1)] Thread 2 hit Breakpoint 3, find_procinfo_or_die (pid=23028, tid=0) at /vol/gnu/src/gdb/hg/master/reghunt-122456/gdb/procfs.c:325 325 procinfo *pi = find_procinfo (pid, tid); I get the same error and the same defunct process: 23026 gdb-9 -q -x top-gdb.gdb 23027 /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122456/gdb/gdb -q 23028 This obviously makes debugging extra hard ;-( However, this error isn't entirely new: when running the gdb testsuite before the mt merge, I get several variations of this error $ grep -a "couldn't find pid" gdb.log |sort|uniq -c 2 Error in re-setting breakpoint 2: procfs: couldn't find pid 0 in procinfo list. 2 Error in re-setting breakpoint 5: procfs: couldn't find pid 0 in procinfo list. 99 procfs: couldn't find pid -1 in procinfo list. 22 procfs: couldn't find pid 0 in procinfo list. 5 procfs: couldn't find pid 21415 in procinfo list. 5 procfs: couldn't find pid 21618 in procinfo list. 10 procfs: couldn't find pid 22032 in procinfo list. 5 procfs: couldn't find pid 22457 in procinfo list. 5 procfs: couldn't find pid 22678 in procinfo list. 10 procfs: couldn't find pid 22985 in procinfo list. > running that gdb under gdb itself most often leads to the same >> error. This very much seems like a race condition to me, but at the >> moment I'm pretty much at a loss how to investigate this further. > > Could this be a race somehow more exposed now due to GDB now spawning worker > threads? What happens if you debug a GDB that doesn't spawn worker > threads? Like: > > ./gdb -D ./data-directory --args ./gdb -ex "maint set worker-threads 0" This doesn't work because master gdb cannot debug anything, without or with the push_target fix. When instead I use a gdb 9.1 as top gdb, I get $ gdb-9 -q --args ./gdb -D data-directory -ex "maint set worker-threads 0" Reading symbols from ./gdb... Setting up the environment for debugging gdb. Breakpoint 1 at 0x197ca44: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/gdbsupport/errors.c, line 54. Breakpoint 2 at 0x17adf8a: file /vol/gnu/src/gdb/hg/master/reghunt/gdb/cli/cli-cmds.c, line 201. (top-gdb) run Starting program: /vol/gnu/obj/gdb/gdb/reghunt/no-resync/122457/gdb/gdb can't handle command-line argument containing whitespace When instead I use $ cat top-gdb-mt.gdb file ./gdb-mt run -q -D data-directory -x bottom-gdb-mt.gdb $ cat bottom-gdb-mt.gdb maint set worker-threads 0 file ./hello b main run $ gdb-9 -q -x top-gdb-mt.gdb [Thread debugging using libthread_db enabled] [New Thread 1 (LWP 1)] [New LWP 2 ] [New LWP 3 ] [New LWP 4 ] [New LWP 5 ] [New LWP 6 ] [New LWP 7 ] [New LWP 8 ] [New LWP 9 ] [LWP 8 exited] [New LWP 8 ] [LWP 6 exited] [New LWP 6 ] [LWP 9 exited] [New LWP 9 ] [LWP 5 exited] [New LWP 5 ] [LWP 7 exited] [New LWP 7 ] [LWP 2 exited] [New LWP 2 ] [LWP 3 exited] [New LWP 3 ] [LWP 4 exited] [New LWP 4 ] Breakpoint 1 at 0x401036: file hello.c, line 6. bottom-gdb-mt.gdb:4: Error in sourced command file: procfs: couldn't find pid 0 in procinfo list. > Does that problem trigger as often that way? The failure is still reproducible that way, but even more verbose (imagine that on that 160-core system I spoke of ;-) To avoid that for the moment, I've changed n_worker_threads to 0 for now. > Or, what happens if you use master GDB with your push_target fix > to debug an older GDB? Master GDB cannot debug anything, unfortunately. Rainer -- ----------------------------------------------------------------------------- Rainer Orth, Center for Biotechnology, Bielefeld University