From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 18430 invoked by alias); 27 Jun 2018 18:16:20 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 18419 invoked by uid 89); 27 Jun 2018 18:16:20 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-6.7 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=deals, 3200, traffic, 1350 X-HELO: mx1.redhat.com Received: from mx3-rdu2.redhat.com (HELO mx1.redhat.com) (66.187.233.73) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 27 Jun 2018 18:16:16 +0000 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C8831406EA5E for ; Wed, 27 Jun 2018 18:16:14 +0000 (UTC) Received: from localhost (unused-10-15-17-196.yyz.redhat.com [10.15.17.196]) by smtp.corp.redhat.com (Postfix) with ESMTP id AC7541C5AA; Wed, 27 Jun 2018 18:16:14 +0000 (UTC) From: Sergio Durigan Junior To: Pedro Alves Cc: gdb-patches@sourceware.org Subject: Possible regression on gdb.multi/multi-arch-exec.exp (was: Re: [PATCH] Use thread_info and inferior pointers more throughout) References: <20180607180704.3991-1-palves@redhat.com> Date: Wed, 27 Jun 2018 18:16:00 -0000 In-Reply-To: <20180607180704.3991-1-palves@redhat.com> (Pedro Alves's message of "Thu, 7 Jun 2018 19:07:04 +0100") Message-ID: <87in649jtd.fsf@redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-IsSubscribed: yes X-SW-Source: 2018-06/txt/msg00660.txt.bz2 On Thursday, June 07 2018, Pedro Alves wrote: > This is more preparation bits for multi-target support. Hi Pedro, While preparing a new Fedora GDB rawhide release, I noticed a regression related to this commit. The curious thing is that I am only able to reproduce the regression on a Fedora Rawhide system; it doesn't happen on my Fedora 27 machine (initially I thought it might be related to GCC, but testing against GCC HEAD on my Fedora 27 machine also did not trigger the regression). The test failing is gdb.multi/multi-arch-exec.exp, and here's what I'm seeing: (gdb) break all_started Breakpoint 1 at 0x400848: file /home/sergio/build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.multi/multi-arch-exec.c, line 42. (gdb) run Starting program: /home/sergio/build/gdb/testsuite/outputs/gdb.multi/multi-arch-exec/1-multi-arch-exec [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". [New Thread 0x7ffff7476700 (LWP 1354)] Thread 1 "1-multi-arch-ex" hit Breakpoint 1, all_started () at /home/sergio/build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.multi/multi-arch-exec.c:42 42 } (gdb) delete breakpoints Delete all breakpoints? (y or n) y (gdb) info breakpoints No breakpoints or watchpoints. (gdb) break main Breakpoint 2 at 0x400862: file /home/sergio/build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.multi/multi-arch-exec.c, line 51. (gdb) thread 1 [Switching to thread 1 (Thread 0x7ffff7fdf740 (LWP 1350))] #0 all_started () at /home/sergio/build/gdb/testsuite/../../../binutils-gdb/gdb/testsuite/gdb.multi/multi-arch-exec.c:42 42 } (gdb) PASS: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=new: thread 1 set follow-exec-mode new (gdb) PASS: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=new: set follow-exec-mode new continue Continuing. [Thread 0x7ffff7476700 (LWP 1354) exited] process 1350 is executing new program: /home/sergio/build/gdb/testsuite/outputs/gdb.multi/multi-arch-exec/1-multi-arch-exec-hello [New inferior 2 (process 0)] [New process 1350] ../../binutils-gdb/gdb/target.c:3200: internal-error: gdbarch* default_thread_architecture(target_ops*, ptid_t): Assertion `inf != NULL' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) FAIL: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=new: continue across exec that changes architecture (GDB internal error) I spent some time investigating this, and here's what I've learned so far: 1) When infrun.c:handle_inferior_event_1 is called and deals with TARGET_WAITKIND_EXECD (around line 5275), it does: ... case TARGET_WAITKIND_EXECD: if (debug_infrun) fprintf_unfiltered (gdb_stdlog, "infrun: TARGET_WAITKIND_EXECD\n"); /* Note we can't read registers yet (the stop_pc), because we don't yet know the inferior's post-exec architecture. 'stop_pc' is explicitly read below instead. */ switch_to_thread_no_regs (ecs->event_thread); /* Do whatever is necessary to the parent branch of the vfork. */ handle_vfork_child_exec_or_exit (1); /* This causes the eventpoints and symbol table to be reset. Must do this now, before trying to determine whether to stop. */ follow_exec (inferior_ptid, ecs->ws.value.execd_pathname); // <---- #1 stop_pc = regcache_read_pc (get_thread_regcache (ecs->event_thread)); // <---- #2 ... 2) When follow_exec is called (#1 above), it does: ... /* The target reports the exec event to the main thread, even if some other thread does the exec, and even if the main thread was stopped or already gone. We may still have non-leader threads of the process on our list. E.g., on targets that don't have thread exit events (like remote); or on native Linux in non-stop mode if there were only two threads in the inferior and the non-leader one is the one that execs (and nothing forces an update of the thread list up to here). When debugging remotely, it's best to avoid extra traffic, when possible, so avoid syncing the thread list with the target, and instead go ahead and delete all threads of the process but one that reported the event. Note this must be done before calling update_breakpoints_after_exec, as otherwise clearing the threads' resources would reference stale thread breakpoints -- it may have been one of these threads that stepped across the exec. We could just clear their stepping states, but as long as we're iterating, might as well delete them. Deleting them now rather than at the next user-visible stop provides a nicer sequence of events for user and MI notifications. */ ALL_THREADS_SAFE (th, tmp) if (ptid_get_pid (th->ptid) == pid && !ptid_equal (th->ptid, ptid)) delete_thread (th); ... On my Fedora Rawhide box, delete_thread is being called to delete the same thread as ecs->event_thread. On my Fedora 27 machine, it deletes a different thread. 3) Back to handle_inferior_event_1, when #2 is called, ecs->event_thread points to an invalid object, which triggers the assertion. I haven't progressed much further (other things to wrap up), but I decided to get the ball rolling already. If you need access to a Fedora Rawhide VM, please let me know and I can provide this to you. Thanks, -- Sergio GPG key ID: 237A 54B1 0287 28BF 00EF 31F4 D0EB 7628 65FC 5E36 Please send encrypted e-mail if possible http://sergiodj.net/