From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca by simark.ca with LMTP id kQBRO52UlWCmTAAAWB0awg (envelope-from ) for ; Fri, 07 May 2021 15:27:25 -0400 Received: by simark.ca (Postfix, from userid 112) id E3B811F11C; Fri, 7 May 2021 15:27:25 -0400 (EDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on simark.ca X-Spam-Level: X-Spam-Status: No, score=-1.1 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,MAILING_LIST_MULTI,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPS id 43DBE1E783 for ; Fri, 7 May 2021 15:27:24 -0400 (EDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7EA2D3857C75; Fri, 7 May 2021 19:27:23 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7EA2D3857C75 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1620415643; bh=hZRr1L4QZLfvBR0sataLn0Z7HsLZAuh2H+7xsazAfto=; h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=TNSJ6FrYCSK/LYEHb9y6KeMLTzwA7vnnSmrf9MXvf4WQjFpLXw7Deaz9tNBO8l+9u fAPNLRS7FjQK7gOlrCAGvDKvjQFCyfDZ5VFh3byvHoFfjw92WQrGIUrj/IrkMkePJz e3dJdVcgSTIlTbo2rhMVLaEh2POW20VqeC5Q8v14= Received: from smtp.polymtl.ca (smtp.polymtl.ca [132.207.4.11]) by sourceware.org (Postfix) with ESMTPS id 7B0933857C75 for ; Fri, 7 May 2021 19:27:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 7B0933857C75 Received: from simark.ca (simark.ca [158.69.221.121]) (authenticated bits=0) by smtp.polymtl.ca (8.14.7/8.14.7) with ESMTP id 147JRCYK025459 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 7 May 2021 15:27:17 -0400 DKIM-Filter: OpenDKIM Filter v2.11.0 smtp.polymtl.ca 147JRCYK025459 Received: from [10.0.0.11] (192-222-157-6.qc.cable.ebox.net [192.222.157.6]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPSA id 89BCD1E783; Fri, 7 May 2021 15:27:12 -0400 (EDT) Subject: Re: [PATCH][gdb/tdep] Use pid to choose x86_64 process 64/32-bitness To: Tom de Vries , gdb-patches@sourceware.org References: <20210507084402.GA14817@delia> Message-ID: Date: Fri, 7 May 2021 15:27:12 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 MIME-Version: 1.0 In-Reply-To: <20210507084402.GA14817@delia> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Poly-FromMTA: (simark.ca [158.69.221.121]) at Fri, 7 May 2021 19:27:12 +0000 X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Simon Marchi via Gdb-patches Reply-To: Simon Marchi Errors-To: gdb-patches-bounces@sourceware.org Sender: "Gdb-patches" On 2021-05-07 4:44 a.m., Tom de Vries wrote: > Hi, > > In a linux kernel mailing list discussion, it was mentioned that "gdb has > this odd thing where it takes the 64-bit vs 32-bit data for the whole process > from one thread, and picks the worst possible thread to do it (ie explicitly > not even the main thread, ...)" [1]. > > The picking of the thread is done here in > x86_linux_nat_target::read_description: > ... > /* GNU/Linux LWP ID's are process ID's. */ > tid = inferior_ptid.lwp (); > if (tid == 0) > tid = inferior_ptid.pid (); /* Not a threaded program. */ > ... > > To understand what this code does, let's investigate a scenario in which > inferior_ptid.lwp () != inferior_ptid.pid (). > > Say we start exec jit-attach-pie, identified with pid x. The main thread > starts another thread that sleeps, and then the main thread waits for the > sleeping thread. So we have two threads, identified with LWP IDs x and x+1: > ... > PID LWP CMD > x x ./jit-attach-pie > x x+1 ./jit-attach-pie > ... > [ The thread with LWP x is known as the thread group leader. ] > > When attaching to this exec using the pid, gdb does a stop_all_threads which > iterates over all the threads, first LWP x, and then LWP x+1. > > So the state we arrive with at x86_linux_nat_target::read_description is: > ... > (gdb) p inferior_ptid > $1 = {m_pid = x, m_lwp = x+1, m_tid = 0} > ... > and consequently we probe 64/32-bitness from thread LWP x+1. > > [ Note that this is different from when gdb doesn't attach but instead > launches the exec itself, in which case there's no stop_all_threads needed, > and the probed thread is LWP x. ] Well, it's mostly because when running, there's only one thread to begin with, and the io_uring threads are likely not there yet. > According to aforementioned remark, a better choice would have been the main > thread, that is, LWP x. That doesn't tell the whole story though. Up to now, probing the non-main thread was maybe a slightly strange thing to do, but it worked because all threads were set up the same way for that matter. It worked under the assumption that all threads had the same CS/DS contents (the registers we use to detect architecture). The failure happens when there are io_uring threads present. These threads are of a strange kind because they are like any userspace threads, except that they never return to userspace. Because of that, they don't have a meaningful userspace register state, reading their registers returns garbage. So if we choose one of these threads for probing the architecture of the process (which, as you mentioned below, is bogus, but that's how it works now), then we get bad results. >From what I read (from the lkml thread you linked), the kernel's plan is to mitigate it by filling the register state of these threads to something GDB is happy with, even though that's unnecessary otherwise. I don't know what's the state of that though. But I agree it would be good to fix this on our side as well. > > This patch implement that choice, by simply doing: > ... > tid = inferior_ptid.pid (); I think that's ok. > ... > > The fact that gdb makes a per-process choice for 64/32-bitness is a problem in > itself: each thread can be in either 64 or 32 bit mode. That is a problem > that this patch doesn't fix. Not only that, but from what I understood, it's not right to decide once and for all whether a thread is 32 or 64. It could switch between the two modes. So all we can really know is what mode a thread is currently, while it is stopped. But who knows what mode it's going to be in 5 instructions from now. This actually is a problem to people who debug early boot code with QEMU, because there the "thread" goes from 32 to 64 bit mode at some point. To handle this, it sounds like GDB should re-evaluate the architecture of each thread every time it stops. That sounds expensive, but maybe this mode could exist behind a knob that is disabled by default, for these special cases. There might also be features that can only work if we assume the architecture of a thread never changes, I'm not sure. Also, we already have kind of partial machinery for having threads with different architectures, that's why we have target_ops::thread_architecture. But read_description is used to set inferior::gdbarch, which is kind of the "default" arch of the process, used for many things. And the default implementation of thread_architecture, process_stratum_target::thread_architecture, just returns inf->gdbarch. So in practice, today, for Linux it's true that we have one single architecture for the whole inferior. The exception is aarch64-linux, which can return different architectures for different SVE configurations. > Tested on x86_64-linux. > > [1] https://lore.kernel.org/io-uring/\ > CAHk-=wh0KoEZXPYMGkfkeVEerSCEF1AiCZSvz9TRrx=Kj74D+Q@mail.gmail.com/ Hmm, the link is not clickable / does not lead to the right place, because it is split in two. > > Any comments? 1. I think it would be good to have a test for this. I wanted to actually try making (or finding) a program that uses io_uring and see the problem with my own eyes, but I didn't have time yet. But there could be such a test in our testsuite (we would check that the kernel actually supports io_uring, of course). We could validate that when we select one of the io threads and do things like backtrace, the failure is handled gracefully. 2. There are other architectures than x86-64 where it looks like we would also probe a non-main thread, like ARM: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/arm-linux-nat.c;h=662dade0a122f8adf25da91577b1de375df7785b;hb=HEAD It might not fail like it does on x86-64, it depends on the kernel implementation. But we should probably change them the same way. 3. Have you tested attaching to a program where the main thread has exited? It might still work (and there might be a test for it in our testsuite) because the thread group leader stays around as a zombie as long as other threads exist. But it would be good to test it. Simon