From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 41436 invoked by alias); 21 Feb 2019 19:34:19 -0000 Mailing-List: contact gdb-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sourceware.org Received: (qmail 41419 invoked by uid 89); 21 Feb 2019 19:34:18 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 spammy=noticeable X-HELO: mail-wr1-f52.google.com Received: from mail-wr1-f52.google.com (HELO mail-wr1-f52.google.com) (209.85.221.52) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 21 Feb 2019 19:34:17 +0000 Received: by mail-wr1-f52.google.com with SMTP id i12so31996915wrw.0 for ; Thu, 21 Feb 2019 11:34:16 -0800 (PST) Return-Path: Received: from [192.168.1.108] ([46.50.0.64]) by smtp.gmail.com with ESMTPSA id j124sm10931335wmb.48.2019.02.21.11.34.13 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 21 Feb 2019 11:34:14 -0800 (PST) Subject: Re: "finish" command leads to SIGTRAP To: John Baldwin , David Griffiths References: <78e1f522-f6f5-d38d-0644-d083c1e4ab5d@redhat.com> <743edbbc-9812-c8e7-0f47-7b4842199b48@redhat.com> <863e96ac-83c4-feb3-e412-95b647d18201@FreeBSD.org> Cc: gdb@sourceware.org From: Pedro Alves Message-ID: <9e24e676-ff37-bf4b-3fd0-a9fda0798abb@redhat.com> Date: Thu, 21 Feb 2019 19:34:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <863e96ac-83c4-feb3-e412-95b647d18201@FreeBSD.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-SW-Source: 2019-02/txt/msg00054.txt.bz2 Hi John, Thanks for stepping in. On 02/21/2019 06:49 PM, John Baldwin wrote: > On 2/21/19 9:50 AM, Pedro Alves wrote: >> I wonder what other kernels, like e.g., FreeBSD do here? > > FreeBSD also fails (and in the last year we had a set of changes to rework > TF handling in the kernel to boot). This doesn't look trivial to solve. > To get the exception you have to have TF set in %rflags/%eflags, but that > means it is set when the pushf writes to the stack. I think what would > have to happen (ugh) is that the kernel needs to recognize that the DB# > fault is due to a pushf instruction and that if the TF was a "shadow" TF > due to ptrace it needs to clear TF from the value written on the stack as > part of the fault handler. > >> Guess if GDB is to workaround this, it'll have to either add >> special treatment for this instruction (emulate, step over with a software >> breakpoints, something like that), or clear TF manually after >> single-stepping. :-/ > > I suspect it will be common for kernels to have this bug because the CPU > will always write a value onto the stack with TF set as part of > executing the instruction. A workaround in GDB would be much like what I > described above with the advantage that GDB actually knows it is stepping a > pushf before it steps it, so it can know to rewrite the value on the > stack after it gets the SIGTRAP for the single step over the pushf. > > This may actually be hard for a kernel to get right as at the time of the > fault we don't get anything that says how long the faulting instruction was, > etc. Thus, just looking at the byte before the current eip/rip in a DB# > fault handler for the pushf opcode (I believe it's a single byte) can get > false positives because you might have stepped over a mov instruction with > an immediate whose last byte happens to be the opcode, etc. I can think of other workarounds potentially possible: #1 - emulate the instruction: i.e., if you know you're stepping a pushf instruction, you could instead push the flags state on the stack yourself manually, advance the PC, and then raise a fake trap. Could be done by the kernel, or gdb. Fixing it on the kernel side should be more efficient, and fixes it for all debuggers. While fixing it on the debugger side fixes it for all kernels... #2 - if you know you're stepping a pushf instruction, set a breakpoint after it, and PTRACE_CONTINUE instead of stepping. (that's the software single-step workaround mentioned earlier). #3 - have gdb always clear TF after a single-step. This is the easiest, even if the "less technically cool" solution. This would mean that it'd be impossible to debug a program that sets the trace flag manually. I've actually once co-wrote an in-process x86 debug stub, and in that use case preserving TF mattered, made it possible to debug that stub... Quite a niche use case, though, and it'd have been trivial for me for hack gdb for that special use case, of course. In order for GDB to know whether it is stepping a pushf instruction, it needs to read the memory at PC, which has a cost, but maybe it's negligible if we already end up reading memory anyway (because of the code cache), but I'm not sure we already do. This can have a more noticeable effect with remote debugging (which should weigh on whether to do the workaround at the infrun.c level, or in the target backend (thus in gdbserver when remote). Solution #3 would require extra ptrace commands anyway (read-modify-write the flags), so it may end up being less performant, if #1 and #2 already hit the code cache. There are some extra complications around #1 and #2 for gdbserver, because we need to consider the cases when gdbserver handles single-stepping without roundtripping to gdb: - range-stepping - stepping over breakpoints/tracepoints Thanks, Pedro Alves