From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 25933 invoked by alias); 23 Nov 2006 04:22:14 -0000 Received: (qmail 25921 invoked by uid 22791); 23 Nov 2006 04:22:12 -0000 X-Spam-Check-By: sourceware.org Received: from nile.gnat.com (HELO nile.gnat.com) (205.232.38.5) by sourceware.org (qpsmtpd/0.31) with ESMTP; Thu, 23 Nov 2006 04:22:06 +0000 Received: from localhost (localhost [127.0.0.1]) by filtered-nile.gnat.com (Postfix) with ESMTP id 808CB48CDB2 for ; Wed, 22 Nov 2006 23:22:04 -0500 (EST) Received: from nile.gnat.com ([127.0.0.1]) by localhost (nile.gnat.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 14970-01-6 for ; Wed, 22 Nov 2006 23:22:04 -0500 (EST) Received: from takamaka.act-europe.fr (unknown [70.71.0.212]) by nile.gnat.com (Postfix) with ESMTP id 1132848CC01 for ; Wed, 22 Nov 2006 23:22:04 -0500 (EST) Received: by takamaka.act-europe.fr (Postfix, from userid 1000) id EC36C34C099; Wed, 22 Nov 2006 20:22:27 -0800 (PST) Date: Thu, 23 Nov 2006 04:22:00 -0000 From: Joel Brobecker To: gdb@sourceware.org Subject: [x86_64-linux] ptrace (PT_STEP) causes 2 instruction step??? Message-ID: <20061123042227.GA22601@adacore.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.2i Mailing-List: contact gdb-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sourceware.org X-SW-Source: 2006-11/txt/msg00153.txt.bz2 Hello, I was wondering if anyone had seen something like this before. We're running GDB on an x86_64 chip running Linux. I've seen this behavior on SuSE and RedHat distributions. It's a bit weird because they started appearing one day, even with our old releases (ie tests that passed on the same machine with this release now stop passing on the same machine - no update done). This reproduces with all versions of GDB that I have tested: GDB 6.4 built by us, GDB 6.4 built by SuSE, and GDB from today's CVS. Here are the symptoms: I have a program were we're stopped at one instruction of a function. This is the return address from a function call, where we landed after doing a "finish". I simulated this part by inserting a breakpoint at that address and running to it. After that, I do a "next" and here is what I see: (gdb) b *0x401e41 Breakpoint 1 at 0x401e41: file x.adb, line 9. (gdb) run Starting program: /[...]/x Breakpoint 1, 0x0000000000401e41 in x () at x.adb:9 9 Z : constant Num := F; (gdb) n 0x0000000000401e4f in x () at x.adb:13 13 end X; The unexpected part is that GDB did not stop at the begining of a line, as evidenced by the address printed after the "next" has completed. Here is the assembly code: 0x00401e31 <_ada_x+0>: push %rbp 0x00401e32 <_ada_x+1>: mov %rsp,%rbp 0x00401e35 <_ada_x+4>: sub $0x10,%rsp 0x00401e39 <_ada_x+8>: mov %rbp,%r10 [line 9 starts here] 0x00401e3c <_ada_x+11>: callq 0x401e14 0x00401e41 <_ada_x+16>: movsd %xmm0,0xfffffffffffffff0(%rbp) 0x00401e46 <_ada_x+21>: mov 0xfffffffffffffff0(%rbp),%rax 0x00401e4a <_ada_x+25>: mov %rax,0xfffffffffffffff8(%rbp) [line 13 starts here] 0x00401e4e <_ada_x+29>: leaveq 0x00401e4f <_ada_x+30>: retq I expected GDB to stop at 0x00401e4e, which is the first instruction of line 13. At first sight, it looks like a malfunction of the kernel, because "set debug infrun 1" allows us to see how we get there: infrun: proceed (addr=0xffffffffffffffff, signal=144, step=1) infrun: resume (step=1, signal=0) infrun: wait_for_inferior infrun: infwait_normal_state infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x401e4a infrun: trap expected infrun: stepping inside range [0x401e39-0x401e4e] infrun: resume (step=1, signal=0) infrun: prepare_to_wait infrun: infwait_normal_state infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x401e4f infrun: stepped to a different function infrun: stop_stepping 0x0000000000401e4f in x () at x.adb:13 As you can see, each "resume (step=1,...)" causes the inferior to step *two* instruction instead of one. I looked at the code and traced it, and we seem to be doing everything right: The resume operation is turned into a call to "ptrace (PT_STEP, ...)" with the right arguments. It's then followed by a call to "wait". After the inferior stopped, we find that we're 2 instruction later. The behavior is actually relatively unpredictable. Sometimes, it works fine. I searched the internet a bit, and apparently this type of error has happened a while ago. Unfortunately, I lost the link, but the reports were saying that the problem they saw only occured in a very specific case, which is not the case here... Has anyone seen this before? Any clue? Surprisingly, all our x86_64-linux machines started showing these symptoms on the same day. All except one, which keeps working fine. -- Joel