From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-26910-listarch-gdb=sources.redhat.com@sourceware.org>
Received: (qmail 10073 invoked by alias); 23 Nov 2006 08:35:31 -0000
Received: (qmail 10062 invoked by uid 22791); 23 Nov 2006 08:35:30 -0000
X-Spam-Check-By: sourceware.org
Received: from smtp-vbr2.xs4all.nl (HELO smtp-vbr2.xs4all.nl) (194.109.24.22)     by sourceware.org (qpsmtpd/0.31) with ESMTP; Thu, 23 Nov 2006 08:35:24 +0000
Received: from webmail.xs4all.nl (dovemail6.xs4all.nl [194.109.26.8]) 	by smtp-vbr2.xs4all.nl (8.13.8/8.13.8) with ESMTP id kAN8ZFVq025260; 	Thu, 23 Nov 2006 09:35:20 +0100 (CET) 	(envelope-from mark.kettenis@xs4all.nl)
Received: from 192.87.1.22         (SquirrelMail authenticated user sibelius)         by webmail.xs4all.nl with HTTP;         Thu, 23 Nov 2006 09:35:20 +0100 (CET)
Message-ID: <4303.192.87.1.22.1164270920.squirrel@webmail.xs4all.nl>
In-Reply-To: <20061123042227.GA22601@adacore.com>
References:  <20061123042227.GA22601@adacore.com>
Date: Thu, 23 Nov 2006 08:35:00 -0000
Subject: Re: [x86_64-linux] ptrace (PT_STEP) causes 2 instruction step???
From: "Mark Kettenis" <mark.kettenis@xs4all.nl>
To: "Joel Brobecker" <brobecker@adacore.com>
Cc: gdb@sourceware.org
User-Agent: SquirrelMail/1.4.8
MIME-Version: 1.0
Content-Type: text/plain;charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Mailing-List: contact gdb-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb.sourceware.org>
List-Subscribe: <mailto:gdb-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb/>
List-Post: <mailto:gdb@sourceware.org>
List-Help: <mailto:gdb-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-owner@sourceware.org
X-SW-Source: 2006-11/txt/msg00155.txt.bz2

>  Hello,
>
>  I was wondering if anyone had seen something like this before.
>  We're running GDB on an x86_64 chip running Linux. I've seen
>  this behavior on SuSE and RedHat distributions. It's a bit weird
>  because they started appearing one day, even with our old releases
>  (ie tests that passed on the same machine with this release now
>  stop passing on the same machine - no update done).
>
>  This reproduces with all versions of GDB that I have tested: GDB 6.4
>  built by us, GDB 6.4 built by SuSE, and GDB from today's CVS.
>
>  Here are the symptoms: I have a program were we're stopped at one
>  instruction of a function. This is the return address from a function
>  call, where we landed after doing a "finish". I simulated this part
>  by inserting a breakpoint at that address and running to it.  After that,
>  I do a "next" and here is what I see:
>
>      (gdb) b *0x401e41
>      Breakpoint 1 at 0x401e41: file x.adb, line 9.
>      (gdb) run
>      Starting program: /[...]/x
>
>      Breakpoint 1, 0x0000000000401e41 in x () at x.adb:9
>      9          Z : constant Num := F;
>      (gdb) n
>      0x0000000000401e4f in x () at x.adb:13
>      13      end X;
>
>  The unexpected part is that GDB did not stop at the begining of
>  a line, as evidenced by the address printed after the "next" has
>  completed.
>
>  Here is the assembly code:
>
>          0x00401e31 <_ada_x+0>:  push   %rbp
>          0x00401e32 <_ada_x+1>:  mov    %rsp,%rbp
>          0x00401e35 <_ada_x+4>:  sub    $0x10,%rsp
>          0x00401e39 <_ada_x+8>:  mov    %rbp,%r10
>      [line 9 starts here]
>          0x00401e3c <_ada_x+11>: callq  0x401e14 <x__f.0>
>          0x00401e41 <_ada_x+16>: movsd  %xmm0,0xfffffffffffffff0(%rbp)
>          0x00401e46 <_ada_x+21>: mov    0xfffffffffffffff0(%rbp),%rax
>          0x00401e4a <_ada_x+25>: mov    %rax,0xfffffffffffffff8(%rbp)
>      [line 13 starts here]
>          0x00401e4e <_ada_x+29>: leaveq
>          0x00401e4f <_ada_x+30>: retq
>
>  I expected GDB to stop at 0x00401e4e, which is the first instruction
>  of line 13.
>
>  At first sight, it looks like a malfunction of the kernel, because
>  "set debug infrun 1" allows us to see how we get there:
>
>      infrun: proceed (addr=0xffffffffffffffff, signal=144, step=1)
>      infrun: resume (step=1, signal=0)
>      infrun: wait_for_inferior
>      infrun: infwait_normal_state
>      infrun: TARGET_WAITKIND_STOPPED
>      infrun: stop_pc = 0x401e4a
>      infrun: trap expected
>      infrun: stepping inside range [0x401e39-0x401e4e]
>      infrun: resume (step=1, signal=0)
>      infrun: prepare_to_wait
>      infrun: infwait_normal_state
>      infrun: TARGET_WAITKIND_STOPPED
>      infrun: stop_pc = 0x401e4f
>      infrun: stepped to a different function
>      infrun: stop_stepping
>      0x0000000000401e4f in x () at x.adb:13
>
>  As you can see, each "resume (step=1,...)" causes the inferior
>  to step *two* instruction instead of one. I looked at the code
>  and traced it, and we seem to be doing everything right: The
>  resume operation is turned into a call to "ptrace (PT_STEP, ...)"
>  with the right arguments. It's then followed by a call to "wait".
>  After the inferior stopped, we find that we're 2 instruction later.
>
>  The behavior is actually relatively unpredictable. Sometimes, it
>  works fine.
>
>  I searched the internet a bit, and apparently this type of error
>  has happened a while ago. Unfortunately, I lost the link, but the
>  reports were saying that the problem they saw only occured in a
>  very specific case, which is not the case here...
>
>  Has anyone seen this before? Any clue? Surprisingly, all our
>  x86_64-linux machines started showing these symptoms on the same
>  day.  All except one, which keeps working fine.

This must be a kernel bug of some sorts.  Was the kernel on those machines
updated?

Are you perhaps running vmware on those machines?  My amd64 box at work
seems to do something similar from time to time when I have it running
(random testfailures) but everything seems normal if I close vmware.

Anyway, it is almost certainly something we (GDB developers) can't do
anything about.

Mark