From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 30135 invoked by alias); 27 May 2016 13:35:42 -0000 Mailing-List: contact gdb-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sourceware.org Received: (qmail 30080 invoked by uid 89); 27 May 2016 13:35:42 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.9 required=5.0 tests=BAYES_00,KAM_LAZY_DOMAIN_SECURITY autolearn=no version=3.3.2 spammy=mdt, dmb, ptrace, svc X-HELO: usplmg20.ericsson.net Received: from usplmg20.ericsson.net (HELO usplmg20.ericsson.net) (198.24.6.45) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Fri, 27 May 2016 13:35:40 +0000 Received: from EUSAAHC006.ericsson.se (Unknown_Domain [147.117.188.90]) by usplmg20.ericsson.net (Symantec Mail Security) with SMTP id C0.4E.09012.C7448475; Fri, 27 May 2016 14:58:36 +0200 (CEST) Received: from [142.133.110.144] (147.117.188.8) by smtp-am.internal.ericsson.com (147.117.188.92) with Microsoft SMTP Server id 14.3.294.0; Fri, 27 May 2016 09:35:37 -0400 Subject: Re: Debugging return.exp on ARM To: Pedro Alves , References: <574712FC.5090409@ericsson.com> CC: Yao Qi From: Simon Marchi Message-ID: <57484D29.4020800@ericsson.com> Date: Fri, 27 May 2016 13:35:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2016-05/txt/msg00024.txt.bz2 On 16-05-26 03:11 PM, Pedro Alves wrote: Thanks for the suggestions. > - I'd suspect something odd with caches / barriers too. > Did you try sprinkling in memory barrier instructions, and > see whether it makes a difference? I tried to put some dmb a bit everywhere, it didn't help. > - I'd also try "si" + "info regs" instead of "next" after the return, > and see if a register with a bad value pops up always at some > specific instruction. Good point. If I replace next with si, only the vmov.f64 d7, d0 gets executed. So if everything goes well, I should have the "right" value in both d0 and d7. I made a more focused reproducer, see below. > - I'd try to see if pinning the thread to a core makes a difference. Indeed, pinning GDB to a single CPU makes it work (as in the result is right) every time. As far as I can tell, pinning the inferior has no effect (I am not sure i worked, but I used "set exec-wrapper taskset 0xffffffff" to reset the affinity). > - Might help to show the kernel version. ODroid: Linux odroid 3.10.96+ #5 SMP PREEMPT Thu May 26 15:03:58 EDT 2016 armv7l armv7l armv7l GNU/Linux Firefly: Linux firefly 3.10.0 #40 SMP PREEMPT Tue Jan 27 16:12:04 CST 2015 armv7l armv7l armv7l GNU/Linux I also reproduced it on my Rasp Pi 2, which has: Linux alarmpi 4.4.8-2-ARCH #1 SMP Tue Apr 26 19:14:58 MDT 2016 armv7l GNU/Linux So here's another case that reproduces the problem, but without a memory read, so it isolates the problem a bit more. It verifies whether the thread sees our register write or not. test.S: .global _start _start: vldr.64 d0, constante vldr.64 d1, constante break_here: vcmp.f64 d0, d1 vmrs APSR_nzcv, fpscr # Exit code moveq r0, #1 movne r0, #0 # Exit syscall mov r7, #1 svc 0 .align 8 constante: .word 0xc8b43958 .word 0x40594676 Built with: $ gcc -g3 -O0 -o test test.S -nostdlib And the gdb script test.gdb: file test b break_here run p $d0 = 4.0 c The test is ran with $ ./gdb -nx -x test.gdb -batch The test loads the same constant in d0 and d1. It then does a comparison between them and exits with 1 (failure) if they are the same, 0 (success) if they are different. The GDB script breaks at "break_here", tries to change the value of d0 to some other constant (4.0) and lets the program continue and exit. If our register write succeeded, the program should exit with 0 (values are different). If our register write failed, the program will exit with 1 (values are still the same). The result is that I randomly see both cases, hinting that the race is really between the register write through ptrace and the kernel restoring the thread's vfp registers. Again, pinning GDB to a single code seems to hide/bypass the bug. Simon