From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 10883 invoked by alias); 22 Apr 2013 17:03:36 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 10873 invoked by uid 89); 22 Apr 2013 17:03:36 -0000 X-Spam-SWARE-Status: No, score=-8.6 required=5.0 tests=AWL,BAYES_00,KHOP_THREADED,RCVD_IN_DNSWL_HI,RCVD_IN_HOSTKARMA_W,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS,TW_FC autolearn=ham version=3.3.1 Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Mon, 22 Apr 2013 17:03:35 +0000 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r3MH2chl011254 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 22 Apr 2013 13:02:38 -0400 Received: from [127.0.0.1] (ovpn01.gateway.prod.ext.ams2.redhat.com [10.39.146.11]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r3MH2aVa017742; Mon, 22 Apr 2013 13:02:36 -0400 Message-ID: <51756D2B.5050204@redhat.com> Date: Tue, 23 Apr 2013 00:05:00 -0000 From: Pedro Alves User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130311 Thunderbird/17.0.4 MIME-Version: 1.0 To: Pedro Alves CC: lgustavo@codesourcery.com, "'gdb-patches@sourceware.org'" Subject: Re: [PATCH, ppc] Fix hw *points for embedded ppc in a threaded environment References: <516EC58C.5060501@codesourcery.com> <51755821.8020907@codesourcery.com> <51755A5F.3060009@redhat.com> In-Reply-To: <51755A5F.3060009@redhat.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-SW-Source: 2013-04/txt/msg00681.txt.bz2 On 04/22/2013 04:42 PM, Pedro Alves wrote: > On 04/22/2013 04:32 PM, Luis Machado wrote: > >> For the intermitent x86 failure, raising the usleep time a bit (1 to 5) seems to solve it. > >> I blame this on something other than GDB then. > > Blaming the test itself is plausible. However, the extra usleep time just > means the race window is smaller, not fixed, so still subject to the > occasional racy failure. I hacked the test and GDB with the patch below, mainly to zone in on the issue with a "smaller" test (less threads, less iterations, etc.) and here's what I see (the FAIL is almost always reproducible to me this way). In the times the test fails, there's one watchpoint that is missed. ptrace does report the memory as having been touched, but then GDB compares the watchpoint's old/new values and decides nothing changed, so re-resumes the target immediately. The patch adds the fprintf_unfiltered (gdb_stdlog, "breakpoint: nothing changed\n"); line to breakpoint.c so we see that breakpoint logic triggering in the gdb.log. In the times the test passes completely, we don't see that line in the log. Here's the relevant snippet of the gdb.log showing that watchpoint missed case: Thread 1 changing watch_thread[1] data from 6 -> 7 sigchld LLW: enter LNW: waitpid(-1, ...) returned 21376, ERRNO-OK LLW: waitpid 21376 received Trace/breakpoint trap (stopped) LLTA: KILL(SIG0) Thread 0x7ffff77ca700 (LWP 21376) (OK) LLW: Candidate event Trace/breakpoint trap (stopped) in Thread 0x7ffff77ca700 (LWP 21376). SC: kill Thread 0x7ffff7fcb700 (LWP 21375) **** SC: lwp kill 0 ERRNO-OK SC: kill Thread 0x7ffff7fcc740 (LWP 21362) **** SC: lwp kill 0 ERRNO-OK WL: waitpid Thread 0x7ffff7fcb700 (LWP 21375) received Trace/breakpoint trap (stopped) SWC: Pending event Trace/breakpoint trap (stopped) in Thread 0x7ffff7fcb700 (LWP 21375) WL: waitpid Thread 0x7ffff7fcc740 (LWP 21362) received Stopped (signal) (stopped) SWC: Delayed SIGSTOP caught for Thread 0x7ffff7fcc740 (LWP 21362). SEL: Found 2 SIGTRAP events, selecting #1 LLW: trap ptid is Thread 0x7ffff7fcb700 (LWP 21375). LLW: exit sigchld infrun: target_wait (-1, status) = infrun: 21362 [Thread 0x7ffff7fcb700 (LWP 21375)], infrun: status->kind = stopped, signal = SIGTRAP infrun: infwait_normal_state infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x3d25a0f680 infrun: context switch infrun: Switching context from Thread 0x7ffff77ca700 (LWP 21376) to Thread 0x7ffff7fcb700 (LWP 21375) infrun: BPSTAT_WHAT_SINGLE infrun: no stepping, continue infrun: resume (step=1, signal=0), trap_expected=1, current thread [Thread 0x7ffff7fcb700 (LWP 21375)] at 0x3d25a0f680 LLR: Preparing to step Thread 0x7ffff7fcb700 (LWP 21375), 0, inferior_ptid Thread 0x7ffff7fcb700 (LWP 21375) LLR: PTRACE_SINGLESTEP process 21375, 0 (resume event thread) sigchld infrun: prepare_to_wait linux_nat_wait: [process -1], [] LLW: enter LNW: waitpid(-1, ...) returned 21375, ERRNO-OK LLW: waitpid 21375 received Stopped (signal) (stopped) LLTA: KILL(SIG0) Thread 0x7ffff7fcb700 (LWP 21375) (OK) LLW: Delayed SIGSTOP caught for Thread 0x7ffff7fcb700 (LWP 21375). LLW: PTRACE_SINGLESTEP Thread 0x7ffff7fcb700 (LWP 21375), 0, 0 (discard SIGSTOP) LNW: waitpid(-1, ...) returned 21375, ERRNO-OK LLW: waitpid 21375 received Trace/breakpoint trap (stopped) LLTA: KILL(SIG0) Thread 0x7ffff7fcb700 (LWP 21375) (OK) LLW: Candidate event Trace/breakpoint trap (stopped) in Thread 0x7ffff7fcb700 (LWP 21375). SEL: Select single-step Thread 0x7ffff7fcb700 (LWP 21375) LLW: trap ptid is Thread 0x7ffff7fcb700 (LWP 21375). LLW: exit sigchld infrun: target_wait (-1, status) = infrun: 21362 [Thread 0x7ffff7fcb700 (LWP 21375)], infrun: status->kind = stopped, signal = SIGTRAP infrun: infwait_normal_state infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x3d25a068d2 infrun: no stepping, continue infrun: resume (step=0, signal=0), trap_expected=0, current thread [Thread 0x7ffff7fcb700 (LWP 21375)] at 0x3d25a068d2 LLR: Preparing to resume process 21362, 0, inferior_ptid Thread 0x7ffff7fcb700 (LWP 21375) RC: Not resuming sibling Thread 0x7ffff77ca700 (LWP 21376) (has pending) RC: Not resuming sibling Thread 0x7ffff7fcb700 (LWP 21375) (not stopped) RC: Resuming sibling Thread 0x7ffff7fcc740 (LWP 21362), 0, resume LLR: PTRACE_CONT process 21375, 0 (resume event thread) sigchld infrun: prepare_to_wait linux_nat_wait: [process -1], [] LLW: enter LLW: Using pending wait status Trace/breakpoint trap (stopped) for Thread 0x7ffff77ca700 (LWP 21376). LLW: Candidate event Trace/breakpoint trap (stopped) in Thread 0x7ffff77ca700 (LWP 21376). SC: kill Thread 0x7ffff7fcb700 (LWP 21375) **** SC: lwp kill 0 ERRNO-OK SC: kill Thread 0x7ffff7fcc740 (LWP 21362) **** SC: lwp kill 0 ERRNO-OK WL: waitpid Thread 0x7ffff7fcb700 (LWP 21375) received Trace/breakpoint trap (stopped) SWC: Pending event Trace/breakpoint trap (stopped) in Thread 0x7ffff7fcb700 (LWP 21375) WL: waitpid Thread 0x7ffff7fcc740 (LWP 21362) received Stopped (signal) (stopped) SWC: Delayed SIGSTOP caught for Thread 0x7ffff7fcc740 (LWP 21362). SEL: Found 2 SIGTRAP events, selecting #1 LLW: trap ptid is Thread 0x7ffff7fcb700 (LWP 21375). LLW: exit sigchld infrun: target_wait (-1, status) = infrun: 21362 [Thread 0x7ffff7fcb700 (LWP 21375)], infrun: status->kind = stopped, signal = SIGTRAP infrun: infwait_normal_state infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x3d25a0f680 infrun: BPSTAT_WHAT_SINGLE infrun: no stepping, continue infrun: resume (step=1, signal=0), trap_expected=1, current thread [Thread 0x7ffff7fcb700 (LWP 21375)] at 0x3d25a0f680 LLR: Preparing to step Thread 0x7ffff7fcb700 (LWP 21375), 0, inferior_ptid Thread 0x7ffff7fcb700 (LWP 21375) LLR: PTRACE_SINGLESTEP process 21375, 0 (resume event thread) sigchld infrun: prepare_to_wait linux_nat_wait: [process -1], [] LLW: enter LNW: waitpid(-1, ...) returned 21375, ERRNO-OK LLW: waitpid 21375 received Stopped (signal) (stopped) LLTA: KILL(SIG0) Thread 0x7ffff7fcb700 (LWP 21375) (OK) LLW: Delayed SIGSTOP caught for Thread 0x7ffff7fcb700 (LWP 21375). LLW: PTRACE_SINGLESTEP Thread 0x7ffff7fcb700 (LWP 21375), 0, 0 (discard SIGSTOP) LNW: waitpid(-1, ...) returned 21375, ERRNO-OK LLW: waitpid 21375 received Trace/breakpoint trap (stopped) LLTA: KILL(SIG0) Thread 0x7ffff7fcb700 (LWP 21375) (OK) LLW: Candidate event Trace/breakpoint trap (stopped) in Thread 0x7ffff7fcb700 (LWP 21375). SEL: Select single-step Thread 0x7ffff7fcb700 (LWP 21375) LLW: trap ptid is Thread 0x7ffff7fcb700 (LWP 21375). LLW: exit sigchld infrun: target_wait (-1, status) = infrun: 21362 [Thread 0x7ffff7fcb700 (LWP 21375)], infrun: status->kind = stopped, signal = SIGTRAP infrun: infwait_normal_state infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x3d25a130a5 infrun: no stepping, continue infrun: resume (step=0, signal=0), trap_expected=0, current thread [Thread 0x7ffff7fcb700 (LWP 21375)] at 0x3d25a130a5 LLR: Preparing to resume process 21362, 0, inferior_ptid Thread 0x7ffff7fcb700 (LWP 21375) RC: Not resuming sibling Thread 0x7ffff77ca700 (LWP 21376) (has pending) RC: Not resuming sibling Thread 0x7ffff7fcb700 (LWP 21375) (not stopped) RC: Resuming sibling Thread 0x7ffff7fcc740 (LWP 21362), 0, resume LLR: PTRACE_CONT process 21375, 0 (resume event thread) sigchld infrun: prepare_to_wait linux_nat_wait: [process -1], [] LLW: enter LLW: Using pending wait status Trace/breakpoint trap (stopped) for Thread 0x7ffff77ca700 (LWP 21376). LLW: Candidate event Trace/breakpoint trap (stopped) in Thread 0x7ffff77ca700 (LWP 21376). SC: kill Thread 0x7ffff7fcb700 (LWP 21375) **** SC: lwp kill 0 ERRNO-OK SC: kill Thread 0x7ffff7fcc740 (LWP 21362) **** SC: lwp kill 0 ERRNO-OK WL: waitpid Thread 0x7ffff7fcb700 (LWP 21375) received Trace/breakpoint trap (stopped) SWC: Pending event Trace/breakpoint trap (stopped) in Thread 0x7ffff7fcb700 (LWP 21375) WL: waitpid Thread 0x7ffff7fcc740 (LWP 21362) received Stopped (signal) (stopped) SWC: Delayed SIGSTOP caught for Thread 0x7ffff7fcc740 (LWP 21362). SEL: Found 2 SIGTRAP events, selecting #0 CB: Push back breakpoint for Thread 0x7ffff7fcb700 (LWP 21375) LLW: trap ptid is Thread 0x7ffff77ca700 (LWP 21376). LLW: exit sigchld infrun: target_wait (-1, status) = infrun: 21362 [Thread 0x7ffff77ca700 (LWP 21376)], infrun: status->kind = stopped, signal = SIGTRAP infrun: infwait_normal_state infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x400aa0 infrun: stopped by watchpoint infrun: stopped data address = 0x6012e4 infrun: context switch infrun: Switching context from Thread 0x7ffff7fcb700 (LWP 21375) to Thread 0x7ffff77ca700 (LWP 21376) breakpoint: nothing changed I suspect the reason this doesn't trigger on ppc may be related to the fact that x86 watchpoints are continuable while ppc reports the watchpoint hit before the memory changes (thus triggering a different sequence of events, as gdb single-steps the thread that triggers the watchpoint to get the new value). I haven't really examined the log to try to figure out why is really going wrong. Seems plausible to me that GDB could be at fault. BTW, the test has lots of weird indenting. I re-indented it automatically with emacs, so the below is a "git diff -w" patch. BTW, what's NR_BYTES supposed to be? See patch. diff --git c/gdb/breakpoint.c w/gdb/breakpoint.c index 2757c6b..d33842a 100644 --- c/gdb/breakpoint.c +++ w/gdb/breakpoint.c @@ -4807,6 +4807,7 @@ watchpoint_check (void *p) } else { + fprintf_unfiltered (gdb_stdlog, "breakpoint: nothing changed\n"); /* Nothing changed. */ value_free_to_mark (mark); return WP_VALUE_NOT_CHANGED; diff --git c/gdb/testsuite/gdb.threads/wp-replication.c w/gdb/testsuite/gdb.threads/wp-replication.c index 4d79d56..8f0c5d2 100644 --- c/gdb/testsuite/gdb.threads/wp-replication.c +++ w/gdb/testsuite/gdb.threads/wp-replication.c @@ -44,7 +44,7 @@ int test_ready = 0; int watch_count_done = 0; /* Array with elements we can create watchpoints for. */ -static int watched_data[NR_BYTES]; +static int watched_data[NR_TRIGGERS_PER_THREAD * NR_THREADS]; pthread_mutex_t data_mutex; /* Wait function to keep threads busy while the testcase does @@ -121,11 +121,11 @@ thread_function (void *arg) while (!test_ready) usleep (1); + pthread_mutex_lock (&data_mutex); for (i = 0; i < NR_TRIGGERS_PER_THREAD; ++i) { for (j = 0; j < NR_THREADS; j++) { - pthread_mutex_lock (&data_mutex); /* For debugging. */ printf ("Thread %ld changing watch_thread[%d] data" " from %d -> %d\n", thread_number, j, @@ -134,11 +134,11 @@ thread_function (void *arg) the pc is still on the same line. */ /* Increment the watched data field. */ ++watched_data[j]; - usleep (1); + usleep (5); - pthread_mutex_unlock (&data_mutex); } } + pthread_mutex_unlock (&data_mutex); pthread_exit (NULL); } diff --git c/gdb/testsuite/gdb.threads/wp-replication.exp w/gdb/testsuite/gdb.threads/wp-replication.exp index 449423f..937c9e6 100644 --- c/gdb/testsuite/gdb.threads/wp-replication.exp +++ w/gdb/testsuite/gdb.threads/wp-replication.exp @@ -19,8 +19,8 @@ # threads when the hardware watchpoint is created. -set NR_THREADS 4 -set NR_TRIGGERS_PER_THREAD 10 +set NR_THREADS 2 +set NR_TRIGGERS_PER_THREAD 4 set NR_BYTES 100 # This test verifies that a hardware watchpoint gets replicated to @@ -105,6 +105,9 @@ gdb_test "break thread_started" \ "Breakpoint \[0-9\]+ at .*: file .*${srcfile}, line .*" \ "Breakpoint on thread_started" +verbose -log "hwatch_count = $hwatch_count" +verbose -log "NR_THREADS = $NR_THREADS" + # Move all threads to where they're supposed to be for testing. for { set i 0 } { $i < $NR_THREADS } { incr i} { @@ -117,6 +120,8 @@ for { set i 0 } { $i < $NR_THREADS } { incr i} { gdb_test "watch watched_data\[$i\]" \ "Hardware watchpoint .*" \ "watch watched_data\[$i\]" + } else { + verbose -log "not setting watchpoint for watched_data\[$i\]\n" } gdb_test continue "Continuing\\..*Breakpoint \[0-9\]+, thread_started \\(\\) at .*$srcfile.*" \ @@ -131,6 +136,9 @@ gdb_test_no_output "set var test_ready=1" \ # Set the number of expected watchpoint triggers. set TRIGGERS [expr "$NR_THREADS * $NR_THREADS * $NR_TRIGGERS_PER_THREAD"] +gdb_test_no_output "set debug infrun 1" +gdb_test_no_output "set debug lin-lwp 1" + # Move the threads and hit the watchpoints # TRIGGERS times. for { set i 0 } { $i < $TRIGGERS } { incr i} {