From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 25107 invoked by alias); 23 Sep 2014 09:58:46 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 25029 invoked by uid 89); 23 Sep 2014 09:58:45 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Tue, 23 Sep 2014 09:58:40 +0000 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s8N9wbCM007931 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 23 Sep 2014 05:58:37 -0400 Received: from [127.0.0.1] (ovpn01.gateway.prod.ext.ams2.redhat.com [10.39.146.11]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id s8N9wZJQ023832; Tue, 23 Sep 2014 05:58:35 -0400 Message-ID: <5421444A.4040400@redhat.com> Date: Tue, 23 Sep 2014 09:58:00 -0000 From: Pedro Alves User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.0 MIME-Version: 1.0 To: Yao Qi CC: gdb-patches@sourceware.org Subject: Re: [PATCH] Honour SIGILL and SIGSEGV in cancel breakpoint References: <1410696393-29327-1-git-send-email-yao@codesourcery.com> <54182945.7090300@redhat.com> <87mw9xzmlr.fsf@codesourcery.com> <541C6208.3080805@redhat.com> <871tr2ybg0.fsf@codesourcery.com> In-Reply-To: <871tr2ybg0.fsf@codesourcery.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-SW-Source: 2014-09/txt/msg00674.txt.bz2 On 09/23/2014 09:42 AM, Yao Qi wrote: > Pedro Alves writes: > >>> count_events_callback and select_event_lwp_callback in GDBServer need to >>> honour SIGILL and SIGSEGV too. I write a patch to call >>> lp_status_is_sigtrap_like_event in them, but regression test result >>> isn't changed, which is a surprise to me. I thought some fails should >>> be fixed. I'll look into it deeply. >> >> Maybe you're getting lucky with scheduling. >> pthreads.exp and schedlock.exp I think are the most sensitive to this. > > I run them ten times, the results aren't changed. > >> >> See: >> https://www.sourceware.org/ml/gdb-patches/2001-06/msg00250.html > > Randomly selecting event lwp was added in the url above you gave, to > prevent the starvation of threads. However, in my configuration > (arm-linux with SIGILL), event lwp selection does nothing, but no test > fails are caused. GDBserver processes events like this: > > 1. When GDBServer gets a breakpoint event from waitpid (-1, ), > 2. GDBserver will stop_all_lwps, in which wait_for_sigstop will drain > all pending reports from kernel. > 3. GDBserver selects one lwp and cancels the breakpoint on the rest. If > event lwp selection does nothing, it is the lwp GDBserver gets in step 1. > 4. GDBserver steps over the breakpoint, and resumes all the threads. > Go back to step 1, wait until any threads hit breakpoint, > > As we can see, if waitpid (-1, ) (in step #1) returns event lwp randomly, > we don't have to randomly select event lwp again in step #3. IMO, it is > naturally random that one thread hits the breakpoint first in a > multi-thread program. Depends on scheduling. When the program is resumed, the thread that had last hit the breakpoint may manage to be scheduled before other threads manage to be scheduled and hit a breakpoint themselves. > That is the reason why no test fails are caused > without event lwp selection in my experiments. IOW, on the platform > that waitpid (-1, ) returns event lwp randomly, > we don't need such lwp > random selection at all. However, if waitpid kernel implementation > always iterate over a list children in the fixed order, it is possible > that event of the lwp in the front of the list is reported and the rest > lwps may be starved. In this case, we still have to reply on random > selection inside GDB/GDBserver to avoid starvation. I'm looking at kernel/exit.c on the Linux kernel's sources I have handy (14186fea0cb06bc43181ce239efe0df6f1af260a), specifically at do_wait() / do_wait_thread() / ptrace_do_wait() and it seems to me that waitpid always walks the task list in the same order: set_current_state(TASK_INTERRUPTIBLE); read_lock(&tasklist_lock); tsk = current; do { retval = do_wait_thread(wo, tsk); if (retval) goto end; retval = ptrace_do_wait(wo, tsk); if (retval) goto end; if (wo->wo_flags & __WNOTHREAD) break; } while_each_thread(current, tsk); read_unlock(&tasklist_lock); So seems like it's still like Michael said back then: "If more than one LWP is currently stopped at a breakpoint, the highest-numbered one will be returned.", and it's likely you're being lucky with scheduling. E.g., multi-core vs single-core, or the scheduling algorithms in the kernel improved and are masking the issue. Or, simply the tests don't really exercise the starvation issue properly. Anyway, > The patch below is updated to call lp_status_maybe_breakpoint in both > breakpoint cancellation and event lwp selection. This patch is OK. Thanks, Pedro Alves