From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 32288 invoked by alias); 25 Oct 2009 23:59:28 -0000 Received: (qmail 32274 invoked by uid 22791); 25 Oct 2009 23:59:27 -0000 X-SWARE-Spam-Status: No, hits=-2.5 required=5.0 tests=AWL,BAYES_00,SPF_PASS X-Spam-Check-By: sourceware.org Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (38.113.113.100) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sun, 25 Oct 2009 23:59:23 +0000 Received: (qmail 22952 invoked from network); 25 Oct 2009 23:59:21 -0000 Received: from unknown (HELO orlando) (pedro@127.0.0.2) by mail.codesourcery.com with ESMTPA; 25 Oct 2009 23:59:21 -0000 To: Paul Pluzhnikov Subject: Re: [patch] Fix for PR gdb/10757 Cc: gdb-patches@sourceware.org Content-Disposition: inline From: Pedro Alves Date: Sun, 25 Oct 2009 23:59:00 -0000 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Message-Id: <200910252359.40752.pedro@codesourcery.com> X-IsSubscribed: yes Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2009-10/txt/msg00636.txt.bz2 On Sunday 18 October 2009 00:25:56, Paul Pluzhnikov wrote: > 2009-10-17 Paul Pluzhnikov >=20 > PR gdb/10757 > * linux-thread-db.c (attach_thread): Return success/failure > indicator. > (thread_db_find_new_threads_silently): Retry until no new threads. > (struct callback_data): New. > (find_new_threads_callback): Count new threads, stop iteration > on error. > (find_new_threads_once): New function. > (thread_db_find_new_threads_2): Rename from > thread_db_find_new_threads_1 and adjust. > (thread_db_find_new_threads_1): New function. Thanks. This is quite unfortunate, but we get to live with it... FTR, I've took a peek at nptl/nptl_db, to try to understand a bit better why this happens (please do correct me if I'm wrong): - First, it is worth remind us while nptl takes locks and barriers to protect its internal structures, thread_db/ntpl_db doesn't, since it runs in gdb's context. - nptl adds new threads to one of the __stack_user or stack_used lists. New nodes are added to the _head_ of the lists. - nptl_db/td_ta_thr_iter.c:td_ta_thr_iter iterates over these thread lists from head to tail. =46rom these last two points alone, one can infer that an iteration over the thread list can miss new threads if they're added just while the list is being iterated. Another way to tackle this, would be iterate over=20 /proc/pid/task/ and attach to all lwps that way, instead of relying on thread_db to do the initial iteration over all threads. We'd still iterate using the thread_db interface once afterwards, so to learn the matching pthread ids of the lwps. With /proc/pid/task, you'd still need to keep iterating until no new thread comes out, but, I think we could make the number of loops unbounded much safely, since we wouldn't be at risk of reading stale inferior data. Assuming no kernel bugs, that is. Note that this is something that would be a useful addition to gdb anyway, so that we could be able to attach to multi-threaded apps that use raw clone instead of pthreads. > + else > + /* Problem attaching this thread; perhaps it exited before we > + could attach it? > + This could mean that the thread list inside glibc itself is in > + inconsistent state, and libthread_db could go on looping forev= er > + (observed with glibc-2.3.6). To prevent that, terminate > + iteration: thread_db_find_new_threads_2 will retry. */ > + return 1; Probably related to the fact that gdb can be iterating over the threads lists, exactly while libc is concurrently removing a thread from the same lists. nptl_db could be iterating over stale inferior pointers. > /* Search for new threads, accessing memory through stopped thread > - PTID. */ > + PTID. If NUM_LOOPS is non-zero, repeat searching until NUM_LOOP > + searches in a row do not discover any new threads. */ >=20=20 > static void > -thread_db_find_new_threads_1 (ptid_t ptid) > +thread_db_find_new_threads_2 (ptid_t ptid, int num_loop) > { (...) > + > + if (num_loop !=3D 0) > + { > + /* Require NUM_LOOP successive iterations which do not find any ne= w threads. */ This line too long. > + for (i =3D 0, loop =3D 0; loop < 4; ++i, ++loop) Err, s/4/num_loop/ ? A comment explain _why_ we need this is missing as well. The patch looks OK otherwise. Actually, I'd rename the `num_loop' parameter into say, `until_no_new', and do this instead: if (until_no_new) { /* Require a few successive iterations which do not find any new thre= ads. */ for (i =3D 0, loop =3D 0; loop < 4; ++i, ++loop) That would be one way of localizing a bit the detail (and any comments) of how many iterations you've happened to decide were sufficient in practice, in one place, as opposed to spread around, like in: > @@ -597,7 +598,7 @@ thread_db_find_new_threads_silently (pti >=20=20 > TRY_CATCH (except, RETURN_MASK_ERROR) > { > - thread_db_find_new_threads_1 (ptid); > + thread_db_find_new_threads_2 (ptid, 4); > } ... or any other call sites we may add in the future. --=20 Pedro Alves