From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-68013-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 32288 invoked by alias); 25 Oct 2009 23:59:28 -0000
Received: (qmail 32274 invoked by uid 22791); 25 Oct 2009 23:59:27 -0000
X-SWARE-Spam-Status: No, hits=-2.5 required=5.0 	tests=AWL,BAYES_00,SPF_PASS
X-Spam-Check-By: sourceware.org
Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (38.113.113.100)     by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sun, 25 Oct 2009 23:59:23 +0000
Received: (qmail 22952 invoked from network); 25 Oct 2009 23:59:21 -0000
Received: from unknown (HELO orlando) (pedro@127.0.0.2)   by mail.codesourcery.com with ESMTPA; 25 Oct 2009 23:59:21 -0000
To: Paul Pluzhnikov <ppluzhnikov@google.com>
Subject: Re: [patch] Fix for PR gdb/10757
Cc: gdb-patches@sourceware.org
Content-Disposition: inline
From: Pedro Alves <pedro@codesourcery.com>
Date: Sun, 25 Oct 2009 23:59:00 -0000
MIME-Version: 1.0
Content-Type: text/plain;   charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Message-Id: <200910252359.40752.pedro@codesourcery.com>
X-IsSubscribed: yes
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
X-SW-Source: 2009-10/txt/msg00636.txt.bz2

On Sunday 18 October 2009 00:25:56, Paul Pluzhnikov wrote:
> 2009-10-17  Paul Pluzhnikov  <ppluzhnikov@google.com>
>=20
>        PR gdb/10757
>        * linux-thread-db.c (attach_thread): Return success/failure
>        indicator.
>        (thread_db_find_new_threads_silently): Retry until no new threads.
>        (struct callback_data): New.
>        (find_new_threads_callback): Count new threads, stop iteration
>        on error.
>        (find_new_threads_once): New function.
>        (thread_db_find_new_threads_2): Rename from
>        thread_db_find_new_threads_1 and adjust.
>        (thread_db_find_new_threads_1): New function.

Thanks.  This is quite unfortunate, but we get to live with it...


FTR, I've took a peek at nptl/nptl_db, to try to understand
a bit better why this happens (please do correct me if I'm
wrong):

 - First, it is worth remind us while nptl takes locks and barriers
   to protect its internal structures, thread_db/ntpl_db doesn't,
   since it runs in gdb's context.

 - nptl adds new threads to one of the __stack_user or stack_used
   lists.  New nodes are added to the _head_ of the lists.

 - nptl_db/td_ta_thr_iter.c:td_ta_thr_iter iterates over
   these thread lists from head to tail.

=46rom these last two points alone, one can infer that an iteration
over the thread list can miss new threads if they're added just while
the list is being iterated.


Another way to tackle this, would be iterate over=20
/proc/pid/task/ and attach to all lwps that way, instead of
relying on thread_db to do the initial iteration over all
threads.  We'd still iterate using the thread_db interface
once afterwards, so to learn the matching pthread
ids of the lwps.

With /proc/pid/task, you'd still need to keep iterating until
no new thread comes out, but, I think we could make the
number of loops unbounded much safely, since we wouldn't be
at risk of reading stale inferior data.  Assuming no kernel
bugs, that is.  Note that this is something that would be
a useful addition to gdb anyway, so that we could be able to
attach to multi-threaded apps that use raw clone instead of
pthreads.

> +      else
> +       /* Problem attaching this thread; perhaps it exited before we
> +          could attach it?
> +          This could mean that the thread list inside glibc itself is in
> +          inconsistent state, and libthread_db could go on looping forev=
er
> +          (observed with glibc-2.3.6).  To prevent that, terminate
> +          iteration: thread_db_find_new_threads_2 will retry.  */
> +       return 1;

Probably related to the fact that gdb can be iterating over
the threads lists, exactly while libc is concurrently removing
a thread from the same lists.  nptl_db could be iterating over
stale inferior pointers.


>  /* Search for new threads, accessing memory through stopped thread
> -   PTID.  */
> +   PTID.  If NUM_LOOPS is non-zero, repeat searching until NUM_LOOP
> +   searches in a row do not discover any new threads.  */
>=20=20
>  static void
> -thread_db_find_new_threads_1 (ptid_t ptid)
> +thread_db_find_new_threads_2 (ptid_t ptid, int num_loop)
>  {
(...)
> +
> +  if (num_loop !=3D 0)
> +    {
> +      /* Require NUM_LOOP successive iterations which do not find any ne=
w threads.  */

This line too long.

> +      for (i =3D 0, loop =3D 0; loop < 4; ++i, ++loop)

Err, s/4/num_loop/ ?  A comment explain _why_ we need this is missing
as well.  The patch looks OK otherwise.

Actually, I'd rename the `num_loop' parameter into
say, `until_no_new', and do this instead:

  if (until_no_new)
    {
      /* Require a few successive iterations which do not find any new thre=
ads.  */
      for (i =3D 0, loop =3D 0; loop < 4; ++i, ++loop)

That would be one way of localizing a bit the detail (and any comments)
of how many iterations you've happened to decide were sufficient in
practice, in one place, as opposed to spread around, like in:

> @@ -597,7 +598,7 @@ thread_db_find_new_threads_silently (pti
>=20=20
>    TRY_CATCH (except, RETURN_MASK_ERROR)
>      {
> -      thread_db_find_new_threads_1 (ptid);
> +      thread_db_find_new_threads_2 (ptid, 4);
>      }

... or any other call sites we may add in the future.

--=20
Pedro Alves