From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-patches-return-57797-listarch-gdb-patches=sources.redhat.com@sourceware.org>
Received: (qmail 22489 invoked by alias); 8 Aug 2008 20:02:55 -0000
Received: (qmail 22477 invoked by uid 22791); 8 Aug 2008 20:02:52 -0000
X-Spam-Check-By: sourceware.org
Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (65.74.133.4)     by sourceware.org (qpsmtpd/0.31) with ESMTP; Fri, 08 Aug 2008 20:02:15 +0000
Received: (qmail 27808 invoked from network); 8 Aug 2008 20:02:13 -0000
Received: from unknown (HELO orlando.local) (pedro@127.0.0.2)   by mail.codesourcery.com with ESMTPA; 8 Aug 2008 20:02:13 -0000
From: Pedro Alves <pedro@codesourcery.com>
To: "John David Anglin" <dave@hiauly1.hia.nrc.ca>
Subject: Re: ttrace: Protocal error
Date: Fri, 08 Aug 2008 20:02:00 -0000
User-Agent: KMail/1.9.9
Cc: gdb-patches@sourceware.org
References: <20080808183307.12E604EBE@hiauly1.hia.nrc.ca>
In-Reply-To: <20080808183307.12E604EBE@hiauly1.hia.nrc.ca>
MIME-Version: 1.0
Content-Type: text/plain;   charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
Message-Id: <200808082102.11828.pedro@codesourcery.com>
X-IsSubscribed: yes
Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <gdb-patches.sourceware.org>
List-Subscribe: <mailto:gdb-patches-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: gdb-patches-owner@sourceware.org
X-SW-Source: 2008-08/txt/msg00220.txt.bz2

You didn't mention, but I assume this also happens without my patch.

Note, I know nothing about ttrace and HP-UX.

On Friday 08 August 2008 19:33:06, John David Anglin wrote:
> While were on the subject of threads, it seems we are still not in
> a position to debug the vla6.f90 failure:

What's this test doing different?

> Breakpoint 1, perror_with_name (string=3D0x0) at ../../src/gdb/utils.c:847
> 847       err =3D safe_strerror (errno);
> (gdb) bt
> #0  perror_with_name (string=3D0x0) at ../../src/gdb/utils.c:847
> #1  0x000c9b08 in inf_ttrace_resume_callback (info=3D0x2319b0,
> arg=3D0x7b019048) at ../../src/gdb/inf-ttrace.c:813
> #2  0x0008b640 in iterate_over_threads (
>     callback=3D@0x4001a70a: 0xc9a28 <inf_ttrace_resume_callback>, data=3D=
0x0)
>     at ../../src/gdb/thread.c:338
> #3  0x000c9960 in inf_ttrace_resume (ptid=3D
>     {pid =3D 1953788513, lwp =3D 1667563520, tid =3D 774778670}, step=3D1=
073949720,
>     signal=3DTARGET_SIGNAL_0) at ../../src/gdb/inf-ttrace.c:847
> #4  0x000a3390 in target_resume (ptid=3D
>     {pid =3D 1953788513, lwp =3D 1667563520, tid =3D 774778670}, step=3D0,
>     signal=3DTARGET_SIGNAL_0) at ../../src/gdb/target.c:1789

             ^^^^^^^^^^        ^^^^^^^^^^        ^^^^^^^^^

I assume this ptid is GDB getting bogus info, right?
To be to getting to inf_ttrace_resume_callback, this has
to be (-1,0,0).

=46rom your log:
> [New process 20069]
> [New process 20069, lwp 7087826]

> [process 20069, lwp 7087826 exited]

This should be setting the dying flag on the thread, but
it is still listed in gdb's thread table.

   case TTEVT_LWP_EXIT:
      if (print_thread_events)
	printf_unfiltered (_("[%s exited]\n"), target_pid_to_str (ptid));
      ti =3D find_thread_pid (ptid);
      gdb_assert (ti !=3D NULL);
      ((struct inf_ttrace_private_thread_info *)ti->private)->dying =3D 1;
      inf_ttrace_num_lwps--;
      ttrace (TT_LWP_CONTINUE, ptid_get_pid (ptid),
              ptid_get_lwp (ptid), TT_NOPC, 0, 0);
      /* If we don't return -1 here, core GDB will re-add the thread.  */
      ptid =3D minus_one_ptid;
      break;


inf_ttrace_resume:

  if (ptid_equal (ptid, minus_one_ptid))
    {
      /* Let all the other threads run too.  */
      iterate_over_threads (inf_ttrace_resume_callback, NULL);
      iterate_over_threads (inf_ttrace_delete_dying_threads_callback, NULL);
    }

Is this the first resume after that "exit" notification?
Any chance we're trying to resume a dead thread here then?

What happens when you delete the dying threads before resuming?

      iterate_over_threads (inf_ttrace_delete_dying_threads_callback, NULL);
      iterate_over_threads (inf_ttrace_resume_callback, NULL);
      iterate_over_threads (inf_ttrace_delete_dying_threads_callback, NULL);

Hmmm, I assume not, if my sources match yours, your the program is stopped
at a syscall event:

      /* Be careful not to try to gather much state about a thread
         that's in a syscall.  It's frequently a losing proposition.  */
    case TARGET_WAITKIND_SYSCALL_ENTRY:
      if (debug_infrun)
        fprintf_unfiltered (gdb_stdlog, "infrun:=20
TARGET_WAITKIND_SYSCALL_ENTRY\n");
      resume (0, TARGET_SIGNAL_0);
      prepare_to_wait (ecs);
      return;

So, there should have already been a resume in between.

Could you check which thread got the syscall event?  Is it the same
thread we fail to resume?  Is it possibly to disable syscall events,
just for checking if it is related?

--=20
Pedro Alves