From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 22489 invoked by alias); 8 Aug 2008 20:02:55 -0000 Received: (qmail 22477 invoked by uid 22791); 8 Aug 2008 20:02:52 -0000 X-Spam-Check-By: sourceware.org Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (65.74.133.4) by sourceware.org (qpsmtpd/0.31) with ESMTP; Fri, 08 Aug 2008 20:02:15 +0000 Received: (qmail 27808 invoked from network); 8 Aug 2008 20:02:13 -0000 Received: from unknown (HELO orlando.local) (pedro@127.0.0.2) by mail.codesourcery.com with ESMTPA; 8 Aug 2008 20:02:13 -0000 From: Pedro Alves To: "John David Anglin" Subject: Re: ttrace: Protocal error Date: Fri, 08 Aug 2008 20:02:00 -0000 User-Agent: KMail/1.9.9 Cc: gdb-patches@sourceware.org References: <20080808183307.12E604EBE@hiauly1.hia.nrc.ca> In-Reply-To: <20080808183307.12E604EBE@hiauly1.hia.nrc.ca> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Message-Id: <200808082102.11828.pedro@codesourcery.com> X-IsSubscribed: yes Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2008-08/txt/msg00220.txt.bz2 You didn't mention, but I assume this also happens without my patch. Note, I know nothing about ttrace and HP-UX. On Friday 08 August 2008 19:33:06, John David Anglin wrote: > While were on the subject of threads, it seems we are still not in > a position to debug the vla6.f90 failure: What's this test doing different? > Breakpoint 1, perror_with_name (string=3D0x0) at ../../src/gdb/utils.c:847 > 847 err =3D safe_strerror (errno); > (gdb) bt > #0 perror_with_name (string=3D0x0) at ../../src/gdb/utils.c:847 > #1 0x000c9b08 in inf_ttrace_resume_callback (info=3D0x2319b0, > arg=3D0x7b019048) at ../../src/gdb/inf-ttrace.c:813 > #2 0x0008b640 in iterate_over_threads ( > callback=3D@0x4001a70a: 0xc9a28 , data=3D= 0x0) > at ../../src/gdb/thread.c:338 > #3 0x000c9960 in inf_ttrace_resume (ptid=3D > {pid =3D 1953788513, lwp =3D 1667563520, tid =3D 774778670}, step=3D1= 073949720, > signal=3DTARGET_SIGNAL_0) at ../../src/gdb/inf-ttrace.c:847 > #4 0x000a3390 in target_resume (ptid=3D > {pid =3D 1953788513, lwp =3D 1667563520, tid =3D 774778670}, step=3D0, > signal=3DTARGET_SIGNAL_0) at ../../src/gdb/target.c:1789 ^^^^^^^^^^ ^^^^^^^^^^ ^^^^^^^^^ I assume this ptid is GDB getting bogus info, right? To be to getting to inf_ttrace_resume_callback, this has to be (-1,0,0). =46rom your log: > [New process 20069] > [New process 20069, lwp 7087826] > [process 20069, lwp 7087826 exited] This should be setting the dying flag on the thread, but it is still listed in gdb's thread table. case TTEVT_LWP_EXIT: if (print_thread_events) printf_unfiltered (_("[%s exited]\n"), target_pid_to_str (ptid)); ti =3D find_thread_pid (ptid); gdb_assert (ti !=3D NULL); ((struct inf_ttrace_private_thread_info *)ti->private)->dying =3D 1; inf_ttrace_num_lwps--; ttrace (TT_LWP_CONTINUE, ptid_get_pid (ptid), ptid_get_lwp (ptid), TT_NOPC, 0, 0); /* If we don't return -1 here, core GDB will re-add the thread. */ ptid =3D minus_one_ptid; break; inf_ttrace_resume: if (ptid_equal (ptid, minus_one_ptid)) { /* Let all the other threads run too. */ iterate_over_threads (inf_ttrace_resume_callback, NULL); iterate_over_threads (inf_ttrace_delete_dying_threads_callback, NULL); } Is this the first resume after that "exit" notification? Any chance we're trying to resume a dead thread here then? What happens when you delete the dying threads before resuming? iterate_over_threads (inf_ttrace_delete_dying_threads_callback, NULL); iterate_over_threads (inf_ttrace_resume_callback, NULL); iterate_over_threads (inf_ttrace_delete_dying_threads_callback, NULL); Hmmm, I assume not, if my sources match yours, your the program is stopped at a syscall event: /* Be careful not to try to gather much state about a thread that's in a syscall. It's frequently a losing proposition. */ case TARGET_WAITKIND_SYSCALL_ENTRY: if (debug_infrun) fprintf_unfiltered (gdb_stdlog, "infrun:=20 TARGET_WAITKIND_SYSCALL_ENTRY\n"); resume (0, TARGET_SIGNAL_0); prepare_to_wait (ecs); return; So, there should have already been a resume in between. Could you check which thread got the syscall event? Is it the same thread we fail to resume? Is it possibly to disable syscall events, just for checking if it is related? --=20 Pedro Alves