From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 10745 invoked by alias); 10 May 2003 00:58:02 -0000 Mailing-List: contact gdb-patches-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sources.redhat.com Received: (qmail 10737 invoked from network); 10 May 2003 00:58:01 -0000 Received: from unknown (HELO crack.them.org) (146.82.138.56) by sources.redhat.com with SMTP; 10 May 2003 00:58:01 -0000 Received: from nevyn.them.org ([66.93.61.169] ident=mail) by crack.them.org with asmtp (Exim 3.12 #1 (Debian)) id 19EIgw-000512-00; Fri, 09 May 2003 19:58:23 -0500 Received: from drow by nevyn.them.org with local (Exim 3.36 #1 (Debian)) id 19EIgV-0008W4-00; Fri, 09 May 2003 20:57:55 -0400 Date: Sat, 10 May 2003 00:58:00 -0000 From: Daniel Jacobowitz To: "J. Johnston" Cc: gdb-patches@sources.redhat.com Subject: Re: RFC: nptl threading patch for linux Message-ID: <20030510005755.GA32695@nevyn.them.org> Mail-Followup-To: "J. Johnston" , gdb-patches@sources.redhat.com References: <3EA84E74.3010101@redhat.com> <20030509220011.GA22383@nevyn.them.org> <3EBC3BFA.7030709@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3EBC3BFA.7030709@redhat.com> User-Agent: Mutt/1.5.1i X-SW-Source: 2003-05/txt/msg00154.txt.bz2 On Fri, May 09, 2003 at 07:38:34PM -0400, J. Johnston wrote: > Daniel Jacobowitz wrote: > >On Thu, Apr 24, 2003 at 04:52:04PM -0400, J. Johnston wrote: > > > >>The following is the last part of my revised nptl patch that has > >>been broken up per Daniel J.'s suggestion. There are no generated > >>files included in the patch. > > > > > >Well, this patch doesn't work for me :( Using 2.5.69, since I don't > >have any of the Red Hat kernels available here at the moment. It looks > >like GDB bellies up around the second thread creation. > > > > Is this one of the gdb.threads testcases? If not, do any of those run > for you and/or can you send me a testcase for the problem below so we can > at least > have something common to compare? Sorry, I forgot to say. This is just pthreads.exp, with a breakpoint on common_routine. > > -- Jeff J. > > >A backtrace looks like: > >#0 0xffffe402 in ?? () > >#1 0x080e1332 in stop_wait_callback (lp=0x0, data=0xbffff450) > > at /opt/src/gdb/src-gdblinks/gdb/lin-lwp.c:708 > >#2 0x080e159a in stop_wait_callback (lp=0x0, data=0xbffff450) > > at /opt/src/gdb/src-gdblinks/gdb/lin-lwp.c:870 > >#3 0x080e159a in stop_wait_callback (lp=0x0, data=0xbffff450) > > at /opt/src/gdb/src-gdblinks/gdb/lin-lwp.c:870 > >#4 0x080e159a in stop_wait_callback (lp=0x0, data=0xbffff450) > > at /opt/src/gdb/src-gdblinks/gdb/lin-lwp.c:870 > > > >And that's not just the stack unwinder getting confused. We really did > >recurse until we ran out of stack. > > > >The superficial reason is this: > >SWC: Pending event Segmentation Fault (stopped) in LWP 4490 > > > >i.e. every time we resume it with no signal it SIGSEGV's again, and we > >never get the SIGSTOP. > > > >Here's some more of the log: > >(gdb) c > >Continuing. > >LLR: PTRACE_SINGLESTEP process 4498, 0 (resume event thread) > >LLW: waitpid 4498 received Trace/breakpoint trap (stopped) > >LLTA: PTRACE_PEEKUSER LWP 4498, 0, 0 (OK) > >LLW: Candidate event Trace/breakpoint trap (stopped) in LWP 4498. > >SEL: Select single-step LWP 4498 > >LLW: trap_ptid is LWP 4498. > >RC: PTRACE_CONT LWP 4497, 0, 0 (resume sibling) > >LLR: PTRACE_CONT process 4498, 0 (resume event thread) > >LLW: waitpid 4497 received Trace/breakpoint trap (stopped) > >LLTA: PTRACE_PEEKUSER LWP 4497, 0, 0 (OK) > >LLW: Candidate event Trace/breakpoint trap (stopped) in LWP 4497. > >SC: kill LWP 4498 **** > >SC: lwp kill 0 ERRNO-OK > >SWC: waitpid LWP 4498 received Stopped (signal) (stopped) > >LLTA: PTRACE_PEEKUSER LWP 4498, 0, 0 (OK) > >LLW: trap_ptid is LWP 4497. > >[New Thread 1077276112 (LWP 4499)] > >LLAL: PTRACE_ATTACH LWP 4499, 0, 0 (OK) > >LLAL: waitpid LWP 4499 received Stopped (signal) (stopped) > >LLR: PTRACE_SINGLESTEP process 4497, 0 (resume event thread) > >LLW: waitpid 4497 received Trace/breakpoint trap (stopped) > >LLTA: PTRACE_PEEKUSER LWP 4497, 0, 0 (OK) > >LLW: Candidate event Trace/breakpoint trap (stopped) in LWP 4497. > >SEL: Select single-step LWP 4497 > >LLW: trap_ptid is LWP 4497. > >RC: PTRACE_CONT LWP 4499, 0, 0 (resume sibling) > >RC: PTRACE_CONT LWP 4498, 0, 0 (resume sibling) > >LLR: PTRACE_CONT process 4497, 0 (resume event thread) > >LLW: waitpid 4499 received Trace/breakpoint trap (stopped) > >LLTA: PTRACE_PEEKUSER LWP 4499, 0, 0 (OK) > >LLW: Candidate event Trace/breakpoint trap (stopped) in LWP 4499. > >SC: kill LWP 4498 **** > >SC: lwp kill 0 ERRNO-OK > >SC: kill LWP 4497 **** > >SC: lwp kill 0 ERRNO-OK > >SWC: waitpid LWP 4498 received Stopped (signal) (stopped) > >LLTA: PTRACE_PEEKUSER LWP 4498, 0, 0 (OK) > >SWC: waitpid LWP 4497 received Trace/breakpoint trap (stopped) > >LLTA: PTRACE_PEEKUSER LWP 4497, 0, 0 (OK) > >PTRACE_CONT LWP 4497, 0, 0 (OK) > >SWC: Candidate SIGTRAP event in LWP 4497 > >SWC: waitpid LWP 4497 received Trace/breakpoint trap (stopped) > >LLTA: PTRACE_PEEKUSER LWP 4497, 0, 0 (OK) > >PTRACE_CONT LWP 4497, 0, 0 (OK) > >SWC: Candidate SIGTRAP event in LWP 4497 > >SWC: waitpid LWP 4497 received Segmentation fault (stopped) > >LLTA: PTRACE_PEEKUSER LWP 4497, 0, 0 (OK) > >SWC: Pending event Segmentation fault (stopped) in LWP 4497 > >SWC: PTRACE_CONT LWP 4497, 0, 0 (OK) > >SWC: waitpid LWP 4497 received Segmentation fault (stopped) > >LLTA: PTRACE_PEEKUSER LWP 4497, 0, 0 (OK) > > > > > >A little interpretation: 4497 hits the creation breakpoint. We atach > >to 4499. 4499 hits the common_routine breakpoint. We stop 4497. It > >hits the breakpoint at thread creation again for the next thread. We > >PTRACE_CONT 4497 again trying to get the SIGSTOP, and get another > >SIGTRAP - probably we were backed up from the breakpoint last time so > >we hit it again. We try _again_, and SIGSEGV because we're on the > >second byte of a multi-byte instruction, the first byte having been > >replaced by a breakpoint. > > > >Life explodes. > > > > > >So: > > - stop_wait_callback should be fixed to not be so dumb when this > > happens. > > - we need to figure out how we got into this mess. > > - and why the SIGSTOP never showed up. > > > >I avoid this entire foul issue in gdbserver by not backtracking and > >resuming the application; instead I just set a flag marking the next > >SIGSTOP as "expected". It's still not perfect but it's a great deal > >better. I can do even better when I have some time to play with > >PTRACE_GETSIGINFO. > > > >I'm waiting for GDB to tell me how we got here. The backtrace is more > >than 40K frames, since I forgot to shrink the stack limit. 50K... > >170K... ooh! > > > >#174697 0x080e1724 in stop_wait_callback (lp=0x0, data=0xbffff450) > > at /opt/src/gdb/src-gdblinks/gdb/lin-lwp.c:830 > >#174698 0x080e033d in iterate_over_lwps (callback=0x80e12d0 > >, data=0x1181) > > at /opt/src/gdb/src-gdblinks/gdb/lin-lwp.c:293 > >#174699 0x080e251e in lin_lwp_wait (ptid={pid = -1, lwp = 0, tid = 0}, > >ourstatus=0x72) > > at /opt/src/gdb/src-gdblinks/gdb/lin-lwp.c:1499 > >#174700 0x08128ca3 in thread_db_wait (ptid={pid = -1, lwp = 0, tid = 0}, > >ourstatus=0xffffffff) > > at /opt/src/gdb/src-gdblinks/gdb/thread-db.c:846 > >#174701 0x080bc19e in wait_for_inferior () at > >/opt/src/gdb/src-gdblinks/gdb/infrun.c:1003 > >#174702 0x080bbf13 in proceed (addr=3221222720, siggnal=144, step=0) > > at /opt/src/gdb/src-gdblinks/gdb/infrun.c:814 > >#174703 0x080b8fb0 in continue_command (proc_count_exp=0x0, from_tty=1) > > at /opt/src/gdb/src-gdblinks/gdb/infcmd.c:539 > > > >It wasn't worth the wait. That didn't help much. > > > > > > > -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer