From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16488 invoked by alias); 3 May 2005 14:48:58 -0000 Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-owner@sources.redhat.com Received: (qmail 16238 invoked from network); 3 May 2005 14:48:47 -0000 Received: from unknown (HELO nevyn.them.org) (66.93.172.17) by sourceware.org with SMTP; 3 May 2005 14:48:47 -0000 Received: from drow by nevyn.them.org with local (Exim 4.50 #1 (Debian)) id 1DSyhY-0006zD-Hv; Tue, 03 May 2005 10:48:44 -0400 Date: Tue, 03 May 2005 14:48:00 -0000 From: Daniel Jacobowitz To: David Lecomber Cc: Andreas Schwab , gdb Subject: Re: GDB locks up -- Cannot find new threads: generic error Message-ID: <20050503144844.GA24721@nevyn.them.org> Mail-Followup-To: David Lecomber , Andreas Schwab , gdb References: <1114627357.31720.81.camel@cpc4-oxfd5-5-0-cust111.oxfd.cable.ntl.com> <20050427190108.GA28978@nevyn.them.org> <1115130086.1638.27.camel@delmo.priv.wark.uk.streamline-computing.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1115130086.1638.27.camel@delmo.priv.wark.uk.streamline-computing.com> User-Agent: Mutt/1.5.8i X-SW-Source: 2005-05/txt/msg00033.txt.bz2 On Tue, May 03, 2005 at 03:21:25PM +0100, David Lecomber wrote: > > >> The system is: > > >> kernel-2.4.21-27.EL > > >> glibc-2.3.2-95.30 > > > > > > At a guess, your kernel is buggered. You really should never see that > > > warning. The unexpected signal is SIGCHLD; your kernel has accepted > > > the SETOPTIONS but obviously failed to stop when the test thread > > > vforked. > > > > I think that can happen when the 32 bit ptrace emulation is incomplete, > > especially if PTRACE_GETEVENTMSG is not properly emulated. That should be > > fixed in recent (< 9 months) kernels. > > Hello Andreas, > > I can reproduce this on a SuSE Opteron machine - running 2.6.8-24.13 - > and 2.6.8 came out 13th August (?). How - other than brokenness - can I > test if this PTRACE_GETEVENTMSG is the problem? I assume your GDB is built as a 32-bit application? If it is broken, than the result will be 64-bit despite the fact that GDB is a 32-bit binary. We could detect this and disable the feature, but better still would be to detect and handle it. All relevant code is in linux-nat.c. Create a type: union event_msg { long l; long long ll; }; Initialize LL to zero. Pass that to ptrace instead of &second_pid. Check the result. If L is non-zero, we can use that; if it isn't, but LL is non-zero, we need to use LL. Save the result of this test in a global variable and update all callers. This won't catch all cases, depending on endianness, but it ought to work anyway. I don't see how it's going to help x86 though. Little endian; the worst that would happen is a couple bytes on the stack clobbered. The PID should be OK. Anyway, like the attached. Want to try it? I left it noisy for testing. It seems to do the expected thing on i386. -- Daniel Jacobowitz CodeSourcery, LLC Index: linux-nat.c =================================================================== RCS file: /cvs/src/src/gdb/linux-nat.c,v retrieving revision 1.27 diff -u -p -r1.27 linux-nat.c --- linux-nat.c 6 Mar 2005 16:42:20 -0000 1.27 +++ linux-nat.c 3 May 2005 14:48:05 -0000 @@ -109,6 +109,20 @@ static int linux_supports_tracefork_flag static int linux_supports_tracevforkdone_flag = -1; +/* Normally PTRACE_GETEVENTMSG returns a long int. But on some 64-bit + systems, even with 32-bit long, it will return a long long. For + instance, some x86_64 kernels had broken 32-bit emulation for this + option. MIPS n32 also does this. */ + +union ptrace_event_msg +{ + long l; + long long ll; + long la[2]; +}; + +static int linux_geteventmsg_uses_long_long = 0; + /* Trivial list manipulation functions to keep track of a list of new stopped processes. */ @@ -189,6 +203,7 @@ static void linux_test_for_tracefork (int original_pid) { int child_pid, ret, status; + union ptrace_event_msg event; long second_pid; linux_supports_tracefork_flag = 0; @@ -247,8 +262,30 @@ linux_test_for_tracefork (int original_p if (ret == child_pid && WIFSTOPPED (status) && status >> 16 == PTRACE_EVENT_FORK) { - second_pid = 0; - ret = ptrace (PTRACE_GETEVENTMSG, child_pid, 0, &second_pid); + event.la[0] = 0; + event.la[1] = 0x42000000; + ret = ptrace (PTRACE_GETEVENTMSG, child_pid, 0, &event); + if (event.la[0] == 0 && event.la[1] == 0x42000000) + { + second_pid = 0; + warning ("linux_test_for_tracefork: No response"); + } + else if (event.la[0] == 0 && event.la[1] != 0x42000000) + { + linux_geteventmsg_uses_long_long = 1; + second_pid = event.ll; + warning ("linux_test_for_tracefork: Needed to use long long"); + } + else if (event.la[0] != 0 && event.la[1] == 0x42000000) + { + second_pid = event.l; + warning ("linux_test_for_tracefork: Needed to use long, as expected"); + } + else + { + second_pid = event.l; + warning ("linux_test_for_tracefork: Needed to use long, but second half was clobbered"); + } if (ret == 0 && second_pid != 0) { int second_status; @@ -484,10 +521,12 @@ linux_handle_extended_wait (int pid, int if (event == PTRACE_EVENT_FORK || event == PTRACE_EVENT_VFORK || event == PTRACE_EVENT_CLONE) { + union ptrace_event_msg event_msg; unsigned long new_pid; int ret; - ptrace (PTRACE_GETEVENTMSG, pid, 0, &new_pid); + ptrace (PTRACE_GETEVENTMSG, pid, 0, &event_msg); + new_pid = linux_geteventmsg_uses_long_long ? event_msg.ll : event_msg.l; /* If we haven't already seen the new PID stop, wait for it now. */ if (! pull_pid_from_list (&stopped_pids, new_pid))