From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gdb-return-21378-listarch-gdb=sources.redhat.com@sources.redhat.com>
Received: (qmail 16488 invoked by alias); 3 May 2005 14:48:58 -0000
Mailing-List: contact gdb-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Subscribe: <mailto:gdb-subscribe@sources.redhat.com>
List-Archive: <http://sources.redhat.com/ml/gdb/>
List-Post: <mailto:gdb@sources.redhat.com>
List-Help: <mailto:gdb-help@sources.redhat.com>, <http://sources.redhat.com/ml/#faqs>
Sender: gdb-owner@sources.redhat.com
Received: (qmail 16238 invoked from network); 3 May 2005 14:48:47 -0000
Received: from unknown (HELO nevyn.them.org) (66.93.172.17)
  by sourceware.org with SMTP; 3 May 2005 14:48:47 -0000
Received: from drow by nevyn.them.org with local (Exim 4.50 #1 (Debian))
	id 1DSyhY-0006zD-Hv; Tue, 03 May 2005 10:48:44 -0400
Date: Tue, 03 May 2005 14:48:00 -0000
From: Daniel Jacobowitz <drow@false.org>
To: David Lecomber <david@allinea.com>
Cc: Andreas Schwab <schwab@suse.de>, gdb <gdb@sources.redhat.com>
Subject: Re: GDB locks up -- Cannot find new threads: generic error
Message-ID: <20050503144844.GA24721@nevyn.them.org>
Mail-Followup-To: David Lecomber <david@allinea.com>,
	Andreas Schwab <schwab@suse.de>, gdb <gdb@sources.redhat.com>
References: <1114627357.31720.81.camel@cpc4-oxfd5-5-0-cust111.oxfd.cable.ntl.com> <20050427190108.GA28978@nevyn.them.org> <jefyxbvnwd.fsf@sykes.suse.de> <1115130086.1638.27.camel@delmo.priv.wark.uk.streamline-computing.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1115130086.1638.27.camel@delmo.priv.wark.uk.streamline-computing.com>
User-Agent: Mutt/1.5.8i
X-SW-Source: 2005-05/txt/msg00033.txt.bz2

On Tue, May 03, 2005 at 03:21:25PM +0100, David Lecomber wrote:
> > >> The  system is:
> > >> kernel-2.4.21-27.EL
> > >> glibc-2.3.2-95.30
> > >
> > > At a guess, your kernel is buggered.  You really should never see that
> > > warning.  The unexpected signal is SIGCHLD; your kernel has accepted
> > > the SETOPTIONS but obviously failed to stop when the test thread
> > > vforked.
> > 
> > I think that can happen when the 32 bit ptrace emulation is incomplete,
> > especially if PTRACE_GETEVENTMSG is not properly emulated.  That should be
> > fixed in recent (< 9 months) kernels.
> 
> Hello Andreas,
> 
> I can reproduce this on a SuSE Opteron machine - running 2.6.8-24.13 -
> and 2.6.8 came out 13th August (?).  How - other than brokenness - can I
> test if this PTRACE_GETEVENTMSG is the problem?

I assume your GDB is built as a 32-bit application?

If it is broken, than the result will be 64-bit despite the fact that
GDB is a 32-bit binary.  We could detect this and disable the feature,
but better still would be to detect and handle it.  All relevant code
is in linux-nat.c.

Create a type:
union event_msg {
  long l;
  long long ll;
};

Initialize LL to zero.  Pass that to ptrace instead of &second_pid. 
Check the result.  If L is non-zero, we can use that; if it isn't,
but LL is non-zero, we need to use LL.  Save the result of this test in
a global variable and update all callers.  This won't catch all cases,
depending on endianness, but it ought to work anyway.

I don't see how it's going to help x86 though.  Little endian; the
worst that would happen is a couple bytes on the stack clobbered.
The PID should be OK.

Anyway, like the attached.  Want to try it?  I left it noisy for
testing.  It seems to do the expected thing on i386.

-- 
Daniel Jacobowitz
CodeSourcery, LLC

Index: linux-nat.c
===================================================================
RCS file: /cvs/src/src/gdb/linux-nat.c,v
retrieving revision 1.27
diff -u -p -r1.27 linux-nat.c
--- linux-nat.c	6 Mar 2005 16:42:20 -0000	1.27
+++ linux-nat.c	3 May 2005 14:48:05 -0000
@@ -109,6 +109,20 @@ static int linux_supports_tracefork_flag
 
 static int linux_supports_tracevforkdone_flag = -1;
 
+/* Normally PTRACE_GETEVENTMSG returns a long int.  But on some 64-bit
+   systems, even with 32-bit long, it will return a long long.  For
+   instance, some x86_64 kernels had broken 32-bit emulation for this
+   option.  MIPS n32 also does this.  */
+   
+union ptrace_event_msg
+{
+  long l;
+  long long ll;
+  long la[2];
+};
+
+static int linux_geteventmsg_uses_long_long = 0;
+
 
 /* Trivial list manipulation functions to keep track of a list of
    new stopped processes.  */
@@ -189,6 +203,7 @@ static void
 linux_test_for_tracefork (int original_pid)
 {
   int child_pid, ret, status;
+  union ptrace_event_msg event;
   long second_pid;
 
   linux_supports_tracefork_flag = 0;
@@ -247,8 +262,30 @@ linux_test_for_tracefork (int original_p
   if (ret == child_pid && WIFSTOPPED (status)
       && status >> 16 == PTRACE_EVENT_FORK)
     {
-      second_pid = 0;
-      ret = ptrace (PTRACE_GETEVENTMSG, child_pid, 0, &second_pid);
+      event.la[0] = 0;
+      event.la[1] = 0x42000000;
+      ret = ptrace (PTRACE_GETEVENTMSG, child_pid, 0, &event);
+      if (event.la[0] == 0 && event.la[1] == 0x42000000)
+	{
+	  second_pid = 0;
+	  warning ("linux_test_for_tracefork: No response");
+	}
+      else if (event.la[0] == 0 && event.la[1] != 0x42000000)
+	{
+	  linux_geteventmsg_uses_long_long = 1;
+	  second_pid = event.ll;
+	  warning ("linux_test_for_tracefork: Needed to use long long");
+	}
+      else if (event.la[0] != 0 && event.la[1] == 0x42000000)
+	{
+	  second_pid = event.l;
+	  warning ("linux_test_for_tracefork: Needed to use long, as expected");
+	}
+      else
+	{
+	  second_pid = event.l;
+	  warning ("linux_test_for_tracefork: Needed to use long, but second half was clobbered");
+	}
       if (ret == 0 && second_pid != 0)
 	{
 	  int second_status;
@@ -484,10 +521,12 @@ linux_handle_extended_wait (int pid, int
   if (event == PTRACE_EVENT_FORK || event == PTRACE_EVENT_VFORK
       || event == PTRACE_EVENT_CLONE)
     {
+      union ptrace_event_msg event_msg;
       unsigned long new_pid;
       int ret;
 
-      ptrace (PTRACE_GETEVENTMSG, pid, 0, &new_pid);
+      ptrace (PTRACE_GETEVENTMSG, pid, 0, &event_msg);
+      new_pid = linux_geteventmsg_uses_long_long ? event_msg.ll : event_msg.l;
 
       /* If we haven't already seen the new PID stop, wait for it now.  */
       if (! pull_pid_from_list (&stopped_pids, new_pid))