From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 25160 invoked by alias); 10 May 2010 23:30:23 -0000 Received: (qmail 25152 invoked by uid 22791); 10 May 2010 23:30:22 -0000 X-SWARE-Spam-Status: No, hits=-2.1 required=5.0 tests=AWL,BAYES_00,TW_BJ,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mail.codesourcery.com (HELO mail.codesourcery.com) (38.113.113.100) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 10 May 2010 23:30:18 +0000 Received: (qmail 3909 invoked from network); 10 May 2010 23:30:17 -0000 Received: from unknown (HELO orlando.localnet) (pedro@127.0.0.2) by mail.codesourcery.com with ESMTPA; 10 May 2010 23:30:17 -0000 From: Pedro Alves To: gdb-patches@sourceware.org Subject: Re: [RFA] Checkpoint: wait the defunct process when delete it Date: Mon, 10 May 2010 23:30:00 -0000 User-Agent: KMail/1.12.2 (Linux/2.6.31-20-generic; KDE/4.3.2; x86_64; ; ) Cc: Hui Zhu , Michael Snyder References: In-Reply-To: MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201005110030.09328.pedro@codesourcery.com> X-IsSubscribed: yes Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2010-05/txt/msg00241.txt.bz2 On Sunday 09 May 2010 07:23:15, Hui Zhu wrote: > I found that when we delete the checkpoint process, it keep defunct. > This is because the parent process is still running and didn't wait > it. > So I add a wait_ptid function after ptrace kill. You're assuming inferior_ptid is the parent process of the checkpoint fork, but I don't believe that is always true. E.g., if you do (gdb) checkpoint (gdb) checkpoint (gdb) checkpoint (gdb) info checkpoints 3 process 15353 at 0x457d43, file gdb.c, line 28 2 process 15352 at 0x457d43, file gdb.c, line 28 1 process 15351 at 0x457d43, file gdb.c, line 28 * 0 Thread 0x7ffff7fcc6f0 (LWP 15348) (main process) at 0x457d43, file gdb.c, line 28 (gdb) restart 1 ... (gdb) delete checkpoint 2 At this point, inferior_ptid will be process 15351, but that is not the parent of 15352, the process you're killing. 15348 is. > +static int > +wait_ptid (ptid_t ptid) > +{ I'd rename this to call_waitpid, similarly to the call_lseek function already present in the file. You may want to reimplement your function similarly to call_lseek is implemented too. Your call. > + struct objfile *waitpid_objf; > + struct value *waitpid_fn = NULL; > + struct value *argv[4]; > + struct gdbarch *gdbarch = get_current_arch (); > + > + /* Get the waitpid_fn. */ > + if (lookup_minimal_symbol ("waitpid", NULL, NULL) != NULL) > + waitpid_fn = find_function_in_inferior ("waitpid", &waitpid_objf); > + if (!waitpid_fn) > + if (lookup_minimal_symbol ("_waitpid", NULL, NULL) != NULL) "_waitpid" here, > + waitpid_fn = find_function_in_inferior ("waitpid", &waitpid_objf); but "waitpid" here? You could also put those two 'if's on a single line, like: if (lookup_minimal_symbol ("waitpid", NULL, NULL) != NULL) waitpid_fn = find_function_in_inferior ("waitpid", &waitpid_objf); if (!waitpid_fn && lookup_minimal_symbol ("_waitpid", NULL, NULL) != NULL) waitpid_fn = find_function_in_inferior ("_waitpid", &waitpid_objf); > + /* Get the argv. */ > + argv[0] = value_from_longest (builtin_type (gdbarch)->builtin_int, > PIDGET (ptid)); > + argv[1] = value_from_longest (builtin_type (gdbarch)->builtin_int, 0); > + argv[2] = value_from_longest (builtin_type (gdbarch)->builtin_int, 0); > + argv[3] = 0; The second argument of waitpid is a pointer, not an integer. From `man waitpid': pid_t waitpid(pid_t pid, int *status, int options); > + if (call_function_by_hand (waitpid_fn, 3, argv) == 0) > + return -1; This doesn't work in non-stop/async modes if the parent is presently running. Maybe just take the easy route for now, and add an is_stopped check, bailing out if not stopped? > + if (wait_ptid (ptid)) > + error (_("Unable to wait pid %s"), target_pid_to_str (ptid)); -- Pedro Alves