From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from aserp2120.oracle.com (aserp2120.oracle.com [141.146.126.78]) by sourceware.org (Postfix) with ESMTPS id 41C1D3851C26 for ; Tue, 2 Jun 2020 16:30:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 41C1D3851C26 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 052GQQfk007758; Tue, 2 Jun 2020 16:30:36 GMT Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by aserp2120.oracle.com with ESMTP id 31bfem4xd6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 02 Jun 2020 16:30:36 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.42/8.16.0.42) with SMTP id 052GTPDw118953; Tue, 2 Jun 2020 16:30:36 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserp3030.oracle.com with ESMTP id 31c12pgwa9-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 02 Jun 2020 16:30:36 +0000 Received: from abhmp0008.oracle.com (abhmp0008.oracle.com [141.146.116.14]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id 052GUZXF023551; Tue, 2 Jun 2020 16:30:35 GMT Received: from [192.168.15.249] (/89.233.184.135) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 02 Jun 2020 09:30:35 -0700 Subject: Re: Solaris - procfs: couldn't find pid 32748 (kernel thread 21) in procinfo list To: Pedro Alves , gdb@sourceware.org References: <5ab0b8b1-6072-6717-1ae0-ba06339254b8@oracle.com> <0570473c-1181-2269-06a0-0f6d4fc6b178@redhat.com> <51ff2398-4a7d-eb07-be98-0ae92673e152@oracle.com> <6f4b62a6-3bcc-346e-ac69-a89e98f6dfbe@redhat.com> From: Petr Sumbera Organization: Oracle Corporation Message-ID: <405d3ffb-ea46-57cb-a023-7dece1983fb6@oracle.com> Date: Tue, 2 Jun 2020 18:30:32 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.8.1 MIME-Version: 1.0 In-Reply-To: <6f4b62a6-3bcc-346e-ac69-a89e98f6dfbe@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9640 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxscore=0 phishscore=0 malwarescore=0 adultscore=0 suspectscore=0 spamscore=0 bulkscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2006020118 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9640 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 malwarescore=0 suspectscore=0 mlxlogscore=999 priorityscore=1501 bulkscore=0 phishscore=0 clxscore=1015 impostorscore=0 adultscore=0 spamscore=0 mlxscore=0 lowpriorityscore=0 cotscore=-2147483648 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2006020117 X-Spam-Status: No, score=-7.8 required=5.0 tests=BAYES_00, BODY_8BITS, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gdb@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Jun 2020 16:30:41 -0000 On 02.06.2020 16:53, Pedro Alves wrote: > On 6/2/20 8:32 AM, Petr Sumbera via Gdb wrote: >> On 01.06.2020 21:12, Pedro Alves wrote: >>> On 6/1/20 12:39 PM, Petr Sumbera via Gdb wrote: >>>> The issue seems to be that the LWP exits and the status->kind is set to TARGET_WAITKIND_SPURIOUS: >>>> >>>> https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/procfs.c;h=f6c6b0e71c16224d3e7345ca09e011cdcf06349a;hb=HEAD#l2214 >>>> >>>> But instantly it's added into the list again here: >>>> >>>> https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/infrun.c;h=95fc3bfe45930b53c33cb4de165db9c070449ad8;hb=HEAD#l5200 >>>> >>>> But there is no longer such LWP in /proc. >>>> >>>> Any suggestion? >> >> Thanks for looking at it! >> >>> Either: >>> >>> - replace TARGET_WAITKIND_SPURIOUS with TARGET_WAITKIND_THREAD_EXITED, or, >> >> With this I'm getting: >> >> [LWP    21         exited] >> [LWP    21         exited] >> /builds/psumbera/userland-gdb-procinfo/components/gdb/gdb-9.2/gdb/thread.c:459: internal-error: void delete_thread_1(thread_info*, bool): Assertion `thr != nullptr' failed. >> A problem internal to GDB has been detected, >> further debugging may prove unreliable. >> >>> - replace >>>      status->kind = TARGET_WAITKIND_SPURIOUS; >>>      return retval; >>>    with >>>      goto wait_again; >>>    instead. >> >> and with this: >> >> [LWP    20         exited] >> [LWP    20         exited] >> /builds/psumbera/userland-gdb-procinfo/components/gdb/gdb-9.2/gdb/thread.c:459: internal-error: void delete_thread_1(thread_info*, bool): Assertion `thr != nullptr' failed. >> A problem internal to GDB has been detected, >> further debugging may prove unreliable. >> >> -- >> >> Note that in both cases there are TWO exits for one LWP. But LWP numbers differ. > > You mean, it was 21 in one run, and 20 in another run? > Those were two different runs, and some timing difference > probably tweaked the order of which thread exits first or > something. Doesn't seem unusual. > > Sounds like the patch below would fix it. Unfortunately no. > But why do we get two exits in a row for each LWP? Oh, I guess > once for PR_SYSENTRY of the exit syscall, and another time for > PR_SYSEXIT. Only PR_SYSENTRY is called for my test case (the first occurrence of 'exited]' - I changed that strings to distinguish between each other). > From 0be6c82e754dd676e9f1259ab0f9a7849d985ffd Mon Sep 17 00:00:00 2001 > From: Pedro Alves > Date: Tue, 2 Jun 2020 15:44:54 +0100 > Subject: [PATCH] fix-solaris > > --- > gdb/procfs.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/gdb/procfs.c b/gdb/procfs.c > index f6c6b0e71c1..e2042f3edc4 100644 > --- a/gdb/procfs.c > +++ b/gdb/procfs.c > @@ -2331,9 +2331,10 @@ procfs_target::wait (ptid_t ptid, struct target_waitstatus *status, > if (print_thread_events) > printf_unfiltered (_("[%s exited]\n"), > target_pid_to_str (retval).c_str ()); > - delete_thread (find_thread_ptid (this, retval)); > - status->kind = TARGET_WAITKIND_SPURIOUS; > - return retval; > + thread_info *thr = find_thread_ptid (this, retval); > + if (thr != nullptr) > + delete_thread (thr); > + goto wait_again; > } > else if (0) > { > > base-commit: f6eee2d098049afd18f90b8f4bb6a5d1a49d900c > I have modified your change to gdb 9.2 and to correct occurrence (you have added it to second occurrence of 'exited'): --- ../../gdb-9.2/gdb/procfs.c.orig 2020-06-02 17:10:32.057735432 +0000 +++ ../../gdb-9.2/gdb/procfs.c 2020-06-02 18:02:45.496117117 +0000 @@ -2207,9 +2207,10 @@ if (print_thread_events) printf_unfiltered (_("[%s exited]\n"), target_pid_to_str (retval).c_str ()); - delete_thread (find_thread_ptid (retval)); - status->kind = TARGET_WAITKIND_SPURIOUS; - return retval; + thread_info *thr = find_thread_ptid (retval); + if (thr) + delete_thread (thr); + goto wait_again; } else if (syscall_is_exit (pi, what)) { But this time exited message repeats forever: [LWP 24 exited] [LWP 24 exited] [LWP 24 exited] .. --- Petr