Re: [RFC PATCH 3/3] gdb/nat/linux: Fix attaching to process when it has zombie threads

Mirror of the gdb-patches mailing list
 help / color / mirror / Atom feed

From: Pedro Alves <pedro@palves.net>
To: Thiago Jung Bauermann <thiago.bauermann@linaro.org>,
	gdb-patches@sourceware.org
Subject: Re: [RFC PATCH 3/3] gdb/nat/linux: Fix attaching to process when it has zombie threads
Date: Wed, 17 Apr 2024 17:28:12 +0100	[thread overview]
Message-ID: <29215aa0-8387-4dee-8b8d-3cbf64e6abe3@palves.net> (raw)
In-Reply-To: <20240321231149.519549-4-thiago.bauermann@linaro.org>

On 2024-03-21 23:11, Thiago Jung Bauermann wrote:
> When GDB attaches to a multi-threaded process, it calls
> linux_proc_attach_tgid_threads () to go through all threads found in
> /proc/PID/task/ and call attach_proc_task_lwp_callback () on each of
> them.  If it does that twice without the callback reporting that a new
> thread was found, then it considers that all inferior threads have been
> found and returns.
> 
> The problem is that the callback considers any thread that it hasn't
> attached to yet as new.  This causes problems if the process has one or
> more zombie threads, because GDB can't attach to it and the loop will
> always "find" a new thread (the zombie one), and get stuck in an
> infinite loop.
> 
> This is easy to trigger (at least on aarch64-linux and powerpc64le-linux)
> with the gdb.threads/attach-many-short-lived-threads.exp testcase, because
> its test program constantly creates and finishes joinable threads so the
> chance of having zombie threads is high.
> 
> This problem causes the following failures:
> 
> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: attach (timeout)
> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: no new threads (timeout)
> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: set breakpoint always-inserted on (timeout)
> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: break break_fn (timeout)
> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: break at break_fn: 1 (timeout)
> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: break at break_fn: 2 (timeout)
> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: break at break_fn: 3 (timeout)
> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: reset timer in the inferior (timeout)
> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: print seconds_left (timeout)
> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: detach (timeout)
> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: set breakpoint always-inserted off (timeout)
> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: delete all breakpoints, watchpoints, tracepoints, and catchpoints in delete_breakpoints (timeout)
> ERROR: breakpoints not deleted
> 
> The iteration number is random, and all tests in the subsequent iterations
> fail too, because GDB is stuck in the attach command at the beginning of
> the iteration.
> 
> The solution is to make linux_proc_attach_tgid_threads () remember when it
> has already processed a given LWP and skip it in the subsequent iterations.
> 
> PR testsuite/31312
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31312
> ---
>  gdb/nat/linux-osdata.c | 22 ++++++++++++++++++++++
>  gdb/nat/linux-osdata.h |  4 ++++
>  gdb/nat/linux-procfs.c | 19 +++++++++++++++++++
>  3 files changed, 45 insertions(+)
> 
> diff --git a/gdb/nat/linux-osdata.c b/gdb/nat/linux-osdata.c
> index c254f2e4f05b..998279377433 100644
> --- a/gdb/nat/linux-osdata.c
> +++ b/gdb/nat/linux-osdata.c
> @@ -112,6 +112,28 @@ linux_common_core_of_thread (ptid_t ptid)
>    return core;
>  }
>  
> +/* See linux-osdata.h.  */
> +
> +std::optional<ULONGEST>
> +linux_get_starttime (ptid_t ptid)

Ditto re. moving this to linux-procfs.  This has nothing to do with "info osdata".

> index a82fb08b998e..1cdc687aa9cf 100644
> --- a/gdb/nat/linux-osdata.h
> +++ b/gdb/nat/linux-osdata.h
> @@ -27,4 +27,8 @@ extern int linux_common_core_of_thread (ptid_t ptid);
>  extern LONGEST linux_common_xfer_osdata (const char *annex, gdb_byte *readbuf,
>  					 ULONGEST offset, ULONGEST len);
>  
> +/* Get the start time of thread PTID.  */
> +
> +extern std::optional<ULONGEST> linux_get_starttime (ptid_t ptid);
> +
>  #endif /* NAT_LINUX_OSDATA_H */
> diff --git a/gdb/nat/linux-procfs.c b/gdb/nat/linux-procfs.c
> index b17e3120792e..b01bf36c0b53 100644
> --- a/gdb/nat/linux-procfs.c
> +++ b/gdb/nat/linux-procfs.c
> @@ -17,10 +17,13 @@
>     along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
>  
>  #include "gdbsupport/common-defs.h"
> +#include "linux-osdata.h"
>  #include "linux-procfs.h"

linux-procfs.h is the main header of this source file, so it should be first.
But then again, I think this shouldn't be consuming things from linux-osdata, it
should be the other way around.

>  #include "gdbsupport/filestuff.h"
>  #include <dirent.h>
>  #include <sys/stat.h>
> +#include <set>
> +#include <utility>
>  
>  /* Return the TGID of LWPID from /proc/pid/status.  Returns -1 if not
>     found.  */
> @@ -290,6 +293,10 @@ linux_proc_attach_tgid_threads (pid_t pid,
>        return;
>      }
>  
> +  /* Keeps track of the LWPs we have already visited in /proc,
> +     identified by their PID and starttime to detect PID reuse.  */
> +  std::set<std::pair<unsigned long,ULONGEST>> visited_lwps;

Missing space before ULONGEST.

AFAICT, you don't rely on order, so this could be an unordered_set?


> +
>    /* Scan the task list for existing threads.  While we go through the
>       threads, new threads may be spawned.  Cycle through the list of
>       threads until we have done two iterations without finding new
> @@ -308,6 +315,18 @@ linux_proc_attach_tgid_threads (pid_t pid,
>  	  if (lwp != 0)
>  	    {
>  	      ptid_t ptid = ptid_t (pid, lwp);
> +	      std::optional<ULONGEST> starttime = linux_get_starttime (ptid);
> +
> +	      if (starttime.has_value ())
> +		{
> +		  std::pair<unsigned long,ULONGEST> key (lwp, *starttime);

Space before ULONGEST.

> +
> +		  /* If we already visited this LWP, skip it this time.  */
> +		  if (visited_lwps.find (key) != visited_lwps.cend ())
> +		    continue;
> +
> +		  visited_lwps.insert (key);
> +		}
>  
>  	      if (attach_lwp (ptid))
>  		new_threads_found = 1;

next prev parent reply	other threads:[~2024-04-17 16:28 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-21 23:11 [RFC PATCH 0/3] " Thiago Jung Bauermann
2024-03-21 23:11 ` [RFC PATCH 1/3] gdb/nat: Use procfs(5) indexes in linux_common_core_of_thread Thiago Jung Bauermann
2024-03-22 17:33   ` Luis Machado
2024-04-17 15:55     ` Pedro Alves
2024-04-20  5:15       ` Thiago Jung Bauermann
2024-03-21 23:11 ` [RFC PATCH 2/3] gdb/nat: Factor linux_find_proc_stat_field out of linux_common_core_of_thread Thiago Jung Bauermann
2024-03-22 16:12   ` Luis Machado
2024-04-17 16:06   ` Pedro Alves
2024-04-20  5:16     ` Thiago Jung Bauermann
2024-03-21 23:11 ` [RFC PATCH 3/3] gdb/nat/linux: Fix attaching to process when it has zombie threads Thiago Jung Bauermann
2024-03-22 16:19   ` Luis Machado
2024-03-22 16:52   ` Pedro Alves
2024-04-16  4:48     ` Thiago Jung Bauermann
2024-04-17 15:32       ` Pedro Alves
2024-04-20  5:00         ` Thiago Jung Bauermann
2024-04-26 15:35           ` Pedro Alves
2024-04-17 16:28   ` Pedro Alves [this message]
2024-04-20  5:28     ` Thiago Jung Bauermann
2024-03-22 10:17 ` [RFC PATCH 0/3] " Christophe Lyon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=29215aa0-8387-4dee-8b8d-3cbf64e6abe3@palves.net \
    --to=pedro@palves.net \
    --cc=gdb-patches@sourceware.org \
    --cc=thiago.bauermann@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox