* [PATCH] [gdb/tdep] Don't call WaitForSingleObject with INFINITE arg
@ 2025-06-05 15:03 Tom de Vries
2025-06-05 16:15 ` Pedro Alves
0 siblings, 1 reply; 12+ messages in thread
From: Tom de Vries @ 2025-06-05 15:03 UTC (permalink / raw)
To: gdb-patches
I decided to try to build and test gdb on Windows.
I found a page on the wiki ( https://sourceware.org/gdb/wiki/BuildingOnWindows
) suggesting three ways of building gdb:
- MinGW,
- MinGW on Cygwin, and
- Cygwin.
I picked Cygwin, because I've used it before (though not recently).
I managed to install Cygwin and sufficient packages to build gdb and start the
testsuite.
However, testsuite progress ground to a halt at gdb.base/branch-to-self.exp.
[ AFAICT, similar problems reported here [1]. ]
I managed to reproduce this hang by running just the test-case.
I attempted to kill the hanging processes by:
- first killing the inferior process, using the cygwin "kill -9" command, and
- then killing the gdb process, likewise.
But the gdb process remained, and I had to point-and-click my way through task
manager to actually kill the gdb process.
I investigated this by attaching to the hanging gdb process. Looking at the
main thread, I saw it was stopped in a call to WaitForSingleObject, with
the dwMilliseconds parameter set to INFINITE.
The backtrace in more detail:
...
(gdb) bt
#0 0x00007fff196fc044 in ntdll!ZwWaitForSingleObject () from
/cygdrive/c/windows/SYSTEM32/ntdll.dll
#1 0x00007fff16bbcdcf in WaitForSingleObjectEx () from
/cygdrive/c/windows/System32/KERNELBASE.dll
#2 0x0000000100998065 in wait_for_single (handle=0x1b8, howlong=4294967295) at
gdb/windows-nat.c:435
#3 0x0000000100999aa7 in
windows_nat_target::do_synchronously(gdb::function_view<bool ()>)
(this=this@entry=0xa001c6fe0, func=...) at gdb/windows-nat.c:487
#4 0x000000010099a7fb in windows_nat_target::wait_for_debug_event_main_thread
(event=<optimized out>, this=0xa001c6fe0)
at gdb/../gdbsupport/function-view.h:296
#5 windows_nat_target::kill (this=0xa001c6fe0) at gdb/windows-nat.c:2917
#6 0x00000001008f2f86 in target_kill () at gdb/target.c:901
#7 0x000000010091fc46 in kill_or_detach (from_tty=0, inf=0xa000577d0)
at gdb/top.c:1658
#8 quit_force (exit_arg=<optimized out>, from_tty=from_tty@entry=0)
at gdb/top.c:1759
#9 0x00000001004f9ea8 in quit_command (args=args@entry=0x0,
from_tty=from_tty@entry=0) at gdb/cli/cli-cmds.c:483
#10 0x000000010091c6d0 in quit_cover () at gdb/top.c:295
#11 0x00000001005e3d8a in async_disconnect (arg=<optimized out>)
at gdb/event-top.c:1496
#12 0x0000000100499c45 in invoke_async_signal_handlers ()
at gdb/async-event.c:233
#13 0x0000000100eb23d6 in gdb_do_one_event (mstimeout=mstimeout@entry=-1)
at gdbsupport/event-loop.cc:198
#14 0x00000001006df94a in interp::do_one_event (mstimeout=-1,
this=<optimized out>) at gdb/interps.h:87
#15 start_event_loop () at gdb/main.c:402
#16 captured_command_loop () at gdb/main.c:466
#17 0x00000001006e2865 in captured_main (data=0x7ffffcba0) at gdb/main.c:1346
#18 gdb_main (args=args@entry=0x7ffffcc10) at gdb/main.c:1365
#19 0x0000000100f98c70 in main (argc=10, argv=0xa000129f0) at gdb/gdb.c:38
...
In the docs [2], I read that using an INFINITE argument to WaitForSingleObject
might cause a system deadlock.
This prompted me to try this simple change in wait_for_single:
...
while (true)
{
- DWORD r = WaitForSingleObject (handle, howlong);
+ DWORD r = WaitForSingleObject (handle,
+ howlong == INFINITE ? 100 : howlong);
+ if (howlong == INFINITE && r == WAIT_TIMEOUT)
+ continue;
...
with the timeout of 0.1 second estimated to be:
- small enough for gdb to feel reactive, and
- big enough not to consume too much cpu cycles with looping.
And indeed, the test-case, while still failing, now finishes in ~50 seconds.
While there may be an underlying bug that triggers this behaviour, the failure
mode is so severe that I consider it a bug in itself.
Fix this by avoiding calling WaitForSingleObject with INFINITE argument.
Tested on x86_64-cygwin, by running the testsuite past the test-case.
PR tdep/32894
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32894
[1] https://sourceware.org/pipermail/gdb-patches/2025-May/217949.html
[2] https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitforsingleobject
---
gdb/windows-nat.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/gdb/windows-nat.c b/gdb/windows-nat.c
index 461d9eb9af5..b1c49c9d891 100644
--- a/gdb/windows-nat.c
+++ b/gdb/windows-nat.c
@@ -432,7 +432,10 @@ wait_for_single (HANDLE handle, DWORD howlong)
{
while (true)
{
- DWORD r = WaitForSingleObject (handle, howlong);
+ DWORD milliseconds = howlong == INFINITE ? 100 : howlong;
+ DWORD r = WaitForSingleObject (handle, milliseconds);
+ if (howlong == INFINITE && r == WAIT_TIMEOUT)
+ continue;
if (r == WAIT_OBJECT_0)
return;
if (r == WAIT_FAILED)
base-commit: 96662aacaa0cf0a9766d874fc9577a5da417dbcc
--
2.43.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [gdb/tdep] Don't call WaitForSingleObject with INFINITE arg
2025-06-05 15:03 [PATCH] [gdb/tdep] Don't call WaitForSingleObject with INFINITE arg Tom de Vries
@ 2025-06-05 16:15 ` Pedro Alves
2025-06-05 16:25 ` Pedro Alves
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Pedro Alves @ 2025-06-05 16:15 UTC (permalink / raw)
To: Tom de Vries, gdb-patches
On 2025-06-05 16:03, Tom de Vries wrote:
> I decided to try to build and test gdb on Windows.
>
> I found a page on the wiki ( https://sourceware.org/gdb/wiki/BuildingOnWindows
> ) suggesting three ways of building gdb:
> - MinGW,
> - MinGW on Cygwin, and
> - Cygwin.
>
> I picked Cygwin, because I've used it before (though not recently).
>
> I managed to install Cygwin and sufficient packages to build gdb and start the
> testsuite.
Cool!
FYI, I cross build from Linux using this: https://github.com/palves/cygwin-cross
and then test from Cygwin, by:
- smb mounting the sources from linux
- configuring only the testsuite, not the whole gdb (src/gdb/testsuite/configure)
- copying over gdb.exe to the Windows box (via rsync)
- pointing at the copied gdb with make check GDB=.../gdb.exe
I do this because building on Linux is so much faster.
>
> However, testsuite progress ground to a halt at gdb.base/branch-to-self.exp.
> [ AFAICT, similar problems reported here [1]. ]
The testsuite fails to kill gdb all the time for me, it may well be the same.
I'll give it a try after you've merged it.
Ecstatic that you found this!
>
> I managed to reproduce this hang by running just the test-case.
>
> I attempted to kill the hanging processes by:
> - first killing the inferior process, using the cygwin "kill -9" command, and
> - then killing the gdb process, likewise.
>
> But the gdb process remained, and I had to point-and-click my way through task
> manager to actually kill the gdb process.
>
> I investigated this by attaching to the hanging gdb process. Looking at the
> main thread, I saw it was stopped in a call to WaitForSingleObject, with
> the dwMilliseconds parameter set to INFINITE.
>
> The backtrace in more detail:
> ...
> (gdb) bt
> #0 0x00007fff196fc044 in ntdll!ZwWaitForSingleObject () from
> /cygdrive/c/windows/SYSTEM32/ntdll.dll
> #1 0x00007fff16bbcdcf in WaitForSingleObjectEx () from
> /cygdrive/c/windows/System32/KERNELBASE.dll
> #2 0x0000000100998065 in wait_for_single (handle=0x1b8, howlong=4294967295) at
> gdb/windows-nat.c:435
> #3 0x0000000100999aa7 in
> windows_nat_target::do_synchronously(gdb::function_view<bool ()>)
> (this=this@entry=0xa001c6fe0, func=...) at gdb/windows-nat.c:487
> #4 0x000000010099a7fb in windows_nat_target::wait_for_debug_event_main_thread
> (event=<optimized out>, this=0xa001c6fe0)
> at gdb/../gdbsupport/function-view.h:296
> #5 windows_nat_target::kill (this=0xa001c6fe0) at gdb/windows-nat.c:2917
> #6 0x00000001008f2f86 in target_kill () at gdb/target.c:901
> #7 0x000000010091fc46 in kill_or_detach (from_tty=0, inf=0xa000577d0)
> at gdb/top.c:1658
> #8 quit_force (exit_arg=<optimized out>, from_tty=from_tty@entry=0)
> at gdb/top.c:1759
> #9 0x00000001004f9ea8 in quit_command (args=args@entry=0x0,
> from_tty=from_tty@entry=0) at gdb/cli/cli-cmds.c:483
> #10 0x000000010091c6d0 in quit_cover () at gdb/top.c:295
> #11 0x00000001005e3d8a in async_disconnect (arg=<optimized out>)
> at gdb/event-top.c:1496
> #12 0x0000000100499c45 in invoke_async_signal_handlers ()
> at gdb/async-event.c:233
> #13 0x0000000100eb23d6 in gdb_do_one_event (mstimeout=mstimeout@entry=-1)
> at gdbsupport/event-loop.cc:198
> #14 0x00000001006df94a in interp::do_one_event (mstimeout=-1,
> this=<optimized out>) at gdb/interps.h:87
> #15 start_event_loop () at gdb/main.c:402
> #16 captured_command_loop () at gdb/main.c:466
> #17 0x00000001006e2865 in captured_main (data=0x7ffffcba0) at gdb/main.c:1346
> #18 gdb_main (args=args@entry=0x7ffffcc10) at gdb/main.c:1365
> #19 0x0000000100f98c70 in main (argc=10, argv=0xa000129f0) at gdb/gdb.c:38
> ...
>
> In the docs [2], I read that using an INFINITE argument to WaitForSingleObject
> might cause a system deadlock.
Wow, unbelievable. Thanks a lot for going all the way and finding this.
>
> This prompted me to try this simple change in wait_for_single:
> ...
> while (true)
> {
> - DWORD r = WaitForSingleObject (handle, howlong);
> + DWORD r = WaitForSingleObject (handle,
> + howlong == INFINITE ? 100 : howlong);
> + if (howlong == INFINITE && r == WAIT_TIMEOUT)
> + continue;
> ...
> with the timeout of 0.1 second estimated to be:
> - small enough for gdb to feel reactive, and
> - big enough not to consume too much cpu cycles with looping.
>
> And indeed, the test-case, while still failing, now finishes in ~50 seconds.
>
> While there may be an underlying bug that triggers this behaviour, the failure
> mode is so severe that I consider it a bug in itself.
>
Yeah. The documentation talks about the the case of the code spawning windows, which
I don't think we do. But still. I've seen so many weird things on Windows...
It's just one more.
> Fix this by avoiding calling WaitForSingleObject with INFINITE argument.
Approved-By: Pedro Alves <pedro@palves.net>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [gdb/tdep] Don't call WaitForSingleObject with INFINITE arg
2025-06-05 16:15 ` Pedro Alves
@ 2025-06-05 16:25 ` Pedro Alves
2025-06-06 6:27 ` Tom de Vries
2025-06-06 7:12 ` Tom de Vries
2025-06-06 13:11 ` Pedro Alves
2 siblings, 1 reply; 12+ messages in thread
From: Pedro Alves @ 2025-06-05 16:25 UTC (permalink / raw)
To: Tom de Vries, gdb-patches
On 2025-06-05 17:15, Pedro Alves wrote:
> On 2025-06-05 16:03, Tom de Vries wrote:
>> Fix this by avoiding calling WaitForSingleObject with INFINITE argument.
>
> Approved-By: Pedro Alves <pedro@palves.net>
>
Actually, can you add a comment to the code, please, explaining why we're avoiding INFINITE?
LGTM with that. :-)
On 2025-06-05 16:03, Tom de Vries wrote:
> while (true)
> {
> - DWORD r = WaitForSingleObject (handle, howlong);
> + DWORD milliseconds = howlong == INFINITE ? 100 : howlong;
> + DWORD r = WaitForSingleObject (handle, milliseconds);
> + if (howlong == INFINITE && r == WAIT_TIMEOUT)
> + continue;
> if (r == WAIT_OBJECT_0)
> return;
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [gdb/tdep] Don't call WaitForSingleObject with INFINITE arg
2025-06-05 16:25 ` Pedro Alves
@ 2025-06-06 6:27 ` Tom de Vries
0 siblings, 0 replies; 12+ messages in thread
From: Tom de Vries @ 2025-06-06 6:27 UTC (permalink / raw)
To: Pedro Alves, gdb-patches
On 6/5/25 18:25, Pedro Alves wrote:
> On 2025-06-05 17:15, Pedro Alves wrote:
>> On 2025-06-05 16:03, Tom de Vries wrote:
>
>>> Fix this by avoiding calling WaitForSingleObject with INFINITE argument.
>>
>> Approved-By: Pedro Alves <pedro@palves.net>
>>
>
> Actually, can you add a comment to the code, please, explaining why we're avoiding INFINITE?
>
> LGTM with that. :-)
>
Hi Pedro,
thanks for the review.
I've added a comment, and pushed as v2 (
https://sourceware.org/pipermail/gdb-patches/2025-June/218402.html ).
Thanks,
- Tom
> On 2025-06-05 16:03, Tom de Vries wrote:
>> while (true)
>> {
>> - DWORD r = WaitForSingleObject (handle, howlong);
>> + DWORD milliseconds = howlong == INFINITE ? 100 : howlong;
>> + DWORD r = WaitForSingleObject (handle, milliseconds);
>> + if (howlong == INFINITE && r == WAIT_TIMEOUT)
>> + continue;
>> if (r == WAIT_OBJECT_0)
>> return;
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [gdb/tdep] Don't call WaitForSingleObject with INFINITE arg
2025-06-05 16:15 ` Pedro Alves
2025-06-05 16:25 ` Pedro Alves
@ 2025-06-06 7:12 ` Tom de Vries
2025-06-06 15:14 ` Pedro Alves
2025-06-06 13:11 ` Pedro Alves
2 siblings, 1 reply; 12+ messages in thread
From: Tom de Vries @ 2025-06-06 7:12 UTC (permalink / raw)
To: Pedro Alves, gdb-patches
On 6/5/25 18:15, Pedro Alves wrote:
> On 2025-06-05 16:03, Tom de Vries wrote:
>> I decided to try to build and test gdb on Windows.
>>
>> I found a page on the wiki ( https://sourceware.org/gdb/wiki/BuildingOnWindows
>> ) suggesting three ways of building gdb:
>> - MinGW,
>> - MinGW on Cygwin, and
>> - Cygwin.
>>
>> I picked Cygwin, because I've used it before (though not recently).
>>
>> I managed to install Cygwin and sufficient packages to build gdb and start the
>> testsuite.
>
> Cool!
>
> FYI, I cross build from Linux using this: https://github.com/palves/cygwin-cross
> and then test from Cygwin, by:
>
> - smb mounting the sources from linux
> - configuring only the testsuite, not the whole gdb (src/gdb/testsuite/configure)
> - copying over gdb.exe to the Windows box (via rsync)
> - pointing at the copied gdb with make check GDB=.../gdb.exe
>
> I do this because building on Linux is so much faster.
>
Hi Pedro,
yeah, things are slow indeed, and thanks for sharing this setup. For
now, I'll stick with the basic setup, and leave optimizations for later.
>> However, testsuite progress ground to a halt at gdb.base/branch-to-self.exp.
>> [ AFAICT, similar problems reported here [1]. ]
>
> The testsuite fails to kill gdb all the time for me, it may well be the same.
> I'll give it a try after you've merged it.
>
> Ecstatic that you found this!
>
Yeah, I hope that this helps :)
FWIW, I've ran the testsuite again, and got stuck this time at
gdb.base/catch-fork-kill.exp. In this case, the problem didn't seem to
be gdb, but lingering test execs. After killing those manually using
"kill -9", the test-case finished.
The hang can be fixed by adding a missing "require supports_catch_syscall".
I was wondering, I saw you mention "I've got another series of patches
to improve such tests and skip others, and that's what I've been testing
with". Do you have similar fixes in that series? If so, can you submit
them, or share them in a branch?
>> I managed to reproduce this hang by running just the test-case.
>>
>> I attempted to kill the hanging processes by:
>> - first killing the inferior process, using the cygwin "kill -9" command, and
>> - then killing the gdb process, likewise.
>>
>> But the gdb process remained, and I had to point-and-click my way through task
>> manager to actually kill the gdb process.
>>
>> I investigated this by attaching to the hanging gdb process. Looking at the
>> main thread, I saw it was stopped in a call to WaitForSingleObject, with
>> the dwMilliseconds parameter set to INFINITE.
>>
>> The backtrace in more detail:
>> ...
>> (gdb) bt
>> #0 0x00007fff196fc044 in ntdll!ZwWaitForSingleObject () from
>> /cygdrive/c/windows/SYSTEM32/ntdll.dll
>> #1 0x00007fff16bbcdcf in WaitForSingleObjectEx () from
>> /cygdrive/c/windows/System32/KERNELBASE.dll
>> #2 0x0000000100998065 in wait_for_single (handle=0x1b8, howlong=4294967295) at
>> gdb/windows-nat.c:435
>> #3 0x0000000100999aa7 in
>> windows_nat_target::do_synchronously(gdb::function_view<bool ()>)
>> (this=this@entry=0xa001c6fe0, func=...) at gdb/windows-nat.c:487
>> #4 0x000000010099a7fb in windows_nat_target::wait_for_debug_event_main_thread
>> (event=<optimized out>, this=0xa001c6fe0)
>> at gdb/../gdbsupport/function-view.h:296
>> #5 windows_nat_target::kill (this=0xa001c6fe0) at gdb/windows-nat.c:2917
>> #6 0x00000001008f2f86 in target_kill () at gdb/target.c:901
>> #7 0x000000010091fc46 in kill_or_detach (from_tty=0, inf=0xa000577d0)
>> at gdb/top.c:1658
>> #8 quit_force (exit_arg=<optimized out>, from_tty=from_tty@entry=0)
>> at gdb/top.c:1759
>> #9 0x00000001004f9ea8 in quit_command (args=args@entry=0x0,
>> from_tty=from_tty@entry=0) at gdb/cli/cli-cmds.c:483
>> #10 0x000000010091c6d0 in quit_cover () at gdb/top.c:295
>> #11 0x00000001005e3d8a in async_disconnect (arg=<optimized out>)
>> at gdb/event-top.c:1496
>> #12 0x0000000100499c45 in invoke_async_signal_handlers ()
>> at gdb/async-event.c:233
>> #13 0x0000000100eb23d6 in gdb_do_one_event (mstimeout=mstimeout@entry=-1)
>> at gdbsupport/event-loop.cc:198
>> #14 0x00000001006df94a in interp::do_one_event (mstimeout=-1,
>> this=<optimized out>) at gdb/interps.h:87
>> #15 start_event_loop () at gdb/main.c:402
>> #16 captured_command_loop () at gdb/main.c:466
>> #17 0x00000001006e2865 in captured_main (data=0x7ffffcba0) at gdb/main.c:1346
>> #18 gdb_main (args=args@entry=0x7ffffcc10) at gdb/main.c:1365
>> #19 0x0000000100f98c70 in main (argc=10, argv=0xa000129f0) at gdb/gdb.c:38
>> ...
>>
>> In the docs [2], I read that using an INFINITE argument to WaitForSingleObject
>> might cause a system deadlock.
>
> Wow, unbelievable. Thanks a lot for going all the way and finding this.
>
>>
>> This prompted me to try this simple change in wait_for_single:
>> ...
>> while (true)
>> {
>> - DWORD r = WaitForSingleObject (handle, howlong);
>> + DWORD r = WaitForSingleObject (handle,
>> + howlong == INFINITE ? 100 : howlong);
>> + if (howlong == INFINITE && r == WAIT_TIMEOUT)
>> + continue;
>> ...
>> with the timeout of 0.1 second estimated to be:
>> - small enough for gdb to feel reactive, and
>> - big enough not to consume too much cpu cycles with looping.
>>
>> And indeed, the test-case, while still failing, now finishes in ~50 seconds.
>>
>> While there may be an underlying bug that triggers this behaviour, the failure
>> mode is so severe that I consider it a bug in itself.
>>
>
> Yeah. The documentation talks about the the case of the code spawning windows, which
> I don't think we do. But still. I've seen so many weird things on Windows...
> It's just one more.
>
>> Fix this by avoiding calling WaitForSingleObject with INFINITE argument.
>
> Approved-By: Pedro Alves <pedro@palves.net>
>
Thanks for the review.
- Tom
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [gdb/tdep] Don't call WaitForSingleObject with INFINITE arg
2025-06-05 16:15 ` Pedro Alves
2025-06-05 16:25 ` Pedro Alves
2025-06-06 7:12 ` Tom de Vries
@ 2025-06-06 13:11 ` Pedro Alves
2025-06-10 7:13 ` Tom de Vries
2 siblings, 1 reply; 12+ messages in thread
From: Pedro Alves @ 2025-06-06 13:11 UTC (permalink / raw)
To: Tom de Vries, gdb-patches
On 2025-06-05 17:15, Pedro Alves wrote:
>> However, testsuite progress ground to a halt at
>> gdb.base/branch-to-self.exp. [ AFAICT, similar problems reported
>> here [1]. ]
> The testsuite fails to kill gdb all the time for me, it may well be
> the same. I'll give it a try after you've merged it.
>
I tested this now, and, unfortunately, it makes no difference for me.
With upstream from before your patch, and I do see a gdb.base/branch-to-self.exp hang.
However, after your patch, I still see the same hang.
If I attach another GDB to the hung GDB, I see several other GDB threads stopped in
WaitForMultipleObjects with an INFINITE argument, in Cygwin code, so I'm puzzled on
how changing that particular call made a difference for you. :-/ Wonder if it's just
a happy coincidence of affecting the scheduling.
I've long suspected that this is actually some deadlock bug in Cygwin somewhere around
closing stdin/stdout, and I still suspect so. But I can't discard it being a GDB bug.
I didn't use to have this issue maybe a year ago, then I stopped working on Windows for a
few months, and when I got back at it, I started seeing the issue. It's like some Cygwin
update, or Windows update caused the issue.
The hang is like other hangs I've observed before, it's around closing GDB to restart it
for more tests, and the patch at:
https://sourceware.org/pipermail/gdb-patches/2025-May/217949.html
still helps with it for me. With that one on top of yours, and with
gdb.base/branch-to-self.exp, I get:
WARNING: closing gdb failed with: child process exited abnormally
(the same as what I get without your patch) and the testcase moves on and
finishes properly (though a couple tests fail with timeout, as they do for you.)
If I look at the task manager while doing a full testsuite run, I see a bunch of leftover
gdb.exe processes and their children still running, because dejagnu fails to properly
close gdb.exe.
Pedro Alves
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [gdb/tdep] Don't call WaitForSingleObject with INFINITE arg
2025-06-06 7:12 ` Tom de Vries
@ 2025-06-06 15:14 ` Pedro Alves
2025-06-06 15:27 ` Pedro Alves
0 siblings, 1 reply; 12+ messages in thread
From: Pedro Alves @ 2025-06-06 15:14 UTC (permalink / raw)
To: Tom de Vries, gdb-patches
On 2025-06-06 08:12, Tom de Vries wrote:
> FWIW, I've ran the testsuite again, and got stuck this time at gdb.base/catch-fork-kill.exp. In this case, the problem didn't seem to be gdb, but lingering test execs. After killing those manually using "kill -9", the test-case finished.
>
> The hang can be fixed by adding a missing "require supports_catch_syscall".
>
> I was wondering, I saw you mention "I've got another series of patches to improve such tests and skip others, and that's what I've been testing with". Do you have similar fixes in that series? If so, can you submit them, or share them in a branch?
>
For fork-related tests, I added allow_fork_tests to be used with require, and
sprinkled that throughout. There is no way currently for the debugger to be notified
of forks and execs. There is no protocol between debugger and Cygwin for that.
So all fork tests fail.
So for gdb.base/catch-fork-kill.exp in particular, that's what I did:
diff --git a/gdb/testsuite/gdb.base/catch-fork-kill.exp b/gdb/testsuite/gdb.base/catch-fork-kill.exp
index 0fd853b9085..224a8dfec89 100644
--- a/gdb/testsuite/gdb.base/catch-fork-kill.exp
+++ b/gdb/testsuite/gdb.base/catch-fork-kill.exp
@@ -32,6 +32,8 @@
standard_testfile
+require allow_fork_tests
And then, the Windows backend doesn't currently support multi-process at all,
so all tests exercising that fail, sometimes with cascading timeouts. So I added
allow_multi_inferior_tests for that.
I did initially try to fix the testcases individually initially, one by one, to
gracefully handle fork-related (and multi-inferior-related) things failing and
bailing out early, but it was turning into a time sink for not much gain,
and I then decided to apply the "require" big hammer to move on.
Here's the subject lines of the testsuite-related patches (and gdb patches that the
tests depend on) that I'd like to submit, which mention the testcase names.
Some are WIP, and some are hacks that probably should be done differently.
I'll need a bit to clean them up somewhat, add commit logs, etc. and
either post them to the list or push them to a branch. I should probably do this
piecemeal, a patch at a time.
# gdb.multi/attach-no-multi-process.exp: Detect no remote non-stop
# gdb testsuite: Introduce allow_fork_tests and use it throughout
# gdb testsuite: Introduce allow_multi_inferior_tests and use it throughout
# Make gdb.threads/watchthreads.exp fetch registers of all threads
# Windows GDB: Avoid hang second attach (gdb.base/attach.exp)
# gdb.threads/schedlock.exp: Expected SIGTRAP too
# Windows all-stop, interrupt with "stopped" instead of SIGTRAP
# gdb.threads/schedlock.exp: handle "stopped" too (to squash)
# Extra "maint set show-debug-dregs" debug output (submit separately)
# Fix gdb.threads/multiple-successive-infcall.exp on Windows (depends on schedlock)
# Software watchpoints and threaded inferiors (submit independently)
# Skip gdb.base/libsegfault.exp on Cygwin (submit independently)
# gdb.threads/thread-execl, don't re-exec forever (submit independently)
# Support core dumping testcases with Cygwin's dumper (submit independently)
# Skip gdb.base/many-headers.exp on Cygwin (submit independently)
# Fix gdb.threads/multiple-successive-infcall.exp on Cygwin (hack)
# Adjust gdb.base/sigall.exp for Cygwin (submit independently)
# Adjust gdb.threads/schedlock.exp for Cygwin (submit independently)
# Adjust gdb.threads/multiple-step-overs.exp for Cygwin (submit independently)
# Adjust gdb.threads/step-over-trips-on-watchpoint.exp for Cygwin (submit independently)
# Adjust gdb.cp/cpexprs.exp for Cygwin (submit independently)
# remote target pid_to_string for gdb.server/ext-attach.exp (incomplete)
# Adjust gdb.base/bp-cond-failure.exp for Cygwin (submit separately)
# Adjust gdb.base/bp-permanent.exp for Cygwin (submit separately)
# Fix gdb.arch/amd64-watchpoint-downgrade.exp for Cygwin (submit separately)
# announce attach should be earlier (submit separately)
# Convert gdb.base/watchpoint-hw-attach.exp to spawn_wait_for_attach (submit separately)
Pedro Alves
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [gdb/tdep] Don't call WaitForSingleObject with INFINITE arg
2025-06-06 15:14 ` Pedro Alves
@ 2025-06-06 15:27 ` Pedro Alves
2025-06-09 17:52 ` Pedro Alves
0 siblings, 1 reply; 12+ messages in thread
From: Pedro Alves @ 2025-06-06 15:27 UTC (permalink / raw)
To: Tom de Vries, gdb-patches
On 2025-06-06 16:14, Pedro Alves wrote:
> On 2025-06-06 08:12, Tom de Vries wrote:
>
>> FWIW, I've ran the testsuite again, and got stuck this time at gdb.base/catch-fork-kill.exp. In this case, the problem didn't seem to be gdb, but lingering test execs. After killing those manually using "kill -9", the test-case finished.
>>
>> The hang can be fixed by adding a missing "require supports_catch_syscall".
>>
>> I was wondering, I saw you mention "I've got another series of patches to improve such tests and skip others, and that's what I've been testing with". Do you have similar fixes in that series? If so, can you submit them, or share them in a branch?
>>
>
> For fork-related tests, I added allow_fork_tests to be used with require, and
> sprinkled that throughout. There is no way currently for the debugger to be notified
> of forks and execs. There is no protocol between debugger and Cygwin for that.
> So all fork tests fail.
>
> So for gdb.base/catch-fork-kill.exp in particular, that's what I did:
>
> diff --git a/gdb/testsuite/gdb.base/catch-fork-kill.exp b/gdb/testsuite/gdb.base/catch-fork-kill.exp
> index 0fd853b9085..224a8dfec89 100644
> --- a/gdb/testsuite/gdb.base/catch-fork-kill.exp
> +++ b/gdb/testsuite/gdb.base/catch-fork-kill.exp
> @@ -32,6 +32,8 @@
>
> standard_testfile
>
> +require allow_fork_tests
>
>
OK, I've sent that patch now:
https://inbox.sourceware.org/gdb-patches/20250606152500.947489-1-pedro@palves.net/T/#u
Pedro Alves
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [gdb/tdep] Don't call WaitForSingleObject with INFINITE arg
2025-06-06 15:27 ` Pedro Alves
@ 2025-06-09 17:52 ` Pedro Alves
2025-06-10 7:13 ` Tom de Vries
0 siblings, 1 reply; 12+ messages in thread
From: Pedro Alves @ 2025-06-09 17:52 UTC (permalink / raw)
To: Tom de Vries, gdb-patches
Hi again,
On 2025-06-06 16:27, Pedro Alves wrote:
> OK, I've sent that patch now:
>
> https://inbox.sourceware.org/gdb-patches/20250606152500.947489-1-pedro@palves.net/T/#u
I think I've now sent all the testsuite patches that don't depend on the non-stop series,
or rather, the "set scheduler-locking on" support that comes with it.
Some of the posted patches have already been merged, and others have not.
I've pushed a new 'users/palves/windows-non-stop-v2-plus' branch to sourceware.org that contains
the whole non-stop v2 series rebased on today's master, plus all the testsuite patches that I've posted
that aren't merged yet, and the others that I've not posted yet. There are a few unposted GDB patches
in there too.
Pedro Alves
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [gdb/tdep] Don't call WaitForSingleObject with INFINITE arg
2025-06-06 13:11 ` Pedro Alves
@ 2025-06-10 7:13 ` Tom de Vries
2025-06-10 9:51 ` Pedro Alves
0 siblings, 1 reply; 12+ messages in thread
From: Tom de Vries @ 2025-06-10 7:13 UTC (permalink / raw)
To: Pedro Alves, gdb-patches
On 6/6/25 15:11, Pedro Alves wrote:
> On 2025-06-05 17:15, Pedro Alves wrote:
>>> However, testsuite progress ground to a halt at
>>> gdb.base/branch-to-self.exp. [ AFAICT, similar problems reported
>>> here [1]. ]
>> The testsuite fails to kill gdb all the time for me, it may well be
>> the same. I'll give it a try after you've merged it.
>>
>
> I tested this now, and, unfortunately, it makes no difference for me.
Hi Pedro,
that's a pity.
FWIW, I've rebuild gdb from scratch, this time with -O0, ran the
test-case 10 times, and it finished every time.
> With upstream from before your patch, and I do see a gdb.base/branch-to-self.exp hang.
>
> However, after your patch, I still see the same hang.
>
> If I attach another GDB to the hung GDB, I see several other GDB threads stopped in
> WaitForMultipleObjects with an INFINITE argument, in Cygwin code, so I'm puzzled on
> how changing that particular call made a difference for you. :-/ Wonder if it's just
> a happy coincidence of affecting the scheduling.
>
Could be, I'm not sure.
FWIW, I've now managed to run the testsuite until gdb.base.sigterm.exp,
and it's stuck there.
> I've long suspected that this is actually some deadlock bug in Cygwin somewhere around
> closing stdin/stdout, and I still suspect so. But I can't discard it being a GDB bug.
>
If cygwin is problematic, I'm happy to ignore it and use one of the
other setups.
I've tried msys2, but also there I run into trouble (
https://sourceware.org/bugzilla/show_bug.cgi?id=33072 ) and I get even
less far than on cygwin: starting gdb is a problem because it produces
some control characters that the testsuite doesn't expect, and I can't
debug it because the gdb executable is missing debug info.
> I didn't use to have this issue maybe a year ago, then I stopped working on Windows for a
> few months, and when I got back at it, I started seeing the issue. It's like some Cygwin
> update, or Windows update caused the issue.
I also wonder whether it's easy to introduce regressions in gdb by
testing mingw but not cygwin.
Anyway, my windows/cygwin setup is fresh. I bought a windows laptop
this spring with windows 11 home, and installed cygwin on it.
> The hang is like other hangs I've observed before, it's around closing GDB to restart it
> for more tests, and the patch at:
>
> https://sourceware.org/pipermail/gdb-patches/2025-May/217949.html
>
> still helps with it for me. With that one on top of yours, and with
> gdb.base/branch-to-self.exp, I get:
>
> WARNING: closing gdb failed with: child process exited abnormally
>
> (the same as what I get without your patch) and the testcase moves on and
> finishes properly (though a couple tests fail with timeout, as they do for you.)
>
I've not seen that failure mode sofar, but thanks for the fix, that
looks useful.
> If I look at the task manager while doing a full testsuite run, I see a bunch of leftover
> gdb.exe processes and their children still running, because dejagnu fails to properly
> close gdb.exe.
I also see other cases of processes left behind. I've filed this
(https://sourceware.org/bugzilla/show_bug.cgi?id=33061 ) for one of those.
I've also got a patch submitted here (
https://patchwork.sourceware.org/project/gdb/patch/20250217163619.8662-3-tdevries@suse.de/
) for another case.
Thanks,
- Tom
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [gdb/tdep] Don't call WaitForSingleObject with INFINITE arg
2025-06-09 17:52 ` Pedro Alves
@ 2025-06-10 7:13 ` Tom de Vries
0 siblings, 0 replies; 12+ messages in thread
From: Tom de Vries @ 2025-06-10 7:13 UTC (permalink / raw)
To: Pedro Alves, gdb-patches
On 6/9/25 19:52, Pedro Alves wrote:
> Hi again,
>
> On 2025-06-06 16:27, Pedro Alves wrote:
>
>> OK, I've sent that patch now:
>>
>> https://inbox.sourceware.org/gdb-patches/20250606152500.947489-1-pedro@palves.net/T/#u
>
> I think I've now sent all the testsuite patches that don't depend on the non-stop series,
> or rather, the "set scheduler-locking on" support that comes with it.
>
> Some of the posted patches have already been merged, and others have not.
>
> I've pushed a new 'users/palves/windows-non-stop-v2-plus' branch to sourceware.org that contains
> the whole non-stop v2 series rebased on today's master, plus all the testsuite patches that I've posted
> that aren't merged yet, and the others that I've not posted yet. There are a few unposted GDB patches
> in there too.
Thanks a lot.
I'll give that branch a try.
- Tom
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] [gdb/tdep] Don't call WaitForSingleObject with INFINITE arg
2025-06-10 7:13 ` Tom de Vries
@ 2025-06-10 9:51 ` Pedro Alves
0 siblings, 0 replies; 12+ messages in thread
From: Pedro Alves @ 2025-06-10 9:51 UTC (permalink / raw)
To: Tom de Vries, gdb-patches
On 2025-06-10 08:13, Tom de Vries wrote:
> On 6/6/25 15:11, Pedro Alves wrote:
>> I've long suspected that this is actually some deadlock bug in Cygwin somewhere around
>> closing stdin/stdout, and I still suspect so. But I can't discard it being a GDB bug.
>>
>
> If cygwin is problematic, I'm happy to ignore it and use one of the other setups.
>
> I've tried msys2, but also there I run into trouble ( https://sourceware.org/bugzilla/show_bug.cgi?id=33072 ) and I get even less far than on cygwin: starting gdb is a problem because it produces some control characters that the testsuite doesn't expect, and I can't debug it because the gdb executable is missing debug info.
That's because when you say "msys2", you're really saying that you tried to test MinGW GDB with MSYS2 DejaGnu.
Thing is, MSYS2 _is_ Cygwin. Or rather, it's a Cygwin fork with a few extra patches that make is easier
to bridge Cygwin programs and MinGW programs. It's kind of like if they were different Linux distros, but of Cygwin.
For my Windows non-stop work, it's important that I test Cygwin GDB, because the Windows native target has
special support for some extra Cygwin-only features, like Cygwin signals.
>
>> I didn't use to have this issue maybe a year ago, then I stopped working on Windows for a
>> few months, and when I got back at it, I started seeing the issue. It's like some Cygwin
>> update, or Windows update caused the issue.
>
> I also wonder whether it's easy to introduce regressions in gdb by testing mingw but not cygwin.
We need to test both. For the Windows target backend (windows-nat.c and friends), most of the code is
common between Cygwin and MinGW, but there are a few important differences. For example, the support
for Cygwin signals just doesn't exist on MinGW, because, well, it's a Cygwin thing. And then,
the common code outside windows-nat.c takes different paths (POSIX vs native Windows APIs) in
important areas, like Ctrl-C handling, terminal I/O and event loop, to name a few.
I'm very much interested in testing mingw GDB too, BTW. That is actually my end goal.
>
> Anyway, my windows/cygwin setup is fresh. I bought a windows laptop this spring with windows 11 home, and installed cygwin on it.
FWIW, I run Windows on a VM (on Linux/KVM/libvirt). Simon also recently installed Cygwin and didn't see
the same issue I see. I'll need to try recreating my setup from scratch, maybe this issue just goes away...
>
>> The hang is like other hangs I've observed before, it's around closing GDB to restart it
>> for more tests, and the patch at:
>>
>> https://sourceware.org/pipermail/gdb-patches/2025-May/217949.html
>>
>> still helps with it for me. With that one on top of yours, and with
>> gdb.base/branch-to-self.exp, I get:
>>
>> WARNING: closing gdb failed with: child process exited abnormally
>>
>> (the same as what I get without your patch) and the testcase moves on and
>> finishes properly (though a couple tests fail with timeout, as they do for you.)
>>
>
> I've not seen that failure mode sofar, but thanks for the fix, that looks useful.
>
>> If I look at the task manager while doing a full testsuite run, I see a bunch of leftover
>> gdb.exe processes and their children still running, because dejagnu fails to properly
>> close gdb.exe.
>
> I also see other cases of processes left behind. I've filed this (https://sourceware.org/bugzilla/show_bug.cgi?id=33061 ) for one of those.
>
> I've also got a patch submitted here ( https://patchwork.sourceware.org/project/gdb/patch/20250217163619.8662-3-tdevries@suse.de/ ) for another case.
>
Thanks, I hadn't seen it.
Pedro Alves
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-06-10 9:52 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-06-05 15:03 [PATCH] [gdb/tdep] Don't call WaitForSingleObject with INFINITE arg Tom de Vries
2025-06-05 16:15 ` Pedro Alves
2025-06-05 16:25 ` Pedro Alves
2025-06-06 6:27 ` Tom de Vries
2025-06-06 7:12 ` Tom de Vries
2025-06-06 15:14 ` Pedro Alves
2025-06-06 15:27 ` Pedro Alves
2025-06-09 17:52 ` Pedro Alves
2025-06-10 7:13 ` Tom de Vries
2025-06-06 13:11 ` Pedro Alves
2025-06-10 7:13 ` Tom de Vries
2025-06-10 9:51 ` Pedro Alves
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox