On 24-03-2020 16:35, Simon Marchi wrote: > Hi Tom, > > The test fails for me more often than not. I've attached a gdb.log showing one such > failure. There seems to be a problem matching the output of the last "continue". > Hi Simon, I've managed to reproduce that, by running the test-case in parallel with stress -c 5. > This is what I see when I reproduce the case by hand: > > (gdb) c > Continuing. > Couldn't get registers: No such process. > Couldn't get registers: No such process. > (gdb) [Thread 0x7ffff7d99700 (LWP 514079) exited] > > Program terminated with signal SIGKILL, Killed. > The program no longer exists. > I've modified the test-case to check for the amount of "Couldn't get registers: No such process." between the continue and the following gdb prompt, and handle that appropriately. Hopefully this will fix the FAILs you're seeing. > I didn't really understand why we saw a prompt coming back before the other messages, > so I looked into it a bit and this is what I think happens: > > 1. While we are stopped at the prompt, the linux-nat event handler is unregistered from the event loop > 2. Inferior gets SIGKILL'ed, so GDB gets SIGCHLD'ed, that posts an event to the event pipe, but since it's > not registered in the event loop, nothing happens > 3. User does continue, the linux-nat event handler gets registered with the event loop. Then, the continue > fails (No such process), which brings us back at the prompt. However, the linux-nat event handler stays > registered with the event loop. > 4. When we come back to the event loop, we process the event for the SIGKILL, which makes GDB print the > thread exit message and "Program terminated" message. > > Normally, after the "continue" fails, I don't think we would want to leave the linux-nat > handler registered with the event loop: if it was not registered before, why should it be > registered after? However, if it wasn't left there, we wouldn't see the messages saying > the program has terminated, so that wouldn't be good either. > > Maybe there's a better way to handle it? > Thanks for the investigation. Unfortunately I'm not able to comment on this. > However, I still think it would be a good idea to merge a patch like yours, it's already a > step forward (especially since it fixes a regression). > >> diff --git a/gdb/testsuite/lib/gdb.exp b/gdb/testsuite/lib/gdb.exp >> index 81518b9646..745694df2d 100644 >> --- a/gdb/testsuite/lib/gdb.exp >> +++ b/gdb/testsuite/lib/gdb.exp >> @@ -571,7 +571,7 @@ proc runto { function args } { >> } >> return 1 >> } >> - -re "Breakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" { >> + -re "\[Bb\]reakpoint \[0-9\]*, \[0-9xa-f\]* in .*$gdb_prompt $" { >> if { $print_pass } { >> pass $test_name >> } >> diff --git a/gdb/thread.c b/gdb/thread.c >> index 54b59e2244..9876ca3c76 100644 >> --- a/gdb/thread.c >> +++ b/gdb/thread.c >> @@ -1444,6 +1444,8 @@ scoped_restore_current_thread::restore () >> >> scoped_restore_current_thread::~scoped_restore_current_thread () >> { >> + if (m_inf == NULL) >> + return; >> if (!m_dont_restore) >> { >> try >> @@ -1488,7 +1490,17 @@ scoped_restore_current_thread::scoped_restore_current_thread () >> else >> frame = NULL; >> >> - m_selected_frame_id = get_frame_id (frame); >> + try >> + { >> + m_selected_frame_id = get_frame_id (frame); >> + } >> + catch (const gdb_exception &ex) >> + { >> + m_inf = NULL; >> + m_selected_frame_id = null_frame_id; >> + m_selected_frame_level = -1; >> + return; >> + } > > The indentation is a bit off here. > > Instead of clearing everything, I think we should just set m_selected_frame_id to > null_frame_id and m_selected_frame_level to -1. m_inf and m_thread can still be > set. This way, the right inferior will be restored, at least, I think this is > desirable. scoped_restore_current_thread::restore handles well the case where the > thread to restore has exited. The thread_info object is refcounted for this exact > use case, where the thread would get deleted while > > In other words: > > try > { > m_selected_frame_id = get_frame_id (frame); > m_selected_frame_level = frame_relative_level (frame); > } > catch (const gdb_exception &ex) > { > m_selected_frame_id = null_frame_id; > m_selected_frame_level = -1; > } Yep, that works. Here's the updated patch. Thanks, - Tom