From: Joel Brobecker <brobecker@adacore.com>
To: gdb-patches@sourceware.org
Subject: RFH: failed assert debugging threaded+fork program over gdbserver
Date: Thu, 12 May 2016 17:16:00 -0000 [thread overview]
Message-ID: <20160512171650.GC26324@adacore.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 4731 bytes --]
Hello,
I have noticed the following problem, when debugging a program which
uses both threads and fork. The program is attached in copy, and
it was compiled by simply doing:
% gnatmake -g a_test
The issue appears only randomly, but it seems to show up fairly
reliably when using certain versions of GNU/Linux such as RHES7,
or WRSLinux. I also see it on Ubuntu, but less reliably. Here is
what I have found, debugging on WRSLinux (we set it up as a cross,
but it should be the same with native GNU/Linux distros):
% gdb a_test
(gdb) break a_test.adb:30
(gdb) break a_test.adb:39
(gdb) target remote my_board:4444
(gdb) continue
Continuing.
[...]
[New Thread 866.868]
[New Thread 866.869]
[New Thread 870.870]
/[...]/gdb/thread.c:89: internal-error: thread_info* inferior_thread(): Assertion `tp' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n)
The error happens because GDBserver returns a list of threads
to GDB where a new thread as a different PID (870 in the case
above, instead of 866).
What this does is that it makes remote_notice_new_inferior
think that there is a new inferior (which is actually true,
in fact), thus causing it to call remote_add_inferior, which
does the following:
/* In the traditional debugging scenario, there's a 1-1 match
between program/address spaces. We simply bind the inferior
to the program space's address space. */
inf = current_inferior ();
inferior_appeared (inf, pid);
These two lines cause the PID of the current inferior, to be changed
to the PID of the new process (from the fork). This is where I *think*
we're making the mistake; see below...
However, remote_notice_new_inferior also calls notice_new_inferior
a few lines later, which starts by setting up a cleanup to restoreu
the current thread:
if (!ptid_equal (inferior_ptid, null_ptid))
make_cleanup_restore_current_thread ();
At that point in time, the current thread, which is the thread
that receveived the event, is one of the thread belonging to
the original inferior (pid=866). That's what we setup to restore.
And unfortunately, the restoration does not go according to plan,
because our inferior list now still has one inferior in it, except
that its PID is no longer 866, but rather 870. If we look at
thread.c::do_restore_current_thread_cleanup, we see:
tp = find_thread_ptid (old->inferior_ptid);
/* If the previously selected thread belonged to a process that has
in the mean time been deleted (due to normal exit, detach, etc.),
then don't revert back to it, but instead simply drop back to no
thread selected. */
if (tp
&& find_inferior_ptid (tp->ptid) != NULL)
restore_current_thread (old->inferior_ptid);
else
{
restore_current_thread (null_ptid);
set_current_inferior (find_inferior_id (old->inf_id));
}
In our case, find_inferior_ptid no longer finds an inferior with
the old PID, and so we go into the else branch, causing us to
set the inferior_ptid to the null_ptid. This causes problems
a little later, when doing a normal_stop:
/* Notify observers about the stop. This is where the interpreters
print the stop event. */
if (!ptid_equal (inferior_ptid, null_ptid))
observer_notify_normal_stop (inferior_thread ()->control.stop_bpstat,
stop_print_frame);
else
observer_notify_normal_stop (NULL, stop_print_frame);
In our case, we're in the "else" branch. This leads to cli_on_normal_stop,
which calls print_stop_event -> print_stop_location, which starts by
calling inferior_thread:
struct thread_info *tp = inferior_thread ();
And looking at inferior_thread, we see:
struct thread_info *tp = find_thread_ptid (inferior_ptid);
gdb_assert (tp);
We trip the assertion before inferior_ptid is the null_ptid.
At first sight, I think that the main problem is that we muck the
current_inferior's pid when we really shouldn't. I'm not really sure
how the new PID should be handled though, which is why I'm asking
for advice here.
I think it also unearthed a secondary issue - looks like normal_stop
really isn't prepared to handle a null inferior_ptid, even though
the fact that we call it after having checked that inferior_ptid is
null indicates that we should. But what does it mean, to be showing
where we stopped, when we don't know which thread caused the stop???
I think discussing this separately would be best, but I wanted to
mention it here, so it doesn't get overlooked.
Any advice on how I should be fixing the issue?
Thanks!
--
Joel
[-- Attachment #2: a_test.adb --]
[-- Type: text/plain, Size: 1652 bytes --]
with System;
with Text_Io;
procedure A_Test is
Command : constant String := "echo Hello World &" & Ascii.Nul;
Status : Integer;
---------------------------------
-- System shell command interface
---------------------------------
function Shell_Cmd (Command : in System.Address) return Integer;
pragma Import (C, Shell_Cmd, "system");
------------------
-- Start 2 threads
------------------
task Task1 is
entry Start_Task;
end Task1;
task Task2 is
entry Start_Task;
end Task2;
task body Task1 is
begin
select -- Task 1
accept Start_Task;
or
terminate;
end select;
end Task1;
task body Task2 is
begin
select -- Task 2
accept Start_Task;
or
terminate;
end select;
end Task2;
procedure My_Routine is
begin
Text_Io.Put_Line ("Program successful");
end My_Routine;
begin
Status := Shell_Cmd (Command'Address);
--------------------------------------------------------------
-- From "gvd" set break point on line below by selecting line.
-- Run program by clicking "Run" button, some warnings are
-- logged and process goes into limbo.
--
-- Note, it appears to work if you "gdb" command line to set
-- break point as follows: "b My_Routine".
-- However, it does NOT work using "b 50".
-- Turns out that "gdb" works intermittently, only.
--
-- Another interesting anomoly is that the process never seems
-- to quit when run from "gvd" or "gdb" and no break points,
-- but does run to completion without the debugger.
--------------------------------------------------------------
My_Routine;
end A_Test;
next reply other threads:[~2016-05-12 17:16 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-12 17:16 Joel Brobecker [this message]
2016-05-12 17:42 ` Don Breazeal
2016-06-23 22:59 ` Joel Brobecker
2016-06-24 18:12 ` RFA/gdbserver: GDB internal-error debugging threaded program with breakpoint and forks (was: "Re: RFH: failed assert debugging threaded+fork program over gdbserver") Joel Brobecker
2016-06-24 21:57 ` RFA/gdbserver: GDB internal-error debugging threaded program with breakpoint and forks Pedro Alves
2016-06-24 22:36 ` Joel Brobecker
2016-06-24 22:37 ` Pedro Alves
2016-06-27 22:32 ` Joel Brobecker
2016-06-28 19:40 ` Pedro Alves
2016-07-05 16:49 ` Joel Brobecker
2016-06-24 21:52 ` RFH: failed assert debugging threaded+fork program over gdbserver Breazeal, Don
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160512171650.GC26324@adacore.com \
--to=brobecker@adacore.com \
--cc=gdb-patches@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox