From: Pedro Alves <pedro@palves.net>
To: Andrew Burgess <andrew.burgess@embecosm.com>, gdb-patches@sourceware.org
Subject: Re: [PATCH] gdb: better handling of 'S' packets
Date: Fri, 8 Jan 2021 00:51:20 +0000 [thread overview]
Message-ID: <c4870f27-b11b-8bdb-bdda-e4672ffa755e@palves.net> (raw)
In-Reply-To: <20201111153548.1364526-1-andrew.burgess@embecosm.com>
Hi Andrew,
I think I spotted one bug below. Otherwise, just minor comments.
On 11/11/20 15:35, Andrew Burgess wrote:
> This commit builds on work started in the following two commits:
>
> commit 24ed6739b699f329c2c45aedee5f8c7d2f54e493
> Date: Thu Jan 30 14:35:40 2020 +0000
>
> gdb/remote: Restore support for 'S' stop reply packet
>
> commit cada5fc921e39a1945c422eea055c8b326d8d353
> Date: Wed Mar 11 12:30:13 2020 +0000
>
> gdb: Handle W and X remote packets without giving a warning
>
> This is related to how GDB handles remote targets that send back 'S'
> packets.
>
> In the first of the above commits we fixed GDB's ability to handle a
> single process, single threaded target that sends back 'S' packets.
> Although the 'T' packets would always be preferred to 'S' these days,
> there's nothing really wrong with 'S' for this situation.
>
> The second commit above fixed an oversight in the first commit, a
> single-process, multi-threaded target can send back a process wide
> event, for example the process exited event 'W' without including a
> process-id, this also is fine as there is no ambiguity in this case.
>
> In PR gdb/26819 however we start to move towards "better" handling of
> more ambiguous cases. In this bug openocd is used to drive the spike
> RISC-V simulator. In this particular case a multi-core system is
> being simulated and presented to GDB as two threads. GDB then is
> seeing a single process, two thread system. Unfortunately the
> target (openocd) is still sending back 'S' packets, these are the
> packets that _don't_ include a thread-id.
>
> It is my opinion that this target, in this particular configuration,
> is broken. Even though it is possible, by being very careful with how
> GDB is configured to ensure that GDB only ever tries to run one thread
> at a time, I feel that any target that presents multiple threads to
> GDB should be making use of the 'T' stop packet, combined with sending
> a thread-id.
>
> However, with that caveat out of the way, I think this bug report does
> reveal a number of ways that GDB could be improved.
>
> Firstly, the main issue reported in the bug was that GDB would exit
> with this assertion:
>
> infrun.c:5690: internal-error: int finish_step_over(execution_control_state*): Assertion `ecs->event_thread->control.trap_expected' failed.
>
> I think it's fair to say that having a target send back 'S' packets
> when it should use 'T' is _not_ an excuse for GDB to throw an
> assertion.
>
Yeah, assertions aren't ideal.
> What's happening is that GDB connects to the 2 core system. Core 2 is
> selected, and a program loaded. A breakpoint is placed in main and we
> continue, this results in this packet exchange:
>
> Sending packet: $vCont;c#a8...Packet received: T05thread:2;
>
> That's good, all cores ran, and the remote told GDB that we stopped in
> thread 2. Next the user does `stepi` and this results in this packet
> exchange:
>
> Sending packet: $vCont;s:2#24...Packet received: S05
>
> Here GDB is trying to step only thread 2, and the target replies with
> an 'S' packet. Though it feels like sending back 'T05thread:2;' would
> be so much simpler here, there's nothing fundamentally wrong ambiguous
> about the exchange.
>
> Inside GDB the problem we're running into is within the function
> remote_target::process_stop_reply. When a stop reply doesn't include
> a thread-id (or process-id) it is this function that is responsible
> for looking one up. Currently we always just select the first
> non-exited thread, so in this case thread 1.
>
> As the step was issued as part of the step over a breakpoint logic,
> which was specifically run for thread-2 GDB is expecting the event to
> be reported in thread-2, and hence when we try to handle thread-1 we
> trigger the above assertion.
>
> My proposal is to improve the logic used in process_stop_reply to make
> the thread selection smarter.
I don't object to this. As guiding principle, I'd aim at preferring
simplicity over being super tolerant of "bad" stubs, especially if the
workaround complicates the code or some gets in the way in the future.
This seems to be still in tolerable levels. And we do get the warning.
Is the stub getting fixed as well?
>
> My first thought was that each thread has an 'executing' flag, instead
> of picking the first non-exited thread, we should pick a non-exited
> thread that is currently executing. The logic being that stop events
> shouldn't arrive for threads that are no executing.
>
> The problem with this is the very first initial connection.
>
> When GDB first connects to the remote target it is told about all the
> existing threads. These are all created by GDB in the non-executing
> state. Another part of the connecting logic is to send the remote the
> '?' packet, which asks why the target halted. This sends back a stop
> packet which is then processed. At this point non of the threads are
"non" -> "none"
> marked executing so we would end up with no suitably matching threads.
>
> This left me with two rules:
>
> 1. Select the first non-exited thread that is marked executing, or
> 2. If no threads match rule 1, select the first non-exited thread
> whether it is executing or not.
>
> This seemed fine, and certainly resolved the issue seen in the
> original bug report. So then I tried to create a test for this using
> a multi-threaded test program and `gdbserver --disable-packet=T`.
>
> I wasn't able to get anything that exactly reproduced the original
> bug, but I was able to hit similar issues where GDB would try to step
> one thread but GDB would handle the step (from the step) in a
> different thread. In some of these cases there was genuine ambiguity
> in the reply from the target, however, it still felt like GDB could do
> a better job at guessing which thread to select for the stop event.
>
> I wondered if we could make use of the 'continue_thread' and/or the
> 'general_thread' to help guide the choice of thread.
>
> In the end I settled on these rules for thread selection:
>
> [ NOTE: For all the following rules, only non-exited threads are
> considered. ]
>
> 1. If the continue_thread is set to a single specific thread, and
> that thread is executing, then assume this is where the event
> occurred.
>
> 2. If nothing matches rule 1, then if the general_thread is set to a
> single specific thread, and that thread is executing, assume this is
> where the event occurred.
>
> 3. If nothing matches rule 2 then select the first thread that is
> marked as executing.
>
> 4. If nothing matches rule 3 then select the first thread.
>
> This works fine except for one small problem, when GDB is using the
> vcont packets we don't need to send 'Hc' packets to the target and so
vcont -> vCont
> the 'continue_thread' is never set.
>
> In this commit I add a new record_continue_thread function, this sets
> the continue_thread without sending a 'Hc' packet. This effectively
> serves as a cache for which thread did we set running.
>
> The only slight "wart" here is that when GDB steps a thread the
> continue_thread is not set to a specific single thread-id, rather it
> gets set to either minus_one_ptid or to a specific processes ptid. In
> this case (when a step is requested) I store the ptid of the stepping
> thread.
>
>
> diff --git a/gdb/remote.c b/gdb/remote.c
> index 71f814efb36..0020a1ee3c5 100644
> --- a/gdb/remote.c
> +++ b/gdb/remote.c
> @@ -747,6 +747,9 @@ class remote_target : public process_stratum_target
> ptid_t process_stop_reply (struct stop_reply *stop_reply,
> target_waitstatus *status);
>
> + ptid_t guess_thread_for_ambiguous_stop_reply
> + (const struct target_waitstatus *status);
> +
> void remote_notice_new_inferior (ptid_t currthread, int executing);
>
> void process_initial_stop_replies (int from_tty);
> @@ -2576,6 +2579,22 @@ record_currthread (struct remote_state *rs, ptid_t currthread)
> rs->general_thread = currthread;
> }
>
> +/* Called from the vcont packet generation code. Unlike the old thread
vcont -> vCont
> + control packets, which rely on sending a Hc packet before sending the
> + continue/step packet, with vcont no Hc packet is sent.
vcont -> vCont
> +
> + As a result the remote state's continue_thread field is never updated.
> +
> + Sometime though it can be useful if we do have some information about
> + which thread(s) the vcont tried to continue/step as this can be used to
vcont -> vCont
> + guide the choice of thread in the case were a miss-behaving remote
> + doesn't include a thread-id in its stop packet. */
> +static void
Missing empty line between comment and function.
Note that a single vCont packet can include multiple resumptions,
like e.g.:
vCont s:1; s:2; c:3
though currently only in non-stop RSP. Maybe just mention that
> +record_continue_thread (struct remote_state *rs, ptid_t thr)
> +{
> + rs->continue_thread = thr;
> +}
> +
> /* If 'QPassSignals' is supported, tell the remote stub what signals
> it can simply pass through to the inferior without reporting. */
>
> @@ -6227,6 +6246,8 @@ remote_target::remote_resume_with_vcont (ptid_t ptid, int step,
> char *p;
> char *endp;
>
> + record_continue_thread (get_remote_state (), ptid);
You're calling this before the check below that checks whether vCont is supported
at all. That means that if vCont isn't supported, when we get to remote_resume_with_hc,
to the set_thread call via set_continue_thread, the continue_thread is already updated
in gdb, but it was never updated in the remote side:
void
remote_target::set_thread (ptid_t ptid, int gen)
{
struct remote_state *rs = get_remote_state ();
ptid_t state = gen ? rs->general_thread : rs->continue_thread;
char *buf = rs->buf.data ();
char *endbuf = buf + get_remote_packet_size ();
if (state == ptid)
return;
...
I.e., it seems to me that gdb and the remote get out of sync?
> +
> /* No reverse execution actions defined for vCont. */
> if (::execution_direction == EXEC_REVERSE)
> return 0;
> @@ -6264,6 +6285,7 @@ remote_target::remote_resume_with_vcont (ptid_t ptid, int step,
> {
> /* Step inferior_ptid, with or without signal. */
> p = append_resumption (p, endp, inferior_ptid, step, siggnal);
> + record_continue_thread (get_remote_state (), inferior_ptid);
> }
>
> /* Also pass down any pending signaled resumption for other
> @@ -7671,6 +7693,191 @@ remote_notif_get_pending_events (remote_target *remote, notif_client *nc)
> remote->remote_notif_get_pending_events (nc);
> }
>
> +/* Called from process_stop_reply when the stop packet we are responding
> + too didn't include a process-id or thread-id. STATUS is the stop event
"responding too" -> "responding to".
> + we are responding too.
Ditto.
> +
> + It is the task of this function to find (guess) a suitable thread and
> + return its ptid, this is the thread we will assume the stop event came
> + from.
> +
> + In some cases there really isn't any guessing going on, a basic remote
> + with a single process containing a single thread might choose not to
> + send any process-id or thread-id in its stop packets, this function will
> + select and return the one and only thread.
> +
> + However, there are targets out there which are.... not great, and in
> + some cases will support multiple threads but still don't include a
> + thread-id. In these cases we try to do the best we can when selecting a
> + thread, but in the general case we can never know for sure we have
> + picked the correct thread. As a result this function can issue a
> + warning to the user if it detects that there is the possibility that we
> + really are guessing at which thread to report. */
> +
> +ptid_t
> +remote_target::guess_thread_for_ambiguous_stop_reply
> + (const struct target_waitstatus *status)
Should be indented with 2 spaces instead of a tab.
> +{
> + /* Some stop events apply to all threads in an inferior, while others
> + only apply to a single thread. */
> + bool is_stop_for_all_threads
> + = (status->kind == TARGET_WAITKIND_EXITED
> + || status->kind == TARGET_WAITKIND_SIGNALLED);
> +
> + struct remote_state *rs = get_remote_state ();
> +
> + /* Track the possible threads in this structure. */
> + struct thread_choices
> + {
> + /* Constructor. */
> + thread_choices (struct remote_state *rs, bool is_stop_for_all_threads)
> + : m_rs (rs),
> + m_is_stop_for_all_threads (is_stop_for_all_threads)
> + { /* Nothing. */ }
> +
> + /* Disable/delete these. */
> + thread_choices () = delete;
This one is not necessary, since you provide a non-default ctor.
> + DISABLE_COPY_AND_ASSIGN (thread_choices);
> +
> + /* Consider thread THR setting the internal thread tracking variables
> + as appropriate. */
> + void consider_thread (thread_info *thr)
> + {
> + /* Record this as the first thread, or mark that we have multiple
> + possible threads. We set the m_multiple flag even if there is
Uppercase m_multiple.
> + only one thread executing. This means we possibly issue warnings
> + to the user when there is no ambiguity... but there's really no
> + reason why the remote target couldn't include a thread-id so it
> + doesn't seem to bad to point this out. */
> + if (m_first_thread == nullptr)
> + m_first_thread = thr;
> + else if (!m_is_stop_for_all_threads
> + || m_first_thread->ptid.pid () != thr->ptid.pid ())
> + m_multiple = true;
> +
...
> +
> + /* Return true if there were multiple possible thread/processes and we
> + had to just pick one. This indicates that a warning probably should
> + be issued to the user. */
> + bool multiple_possible_threads_p () const
> + { return m_multiple; }
> +
> + private:
> +
> + /* The remote state we are examining threads for. */
> + struct remote_state *m_rs = nullptr;
This one's already initialized in the ctor.
> +
> + /* Is this stop event one for all threads in a process (e.g. process
Spurious double space after e.g.
> + exited), or an event for a single thread (e.g. thread stopped). */
> + bool m_is_stop_for_all_threads;
> +
...
> +
> + /* If this is a stop for all threads then don't use a particular threads
> + ptid, instead create a new ptid where only the pid field is set. */
> + return ((is_stop_for_all_threads) ? ptid_t (thr->ptid.pid ()) : thr->ptid);
Parens around is_stop_for_all_threads redundant.
> +}
> +
> diff --git a/gdb/testsuite/gdb.server/stop-reply-no-thread-multi.exp b/gdb/testsuite/gdb.server/stop-reply-no-thread-multi.exp
> new file mode 100644
> index 00000000000..b4ab03471e8
> --- /dev/null
> +++ b/gdb/testsuite/gdb.server/stop-reply-no-thread-multi.exp
> @@ -0,0 +1,139 @@
> +# This testcase is part of GDB, the GNU debugger.
> +#
> +# Copyright 2020 Free Software Foundation, Inc.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program. If not, see <http://www.gnu.org/licenses/>.
> +
> +# Test how GDB handles the case where a target either doesn't use 'T'
> +# packets at all or doesn't include a thread-id in a 'T' packet, AND,
> +# where the test program contains multiple threads.
> +#
> +# In general this is a broken situation and GDB can never do the
> +# "right" thing is all cases. If two threads are running and when a
> +# stop occurs, the remote does not tell GDB which thread stopped, then
> +# GDB can never be sure it has attributed the stop to the correct
> +# thread.
> +#
> +# However, we can ensure some reasonably sane default behaviours which
> +# can make some broken targets appear a little less broken.
> +
> +load_lib gdbserver-support.exp
> +
> +if { [skip_gdbserver_tests] } {
> + verbose "skipping gdbserver tests"
> + return -1
> +}
> +
> +standard_testfile
> +if [prepare_for_testing "failed to prepare" $testfile $srcfile {debug pthreads}] {
Below in run_test, you use clean_restart so here you can use build_executable
instead of prepare_for_testing. Saves spawning gdb here (and creating unnecessary
gdb.cmd.1/gdb.in.1 files in the output dir) only to restart it again
immediately after.
> + return -1
> +}
> +
...
> +
> + gdb_breakpoint "unlock_worker"
> + gdb_continue_to_breakpoint "run to unlock_worker"
> +
> + # There should be two threads at this point with thread 1 selected.
> + gdb_test "info threads" \
> + "\\\* 1\[\t \]*Thread\[^\r\n\]*\r\n 2\[\t \]*Thread\[^\r\n\]*" \
> + "second thread should now exist"
> +
> + # Switch threads.
> + gdb_test "thread 2" ".*" "switch to second thread"
> +
> + # Single step. This will set all threads running but as there's
> + # no reason for the first thread to report a stop we expect to
> + # finish the step with thread 2 still selected.
I think GDB will first switch to thread 1 to step over the breakpoint thread 1
is stopped at and only after will it step thread 2 while letting thread 1 run free.
I think that with your patch GDB will do the "right" thing and figure out the
right thread for the first step stop of thread 1 correctly, since at that point
no other thread is executing. It's just that the comment seems a bit off.
> + gdb_test_multiple "stepi" "" {
> + -re "Thread 1 received signal SIGTRAP" {
Shouldn't this consume the prompt?
> + fail $gdb_test_name
> + }
> + -re "$hex.*$decimal.*while \\(worker_blocked\\).*$gdb_prompt" {
> + pass $gdb_test_name
> + }
> + }
> +
> + # Double check that thread 2 is still selected.
> + gdb_test "info threads" \
> + " 1\[\t \]*Thread\[^\r\n\]*\r\n\\\* 2\[\t \]*Thread\[^\r\n\]*" \
> + "second thread should still be selected after stepi"
> +
> + # Now "continue" thread 2. Again there's no reason for thread 1
> + # to report a stop so we should finish with thread 2 still
> + # selected.
Ditto here.
I'll look at Simon's patch next, and your thread/frame patches after.
Hopefully tomorrow.
Pedro Alves
next prev parent reply other threads:[~2021-01-08 0:51 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-11 15:35 Andrew Burgess
2020-12-10 16:29 ` Andrew Burgess
2020-12-23 23:09 ` [PATCHv2] " Andrew Burgess
2021-01-06 21:19 ` Simon Marchi via Gdb-patches
2021-01-07 9:57 ` Andrew Burgess
2021-01-08 0:51 ` Pedro Alves [this message]
2021-01-08 3:00 ` [PATCH] " Simon Marchi via Gdb-patches
2021-01-08 10:15 ` Andrew Burgess
2021-01-08 3:58 ` Simon Marchi via Gdb-patches
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c4870f27-b11b-8bdb-bdda-e4672ffa755e@palves.net \
--to=pedro@palves.net \
--cc=andrew.burgess@embecosm.com \
--cc=gdb-patches@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox