Mirror of the gdb mailing list
 help / color / mirror / Atom feed
* Multi-packet gdb server request/response - problem?
@ 2013-12-10 12:23 Ivo Raisr
  2013-12-10 21:46 ` Philippe Waroquiers
  2013-12-11 11:43 ` Yao Qi
  0 siblings, 2 replies; 8+ messages in thread
From: Ivo Raisr @ 2013-12-10 12:23 UTC (permalink / raw)
  To: gdb

https://sourceware.org/gdb/onlinedocs/gdb/Overview.htmlGood morning all!

I use gdb 7.6 (on Solaris 11.1 x86 if that matters) which
is used to talk to a remote gdb server stub implementation (inside
valgrind, if that matters).

gdb is invoked as follows:
$ gdb --quiet -l 60 --nx ./my_binary

After the instrumented binary (with remote gdb server stub
implementation) starts, gdb is commanded to talk to it:
(gdb) set debug remote 1
(gdb) target remote | valgrind-gdb-relay --along-with-other-options

Then the conversation starts (seemingly correctly) with:
-----------
  Sending packet:
$qSupported:multiprocess+;xmlRegisters=i386;qRelocInsn+#b5...relaying
data       between gdb and process 12033
  Ack
  Packet received:
PacketSize=3fff;QStartNoAckMode+;QPassSignals+;qXfer:auxv:read+;
     qXfer:features:read+
  Packet qSupported (supported-packets) is supported
  Sending packet: $QStartNoAckMode#b0...Ack
  Packet received: OK
...
-----------

After a while, program gets stopped.
If I then enter:
(gdb) continue
conversation continues (approx 40 packets are exchanged) but then
abruptly ends with error:

Sending packet: $s#73...Sending packet: $mfe7c852c,4#34...Packet
received: T0505:08f8fe37;04:e8f7fe37;08:4d761100;thread:1;
Reply contains invalid hex digit 84

Please take a particular note that in this case gdb did not wait for a reply
to its '$s' packet but instead immediately issued another '$mfe7c852c,4'
packet.

This is also verified by dumping packets on the other side in the gdb
server stub implementation:
- read from gdb $s#73$mfe7c852c,4#34
- write to gdb $T0505:08f8fe37;04:e8f7fe37;08:4d761100;thread:1;#d8$0*"00#dc

So on the wires it looks like:
- gdb wrote two requests without waiting for a response for the first one
- gdb received two concatenated responses for these two requests
- requests and responses alone look well-formed

My questions are:
- Is multi-packet request/response supported by gdb?
- In other words, is that gdb's behaviour intentional?
- If yes, then why it cannot handle multi-packet response?
- If no, then is that a bug?

I was trying to confirm this with a protocol description available at
https://sourceware.org/gdb/onlinedocs/gdb/Overview.html
but in vain.

Should you need more information, let me know.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Multi-packet gdb server request/response - problem?
  2013-12-10 12:23 Multi-packet gdb server request/response - problem? Ivo Raisr
@ 2013-12-10 21:46 ` Philippe Waroquiers
       [not found]   ` <CANXv6=vFYfEDLPVHv_TRK4VTLmFX6c+2hvmodM9dg3gcJA4_jQ@mail.gmail.com>
  2013-12-11 11:43 ` Yao Qi
  1 sibling, 1 reply; 8+ messages in thread
From: Philippe Waroquiers @ 2013-12-10 21:46 UTC (permalink / raw)
  To: Ivo Raisr; +Cc: gdb

On Tue, 2013-12-10 at 13:22 +0100, Ivo Raisr wrote:

> My questions are:
> - Is multi-packet request/response supported by gdb?
> - In other words, is that gdb's behaviour intentional?
> - If yes, then why it cannot handle multi-packet response?
> - If no, then is that a bug?
Valgrind gdbserver implements (only) the all-stop mode.

To my knowledge, with that mode, the principle is that GDB should
send one command (such as "s" step), and then should wait for the reply.
So, having an "s" packet directly followed by an "m" packet
looks strange to me (at least, I never saw this).

Does this also happen with the gdbserver included in the GDB
distribution ?
(be sure GDB is in all stop mode :
   show non-stop 
)


When using the Valgrind gdbserver, does the problem also happens
if you first instruct GDB to keep the "ack mode" ?
i.e. first use
    set remote noack-packet 0
 before launching
    target remote | vgdb
In that mode, GDB and Valgrind gdbserver will continue
to have each packet acknowledged using a +.

Philippe

NB: when GDB is set in non stop mode and connected to the Valgrind
gdbserver, it reports that the remote stub does not support 
non stop mode, but still continues (and that does not work properly
after).
I am wondering why the choice between all-stop and non-stop is
then not automatically "auto choosed/probed" by GDB ?



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Multi-packet gdb server request/response - problem?
  2013-12-10 12:23 Multi-packet gdb server request/response - problem? Ivo Raisr
  2013-12-10 21:46 ` Philippe Waroquiers
@ 2013-12-11 11:43 ` Yao Qi
  1 sibling, 0 replies; 8+ messages in thread
From: Yao Qi @ 2013-12-11 11:43 UTC (permalink / raw)
  To: Ivo Raisr; +Cc: gdb

On 12/10/2013 08:22 PM, Ivo Raisr wrote:
> Sending packet: $s#73...Sending packet: $mfe7c852c,4#34...Packet
> received: T0505:08f8fe37;04:e8f7fe37;08:4d761100;thread:1;
> Reply contains invalid hex digit 84
> 
> Please take a particular note that in this case gdb did not wait for a reply
> to its '$s' packet but instead immediately issued another '$mfe7c852c,4'
> packet.
> 
> This is also verified by dumping packets on the other side in the gdb
> server stub implementation:
> - read from gdb $s#73$mfe7c852c,4#34
> - write to gdb $T0505:08f8fe37;04:e8f7fe37;08:4d761100;thread:1;#d8$0*"00#dc
> 
> So on the wires it looks like:
> - gdb wrote two requests without waiting for a response for the first one
> - gdb received two concatenated responses for these two requests
> - requests and responses alone look well-formed
> 
> My questions are:
> - Is multi-packet request/response supported by gdb?

No.  GDB sends out one packet and waits for the response.

> - In other words, is that gdb's behaviour intentional?

No, it is not intentional.

> - If yes, then why it cannot handle multi-packet response?
> - If no, then is that a bug?

I have to say that GDB and the remote stub interact incorrectly.  With
the limited information, hard to determine which part cause this
problem.

I forces gdbserver to use 's' packet, but unable to reproduce.

Sending packet: $Hc0#db...Packet received: OK
Sending packet: $s#73...Packet received:
T0505:88efffbf;04:70efffbf;08:f9840408;thread:p7891.7891;core:0;
Sending packet: $z0,80484c1,1#cf...Packet received: OK

-- 
Yao (齐尧)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Fwd: Multi-packet gdb server request/response - problem?
       [not found]   ` <CANXv6=vFYfEDLPVHv_TRK4VTLmFX6c+2hvmodM9dg3gcJA4_jQ@mail.gmail.com>
@ 2013-12-12 15:14     ` Ivo Raisr
  2013-12-12 15:44       ` Ivo Raisr
  0 siblings, 1 reply; 8+ messages in thread
From: Ivo Raisr @ 2013-12-12 15:14 UTC (permalink / raw)
  To: gdb; +Cc: Philippe Waroquiers

Hello Philippe,

So we meet again!
Pleasure on my side and thank you for a quick response.

> Does this also happen with the gdbserver included in the GDB
> distribution ?
> (be sure GDB is in all stop mode :
>    show non-stop
> )

Unfortunately gdbserver is not built with the GDB distribution on Solaris :-(
I suspect no one has ever needed it and my quick attempts to build my own
failed. It needs to be properly ported first...

> When using the Valgrind gdbserver, does the problem also happens
> if you first instruct GDB to keep the "ack mode" ?

Yes, it does:
------------------------------
...
Sending packet: $mfe7d44e0,4#30...Ack
Packet received: 00000000
Sending packet: $s#73...Ack
Sending packet: $mfe7d1af4,4#5f...Packet instead of Ack, ignoring it
Packet instead of Ack, ignoring it
(and the last message is repeated a dozen of times; gdb hangs)
------------------------------

which corresponds to what is observed on the remote stub (with a timestamp):
------------------------------
...
1386767417.658724 from gdb $mfe7d44e0,4#30
1386767417.658901 to gdb +
1386767417.659166 to gdb $0*"00#dc
1386767417.659257 from gdb +
1386767417.659321 from gdb $s#73
1386767417.659402 to gdb +
1386767417.659532 from gdb $mfe7d1af4,4#5f
1386767417.659651 to gdb $T0505:48f8fe37;04:28f8fe37;08:89e31100;thread:1;#ae
1386767417.659786 from gdb +
1386767417.659854 to gdb $T0505:48f8fe37;04:28f8fe37;08:89e31100;thread:1;#ae
(and the last response to gdb is repeated a dozen of times)
------------------------------

So this confirms that gdb does not wait for a response from the remote stub but
issues another request. And then is surprised a response was received
instead of ack...


I was also suspecting that the remote stub may have taken too long
to respond to an "s" packet, but this is not the case as confirmed by another
conversation quite early in the same debugging session:
-----------
...
1386767415.992930 from gdb $s#73
1386767415.993112 to gdb +
1386767415.993469 to gdb $T0505:78f7fe37;04:58f7fe37;08:89e31100;thread:1;#b2
1386767415.993633 from gdb +$qTStatus#49
1386767415.993800 to gdb +
1386767415.994078 to gdb $#00
...
-----------
and gdb waited for the response here even if it took longer than in
the latter case.

I also tried to issue several "step" commands in gdb.
There was a plenty of "s" packet request/response exchanged
with success between gdb and remote stub.

I managed to capture a backtrace of gdb when it printed the error message
"Reply contains invalid hex digit 84" and it looks as follows:
-----------
ffff80ffbfffe7c0 fromhex+0x3f()
ffff80ffbfffe810 hex2bin+0x60()
ffff80ffbfffe8a0 remote_xfer_partial+0x538()
ffff80ffbfffe930 sol_thread_xfer_partial+0xdf()
ffff80ffbfffe9e0 memory_xfer_partial_1+0x139()
ffff80ffbfffea70 target_xfer_partial+0x191()
ffff80ffbfffead0 target_read+0x67()
ffff80ffbfffeaf0 target_read_memory+0x28()
ffff80ffbfffeb70 rw_common.isra.4+0x9c()
ffff80ffbfffebb0 libc_db.so.1`td_read_bootstrap_data+0xf3()
ffff80ffbfffebe0 libc_db.so.1`ph_lock_ta+0x4a()
ffff80ffbffff2f0 libc_db.so.1`td_ta_thr_iter+0x5d()
ffff80ffbffff360 libc_db.so.1`td_ta_map_id2thr+0x97()
ffff80ffbffff450 thread_to_lwp+0xa9()
ffff80ffbffff530 sol_thread_wait+0xaa()
ffff80ffbffff5a0 target_wait+0x74()
ffff80ffbffff690 wait_for_inferior+0x155()
ffff80ffbffff730 proceed+0x1e5()
ffff80ffbffff7d0 continue_command+0x104()
ffff80ffbffff830 execute_command+0x28f()
ffff80ffbffff850 command_handler+0x5b()
ffff80ffbffff890 command_line_handler+0x27c()
ffff80ffbffff8c0 rl_callback_read_char+0xf9()
ffff80ffbffff8d0 rl_callback_read_char_wrapper+9()
ffff80ffbffff8f0 process_event+0x8d()
ffff80ffbffff930 gdb_do_one_event+0x117()
ffff80ffbffff960 start_event_loop+0x47()
ffff80ffbffff970 captured_command_loop+0x13()
ffff80ffbffff9d0 catch_errors+0x64()
ffff80ffbffffaa0 captured_main+0x696()
ffff80ffbffffb00 catch_errors+0x64()
ffff80ffbffffb10 gdb_main+0x24()
ffff80ffbffffb40 main+0x30()
ffff80ffbffffb50 _start+0x6c()
-----------
Just looking at the backtrace it seems that gdb is processing "m"
packet (target_read_memory()),
not waiting for a response to previous "s" packet.


Then I enabled 'infrun' debugging and gdb shows the following:
-----------
(gdb) continue
Continuing.
infrun: clear_proceed_status_thread (Thread 1)
infrun: proceed (addr=0xffffffff, signal=144, step=0)
infrun: resume (step=0, signal=0), trap_expected=0, current thread
[Thread 1] at 0x111652
infrun: wait_for_inferior ()
infrun: target_wait (-1, status) =
infrun:   42000 [Thread 1],
infrun:   status->kind = stopped, signal = SIGTRAP
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x11e388
infrun: BPSTAT_WHAT_SINGLE
infrun: no stepping, continue
infrun: resume (step=1, signal=0), trap_expected=1, current thread
[Thread 1] at 0x11e388
infrun: prepare_to_wait
infrun: target_wait (-1, status) =
infrun:   42000 [Thread 1],
infrun:   status->kind = stopped, signal = SIGTRAP
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x11e389
infrun: no stepping, continue
infrun: resume (step=0, signal=0), trap_expected=0, current thread
[Thread 1] at 0x11e389
infrun: prepare_to_wait
infrun: target_wait (-1, status) =
infrun:   42000 [Thread 1],
infrun:   status->kind = stopped, signal = SIGTRAP
infrun: infwait_normal_state
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x11e388
[Thread debugging using libthread_db enabled]
infrun: BPSTAT_WHAT_SINGLE
infrun: no stepping, continue
infrun: resume (step=1, signal=0), trap_expected=1, current thread
[Thread 1 (defunct)] at 0x11e388
infrun: prepare_to_wait
(and then the error message)
-----------
I wonder how gdb determined that Thread 1 is defunct now?


Combined with 'set debug remote 1', the last few debugging messages are:
-----------
Sending packet: $mfe7d4e80,38#6b...Packet received: <trimmed>
Sending packet: $mfe7d4580,40#34...Packet received: <trimmed>
Sending packet: $mfe7d44dc,4#62...Packet received: 00000000
Sending packet: $mfe7d44e0,4#30...Packet received: 00000000
Sending packet: $s#73...infrun: prepare_to_wait
Sending packet: $mfe7d1af4,4#5f...Packet received:
T0505:48f8fe37;04:28f8fe37;08:89e31100;thread:1;
(and then the error message)
-----------

Now the question is: how to debug the issue further?
I am not familiar with gdb's remote debugging internals...
Could the problem be that gdb determined that Thread 1 is defunct and somehow
skipped waiting for "s" packet response, based on different internal state?
Where I should be looking next?

Thanks for any suggestion.
I.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Multi-packet gdb server request/response - problem?
  2013-12-12 15:14     ` Fwd: " Ivo Raisr
@ 2013-12-12 15:44       ` Ivo Raisr
  2013-12-12 16:46         ` Pedro Alves
  0 siblings, 1 reply; 8+ messages in thread
From: Ivo Raisr @ 2013-12-12 15:44 UTC (permalink / raw)
  To: gdb

Hello guys!

After some hackery I think I understand a tiny bit of how remote
support works in gdb.
Request packet "s" ("c") is sent as a part of resume() as can be seen
from the following truss excerpt:

------------------
...
        -> remote_serial_write(0xffff80ffbfffef60, 0x5, 0x0, 0x0,
0x18, 0x203a74656b636170)
            -> serial_write(0xc195a0, 0xffff80ffbfffef60, 0x5, 0x0,
0x18, 0x203a74656b636170)
                -> ser_base_write(0xc195a0, 0xffff80ffbfffef60, 0x5,
0x0, 0x18, 0x203a74656b636170)
                    -> ser_unix_write_prim(0xc195a0,
0xffff80ffbfffef60, 0x5, 0x0, 0x18, 0x203a74656b636170)
write(9, " $ s # 7 3", 5)                       = 5
                     <- ser_unix_write_prim() = 5
            <- serial_write() = 0
        <- remote_serial_write() = 0
     <- putpkt() = 0
 <- remote_resume() = 0
...
------------------

However code in resume() does not actively wait for the response.
It is assumed (I think) that target_wait() will do the response handling.
And it initially does, because it does the following:
------------------
  for (t = current_target.beneath; t != NULL; t = t->beneath)
    {
      if (t->to_wait != NULL)
        {
          ptid_t retval = (*t->to_wait) (t, ptid, status, options);
...
------------------
Initially t->to_wait() is simply remote_wait() which indeed reads the response
packet and parses it.

At some point in the debugging session (not sure when but it does not matter),
I think gdb kicks in the OS-specific functions. For solaris, they are
found in sol-thread.c.
And t->to_wait() is no longer simple remote_wait() but is sol_thread_wait().

At this point, things go awry.
sol_thread_wait() contains the following stuff:
------------------
  save_ptid = inferior_ptid;
  old_chain = save_inferior_ptid ();

  inferior_ptid = thread_to_lwp (inferior_ptid, PIDGET (main_ph.ptid));
  if (PIDGET (inferior_ptid) == -1)
    inferior_ptid = procfs_first_available ();

...

  rtnval = beneath->to_wait (beneath, ptid, ourstatus, options);
------------------

The old remote_wait() function is hidden in the call of "beneath->to_wait()".
However the problem here is that thread_to_lwp() which precedes it is called
_before_ "beneath->to_wait()". And thread_to_lwp() invokes (among others)
target_read_memory() which in turn sends "m" request packet...

Looking at other *-thread.c sources (BSD, DEC, AIX etc.) they usually contain
call to "beneath->to_wait()" as their first thing. So I think the
logic in sol_thread_wait()
is flawed in that "beneath->to_wait()" is not called "early" enough.

I quickly hacked function sol_thread_wait() and removed all calls to
thread_to_lwp()
preceding "beneath->to_wait()". And the conversation with remote stub
is working now!
But now the ptid->lwp conversion is not done properly...

Anyone has any idea of how to overcome this flaw in sol_thread_wait()?

I.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Multi-packet gdb server request/response - problem?
  2013-12-12 15:44       ` Ivo Raisr
@ 2013-12-12 16:46         ` Pedro Alves
  0 siblings, 0 replies; 8+ messages in thread
From: Pedro Alves @ 2013-12-12 16:46 UTC (permalink / raw)
  To: Ivo Raisr; +Cc: gdb

On 12/12/2013 03:44 PM, Ivo Raisr wrote:

> At some point in the debugging session (not sure when but it does not matter),
> I think gdb kicks in the OS-specific functions. For solaris, they are
> found in sol-thread.c.

That's the (or a) problem.  They shouldn't.  sol-thread.c should only
be active for native Solaris debugging.  Try this.

---
 gdb/sol-thread.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/gdb/sol-thread.c b/gdb/sol-thread.c
index b480b58..3809aec 100644
--- a/gdb/sol-thread.c
+++ b/gdb/sol-thread.c
@@ -578,6 +578,10 @@ check_for_thread_db (void)
   td_err_e err;
   ptid_t ptid;

+  /* Don't attempt to use thread_db for remote targets.  */
+  if (!(target_can_run (&current_target) || core_bfd))
+    return;
+
   /* Do nothing if we couldn't load libthread_db.so.1.  */
   if (p_td_ta_new == NULL)
     return;
-- 
1.7.11.7



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Multi-packet gdb server request/response - problem?
  2013-12-14  3:39 Ivo Raisr
@ 2013-12-16 14:31 ` Pedro Alves
  0 siblings, 0 replies; 8+ messages in thread
From: Pedro Alves @ 2013-12-16 14:31 UTC (permalink / raw)
  To: Ivo Raisr; +Cc: gdb

On 12/14/2013 03:39 AM, Ivo Raisr wrote:

> No regressions found by the test suite. Good.

Thanks.

> 
> I filed the following bug:
> https://sourceware.org/bugzilla/show_bug.cgi?id=16329
> 
> Could we get it fixed in the 7.6.x branch as well?

Sure, done:

  https://sourceware.org/ml/gdb-patches/2013-12/msg00573.html

Not sure if there'll ever be another 7.6 release, though.

-- 
Pedro Alves


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Multi-packet gdb server request/response - problem?
@ 2013-12-14  3:39 Ivo Raisr
  2013-12-16 14:31 ` Pedro Alves
  0 siblings, 1 reply; 8+ messages in thread
From: Ivo Raisr @ 2013-12-14  3:39 UTC (permalink / raw)
  To: gdb

>>> At some point in the debugging session (not sure when but it does not matter),
>>> I think gdb kicks in the OS-specific functions. For solaris, they are
>>> found in sol-thread.c.
>>
>> That's the (or a) problem.  They shouldn't.  sol-thread.c should only
>> be active for native Solaris debugging.
>
> That tiny patch seems to solve the problem!
> I will run the gdb's test suite to check if it does not introduce any
> regression and let you know.

No regressions found by the test suite. Good.

I filed the following bug:
https://sourceware.org/bugzilla/show_bug.cgi?id=16329

Could we get it fixed in the 7.6.x branch as well?

I.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-12-16 14:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-10 12:23 Multi-packet gdb server request/response - problem? Ivo Raisr
2013-12-10 21:46 ` Philippe Waroquiers
     [not found]   ` <CANXv6=vFYfEDLPVHv_TRK4VTLmFX6c+2hvmodM9dg3gcJA4_jQ@mail.gmail.com>
2013-12-12 15:14     ` Fwd: " Ivo Raisr
2013-12-12 15:44       ` Ivo Raisr
2013-12-12 16:46         ` Pedro Alves
2013-12-11 11:43 ` Yao Qi
2013-12-14  3:39 Ivo Raisr
2013-12-16 14:31 ` Pedro Alves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox