From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4605 invoked by alias); 10 Jul 2008 22:33:40 -0000 Received: (qmail 4580 invoked by uid 22791); 10 Jul 2008 22:33:34 -0000 X-Spam-Check-By: sourceware.org Received: from NaN.false.org (HELO nan.false.org) (208.75.86.248) by sourceware.org (qpsmtpd/0.31) with ESMTP; Thu, 10 Jul 2008 22:33:14 +0000 Received: from nan.false.org (localhost [127.0.0.1]) by nan.false.org (Postfix) with ESMTP id 2882798415; Thu, 10 Jul 2008 22:33:13 +0000 (GMT) Received: from caradoc.them.org (22.svnf5.xdsl.nauticom.net [209.195.183.55]) by nan.false.org (Postfix) with ESMTP id 0465498337; Thu, 10 Jul 2008 22:33:12 +0000 (GMT) Received: from drow by caradoc.them.org with local (Exim 4.69) (envelope-from ) id 1KH4hM-0005CJ-AO; Thu, 10 Jul 2008 18:33:12 -0400 Date: Thu, 10 Jul 2008 22:33:00 -0000 From: Daniel Jacobowitz To: Paul Koning Cc: sandra@codesourcery.com, gdb@sourceware.org, gdb-patches@sourceware.org, pedro@codesourcery.com Subject: Re: [remote protocol] support for disabling packet acknowledgement Message-ID: <20080710223312.GA19058@caradoc.them.org> Mail-Followup-To: Paul Koning , sandra@codesourcery.com, gdb@sourceware.org, gdb-patches@sourceware.org, pedro@codesourcery.com References: <48765B8A.6080805@codesourcery.com> <18550.24158.544203.163257@gargle.gargle.HOWL> <48766999.6070001@codesourcery.com> <18550.28000.759268.379468@gargle.gargle.HOWL> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <18550.28000.759268.379468@gargle.gargle.HOWL> User-Agent: Mutt/1.5.17 (2008-05-11) X-IsSubscribed: yes Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org X-SW-Source: 2008-07/txt/msg00172.txt.bz2 Sandra asked me to take a stab at explaining the mess we're in. Some credit also goes to Nathan Sidwell, for hours spent diagramming this and ramming it through my thick skull. On Thu, Jul 10, 2008 at 04:13:20PM -0400, Paul Koning wrote: > Let me see if I understand this right. > > 1. +/- ACKs are fine for the clasis (without non-stop) remote > protocol. > > 2. ACKs are needed if the underlying transport isn't a reliable > transport (for example a raw UART). They aren't needed if the > underlying transport is TCP or equivalent. > > 3. +/- ACKs are not good enough for non-stop mode. (It's not clear to > me why -- is it because there may need to be more than one packet > in flight? An explanation of what exactly is wrong would be > helpful to understand how to fix the issue.) The current GDB protocol has very simple state. At any moment, it is either GDB's turn or the remote's turn to send events. Both sides never simultaneously think they have the token. Sometimes neither side thinks they have the token - either when a message is on the wire, or else when a message has been lost. Normally a timeout comes to the rescue. Non-stop is incompatible with this. GDB can have the normal protocol token, for instance if it is about to send a memory read. At the same time the debug agent can send packets. This has to be the case; otherwise GDB would have to frequently poll for state changes, which would introduce too much overhead and traffic. The result of this is that the acks become ambiguous in the presence of an unreliable or antagonistically delayed transport. For instance, if GDB sends a memory write, the stub acks it, the stub replies with OK, and then GDB's ack is delayed. Existing implementations of the protocol will resend the OK in this case, assuming the message was lost - from stub side that's indistinguishable from ack lost. GDB's long-delayed ACK arrives on the stub at the same time the OK arrives at GDB. GDB must ack again - it doesn't know whether the first ack ever made it through, and if it doesn't ack now then the stub might keep resending that OK until it gets through. So now GDB sends an ack. Simultaneously the stub sends a stop reply indicating that some other thread has stopped. When it receives the ack, it thinks GDB saw the stop reply and does not resend it. But GDB hasn't seen it yet, and if it is dropped the conversation is now out of sync. GDB will hang around waiting for an event that has already been reported. There's a clear solution to this: sequence numbers. There's a convenient protocol which has them, too... > The implication is that the non-stop mode design abandons support for > non-TCP transports. No, the design abandons support for non-stop operation on lossy transports. I've used plenty of serial and UDP links that were in practice sufficient. If the link level is not sufficient, then the implementor still has the option of wrapping a more reliable layer between the transport and the gdb protocol communication. > I would argue you need to identify why +/- ACKs aren't good enough, > and propose a replacement that is good enough. With that > replacement you have a way to add the non-stop mode. If the > overhead of that replacement is significant in some plausible use > case, you could then add a way to turn it off for the case where TCP > is used end to end. I think that if someone wants to design a more reliable protocol than the existing one, they are free to do so, and either layer it under the existing protocol as described above or contribute it to GDB - we're not leaving anyone out in the cold and a new feature doesn't have to meet every possible use case in its first incarnation. This isn't the only problem with the existing protocol in my opinion. It's pretty crufty, but it gets by. -- Daniel Jacobowitz CodeSourcery