[Fwd: AW: [En-Nut-Discussion] TCP/IP-Problems]

Brett Abbott Brett.Abbott at digital-telemetry.com
Mon Jun 5 22:41:35 CEST 2006


Oliver

Thanks for this.  Much appreciated.

Could I ask for an opinion on a particularly strange problem I am seeing.

I have a TCP client running on an ethernut which appears to be getting 
confused when the network is under load and unreliable.

The unit is holding open a TCP session with a remote host - this is 
working well.  For some reason, the remote host occasionally seems to 
screw up the sequence/Ack numbers on one packet, (I suspect the Vodafone 
packet filter is confusing sessions with other clients - This is subject 
to internal investigation and not relevant).

I expected that a screwed up sequence number would result in the two 
ends retrying for a bit and then giving up, however, I see what appears 
to be a endless loop of retries for a period of time, then silence until 
something new is sent.    My issue is that the faulty packet, probably 
due to network issues, isnt resulting in the session being torn down and 
restarted.

The traffic takes the following form...  (note this is over GPRS under 
load and latency is bad)

1.   Nut->Host, ACK PUSH, data
2.  Host ->Nut, ACK
wait 55 minutes
3. Host->Nut, ACK PUSH, data (this one has the unexpected sequence and 
acknowledgment number)
4. Host Resent packet 3 (90 ms later)
5. Nut->Host, ACK, nut uses sequence based on ack number from 2 (I 
suspect that it didnt accept packet from 3/4 due to wrong sequence number)
6. Nut resent packet 5 (no response)
7. Host->Nut, ACK - sequence number used is sequence # from 3 + length 
of packet.  ie. continuing as if the unexpected sequence # was valid.
8. Host resends 7
9. Nut resends 5
10. Nut resends 5
11. Host resends 7
12-1012 or so ........ Host and Nut continues to resend these packets!
Sent in pairs and singles (pairs due to latency) for 1min 40 seconds.  
This is a lot of packets - does the timeout occur on a timer or on a 
retry count?

There is then a break of 4 minutes - (note no closes of sessions - 
traffic just stops) - perhaps ran out of memory?

Then a scheduled transmission from the nut occurs from another thread 
(out the same TCP session) -
Nut->Host, ACK PUSH (data),
no response from host,
followed 11 seconds later by ...
Nut-> Host, ACK FIN from the nut.  - It appears to have noticed the dead 
session.

Any thoughts greatfully appreciated.  Is the retry process acting as 
expected?  Should I just lengthen the retry timer?  The unexpected 
sequence number is an issue and should result in the TCP session being 
killed but I would have expected it to occur following the retries 
rather than the next time we tried to send data.

Brett


-------- Original Message --------
Subject: 	AW: [En-Nut-Discussion] TCP/IP-Problems
Date: 	Fri, 02 Jun 2006 20:47:49 +0200
From: 	Oliver Schulz <olischulz at web.de>
Reply-To: 	Ethernut User Chat (English) <en-nut-discussion at egnite.de>
To: 	'Ethernut User Chat (English)' <en-nut-discussion at egnite.de>



Hello folks, Hello Dirk,

After a first look to the dumps and the Nut sources, I think the problem is
in the window size reported by the S7 on the first packets. The CP 343-1
LEAN indicates on the first sent SYN packet a MSS of 456 bytes but a window
size of only 280 bytes. That means, that the CP 343-1 LEAN cannot even store
a segment with it's MSS of 456 bytes.

NutOS limits the sended segment sizes to the peer MSS, but always waits
until the peer's window is big enough to store even the data to send.
In your application Nut tries to send 300 bytes, which obviously dont fit in
the receivers window. So Nut aborts the connection after a timeout, I think.

Indeed the "old" (and expensive (~1200EUR), I know that, b/c I also used the
343-1 on my own developments) CP from SIEMENS indicates a MSS of 512 bytes
and a window size of 560, which will work with Nut.

Although this behaviour from the CP LEAN is very, very unusual (never saw
that before), I cannot find in the RFC 793 (TCP) that this is a fault. That
means, we have a bug in Nut/Net.

@Dirk: To check whether I'm right, please limit the MSS to 200 bytes, or
limit the send data to 200 bytes. Then it should work.

Hope that helps,
Oliver.


> -----Ursprüngliche Nachricht-----
> Von: en-nut-discussion-bounces at egnite.de 
> [mailto:en-nut-discussion-bounces at egnite.de] Im Auftrag von 
> Dirk Becker
> Gesendet: Dienstag, 30. Mai 2006 18:30
> An: Ethernut User Chat (English)
> Betreff: Re: [En-Nut-Discussion] TCP/IP-Problems
> 
> Hi Brett,
> 
> thanx for your reply.
> Further down you can find a tcpip-dump of the working station.
> The problem with the long delay seems to be the TCP/IP-retransmission 
> timer, it happens only shortly after sytem start (a bit strange, but 
> probably not the real problem). A bit further down you can 
> find packets 
> where no retransmissions happen, but also no data is sent and the 
> Ethernut? ends the connection.
> Dusan Ferbas pointed out, that it might be connected with MSS not 
> accepted by the Ethernut and he suggestet me changing MSS. 
> With his help 
> the LEAN version now sometimes accepts some data but then the 
> connection 
> is dropped again suddenly after 1 packet with data (300 Bytes).
> 
> Kind regards,
> 
> Dirk
> 
> 
> 
> 

_______________________________________________
En-Nut-Discussion mailing list
En-Nut-Discussion at egnite.de
http://www.egnite.de/mailman/listinfo.cgi/en-nut-discussion




-- 
-----------------------------------------------------------------
Brett Abbott, Managing Director, Digital Telemetry Limited
Email: Brett.Abbott at digital-telemetry.com
PO Box 24 036 Manners Street, Wellington, New Zealand
Phone +64 (4) 5666-860  Mobile +64 (21) 656-144
------------------- Commercial in confidence --------------------





More information about the En-Nut-Discussion mailing list