[En-Nut-Discussion] TCP sockets stuck in closing state
cbrumley at polarsoft.biz
Mon May 11 21:55:52 CEST 2015
As you point out, there is a lot of guessing and conjecture associated with
The facts are as follows:
1) I'm using Nut/OS 4.8.7 on an AT91SAM7X256
2) I'm using the HTTPD example code modified as follows; I added a mutex
which gets locked after NutTcpAccept() returns 0 (this completely solved the
lingering "Failed to create stream" error; the call to NutTcpAccept is now
in a loop which will sleep for 1 second if NutTcpAccept returns -1; and I've
made absolutely certain that the socket gets closed properly after the HTTP
request is processed. I will put these changes together as I get time to do
so. I don't know where example goes in the grand scheme of things, but I
hope that my changes will get put into the example.
3) My web application, in fact, uses many short lived connections as it's
retrieving data via XML and web services.
4) Using the original /net/tcpsm.c code (yes, against IE in Window 7), after
several seconds, I end up with some number of sockets in the TCPS_CLOSING
state. The sockets in this stay there indefinitely, and after some hours,
all of the heap space is consumed by new sockets which also end up stuck in
5) I've modified NutTcpStateActiveOpenEvent in /net/tcpsm.c per the SVN
version and feedback from this list.
6) I've modified NutTcpFindSocket to return sockets in the TCPS_CLOSING
state if no exact matches are found.
7) I've modified the code in /net/tcpsm.c to increment the so_time_wait
counter if the socket is in the TCPS_CLOSING state, I've also modified the
code such that the counter does not get reset if the socket reenters the
8) I've modified the NutTcpSm thread to close sockets that have been in the
TCPS_CLOSING state for more than 9 seconds. This may need increased to a
longer period, but I simply don't know what the right length is.
9) I've increased the TCP SM stack space to 1024.
All of these changes have corrected the issues I was seeing. The original
problem does not happen with UDP sockets at all, obviously.
This problem can be recreated using the example HTTPD code and the original
/net/tcpsm.c code. As you point out, this test case needs to be very simple.
HTTPD only without any other "candy".
I will put together that example as soon as time permits.
Thanks for the pointer, but I don't need or want commercial support for my
application. The whole point of posting to the list, and I apologize if that
was the wrong thing to do, is that I was hoping to get feedback from someone
who is intimately familiar with the TCP state machine code to chime in.
> -----Original Message-----
> From: en-nut-discussion-bounces at egnite.de [mailto:en-nut-discussion-
> bounces at egnite.de] On Behalf Of Harald Kipp
> Sent: Monday, May 11, 2015 2:32 PM
> To: Ethernut User Chat (English)
> Subject: Re: [En-Nut-Discussion] TCP sockets stuck in closing state
> Hi Coleman,
> On 07.05.2015 16:33, Coleman Brumley wrote:
> > Based on my overnight testing, this has worked very well. To be
> > honest, I don't if it's resulted in incorrect TCP behavior, but I have
> > noticed any negative side effects. But, the code no longer leaks heap
> > space and that is what is important to me. I'd rather the TCP SM have
> > to renegotiate that the board need to be reset because it has no
> heap space.
> While reading through this thread, I just see wild guesses and trials in
> dark. Quite some years ago there had been some trouble with short lived
> connections at high frequencies and also with sending a large number of
> segments. We set up simple test cases to address this problem and were
> able to fix it.
> In the meantime many things changed within the kernel and in the TCP state
> machine and one of the old monsters may have re-appeared in a new
> I'm currently working on a large application with all kinds of listening
> connecting TCP and UDP ports. Everything looks rock solid, running for
> months without any problem or reboot. Therefore, as you can imagine, this
> thread didn't really catch my attention. But the growing size and age of
> thread starts to make it highly visible. :-) Anyway, not much information
> provided, which would again attract me to dive in.
> Could be thread priority, could be HTTP, could be... followed by a number
> dubious hints, what could be tried else.
> While working on my current app, I also experienced problems with closing
> sockets, which looked to me like the behavior of Windows changed since
> Windows 7. I rarely see FINs, but frequently RSTs instead. As I'm using a
> hybrid Nut/OS version, something between 4.8 and the trunk, I haven't been
> able to create proper patches, not talking about testing them with the
> trunk or other versions on several platforms. In this sense, this is also
> valuable information, like several other contributions to this thread. It
> states: Yes, there is something wrong somewhere.
> The only thing I have so far is, that you, Coleman, are using Nut/OS
> 4.8.7 on an AT91SAM7X256. 4.8 is actually a nice and well maintained
> I'd suggest 4.10, but it's not really required to do this upgrade. Both
> known to work with SAM7X.
> So what? Simply this way: Create a test case, which is so simple and easy
> try, that almost everyone can reproduce the problem without spending
> more than half an hour. And simple means simple. Too often I asked for
> simple test cases and received too much. Every reference to any function,
> which is not directly related to the test, must be excluded. I'm quite
> that this will fix your problem less days, than a fraction of the age of
> If you need someone to debug your full application, there are several
> companies offering commercial support. If it's not a commercial
> than put the source code on Github or elsewhere and lets see, if it is
> enough, so that someone else would need it and spend some time on it.
More information about the En-Nut-Discussion