[En-Nut-Discussion] Stuck TCP socket in Nut/OS 4.8.7

Coleman Brumley cbrumley at polarsoft.biz
Tue May 5 04:02:12 CEST 2015


Sorry, if this is TL;DR, but I really need a TCP state machine guru to chime
in on this. If you'd like to skip to the good parts, I have a TCP socket
that has been stuck in the "CLOSING" state for nearly 20 minutes. I believe
this may be the root cause of my leaks, so I'm wondering if there's an
option I can set to make the stuck socket simply close after X amount of
time without changes. I have both SNDTIMEO and RCVTIMEO set to 1000. 

 

Well, even after solving the "Creating stream device failed" issue (which
has been issues for years!), I'm still leaking memory. 

 

I added some code to periodically display the TCP socket states and here's
what I think I'm seeing. By the way, both the SNDTIMEO and RCVTIMEO are set
to 1000. This seems to have helped, but I'm still seeing a "stuck" TCP
socket state. Nearly 8 minutes after closing IE, there is still a "hanging"
TCP socket in the CLOSING state.

 

I've run my HTTP test using both Chrome and IE, and it looks more like IE is
the culprit. The way the web interface works is via JavaScript/AJAX and the
JS makes periodic calls to get small amounts of data from the controller. 

 

At startup, I display all of the TCP socket states. And, they're what I
expect:

 

19:00:05 12-31-1969:

Heap Available=22984

00209DDC 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

00209D24 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

00209C6C 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

00209BB4 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

00209AFC 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

00209A44 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

 

I think open a new Chrome window and connect to the controller:

21:35:00 05-04-2015:

Heap Available=22744

00209AEC 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

0020A664 192.168.3.76:80

                192.168.3.10:60644 -->TIMEWAIT

0020B400 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

00209E34 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

00209D7C 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

0020B500 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

0020AE04 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

 

Then, after 40 seconds, I open a second Chrome window and connect to the
controller:

21:35:40 05-04-2015:

Heap Available=21628

0020AE90 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

0020B1C4 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

00209A18 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

0020A7DC 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

0020AFA4 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

0020A640 192.168.3.76:80

                192.168.3.10:60716 -->TIMEWAIT

00209B4C 192.168.3.76:80

                192.168.3.10:60717 -->TIMEWAIT

0020A4D0 192.168.3.76:80

                192.168.3.10:60714 -->TIMEWAIT

0020ADB0 192.168.3.76:80

                192.168.3.10:60715 -->TIMEWAIT

00209CF8 192.168.3.76:80

                192.168.3.10:60718 -->TIMEWAIT

0020B10C 192.168.3.76:80

                192.168.3.10:60719 -->TIMEWAIT

00209E28 192.168.3.76:80

                192.168.3.10:60720 -->TIMEWAIT

0020A588 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

 

Then, after one minute, I close the first Chrome window:

21:36:40 05-04-2015:

Heap Available=22904

0020A6C4 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

0020B15C 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

00209B88 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

0020AC4C 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

00209D28 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

00209E14 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

 

Then, after another minute, I close the second Chrome window:

21:37:40 05-04-2015:

Heap Available=22988

0020AA18 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

0020A894 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

0020A7DC 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

00209CB8 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

0020A960 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

00209E14 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

 

As you can see, at this point, the heap is back where we started (22988
available) and all of the sockets have closed. 

 

However, with IE (IE 11, BTW), I get subtly different behavior. 

 

Here are the socket states after running with 2 IE, but after closing the
first:

21:39:00 05-04-2015:

Heap Available=21968

0020A588 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

0020A7DC 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

0020AAC4 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

0020AC30 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

00209BB4 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

00209AFC 192.168.3.76:80

                192.168.3.10:61012 -->TIMEWAIT

0020AD18 192.168.3.76:80

                192.168.3.10:61013 -->CLOSING

0020A69C 192.168.3.76:80

                192.168.3.10:61006 -->TIMEWAIT

0020A4D0 192.168.3.76:80

                192.168.3.10:60984 -->CLOSING

00209A18 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

 

The highlighted entry is the troublesome one.

 

Here's the entry after closing the second IE window:

 

21:39:20 05-04-2015:

Heap Available=22652

00209B88 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

00209C40 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

0020A640 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

00209AD0 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

0020AA98 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

0020A4D0 192.168.3.76:80

                192.168.3.10:60984 -->CLOSING

00209A18 0.0.0.0:80

                0.0.0.0:0 -->LISTEN

 

As you can see 0020A4D0 is still "CLOSING".

 

And, here I am, nearly 20 minutes later:

 

21:57:22 05-04-2015:

Heap Available=22640

0020AA38 0.0.0.0:80

        0.0.0.0:0 -->LISTEN

00209E10 0.0.0.0:80

        0.0.0.0:0 -->LISTEN

0020A8D0 0.0.0.0:80

        0.0.0.0:0 -->LISTEN

00209AD0 0.0.0.0:80

        0.0.0.0:0 -->LISTEN

0020AB7C 0.0.0.0:80

        0.0.0.0:0 -->LISTEN

0020A4D0 192.168.3.76:80

        192.168.3.10:60984 -->CLOSING

00209A18 0.0.0.0:80

        0.0.0.0:0 -->LISTEN

 

I don't have a Wireshark capture of what IE does when it closes. My
assumption is that it correctly closes the sockets on its side (I could be
wrong, or course). But, it seems odd that after 20 minutes, that Nut/OS has
not destroyed this connection. 

 

Is there another option I should set to "lower the bar" for this behavior? 

 

Coleman

 

From: Coleman Brumley [mailto:cbrumley at polarsoft.biz] 
Sent: Monday, May 04, 2015 7:39 PM
To: en-nut-discussion at egnite.de
Subject: RE: File descriptor leak in 4.8.7

 

Ok, sorry for the triple post, but it turns out the addition of a mutex to
the code below solved the "Creating stream device failed" error. 

 

The new code has now been running for around 45 minutes with 3 browsers
connected and no error indications. I'm waiting for the next 30 minute
interval report to see if the heap continues to fall. 

 

Coleman

 

From: Coleman Brumley [mailto:cbrumley at polarsoft.biz] 
Sent: Monday, May 04, 2015 5:48 PM
To: en-nut-discussion at egnite.de <mailto:en-nut-discussion at egnite.de> 
Subject: RE: File descriptor leak in 4.8.7

 

After doing some arithmetic, it appears that each occurrence of "Creating
stream device failed" causes a 1271 byte leak from heap space. Does that
number ring a bell? 

 

I've re-verified that this does not happen unless a browser is making HTTP
requests of the controller. If the controller is just "sitting there", there
is no leak. I verify this via a printout of available heap space every 30
minutes. After running for 4 hours, there were no leaks. 

 

While there is no set time between each occurrence, this does seem to happen
about 5 times in 30 minutes. 

 

The errno value 23 means that too many files are open, correct? FOPEN_MAX is
defined as 8, so is having six concurrent HTTP threads running too many? Is
it possible that the use of multiple HTTP threads without using a mutex is
causing the file open handles to corrupt somehow and therefore not be closed
correctly?  

 

Coleman

 

From: Coleman Brumley [mailto:cbrumley at polarsoft.biz] 
Sent: Monday, May 04, 2015 1:19 PM
To: en-nut-discussion at egnite.de <mailto:en-nut-discussion at egnite.de> 
Subject: File descriptor leak in 4.8.7

 

Everyone,

 

I'm working with Nut/OS v4.8.7 (for many, many reasons, I CANNOT upgrade the
core OS). The processor is an Atmel SAM7X256.

 

I'm encountering a problem with the use of file descriptors in HTTP, and I
believe the core effect is a resource leak. 

 

Here is my HTTP service code:

 

THREAD(Service, arg)

{

       TCPSOCKET *sock;

       FILE *stream;

       u_char id = (u_char) ((uptr_t) arg);

      u_char timeout=0;

       time_t t;

       struct _tm tmx ;

       uint16_t lowheapcounter=1000;

NutThreadSetPriority(72);  

 

    /*

     * Now loop endless for connections.

     */

 

    for (;;) {

 

        /*

         * Create a socket.

         */

              if ((sock = NutTcpCreateSocket()) == 0) 

              {

                     printf("[%u] Creating socket failed\n", id);

                     NutSleep(5000);

                     continue;

              }    

        /*

         * Listen on port 80. This call will block until we get a connection

         * from a client.

         */

        NutTcpAccept(sock, 80);

      

 

        /*

         * Detect low heap and if it happens for too long, reset.

         *

         */

              while (NutHeapAvailable() < 4096) {

                     t=time(0l);

                     localtime_r(&t,&tmx);

                     printf("%02d:%02d:%02d %02d-%02d-%04d -- ",

                           tmx.tm_hour,

                           tmx.tm_min,

                           tmx.tm_sec,

                           tmx.tm_mon+1,

                           tmx.tm_mday,

                           tmx.tm_year+1900);

                     printf("[%u] Low mem (%d)\n", id,
(int)NutHeapAvailable());

                     NutSleep(1000);

                     lowheapcounter--;

                     if(lowheapcounter<=0) Hreset();

              }

 

        /*

         * Associate a stream with the socket so we can use standard I/O
calls.

         */

              

              while (((stream = _fdopen((int) ((uptr_t) sock), "r+b")) ==
0)&&(++timeout<0xFF)) 

              {

                     NutSleep(10);

              } 

              if(!stream)

              {

                     printf("\n[%u] Creating stream device failed w/
errno:%d\n", id,errno);

NutTcpCloseSocket(sock);

                     continue;

              }

              else

              {         

                     NutHttpProcessRequest(stream);            

                     fclose(stream);

                     NutTcpCloseSocket(sock);

              }

    }

}

 

The console output shows all is normal initially, but after a while (approx.
765 seconds, initially after boot time) I start seeing"

 

[3] Creating stream device failed w/ errno:23

 

Where the number in braces is the thread id. I'm running 6 HTTP threads,
started as follows:

for (i = 1; i <= 6; i++) 

{

char thname[] = "httpd0";

        thname[5] = '0' + i;

        NutThreadCreate(thname, Service, (void *) (uptr_t)
i,HTTPD_SERVICE_STACK);

}     

 

The stack size is defined as 1024. 

 

I noticed this question was asked some time ago on this list, but I could
not determine if there was ever a solution posted. There are other I/O
processes running in the controller, but it ONLY runs low on resources when
a browser is connected to it. 

 

Does anyone have any suggestions as to what I could look into as to what
might be the problem? 

 

While the browser(s) are connected, the system appears to lose about 4.5k of
heap space per 30 minutes! 

 

Thanks in advance.

 

Best Regards,

Coleman



More information about the En-Nut-Discussion mailing list