[En-Nut-Discussion] Thread stops executing after some time.

Henrik Maier hmlists at focus-sw.com
Fri Apr 4 03:36:56 CEST 2008


Interesting results. In particular to see all stalled sockets with so_retran_time=0 but a pending send buffer (nbq <> 0) indicating there is something to be sent but no retransmission occurring.

Maybe there is an issue with Nut/OS' re-transmission management?

Nut/OS is using a 32-bit ms timer as time base. However the TCP state machine is using only a 16-bit ms time base for time-out management. Every 65536 ms we have the case that the lower 16 bits of NutGetMillis are zero. This rollover happens almost once a minute (every 65 s), quite frequently in fact.

When a packet is sent the first time in NutTcpOutput, the re-transmission time is set to the lower 16-bit of NutGetMillis using the following statement:

            sock->so_retran_time = (u_short) NutGetMillis();

If this happens exactly at rollover time, then sock->so_retran_time is set to 0. 

Subsequently that packet will never be re-tramsitted because of this if-clause in NutTcpSm:

                if (sock->so_tx_nbq && sock->so_retran_time) {
                    ...
                        NutTcpStateRetranTimeout(sock);
                    ...
                }

And subsequently this socket will forever stay in SYN_SENT state and never time-out, because it's never re-transmitted.

Erik, I suggest to change in the Nut/OS file net\tcpout.c (around line 336) the statement:
            sock->so_retran_time = (u_short) NutGetMillis();
to
            sock->so_retran_time = (u_short) NutGetMillis();
            if (sock->so_retran_time == 0) 
               sock->so_retran_time = 1; // so_retran_time must not be 0 which is a magic value!

and to recompile Nut/OS. See if that changes your issue.

Regards

Henrik
http://www.proconx.com

> -----Original Message-----
> From: en-nut-discussion-bounces at egnite.de [mailto:en-nut-discussion-
> bounces at egnite.de] On Behalf Of Erik L
> Sent: Wednesday, 2 April 2008 6:21 PM
> To: en-nut-discussion at egnite.de
> Subject: Re: [En-Nut-Discussion] Thread stops executing after some time.
> 
> 
> Because of the long time it takes before i can see the result of the tests
> only the ones without debuging of the pointers have falled now.
> 
> But i did get some interesting info out of that one.
> The printout i get from the one that faild is the following (I listed the
> sockets 3 times)
> 
> -----------
> 222 List of sockets
> 
> (DEAD SOCKET)
> SYN SENT        220F     TCP     192.168.0.112:6130      192.168.0.115:9050
> last_error:0    so_retran_time:0        so_rtto:1000    so_retransmits:2,
> NutGetMillis:35943





More information about the En-Nut-Discussion mailing list