[En-Nut-Discussion] tcpsm crash on socket timeouts

Marti Raudsepp marti at voicecom.ee
Wed Oct 20 12:27:38 CEST 2010


Hi list,

It seems that there is a race condition in the tcpsm thread when
handling transmit timeouts (and possibly in other connection abort
cases). This problem is only triggered if the TCP-using thread has
higher priority than the 'tcpsm' thread, so a workaround is lowering
your threar priority.

It all starts in NutTcpStateRetranTimeout(), which after
TCP_RETRIES_MAX calls NutTcpAbortSocket()

NutTcpAbortSocket does the following:
    else
        sock->so_state = TCPS_CLOSED;
    NutTcpDiscardBuffers(sock);
    NutEventBroadcast(&sock->so_rx_tq);
    NutEventBroadcast(&sock->so_tx_tq);
    NutEventBroadcast(&sock->so_pc_tq);
    NutEventBroadcast(&sock->so_ac_tq);

Now, if the socket thread has higher priority than tcpsm, the first
NutEventBroadcast will immediately switch away from tcpsm to the
socket thread.

The socket thread receives an error code from NutTcpSend/Receive, so
the most likely action is to call NutTcpCloseSocket()

NutTcpCloseSocket() calls into NutTcpStateCloseEvent() which does:
    switch (sock->so_state) {
    case TCPS_LISTEN:
    case TCPS_SYN_SENT:
    case TCPS_CLOSED:      // so_state was changed to TCPS_CLOSED above
        /*
         * No connection yet, immediately destroy the socket.
         */
        NutTcpDestroySocket(sock); // frees the memory used by sock

Eventually control passes back to tcpsm which is still in
NutTcpAbortSocket() and continues to use the recently freed socket
pointer. Since NutEventBroadcast does indirect function calls, this
can result in executing arbitrary pieces of memory. During our
testing, once it even corrupted the on-chip flash memory and wouldn't
boot anymore!

There are also other code paths that call NutTcpAbortSocket which
might be suspect. Simply changing NutEventBroadcast to
NutEventBroadcastAsync probaly won't work because the callees are
using NutThreadYield.

The right solution seems to be to assign the socket a different state
and let tcpsm free the socket asynchronously.

On the other hand, it may be that there are other cases where sockets
in TCPS_LISTEN/TCPS_SYN_SENT/TCPS_CLOSED are still held by the tcpsm
thread. Maybe freeing memory directly from NutTcpCloseSocket() is the
wrong thing in the first place? This should be verified.

Regards,
Marti Raudsepp
Voicecom OÜ



More information about the En-Nut-Discussion mailing list