[En-Nut-Discussion] Fear not, good Sir... TCP might still be saved...

Michael Jones Michael.e.Jones at web.de
Tue Jun 20 01:07:34 CEST 2006


Hello!

I've spent the last few hours tracking down our TCP demon that has been
lurking over us for so long... 

I had the luck that unlike in my past experiments I managed to crash 99% of
the 37 nut/os driven boards within 3 minutes by flooding them with ARP
messages...

...but wait there is more! Exactly the same happens when broadcasting random
packets. 

I discussed this new aspect with Harald and we were both stumped but we now
knew that the packets never actually reached e.g. the TCP/IP Stack or ARP -
so what was causing the trouble? 

Doing the usual plastering the os with trace outputs I found it. (Actually
thanks to Harald and a comment he made in the discussion before!)

So here is what I found:

If more then 2000-3000 packets (regardless if broadcasts or actually
addressed to the device) hit nut/os within a 2 second window and are handled
the unit can crash or start to behave erratic. The actually amount of
packets depend on the remaining heap space.

The cause seems to be the line:

	NutEventWait(&ni->ni_rx_rdy, 2000);

...in the NicRxLanc thread within the driver.

	If the 2000ms are replaced with 0ms that problem is gone. 

It could be so simple...

...but now I wanted to know why it makes a difference if 2000ms or 0ms is
specified. Well, as it seems every NutEventWait(...) which is called with a
non 0ms value calls NutTimerCreate(...) which allocates a timer object on
the heap (using TM_ONESHOT). Now that as such it nothing upsetting until you
look closer - once NutTimerInsert(...) is called the timer will tick away
till the time is reached, calls it callback (it still active) and is freed.
But... when the event if signaled before the timer reached its timeout the
timer (and its memory) stay allocated the full remaining duration e.g.
2000ms. So the next event adds a new timer and a new timer and a new
timer... bang (if heap / 12 < events / sec).

So I tried a few things e.g. placing NutTimerStop(...) at the end of
NutEventWait(...).

Actually this only made things even stranger...

So now my question is; does anybody have an idea how we fix this? What can
we do so that fast sequences of signaled events using timers don't hog the
heap?

Regards!

Michael








More information about the En-Nut-Discussion mailing list