[En-Nut-Discussion] TCP/IP Stack crashes

Harald Kipp harald.kipp at egnite.de
Wed Jan 11 11:12:17 CET 2006


Thorsten,

Years ago a customer reported a similar problem. It appeared
that the Realtek suddenly stopped generating interrupts when
running in large networks. We solved this with two modifications.
First the Realtek was polled, if no receive interrupts appeared
within 1 second. Second we added a complete reset of the
chip on specific errors.

Later Bengt Florin spent some time to further investigate the
stability and came to the conclusion, that it's working much
more reliable when using polling mode for transmission. Furthermore
he found, that the inability of the ATmega to respect the ISA
WAIT signal of the Realtek was one of the main causes of
instability and he added a delay at a few places. After his
changes we removed the initial "hacks" and everything worked fine.

I can't remember similar problems with the LAN91C111 and also the
DM9000E looks quite reliable. Well, from time to time there are
reports about Ethernuts, which stop working after some time or
in specific environments. But recently none of them could be tracked
down to any specific hardware or Nut/OS problem so far.
Most often problems are caused by the application code, corrupting
heap memory, overwriting stack space etc.

Anyway, one problem I can think of is within ARP code. Large
networks produce a lot of broadcasts and Dusan Ferbas had never
been satisfied with the current status. Either because of lack of
time or because I simply can't follow his critics, there hadn't
been done anything in this area. May be Dusan knows something,
which I do not.

If your crash happens within minutes, you are lucky. The old
Realtek problem appeared about once a week. That was a lot of
real trouble. I'd suggest to open stdout on devDebug and
insert printf() statements in the hole Ethernet receive chain.
Note, that you can use devDebug even within interrupt context.
IMHO, tracing is the only way to track down such problems. JTAG
debugging will not help at all.

As soon as you found the location where Ethernut stops working,
please let me know and I'll try to help.

Harald


At 09:35 11.01.2006 +0100, you wrote:
>Hello,
>
>again I have the following problem with a small Webserver application based
>on the http demo on NutOS 3.9.8:
>
>The applicatione runs weeks without crash on my private network, even if I
>heavily access the Nut's website from three PCs, using a script which
>reloads the page every 10 seconds using htget with three PCs simultanously.
>
>Now I installed the device in our companie's network which is incredibly
>big, a routed network with several thousands of PCs. In my segment are about
>150 PCs.
>
>10 Minutes later the webserver Thread hangs and the Nut didn't respond to a
>ping anymore.
>
>The application's main thread is alive, seen at debug outputs on the serial
>port. These debug outputs constantly reports about 23kB free heap. So I
>don't think it is a memory leak.
>
>Maybe it has something to with high network or broadcast load?
>
>Can anyone give me a hint what maybe wrong or what I should test?
>
>Thank you
>Thorsten




More information about the En-Nut-Discussion mailing list