[En-Nut-Discussion] ICMP echo request response failure on ARP cache timeout

Thu Jun 4 13:02:37 CEST 2009

Hi Nathan,

Sorry for not coming back to this earlier and thanks for your valuable
input.

Nathan Moore wrote:

> I toyed with the idea of introducing tx threads per network device (didn't
> this exist a long time ago?)

I'm almost sure that this never existed. At least I remember that the
ICMP/ARP problem existed in the very beginning.

> , but then threads are unlimited in the speed with which they may add to the
> tx queue.

Right. On the other hand the same is true for the current
implementation, where the receiver may overflow the TCP message queue.
Currently an ugly hack is used to limit TCP traffic: If the available
memory drops below a certain value, then segments are discarded.

The reason for using a dedicated thread for the TCP state machine is,
that TCP needs to initiate transfers based on timing (re-transmission,
zero window polling).

If we do the main TCP state processing in the network receiver thread
and do all transmits in another thread, like you suggested, we are safe
from TCP segment overflows. If more segments arrive than we can process,
then the overflow occurs in the Ethernet interface and will not eat up
our valuable RAM.

We'd still need a TCP state machine thread to handle re-transmissions.
That one could be simpler than the combined timing/handler loop used
right now.

> To get
> around this I thought about adding an event to netbufs which the thread that
> generated the packet
> could (but wouldn't have to) wait to be signaled by the tx thread.

We may use a simpler mechanism to limit the output queue. We can, for
example, track the total size of waiting packets and block the sender
only if the queue is full.

This will not completely solve the ICMP echo problem, but it will now
occur only, if the tx queue is filled up.

Another question is: Which part of the layer should be processed by the
transmit thread?

Doing this in the driver may not be the best idea. It requires to
rewrite several drivers and may make writing new drivers more complex
than it needs to be.

The IP layer is attractive, because only a singe thread is required to
handle all interfaces including ARP. However, ARP and other non-IP
protocols (we use DLC in one application) must be handled special and we
may end up with similar problems than we have now.

So, it looks to me, that the physical layer is the one to handle in the
transmit thread. We'd need one thread per interface type, currently
Ethernet and PPP.

> (btw, the "who frees a netbuf" within nut/net is very
> confusing)

You're not alone. The initial idea was to release the NETBUF as early as
possible to make it available to the system for other requirements.
Thus, in case of a fatal error, it is released by the lower level
routine. In all other cases it is kept, because the top level may re-use
it (TCP re-transmits).

In early days we had a lot of trouble with the RTL8019. One reason was
that even Realtek was not aware, that only half of the chip's memory is
available in 8-bit mode.

Today fatal errors occur almost never. May be it's the right time to
change this weird NETBUF handling now.

Harald