[En-Nut-Discussion] ICMP echo request response failure on ARP cache timeout

Fri Jun 5 01:30:46 CEST 2009

>
>
> Nathan Moore wrote:
>
> > I toyed with the idea of introducing tx threads per network device
> (didn't
> > this exist a long time ago?)
>
> I'm almost sure that this never existed. At least I remember that the
> ICMP/ARP problem existed in the very beginning.

I never actually saw code that had that in it, but I seem to remember some
old documentation
listing tx and rx threads for Ethernet.  I think it was showing off listing
of the threads rather than
anything specifically network related.  It's been months since then now and
I can't remember what
document that was.  Anyway, I just assumed that at the time that document
was created tx threads
were in use.

>
>
> > , but then threads are unlimited in the speed with which they may add to
> the
> > tx queue.
>
> Right. On the other hand the same is true for the current
> implementation, where the receiver may overflow the TCP message queue.
> Currently an ugly hack is used to limit TCP traffic: If the available
> memory drops below a certain value, then segments are discarded.

Every time I try to do much that involves the TCP state machine thread my
head
explodes, so I don't pretend to understand all of it's internals.

>
> The reason for using a dedicated thread for the TCP state machine is,
> that TCP needs to initiate transfers based on timing (re-transmission,
> zero window polling).

Yeah.  Oh, there may be an error in the TCP thead when using a very slow
network
interface.  A thread calling NutTcpSend() will indirectly call NutIpSend()
if the socket
is ready to send.  If the interface is slow enough the TCP thread may try to
retransmit
before the other thread has finished sending.  This probably never happens
on "normal"
interfaces.

> If we do the main TCP state processing in the network receiver thread
> and do all transmits in another thread, like you suggested, we are safe
> from TCP segment overflows. If more segments arrive than we can process,
> then the overflow occurs in the Ethernet interface and will not eat up
> our valuable RAM.

I tend to favor optimizing getting things sent over receiving because it
encourages
more free RAM.

>
> We'd still need a TCP state machine thread to handle re-transmissions.
> That one could be simpler than the combined timing/handler loop used
> right now.

Since you now do protocol registration you could make part of that
registration
a callback to a periodically called function and a pointer to it's data --
the TCP
thread could turn into one of these functions, and a more generic IP
transmission
thread would call it.

>
> > To get
> > around this I thought about adding an event to netbufs which the thread
> that
> > generated the packet
> > could (but wouldn't have to) wait to be signaled by the tx thread.
>
> We may use a simpler mechanism to limit the output queue. We can, for
> example, track the total size of waiting packets and block the sender
> only if the queue is full.

PPP fills up way too easily.  Well, on AVR it does, and that's the
only architecture I
use.

>
> This will not completely solve the ICMP echo problem, but it will now
> occur only, if the tx queue is filled up.

Well, in that case rather than blocking you could drop the echo reply.

>
> Another question is: Which part of the layer should be processed by the
> transmit thread?

This is a tough problem.  ARP requests complicate this a lot if you shoot
for maximum throughput since if you did it at the IP layer then you're tx
thread will either get blocked (and block the queue) when an arp request is
made (with the current arp code, I think) or would have to keep track of
which packets waiting for arp replies.  Changing the events in ARP entries
to netbuf* might get you halfway there.

>
>
> Doing this in the driver may not be the best idea. It requires to
> rewrite several drivers and may make writing new drivers more complex
> than it needs to be.

I wouldn't do it in the lowest level (chip specific) but maybe for PPP and
for ETH.
In my version of Nut (a fork) I have altered NutIpOutput to not know
anything about
ARP, ethernet, or PPP.  I replaced the drivers' generic output routine
pointers with
IP output routine pointers which do interface specific stuff (ARP) and then
directly call
the generic routines.  I did this because of the the nonstandard network
device I am
working with, but it has other benifits.  (similar things done for
NutIfConfig).

I can't think of any differences that could exist below the device family
layer that would impact
this.  I haven't used wlan, though and could conceive that it may behave
differently
from either PPP or Eth.

>
> The IP layer is attractive, because only a singe thread is required to
> handle all interfaces including ARP. However, ARP and other non-IP
> protocols (we use DLC in one application) must be handled special and we
> may end up with similar problems than we have now.

The more general purpose the solution the slower/bigger it gets.
I prefer a thread per device because it allows each device to be kept busy
as long
as there is work to do, but that's a moot point for most people b/c they
aren't using
multiple network interfaces.
(Oh yeah, in the NutIfConfig changes I mentioned above I changed it so that
PPP
didn't record it's IP to EEPROM, which it did in the same place as Eth.)

>
> So, it looks to me, that the physical layer is the one to handle in the
> transmit thread. We'd need one thread per interface type, currently
> Ethernet and PPP.

For ethernet you could have a chain of protocols which each had a queue of
packets (netbufs).
These chain links could be made of (raw, IPv4).  The Ethernet TX thread
would do different
things based on which link a packet was under.  Raw packets would just be
sent, while IP packets
would be fed through ARP.  If an arp request needed to be made then that arp
request would be
appended to the RAW queue (unless there was an outstanding request already).
 If the IP packet
was ready to send then it's ethernet header would be completed and that
packet would become a
RAW packet, and be placed in the raw queue.  Essentially only things in the
RAW queue can be
sent to the device.

>
> > (btw, the "who frees a netbuf" within nut/net is very
> > confusing)
>
> You're not alone. The initial idea was to release the NETBUF as early as
> possible to make it available to the system for other requirements.
> Thus, in case of a fatal error, it is released by the lower level
> routine. In all other cases it is kept, because the top level may re-use
> it (TCP re-transmits).

Messing with this made me wish for garbage collection.

>
> In early days we had a lot of trouble with the RTL8019. One reason was
> that even Realtek was not aware, that only half of the chip's memory is
> available in 8-bit mode.

I hate when that happens.

>
> Today fatal errors occur almost never. May be it's the right time to
> change this weird NETBUF handling now.

This would make it much easier for other people to work on networking code.

I had thought about using a linked list or tree-like structure for netbufs
(maybe even
with reference counting and garbage collection).  That would be a huge
undertaking
but would allow for more than the current 4 layer model, and cloning them
could
just be aliasing them at the layer that you were interested in --
NutIpOutput's
handling of broadcast would just pass the same netbuf head to all the
interfaces
and they would all generate their own headers for the same IP packet.

Nathan