[En-Nut-Discussion] network throughput gain (nicrtl output)

Fri Nov 28 22:27:07 CET 2003

The loops in nicrtl.c/NicPutPacket that copy data from RAM to the IOPORT look
optimal. But writing them as two nested loops each with bytewide counters
yields even better performance at least with avr-gcc. In principle, when...

>    for (i = ...; i; i--)
>        nic_write(NIC_IOPORT, *p++);

was rewritten to something like this...

>    do {
>        do {
>            nic_write(NIC_IOPORT, *p++);
>        } while (il--!=0);
>    } while (ih--!=0);

...I get about 275 kbyte/s throughput from Enut to peer, compared to
about 235 kbyte/s without the change. The compiled loop bodies then look
similar to this code:

>   .L94:
>          ld r24,X+
>          std Y+16,r24
>          subi r18,1
>          brcc .L94
>          subi r19,1
>          brcc .L94

It would be interesting to see if this yielded similar performance gain
when compiled with ICCAVR.

I temporarily put a patch for nicrtl.c/rtlregs.h of 3.3.2 online at

  http://at.telos.de/~kawk/tmp/faster_dma.diff

Probably a similar enhancement could be applied for input (and at
several other places where significant amounts of data need to be
copied at once, e.g. memcpy and others).

However, this is an optimization specific to 8 bit MCUs. It would
probably yield a performance loss on others, and my patch is definitely
incorrect where a u_char isn't 8 bits.

Comments welcome!

Regards,
Kolja

-- 
mr. kolja waschk - haubach-39 - 22765 hh - ger
phone +49 40 889130-34 - fax -35 - e-mail s.a.