[En-Nut-Discussion] branches/devnut_m3n: Status of device support

Sat Oct 22 17:26:38 CEST 2011

Hi Henrik,

On 21.10.2011 09:46, Henrik Maier wrote:
> Hi Ulrich,
> 
> For Ethernet I prefer to have a bit more RAM. On my AT90CAN128 design 
> with 32K I ended up always a bit short of RAM which is needed for thread 
> stack to have TCP connection endpoints.
> 
Yes, for ethernet you think you have to use more RAM, but times have
changed and the chips designers did their homework.

In earlier times you had to copy out the received Data from a
dual-ported RAM area before you could use it and you had to prepare your
bytes to send in a second area and then copy them into that pseudo-RAM
of the EMAC.
So it ended up in a receive area for new data, a working receive area
for the last data, a preparation area for sending next and a sending
area of the data actually going out. Then all this DMA setup or
interrupt handling starts and you block your system with this management
by throwing bytes around.

But today you just tell the DMA controller where to place the data you
need and where the data is prepared you like to send. You even can
synchronize this as the DMA controller reads the header of your areas
and automatically detects if you still access the packet or if it is
ready for sending (and vice versa). So you can dynamically or statically
block RAM areas that you declare for network handling and you only one
time set up the EMAC. Anytime your packet is ready, set the bit in the
header and start over with you application.
Same with the receiving bytes.

> I have not used an ARM CPU before, so how does the RAM consumption for 
> stack and general Nut/OS use compare to an AVR? I assume it is more just 
> because of the 32-bit architecture but what sort of factor would you 
> apply? I am just after a rough number like 30% or 50% more?

Yes and no!
That is not that easy to see.

If you say ARM, you can decide to use 16bit and/or 32bit. But all data
is 16 or 32 bits aligned in RAM and if you do a lot of work with
variables smaller than these two, you still use at least a minimum of 2
bytes of RAM as the ARM architecture can only address word aligned.
Using 8 bit variables additionally produces software overhead as the
code always does int/byte conversions for each access.
For ethernet frames you do a lot of 16bit things for headers and
checksums. So you do not loose a byte of RAM but you earn speed.

With CortexM3 you can use 8/16/32 bits as the chip supports them all. It
even supports unaligned addressing. So reading CAN structures with lots
of 8 bit values assembled does not produce any software overhead and not
any RAM overhead.

On the other hand:
Lets say you design instruments that measure temperature, volts,
currents, pressure... whatever...
Most values you get from ADCs are 12..14 bits. So writing the code to
transform these values to some readout for a display or serial data for
any kind of bus is the same if you use a high level language like C.
But inside a CPU it makes lots of difference!
In an AVR a simple 16 bit calculation must eventually clean up those few
registers that are able to calculate with 16 bits. Logical operations on
16bits always add code to decide which part of the double-8 bit value in
RAM needs to be loaded and modified.

With a 16/32 bit CPU there is just a read, modify write.

With ARM7 you might loose some RAM but spare a lot of code if the most
values to handle is 16 bit compared to any 8bit architecture.
With Cortex you have speed improvement, saving of code space and no loss
of RAM.

That was the reason for me to go to STM32F. Our CAN sensors do not need
the 20kB of RAM as there is not much data for most of them to handle.
They do not need 64k of FLASH as the code is optimized cause the values
to handle match the fetaures of the architecture. We could step down to
64 or, with some debug printfs removed, to even 32k chips. But the
manufacturer uses large amounts of the 128k type, so it woul eventually
be more expensive for us to take the smaller chip.
And we do not need the 72MHz on most of the systems. I just programmed
the system vie qnutconf to start with 16MHz. Nice side-effect: We use
less than 25mA if the LEDs are off :)

So, think about the input and the output of what you like to design and
find a mostly matching thing in between to do what the system needs.

With CortexM3 the decision is very easy as it matches almost everything.
>From little temperature sensor with CAN on an STM32F103 up to a full
featured network or CAN connected Display or even internet radio using
an STM32F107 or STM32F20x.

I did not follow the other manufacturers of CortexM series chips. I had
not time. At the time I needed something, last year, STM was the only
one that could deliver a CPU that works. The errata sheets where small
and none of the mentioned problems where a problem for us or if, I found
simple workarounds for nut/OS.
ATMEL still slept at that time and the few Cortex based chips available
had a price that was not discussable.
TI had chips but the errata was bigger than the datasheet...
And NXP never answered any call or gave any reaction on the vcards I
gave them on the expositions.

So if I was asked for an advice, take STM and be happy. You can use the
development branch devnut_m3n and you can be sure that it works as it is
in productions at the company where I work.
I did not merge it into trunk for now as there are some changes in the
API interfaces that are incompatible to ARM and AVR. To adopt these
changes for ARM would improve ARM but is still a bit of work. For AVR I
have no guess if and how it would be affected. I guess even there some
bytes in the resulting code would be saved and some speed improvement
would result. But I need to check.