[En-Nut-Discussion] FRC: Direct or Pointer

Sun Nov 13 18:57:51 CET 2011

Hi!

Am 13.11.2011 10:54, schrieb Harald Kipp:
>> Also, using structs has allowed Ulrich to make the code a lot smaller
>> if I recall an older email. So this is a nice plus to that.
> 
> Yes, this might happen. Using structures enables the compiler to keep
> the structure's base address in one register and access specific items
> with an offset that fits in the instruction word. On the other hand,
> using an additional register may lead to less optimal code.

That is, what I saw from the assembler output when optimizing Cortex
GPIO access and additionally when optimizing some other on board
peripherals.
> 
> That's difficult to predict without evaluating a few real world test
> cases. I remember having read about an interesting result: After
> switching some Linux drivers from volatile register definitions to
> memory barriers, the compiler unexpectedly created smaller code.
> 
It was a reasonable code size reduction! I did these analysis already
long time ago after we discussed that volatile void* or volatile
uint32_t* thing for safe register access. Especially drivers with many
register access could be reduced about 40% of effective assembler output.

Let's see:
1) With the ST way of accessing a register via a volatile uint32_t*
struct I never got into the problem that a register was not written at
the point it had to due to caching or optimization.
Checked with gcc 4.3.x, 4.5.x and 4.6.x
2) I got a reasonable code size reduction as registers are addressed
with offset while the compiler saves the offset in a register as long as
it could be spared.
3) It was possible to write a USART driver that only exists one time in
FLASH for all 5 USARTs of he STM32F103 as it was possible to do so for
I2C. The code was reduced by several reasons:

3a) The deriver expects to get the USART base register address is passed
in the drivers control struct. This control struct is only taken from
RAM if the driver is registered.
3b) passing the base address to the driver allows fast access to the
USART without pushing and popping registers to/from stack.
3c) The driver abstract part, like usart1.c/usart2.c... only consists of
some defines that preload the constants and tell the USARTs base
address. That's all.

The overall result is that I could now run several sensor systems that I
started on CortexM3 with 72MHz, can now run with 16MHz saving additional
10..20mA.
----
So even I understand that fear, that there where some compilers that do
not correctly interpret volatile pointers, I haven't seen it in any of
my projects since gcc 3.x.x, neither in AVR, nor AT91SAM7X or STM32.
Again, I measured that down to assembler output and even did a lot of
disassembly and assembly level GDB sessions at the beginning of porting
the CortexM3.

Best regards
Ulrich