[En-Nut-Discussion] NutGetMillis is a _very expensive_ function on Ethernut3

Wed Aug 20 18:18:18 CEST 2008

Alain M. wrote:
> duane ellis escreveu:
>> Harald Kipp wrote:
>>  >> IMHO, using an array instead of a structure will be more flexible. 
>> This way we can define different indices for different targets.
>> I do not think the ARRAY should be public, it should be 100% hidden 
>> "static to a chip/target specific C file", and the C file should 
>> implement access functions to get the clock values. Data Hiding is 
>> good.  How that C file/function does its job internally is up to the 
>> implementor.
> 
> Specialy because it can be made in a way that gets optimyzed away by GCC.

I didn't intend to make the cache variable public. I prefered to use an 
array instead of a structure, because it is more flexible and can be 
handled by the architecture independent part. This reduces the effort of 
porting to new platforms.

No question that data hiding has advantages. However, it can be 
overdone. Specifically in tiny embedded systems the function call 
overhead may be significant. AFAIK, global functions are typically not 
optimized away. If, then it needs to be done by the linker. The compiler 
cannot do this.

But calm down. ;-) There are no plans to replace the existing functions.

I'd like to introduce two new hardware independent functions

   uint32_t NutClockGet(int idx);
   int NutClockSet(int idx, uint32_t freq);

The idx parameter NUT_HWCLK_CPU, NUT_HWCLK_PERIPHERAL etc. which are 
platform dependent.

For now, the second function is specified for NutClockSet(-1, 0) only, 
which means: Release cached values (freq=0) of all clocks (idx=-1). 
Later some platforms may implement the ability to set specified hardware 
clocks to specified frequencies.

Then we still have

   uint32_t NutGetCpuClock(void);

which is an optimized version of NutClockGet(NUT_HWCLK_CPU) and provides 
backward compatibility. If NUT_CPU_FREQ is defined, it will simply

   return NUT_CPU_FREQ;

Otherwise NutClockGet() and NutGetCpuClock() will use the cache 
mechanism suggested by Duane. By using an array for the cache, these 
function can be moved from arch/xxx/ostimer.c to os/timer.c. Indices and 
the size of this array are defined in include/arch/<target>/timer.h.

On the hardware dependent side there is a new function

   uint32_t NutArchClockGet(int idx);

which reads the hardware in case of an invalid cache value.

This way we are also able to replace constructs like

   #if defined(AT91_PLL_MAINCK)
     clk = At91GetMasterClock();
   #else
     clk = NutGetCpuClock();
   #endif

by

   clk = NutClockGet(NUT_HWCLK_PERIPHERAL);

or even more specific

   clk = NutClockGet(NUT_HWCLK_UART0);

On targets with single clock support only, the indices are all the same

   #define NUT_HWCLK_CPU 0
   #define NUT_HWCLK_PERIPHERAL NUT_HWCLK_CPU
   #define NUT_HWCLK_UART0 NUT_HWCLK_PERIPHERAL
   ...
   #define NUT_HWCLK_MAX NUT_HWCLK_PERIPHERAL

and the cache is declared as

   static uint32_t clock_cache[NUT_HWCLK_MAX + 1];

In such cases (1 clock only) the function NutClockGet() is mapped to the 
optimized NutGetCpuClock() by

   #if NUT_HWCLK_MAX == 0
   #define NutClockGet(i)          NutGetCpuClock()
   #endif

This will hopefully not add any additional code to tiny systems, where 
every byte counts. And it will provide high flexibility for coming chips 
without creating a porting nightmares.

Harald