[En-Nut-Discussion] Fwd: Not fixed!!!: Re: confirmed!!! Re: NutOS 4.4.0 on ARM7: possibly bug in NutEnterCritical / NutExitCritical

duane ellis ethernut at duaneellis.com
Thu Feb 28 04:26:10 CET 2008


Harald Kipp wrote:
> There is one point where I stumbled: Access to the list of delayed 
> action functions.
>
> The interrupt handler will add new items to it. If there is no critical 
> section active, it will enable interrupts and execute the listed action 
> functions and remove them from the list. In case of any active critical 
> section, this will be done by EXIT_CRITICAL later on.
>
> Wouldn't it be required to protect the manipulation of this list?
>   
>> Just want to make sure that I understood the finer details.

So - here's the long answer.

==========
Sorry - I think of FIFOs and LISTS as sort of the same.
Technically yea, they are not.
It could also be a priority tree... so some delay funcs get executed first.
==========

First, it should be a fixed size FIFO, not really a list, they are simpler.
It should contain an array of structs that look like this:

   struct delayed_func {
                int (*func)( int p1, int p2, int p3 );
                int p1;
                int p2;
                int p3; // nice power of 2 size
    };

The idea being: You can cast anything - including pointers to "p1/2/3"

The FIFO should be implemented with only a "WR" and "RD" pointer, *NO*
counter,
Why? The WR and RD pointers are modified independently of each other
WR - only during insert, RD - only during removal.

A "count" attribute should be forbidden, as both the insert/delete code
would modify it
And thus - it becomes complicated to manage.

The only FIFO test that matters is "FULL/EMPTY".
If FULL - panic - we are hozed, overflow, die a horrible death.
If EMPTY - there is no work to do.

==========
System Calls - ie: things that go critical.
==========

ie: generic FIFO handler libs, and LIST handler libs, etc.

Must always start with ENTER_CRITICAL() - which bumps the global counter.
And - must always end with - EXIT_CRITICAL()

It would be even better to have them *return* with a macro.

   ie:
        return  EXIT_CRITICAL_RETURN( returnvalue )

    Why? That might be a nice point to yield/taskswitch/etc...

==========
ISR Rules
==========

At the start of an ISR - the ISR always bumps the global_counter.
    ie: Do it inside of some ENTER_ISR() macro, or maybe in the
    generic ASM code that pushes parameters then calls the C function.

   ie: Naked -vrs- Wrapped irq handlers.

Fundamental Important item:
   At the time of the ISR - the global counter transition will be 0->1->0
   Or it will be something else, ie: 1->2->1, or 2->3->2, etc..

   What matters is to detect the 0->1->0 transition.

   IF you need to call a system function, during the IRQ then you
   must understand that the *CALL* might happen later - not *now*

    Encourage this macro use: It *ALWAYS* works
    -[provided the delay fifo does not overflow]-
         DELAY_FUNC_CALL(  func_name,   p1, p2 , p3);

    Or if the person pedantically complaining about need it to be faster...
     The alternative is:

         if( global_counter == 1 ){
                   // system is safe to call, go ahead.
                    func_name( p1, p2, p3 );
         } else {
                  //  Sorry deal with it.
                  DELAY_FUNC_CALL( func_name, p1, p2, p3 );
          }

     The DELAY_FUNC_CALL() could - in fact do that very test.
     Or - it can add an entry to the FIFO.

IMPORTANT:
    If and only if you enable nested IRQs - and IRQs are enabled
    Then you must
           disable IRQs,
           BUMP the WRITE pointer, but remember its value.
           enable IRQs.
     otherwise
          Remember the WR ptr value
          Bump the WR pointer.
     endif

KEY POINT
      (a) you are in an IRQ, thus "global_counter is NON-ZERO"
      (b) Nobody else can drain the fifo, ie: they are not 0->1->0
      (c) if anybody will drain it, it will be your line of execution.

     Now fill in the fifo entry with IRQs enabled
     Nobody can drain it while you are filling in the structure.

==========
Scheduler rules
==========

Task/Thread switches are *ONLY* allowed if and only if the current
execution path *IS* the one that will cause the 0->1->0 transition.
Any other - must not task/thread switch.

System calls that will *block* - must take care to adjust them selves
accordingly.
Basically, the routines would block - can only block if they are in the
0->1->0 path
Otherwise, they must return "ERROR_WOULD_BLOCK" ...
And let something else handle the wait.
Ask if you want more info on this.

=========
Draining the Fifo (Swamp)
=========
The key point is this: The execution path that contains the 0->1->0
transition.

Either: The IRQ detects the 0->1->0 situation *OR*

The system went 0->1 -> 2[due to syscall nesting] -> 1 -> 0
Or went 0 -> 1 -> 2 [due to an IRQ ] -> 1 -> 0

When the FIFO needs to be drained, you do the following:

(a) do not yet decrement the global counter.
(b) Since *you* own the 1->0 transition, you are the only fifo reader!
(c) While fifo has data.
      Peek into the fifo head.
      Call the function in the fifo head, passing parameters.
      Pop the head off the fifo.
      Loop Till Fifo is empty.
(d) *NOW* - disable irqs.
(e) If - FIFO is now not-full - Oops one slipped in.
      Re-enable irqs and go back to step C.
(f) Exit the ISR, or the System Call as appropriate.

If during that process, an IRQ occurs - it will see a 1->2->1 transition
and .... gracefully handle the situation.

===========

> Furthermore, because action functions may be executed in interrupt 
> context, they are still not allowed to call other APIs than 
> NutEventPostFromIrq, right? 

NO - they [IRQ handlers] can call any system call - PROVIDED - it does
not block.
And know and understand that they actual call may be delayed.

To the pedantic lurkers reading this:
    "oh my god - the sky is falling" - this will take forever...
      What is your alternative?
     (1)  Keep IRQs disabled and deal with long dead times
     (2)  Or,  take advantage of this trick.

Option (1) always exists, it has never gone away - you just have to deal
with the fact the IRQs might be disabled for a really long time.
Option (1) is always _safe_ for the naive  developer.

===========

> Access to all resources they use (e.g. rx/tx 
> buffers in the UART driver) must be protected.
>   
Not exactly -

Remember the big win: IRQs are enabled for longer periods of time!

If it becomes problematic - then - one could offer a special fifo
function, ie:

(1) a fast append that does not check the global counter.
(2) a means to lock *only* that specific fifo, perhaps with a special flag.

     The idea is - you have many fifos.
        Not every single fifo in the world needs to be locked.
        You only need to lock the one you are messing with.

Perhaps FIFOs could have an additional "in-service" flag
that protects the IRQs from doing something.

It could be a simple as a single global variable, a pointer to current
fifo.
Either the pointer is NULL - no fifo in service.
Or it points to *some* fifo.

The FIFO code would  -
   at the start: set the volatile global pointer.
   At exit - set the global pointer to null.

Example: Assuming  that no two IRQs fill the same fifo!

Any ISR using the FIFO - could - easily test:
      if( global_in_service_fifo == my_fifo ){
               then - the fifo insert must be delayed.
       else {
               if global_in_service_fifo == NULL
                       fifo_insert( my_fifo );
                else {
                       Some fifo is in service, but it is not *THIS* fifo!
                       fifo_insert( my_fifo )
                 }
          }

===========
-- Duane.







More information about the En-Nut-Discussion mailing list