[En-Nut-Discussion] timer problem in NutOS (Bug #2029411)

Sun Jul 27 16:01:34 CEST 2008

Hi all,

Despite general quoting rules on mailing lists I intentionally use TOFU 
while referring to a more-than-1-year-old article by Rob.

First of all I'm surprised to see this serious bug still unsolved for 
such a long time. I'm not getting tired to point out the importance of 
the sourceforge bug tracker at
http://sourceforge.net/projects/ethernut/
Be aware, that active developers are not be able to follow all threads 
in this list and that bug reports to the list only tend to get lost. The 
bug list at sourceforge is checked regularly by several people. Anyway, 
many thanks to Rob and others for tracking this problem.

In the first moment I thought by myself "Oh no, not again one of these 
mind boggling timer problems". ;-) But actually, Rob's explanation is 
easy to follow.

A few workarounds had been posted as well. I'm not sure if any of these 
will solve all related problems _and_ maintain greatest backward 
compatibility without introducing performance or code size issues.

I'd divide the issue in two parts:

1. As Rob pointed out, in real world applications the callback call will 
typically restart an application thread, which removes the one-shot 
timer. The disaster happens, when the idle thread continues to handle 
the timeout by removing the timer again.

2. More general, NutTimerStop may receive an invalid pointer, which may 
not only refer to a newly created timer as in Rob's case, but to any 
memory structure.

Issue 1 could be solved by moving NutTimerStop to a new function, which 
is used internally only. The application callable API may mark the timer 
only, final removal will be done during idle time only, using the 
internal function. Something similar had been suggested by Erik Lindstein.

This will, of course, not solve issue 2. The kernel has absolutely no 
chance to tell, whether a timer handle is the right one. Even if the 
pointer is in the timer list, it may refer to another timer, which had 
been created later (as in Rob's example). We may redefine

NutTimerStopNew(HANDLE *t)
{
   if (*t) {
     ...
     *t = NULL;
   }
}

However, I'd prefer to clearly document this issue and let the 
application take care of this. Typically the callback shall set the 
timer handle to NULL and the woken-up thread should use

if (th)
   NutTimerStop(th);

Note, that this will follow the general Nut/OS design, where validity 
checks should be done in the application code.

As usual, comments are most welcome. Though, I'm a bit under timer 
pressure for version 4.6.

Harald

PragmaLab wrote:
> Hello all,
> 
> we have this weird problem with timercallbacks in NutOS (currently using the
> 4.3.3 beta but the problem is also in 4.2.1).
> 
> A typical sequence causes our main-thread, in which we kick the watchdog, to
> stop running. Debugging the code with an ICE50 into detail reveals the
> following scenario:
> 
> - from the main thread a 100 msec NutTimer is created, provided with a
> callbackfunction (NutTimerCreate)
> 
> - while the Idle-thread is executing, the expiration of the timer is noticed
> (NutTimerProcessElapsed) and the callback-function is called
>             if (tn->tn_callback)
>             {
>                 (*tn->tn_callback) (tn, (void *) tn->tn_arg);	// call
> callbackfunction
>             }
> 
> - this callbackfunction does a call to NutPostEvent, and this causes a
> threadswitch to the thread that was waiting for the event (menu)
> 
> - so Idle-thread is suspended
> 
> - and the menu thread starts running now
> 
> - menu thread stops the 100 msec NutTimer (indeed was already expired but
> not stopped yet)
> 
> - some actions later, the Idle-thread becomes active again (no other tread
> was ready to run)
> 
> - it resumes right after his last action, which was calling the
> callback-function (NutTimerProcessElapsed)
>             if ((tn->tn_ticks_left = tn->tn_ticks) == 0)
>             {
>                 NutTimerStop(tn);		// kill timer
>             }
> 
> - this means it will stop the timer of which the callback function was
> called, but this timer was allready stopped by the application...
> 
> So NutOS will stop a timer that was allready stopped by the application. The
> worst part is, that after stopping the timer means that it is available
> again when a new timer is asked for. In our case, the main-thread does a
> NutSleep(100) forever and indeed the timer that was stopped is assigned to
> the main-thread some moments later. Then when the Idle-thread is resumed and
> stops 'his' timer, in fact he is killing the timer of NutSleep(100) of the
> mainthread. So the mainthread is never serviced again, so the watchdog is
> not kicked, so.....
> 
> This behaviour is 100% reproducable and with the trace-function of the ICE50
> you see it happen before your eyes, just by examaning the trace. 
> How come other people do not suffer from this behaviour? Is it a bug
> (design-error, because while killing a timer, you cannot tell it is allready
> killed), or do we use timers in an unusual/wrong way? The application is
> allowed to kill a timer that it had started, no matter if the timer was
> expired or still running, right?
> 
> Thanks,
> 
> best regards,
> 
> Rob van Lieshout
> 
> 
> 
> 
> --------------------[PragmaLab]--------
> Loonse Molenstraat 23
> 5175 PS  Loon op Zand
> info at pragmalab.nl
> www.pragmalab.nl
> telefoon: 0416-362548 of 06-15658737
> --------------------------------------- 
>  
> 
> _______________________________________________
> http://lists.egnite.de/mailman/listinfo/en-nut-discussion
> 

-- 
egnite GmbH
Erinstr. 9
44575 Castrop-Rauxel
Germany

Fon +49 (0)23 05-44 12 56
Fax +49 (0)23 05-44 14 87

http://www.egnite.de/
http://www.ethernut.de/

Handelsregister: Amtsgericht Dortmund HRB 19783
USt-IdNr.: DE 189520047
Geschäftsführung: Harald Kipp