[En-Nut-Discussion] timer problem in NutOS (Bug #2029411)
Harald Kipp
harald.kipp at egnite.de
Sun Jul 27 16:01:34 CEST 2008
Hi all,
Despite general quoting rules on mailing lists I intentionally use TOFU
while referring to a more-than-1-year-old article by Rob.
First of all I'm surprised to see this serious bug still unsolved for
such a long time. I'm not getting tired to point out the importance of
the sourceforge bug tracker at
http://sourceforge.net/projects/ethernut/
Be aware, that active developers are not be able to follow all threads
in this list and that bug reports to the list only tend to get lost. The
bug list at sourceforge is checked regularly by several people. Anyway,
many thanks to Rob and others for tracking this problem.
In the first moment I thought by myself "Oh no, not again one of these
mind boggling timer problems". ;-) But actually, Rob's explanation is
easy to follow.
A few workarounds had been posted as well. I'm not sure if any of these
will solve all related problems _and_ maintain greatest backward
compatibility without introducing performance or code size issues.
I'd divide the issue in two parts:
1. As Rob pointed out, in real world applications the callback call will
typically restart an application thread, which removes the one-shot
timer. The disaster happens, when the idle thread continues to handle
the timeout by removing the timer again.
2. More general, NutTimerStop may receive an invalid pointer, which may
not only refer to a newly created timer as in Rob's case, but to any
memory structure.
Issue 1 could be solved by moving NutTimerStop to a new function, which
is used internally only. The application callable API may mark the timer
only, final removal will be done during idle time only, using the
internal function. Something similar had been suggested by Erik Lindstein.
This will, of course, not solve issue 2. The kernel has absolutely no
chance to tell, whether a timer handle is the right one. Even if the
pointer is in the timer list, it may refer to another timer, which had
been created later (as in Rob's example). We may redefine
NutTimerStopNew(HANDLE *t)
{
if (*t) {
...
*t = NULL;
}
}
However, I'd prefer to clearly document this issue and let the
application take care of this. Typically the callback shall set the
timer handle to NULL and the woken-up thread should use
if (th)
NutTimerStop(th);
Note, that this will follow the general Nut/OS design, where validity
checks should be done in the application code.
As usual, comments are most welcome. Though, I'm a bit under timer
pressure for version 4.6.
Harald
PragmaLab wrote:
> Hello all,
>
> we have this weird problem with timercallbacks in NutOS (currently using the
> 4.3.3 beta but the problem is also in 4.2.1).
>
> A typical sequence causes our main-thread, in which we kick the watchdog, to
> stop running. Debugging the code with an ICE50 into detail reveals the
> following scenario:
>
> - from the main thread a 100 msec NutTimer is created, provided with a
> callbackfunction (NutTimerCreate)
>
> - while the Idle-thread is executing, the expiration of the timer is noticed
> (NutTimerProcessElapsed) and the callback-function is called
> if (tn->tn_callback)
> {
> (*tn->tn_callback) (tn, (void *) tn->tn_arg); // call
> callbackfunction
> }
>
> - this callbackfunction does a call to NutPostEvent, and this causes a
> threadswitch to the thread that was waiting for the event (menu)
>
> - so Idle-thread is suspended
>
> - and the menu thread starts running now
>
> - menu thread stops the 100 msec NutTimer (indeed was already expired but
> not stopped yet)
>
> - some actions later, the Idle-thread becomes active again (no other tread
> was ready to run)
>
> - it resumes right after his last action, which was calling the
> callback-function (NutTimerProcessElapsed)
> if ((tn->tn_ticks_left = tn->tn_ticks) == 0)
> {
> NutTimerStop(tn); // kill timer
> }
>
> - this means it will stop the timer of which the callback function was
> called, but this timer was allready stopped by the application...
>
> So NutOS will stop a timer that was allready stopped by the application. The
> worst part is, that after stopping the timer means that it is available
> again when a new timer is asked for. In our case, the main-thread does a
> NutSleep(100) forever and indeed the timer that was stopped is assigned to
> the main-thread some moments later. Then when the Idle-thread is resumed and
> stops 'his' timer, in fact he is killing the timer of NutSleep(100) of the
> mainthread. So the mainthread is never serviced again, so the watchdog is
> not kicked, so.....
>
> This behaviour is 100% reproducable and with the trace-function of the ICE50
> you see it happen before your eyes, just by examaning the trace.
> How come other people do not suffer from this behaviour? Is it a bug
> (design-error, because while killing a timer, you cannot tell it is allready
> killed), or do we use timers in an unusual/wrong way? The application is
> allowed to kill a timer that it had started, no matter if the timer was
> expired or still running, right?
>
> Thanks,
>
> best regards,
>
> Rob van Lieshout
>
>
>
>
> --------------------[PragmaLab]--------
> Loonse Molenstraat 23
> 5175 PS Loon op Zand
> info at pragmalab.nl
> www.pragmalab.nl
> telefoon: 0416-362548 of 06-15658737
> ---------------------------------------
>
>
> _______________________________________________
> http://lists.egnite.de/mailman/listinfo/en-nut-discussion
>
--
egnite GmbH
Erinstr. 9
44575 Castrop-Rauxel
Germany
Fon +49 (0)23 05-44 12 56
Fax +49 (0)23 05-44 14 87
http://www.egnite.de/
http://www.ethernut.de/
Handelsregister: Amtsgericht Dortmund HRB 19783
USt-IdNr.: DE 189520047
Geschäftsführung: Harald Kipp
More information about the En-Nut-Discussion
mailing list