[En-Nut-Discussion] async event problems in 4.1.9 rc7 and rc8

Thu Sep 7 05:38:19 CEST 2006

Hello Bill,

the event system has been redesigned since version 4.1.9. Maybe one of 
the changes introduced in 4.1.9 is now causing the issues you are 
observing. For most people the 4.1.9 changes seem to enhance system 
stability significantly.

Refer to this Changelog entry:

"* os/event.c, os/thread.c, os/timer.c, include/sys/event.h,
include/sys/timer.h, include/sys/thread.h:
Event and timer handling re-design, again. This fixes a bug, which
possibly existed since version 3.9.8 and freezes threads under heavy
load. After several people reported this problem, Michael Jones and
Henrik Maier finally detected the cause and came up with a solution.
However, this fix let interrupt latency times depend on the number
of running threads again and a new solution was implemented, which
not only avoids this problem but further decreases interrupt latencies
by adding an event post counter to the THREADINFO structure. This
counter frees the interrupt routines from dealing with linked lists
and frees the kernel from dealing with linked lists concurrently
modified by interrupts. Furthermore, timeout timers are now released
early. Michael Jones reported, that previous versions suffer from low
memory situations while processing many events within short periods.
The timer list is now double linked to reduce removal time. Internally
timeout condition is now flagged by setting the timer handle to
SIGNALED.
Unfortunately new bugs were introduced with this re-design. Special
thanks to Michael Jones, who located the "exact spot of the crime" and
proofed, that his final fixes let Nut/OS behave quite well under heavy
traffic storms. This new version will probably help also, if you
experienced long term instability.
Last not least the documentation had been updated."

Henrik
http://www.proconx.com

William Baker wrote:
> 
> I have an app that makes heavy use of threads and a timer.  Under both 
> 3.9.8 and 4.0.3 the app is very stable.  I'm upgrading from 3.9.8 
> because it is possible to crash the app with nmap from a remote server 
> -- and I'm adding watchdog features.  I suspected that the problem would 
> magically disappear in 4.1.9.
> 
> It tested both 4.1.9 rc7 and rc8 -- my hardware is the Atmega128 based 
> XNUT100.  The appears to be either thread scheduling or event 
> notification.  I depend on NutEventPostAsync being callable from inside 
> SIG_OVERFLOW1.
> 
> The following is a list of the running threads.  The laser0, laser1, and 
> plc0 threads should be in SLP state 99% of the time.  They each wait on 
> events posted by NutEventPostAsync.  On the rc7 and rc8 versions, the 
> threads appear to operate correctly for as long as a minute, but each 
> one slips into RDY state indefinately.  After a few minutes of 
> operation, usually two of the three threads are stuck in RDY, but 
> sometimes all three are stuck.  Also the Event Queue column indicates 
> that all the stuck threads are waiting on the same event -- which is 
> impossible.  (If I'm not mistaken, it is the same Event Queue id as the 
> httpd1 thread.)
>