[En-Nut-Discussion] NutThreadRemoveQueue clears runQueue to NULL

Philipp Burch phip at hb9etc.ch
Fri Aug 23 20:42:36 CEST 2013


Hi Harald!

On 08/23/2013 06:14 PM, Harald Kipp wrote:
> [...]
>
>>> I don't think that Nut/OS anticipates the runQueue becoming NULL. That
>>> would mean, that no thread is ready to run. That's were the idle thread
>>> jumps in. If all other threads are waiting, the idle thread is running
>>> until another thread becomes ready to run again. This implies, that idle
>>> thread callbacks cannot call blocking functions.
>>>
>>
>> Ok, this sounds interesting. But what do you mean by "no thread is ready
>> to run"? I suppose it is very common that there are moments in which all
>> application threads are either waiting for input or for a timeout. Would
>> it then be reasonable for the runQueue to become NULL or not? I can't
>> completely follow you in this paragraph.
>
> Sorry, my explanation was indeed vague. The runQueue contains all
> threads that are ready to run. This queue is always sorted by tread
> priority. The idle thread never calls any blocking function and
> therefore will always be ready to run and will be always kept in the
> runQueue. It calls NutThreadYield() though, which is not blocking, but
> will do a context switch if another thread with higher priority becomes
> ready to run.

Reading through your first statement again, I see something suspicious 
in the stack trace before the runQueue is being set to NULL: It is not 
an ordinary thread which prints and then blocks, but the callback of a 
timer (started with NutTimerStart()). Somehow I took the assumption that 
a timer's callback is invoked by the thread that started the timer, but 
is this correct? Because if the callback would be executed in the idle 
thread's context, this would clearly violate the rule about no blocking 
functions there. It would not explain why it works most of the time, 
however. But the bug disappears once again as soon as I remove the 
output there.

The full traceback looks like this:

----------- 8< ------------- 8< --------------

(gdb) bt
#0  NutThreadRemoveQueue (td=0x200016c8, tqpp=0x20000fec) at 
/.../devnut_lm3s/nut/os/thread.c:193
#1  0x0000dff4 in NutEventWait (qhp=0x200002a8, ms=0) at 
/.../devnut_lm3s/nut/os/event.c:307
#2  0x0001569c in UsartFlushOutput (dcb=0x20000278, added=0, left=40) at 
/.../devnut_lm3s/nut/dev/usart.c:336
#3  0x000157dc in UsartPut (dev=0x2000032c, buffer=0x1ef84, len=31, 
pflg=0) at /.../devnut_lm3s/nut/dev/usart.c:436
#4  0x000158e0 in UsartWrite (fp=0x2000237c, buffer=0x1ef84, len=31) at 
/.../devnut_lm3s/nut/dev/usart.c:514
#5  0x0000ee96 in _write (fd=536879996, data=0x1ef84, count=31) at 
/.../devnut_lm3s/nut/crt/write.c:96
#6  0x0000f5ee in fputs (string=0x1ef84 "Galil: Watchdog timer 
expired.\n", stream=0x20002328) at /.../devnut_lm3s/nut/crt/fputs.c:74
#7  0x0000f6b4 in puts (string=0x1ef84 "Galil: Watchdog timer 
expired.\n") at /.../devnut_lm3s/nut/crt/puts.c:60
#8  0x0000486e in watchdog_cb (timer=0x20002b7c, arg=0x0) at galil.c:644
#9  0x0000dbce in NutTimerProcessElapsed () at 
/.../devnut_lm3s/nut/os/timer.c:538
#10 0x0000d6b8 in NutThreadResume () at /.../devnut_lm3s/nut/os/thread.c:231
#11 0x0000d786 in NutThreadYield () at /.../devnut_lm3s/nut/os/thread.c:288
#12 0x00000062 in NutIdle (arg=0x0) at 
/.../devnut_lm3s/nut/os/../arch/cm3/os/nutinit.c:214
#13 0x0000002c in g_pfnVectors ()

----------- 8< ------------- 8< --------------

So the idle thread yields the processor, but gets scheduled again 
immediately and then calls the timer callback? Strange...

Oh, while writing this, I tested it again with the print statement 
enabled and it crashed (as usual). But this time the call came from the 
tcpsm thread. So no timer (or at least none of mine) involved this time...

>
> Saying that, I have a new idea, which would explain your problem...
>
> Explanation 1: The idle thread is running at lowest possible priority.
> Typically all other threads are running at higher priorities.
>
> (You can skip this. The confusing part is, that this had been 255 in
> early releases, which didn't support thread termination. Then thread
> termination was first implemented by setting the thread's priority to
> 255 and redefine the lowest possibly priority for a running thread to
> 254. This was a simple hack and had been changed back later. But, if I
> remember correctly, it was not fully reversed, because some applications
> were already created, which killed threads by setting its priority to
> 255. OK, this part explains the magic about thread priorities 254 and
> 255. Btw. everyone probably is aware, that lower priority values mean
> higher priorities.)
>
> Explanation 2: If two threads are running at the same priority, each
> direct or indirect call to NutThreadYield() will result in a context
> switch as well. This guarantees, that threads are switched, even if they
> are not waiting for any event.
>
> Now combining these two explanations: Normally all other threads are
> running at priorities above 254. Therefore the idle thread is the last
> entry in the runQueue, _always_. However, if another thread is also
> running at priority 254, then the idle thread may lose its position at
> the end of the queue. That may cause trouble, if the other thread is
> blocked. While it is generally allowed to run threads at priority 254,
> those threads are not expected to call any blocking function.
>
> Are you running any thread at priority 254. If yes, we found the
> problem. (Otherwise: Grrr... busted again. In that case: Are you
> changing priorities often?)
>

I do not modify any thread priorities, nor do I create and kill any 
threads during the normal lifetime of the application. Everything is set 
up at the beginning and then stays as it is. Priorities range from 9 
(emacrx) over 32 (tcpsm) and 64 for all of my threads to 254 for the 
idle thread.

>
>> Would you mind posting a short comment about how the scheduler of Nut/OS
>> works? Or is there even a document about this topic?
>
> There is an old document from 2002. Not sure, how much had been changed
> since then.
>
> http://www.ethernut.de/pdf/entet100.pdf
>
> A few additional things are explained in
>
> http://www.ethernut.de/pdf/enswm28e.pdf
>
> and
>
> http://www.ethernut.de/pdf/enmem21e.pdf
>
>> Looking at the code, I see the following:
>>
>> The runQueue always points to the thread which is running at the moment
>> (but what is runningThread, then?). If this thread wants to block, a
>
> Not really. The runQueue is a linked list of all threads that are
> ready-to-run, with the highest priority thread on top and, typically,
> the idle thread at the bottom.
>
> The runningThread points to the thread that is _actually_ running.
> Because of its cooperative nature, a higher priority thread may be
> ready-to-run, sitting on top of the runQueue, but is not the thread that
> is running. Only when the runningThread gives up control, the higher
> priority thread is able to take over.
>
> Let me know, if you need further explanation.

Thanks, that makes it much clearer. I was a bit confused because almost 
everything related to context switching is a pointer of type 
NUTTHREADINFO*, with the special values NULL and -1 (SIGNALED).


More information about the En-Nut-Discussion mailing list