[En-Nut-Discussion] Race condition and stack overflow in tcpsm.c

Philipp Burch phip at hb9etc.ch
Wed Nov 19 21:30:20 CET 2014


Hi Harald!

On 19.11.2014 16:02, Harald Kipp wrote:
> Hi Philipp,
> 
> On 14.11.2014 18:04, Philipp Burch wrote:
>> Me again. More debugging showed that:
>>
>> 1. 388 for tcpsm is still too less, I managed to make it use up to 664
>> bytes and hence increased the stack space to 1024 bytes.
>> 2. The DHCP thread overflowed as well, I had to set this to 1024 bytes
>> as well.
> 
> Not sure if you are aware of
> 
> http://lists.egnite.de/pipermail/en-nut-discussion/2009-April/024939.html
> 
> Also the software manual
> 
> http://www.ethernut.de/pdf/enswm28e.pdf
> 
> gives some hints for debugging (page 46ff). It usually lets you
> determine the required stack size.

I used exactly the mentioned NutThreadStackAvailable() function to check
how much stack space ever has been used. This doesn't always work,
though, as it can happen that the stack is overflowed and the MCU almost
immediately crashes, so I don't have the opportunity to call that
function. This is why I extended Ole's Cortex debug facilities in r5891
to print a thread summary in the exception handler with information
about corrupted stacks.

> 
> 
>> With these changes, I still managed to crash the system after some time,
>> but it looks like a different problem this time, as all stacks should
>> still be intact.
> 
> In the first place you need to find out, where it crashes, then why it
> does. Yeah, I know, simply said but difficult to do. With ARM7 I was
> quite successful with stack back-tracing. I do not have much experience
> with Cortex, though.
> 
> 
>> My testing is as follows:
>> - Copy a file of some Megabytes to the SD card using FTP.
>> - Send 2'000'000 flooding pings to the board with a preload of 10.
>> - Issue 100'000 commands over three Telnet connections in parallel.
> 
> And, as long as you do this separately, all went well? It's quite useful
> to check the system's reliability by stressing it this way, but it is
> probably not useful for tracking problems.
> 
> While ping flooding and Telnet communication is tested often, not many
> applications use FTP and FAT on SD-cards. Thus, it is more likely that
> these parts fail.
> 

I haven't investigated on those problems any further yet, maybe I'll do
more testing this weekend.

> 
>> Always increasing the stacks doesn't seem like an appropriate solution
>> however, I'd rather like to find out why the whole thing needs so much
>> more space than on other platforms. Does anyone have an idea?
> 
> AFAIK, Nut/OS doesn't use recursive calls. So, stack requirements should
> be deterministic and cannot grow beyond a specific limit.
> 

That's what I'd expect. It's also the reason why I can't really
understand why I need so much more stack space than other architectures
(unless it wouldn't suffice there either, but I don't believe that).

> Anyway, keep me informed about any progress.

Sure :)

Regards,
Philipp


More information about the En-Nut-Discussion mailing list