[En-Nut-Discussion] Thread stops executing after some time.

Erik Lindstein erik at ledutveckling.com
Wed Mar 26 11:08:46 CET 2008


OT: Are there any guide on how to use this e-mail list?
For one I'm not smart enough to figure out how to keep my posts in the
same thread.

Well back to my problem.

Can it be that for some reason the code below never returns:
if(NutTcpConnect(socket, rip, TCPSERVERPORT) == 0)

That the event suposed to waik the thread up never occurs?
I don´t see any other place in the code that releases the thread.

-------Old text-----------
Ernst, thank you very much for taking time to answer.

I'l write some comments down below

I understand this is a sporadic problem so it takes a lot of time to run
into the "error" situation, but anyway:

1) ... only 1 out of 4 clients where still connecting ...
When you experience this situation more than once, is it always the same
Ethernut which can still connect or is that also "random". And what looks
"random" in the first place, is it really?

--------------------------------
Well, i don´t think there is much that involves software that is truly
random :-) so ofcorse this issnt either.

But here it can be any one of the clients that stops executing the
thread, usuly i can see that after some time(~6 - 7h) one or two
stoped connecting and there can be one left running for up to 24h
(perhaps longer). But in the end all of them stops trying to connect
and the thread sets sleep time to "None".
--------------------------------




2) .. (all of them uses the same software just different MACs and IPs ) ..
Even if all of them use the same SW, are they operating under
same/similar/different conditions? (I mean ".. exchanges some XML data,..":
where does this data come from?, i.e. how is it generated and how different
can it be between the Ethernuts?

--------------------------------
When the server software on the PC is running and the PC is connected
then the client socket gets connected and the client sends some values
read from the A/D, some status variables and then the PC responds with
a command that tells the client to do "something".
Usuly the PC just sends a "reset watchdog command" to the client.

But in this case everything workes fine as long as the software is
running and the PC is connected.
When i then close down the server software the client gets a command
that tells it to start reseting the WDT localy.

But when the problem occurs the software in the client can´t get to
that point where it actualy exchanges the data because the socket
shoudnt be able to connect.
" if(NutTcpConnect(socket, rip, TCPSERVERPORT) == 0) "

If the server(rip:TCPSERVERPORT) issnt connected to the LAN it can´t
return 0 correct?
And even if it for some reason does the socket read timeouts should
occur and it should manage to get past the data exchange rutines and
then start all over again.
--------------------------------



3) How about buffer overflows due to "special" tx/rx data conditions
(length)?
--------------------------------
In this case it only happens when (atleast think that) I don´t read or
send any data more than the data that the tcpsm sends out trying to
connect the socket.
My code dossnt do any rx/tx until the socket and stream is OK.

--------------------------------



4) Try looking "into" the TCP sockets. My bank switch test-program at
http://www.es-business.com/Firma/eng/edocs.htm may help. Include cli.c and
dump.c in your main pgm and create a thread as indicated in the source.
It contains a Telnet based CLI which has a "lists" command which walks thru
and displays all Nut/OS known lists (TCP Sockets is one of these). One word
of caution: Because Nut/OS (i.e. other threads) are executing while this
command follows the pointer in the various lists pointing from one entry to
the next, the command may loop in case a list is updated (by an or on behalf
of an app thread) right when this pointer in the list is used by the "lists"
command itself.
The dump command may help you peek around on RAM.

--------------------------------
Il look into that, thanks..

--------------------------------



5) Is there a possibility to have wireshark monitoring the TCP/IP link up
until the PC gets disconnected? This way, you could find out what was
exchanged immediately before the disconnect happened and maybe this gives
more info about the internal status of the Ethernuts and the TCP/IP
connection states.

--------------------------------
The PC only gets disconnected when i remove the LAN cable but i could
monitor the data until that point.
But i can disconnect and connect the cable many ( unlimited? ) times
and there is no problems. The clients always connects again if i dont
leave the PC unconnected for a longer period of time ( > ~5-6h )

One possibility might be to have the switch i use setup to echo all
trafic out on another port and monitor the trafic there with
wireshark. That way i might be able to se what happens before it stops
working.
But if i have the PC connected in the "normal" way the problem dossnt occur.
--------------------------------



6) Do you log the state of the TCP/IP connection between the Ethernuts and
the PC within the PC? Maybe such log (record length / contents) could
provide some more info.

--------------------------------
Because of the problem only occuring when the PC issnt connected this
is hard to do.
I can log the trafic when everything workes fine but not sure it gives
away the problem but perhaps someone with more knowledge of TCP/IP can
se some something here.
--------------------------------




7) The most important question is:
Is the problem caused by behaviour in Nut/OS or Nut/Net (IP stack, timers,
events etc)
Or
Is the problem cause by some behaviour in the application threads.
Is there any chance to "strip down" the application threads to try to
minimize their possible impact on the situation?

--------------------------------
I minimized the software to only include the thread for client socket
and one tcpserver thread. But this still happends. Il try to remove
some more code in the client thread and se if it changes anything.
I did get the feeling that this software took longer time before it
stoped trying to connect. But thats not 100% verified.  Anyway still
it stops.
--------------------------------


8) I have an example here of a test app, which produces the following
threads list:

CLI>threads
Name    Status  Prio    Stack   Memory  Timeout  INFO-addr  Bank
CMDLINE Run      64        891  OK      None        36C9     -1
XHTST   Sleep    64        357  OK      6           3203      9
XHTST   Ready    64        357  OK      None        2F3D      8
XHTST   Sleep    64        357  OK      None        2C77      7
XHTST   Ready    64        357  OK      None        29B1      6
XHTST   Sleep    64        357  OK      24          26EB      5
XHTST   Ready    64        357  OK      None        2425      4
XHTST   Sleep    64        357  OK      13          215F      3
XHTST   Ready    64        357  OK      None        1E99      2
XHTST   Sleep    64        357  OK      6           1BD3      1
tcpsm   Sleep    32        468  OK      102         1925     -1
XHTST   Sleep    64        357  OK      None        16C1      0
rxi5    Sleep     9        603  OK      699         145F     -1
main    Sleep   200        705  OK      940         1041     -1
idle    Ready   254        356  OK      None         D21     -1

The XHTST threads a looping apps who work in memory, display info via TCP/IP
to a telnet client and sometimes sleep for a random time.
There are threads which are Sleeping and do not have a Timeout associated
with them! (maybe when they are waiting for the telnet output to complete?)

--------------------------------

I have no idea but perhaps when
" NutTcpConnect(socket, rip, TCPSERVERPORT) " executes it sets the
thread to "None" and then waits for some event from the tcpsm that
then never occurs.

--------------------------------


I am quite sure you have thought about some (if not all) of this already,
but maybe it "kicks" off some more thoughts.

Good luck
Regards
Ernst


-----Ursprüngliche Nachricht-----
Von: en-nut-discussion-bounces at egnite.de
[mailto:en-nut-discussion-bounces at egnite.de] Im Auftrag von Erik Lindstein
Gesendet: Montag, 24. März 2008 17:13
An: en-nut-discussion at egnite.de
Betreff: [En-Nut-Discussion] Thread stops executing after some time.

Guys please help me out.
I'm on a wild goose chase trying to figure out what is happening with
a thread that handles communications with a PC thru a tcp/ip socket.

The setup is:
Ethernut V2.1
Software 4.4.0

The software is build up by a couple of threads each handling some
functions( lcd, push buttons, user functions etc )
Then i have one thread that communicates with a server software on my PC.
The communication is pretty simple.
It's a client socket that connects to my server PC and then exchanges
some XML data, disconnects, sleeps for 300ms and then start all over
again.

This works fine for weeks without any problems if i have the server PC
up and running and connected to the same LAN my ethernut is connected
to.

Then one day i disconnected the server PC from the LAN and left a
couple of the ethernut clients running over the weekend, then on
Monday i connected my PC again and started up the server software but
i noticed that only 1 out of 4 clients where still connecting (all of
them uses the same software just different MACs and IPs )

I looked at the incoming traffic with wireshark and could not see any
sign of life at all from the 3 clients not connecting.
I tried to ping them and they all answer on pings and also all other
threads that handles the LCD and push buttons are still up and running
so the software is not dead.
I tested to deactivate/activate the network connection on my PC to see
if anyone of the clients woke up. No luck.

I then added another thread to the software ( i took the sample code
for the tcps in the apps dir and created a thread to run that code )
And when everything is ok i see the "inetd" thread timeout counting
all the time and the thread executes as expected.

When the inetd thread stops executing i can connect to the unit and i
get the output seen below:
----------------------------------------------------------------------------
--------------------------------------------
220 List of threads with name,state,prio,stack,mem,timeout follows
tcpsm   Sleep   32      461     OK      27
TcpS    Run     64      2546    OK      None
inetd   Sleep   64      2381    OK      None
rxi5    Sleep   9       603     OK      1392
wdt     Sleep   40      255     OK      8
SmuTh   Sleep   64      65      OK      71
PcuTh   Sleep   64      805     OK      1
HvpsTh  Sleep   64      605     OK      24
IppsTh  Sleep   64      965     OK      4
TaTh    Sleep   64      65      OK      35
LcdTh   Sleep   64      929     OK      34
main    Sleep   64      733     OK      451
idle    Ready   254     356     OK      None
----------------------------------------------------------------------------
--------------------------------------------

For some reason the thread(inetd) just gets a timeout set to "None"
instead of the NutSleep value.

I only have one place in the code that sets the thread to sleep and i
have a fixed value there of 300. (NutSleep(300))
So there must be somewhere else in the code the thread gets set to
some wait state, but i have no idea how to figure out where and why
this happens.

Can it be something that happens when the socket tries to connect to a
IP/Server that doesn't exist on the LAN.

If it happens all the time it would be easier to figure out whats
wrong but this can run for days without happening.
Also if there was low memory the tcps thread wouldn't answer the
incoming connection attempts i guess.

The thread code is below:
----------------------------------------------------------------------------
--------------------------------------------
THREAD(InetdThread, arg)
{
       TCPSOCKET *socket;
       FILE *stream = 0;
       u_long rip = inet_addr("192.168.0.115");
       u_long tmo = 500;
       int socket_error = 0;
       uint8_t *start = 0, *stop = 0;
       uint8_t unit[20], cmd[40], value[40];
       uint8_t data_exchange_buffer[100] = "0";

       for(;;)
       {
               if ((socket = NutTcpCreateSocket()) != 0)
               {
                       NutTcpSetSockOpt(socket, SO_RCVTIMEO, &tmo,
sizeof(tmo));
                       NutTcpSetSockOpt(socket, SO_SNDTIMEO, &tmo,
sizeof(tmo));
                       if(NutTcpConnect(socket, rip, TCPSERVERPORT) == 0)
                       {
                               stream = _fdopen((int) ((uptr_t) socket),
"r+b");
                               if(stream != 0)
                               {
                                       fprintf_P(stream, info_P,
INFO_P_ARGS); // Send some XML DATA
                                       fflush(stream);
                                       fgets(data_exchange_buffer,
sizeof(data_exchange_buffer),
stream); // Get some XML DATA
                                       {
                                               // Handle XML data
                                       }
                                       fclose(stream);
                                       /*
                                               info_text is a extern
variable that another thread prints on the
LCD for debug output.
                                       */
                                       sprintf(info_text ,"COK\n%lu",
(u_long)NutGetMillis());
                               }
                       }
                       else
                       {
                               socket_error = NutTcpError(socket);
                               sprintf(info_text ,"CE:%d \n%lu",
socket_error, (u_long)NutGetMillis());
                       }
                       NutTcpCloseSocket(socket);
               }
               NutSleep(300);
       }
}
----------------------------------------------------------------------------



More information about the En-Nut-Discussion mailing list