[En-Nut-Discussion] Thread stops executing after some time.
Ernst Stippl
ernst at stippl.org
Mon Mar 24 22:19:29 CET 2008
Hi Erik!
I understand this is a sporadic problem so it takes a lot of time to run
into the "error" situation, but anyway:
1) ... only 1 out of 4 clients where still connecting ...
When you experience this situation more than once, is it always the same
Ethernut which can still connect or is that also "random". And what looks
"random" in the first place, is it really?
2) .. (all of them uses the same software just different MACs and IPs ) ..
Even if all of them use the same SW, are they operating under
same/similar/different conditions? (I mean ".. exchanges some XML data,..":
where does this data come from?, i.e. how is it generated and how different
can it be between the Ethernuts?
3) How about buffer overflows due to "special" tx/rx data conditions
(length)?
4) Try looking "into" the TCP sockets. My bank switch test-program at
http://www.es-business.com/Firma/eng/edocs.htm may help. Include cli.c and
dump.c in your main pgm and create a thread as indicated in the source.
It contains a Telnet based CLI which has a "lists" command which walks thru
and displays all Nut/OS known lists (TCP Sockets is one of these). One word
of caution: Because Nut/OS (i.e. other threads) are executing while this
command follows the pointer in the various lists pointing from one entry to
the next, the command may loop in case a list is updated (by an or on behalf
of an app thread) right when this pointer in the list is used by the "lists"
command itself.
The dump command may help you peek around on RAM.
5) Is there a possibility to have wireshark monitoring the TCP/IP link up
until the PC gets disconnected? This way, you could find out what was
exchanged immediately before the disconnect happened and maybe this gives
more info about the internal status of the Ethernuts and the TCP/IP
connection states.
6) Do you log the state of the TCP/IP connection between the Ethernuts and
the PC within the PC? Maybe such log (record length / contents) could
provide some more info.
7) The most important question is:
Is the problem caused by behaviour in Nut/OS or Nut/Net (IP stack, timers,
events etc)
Or
Is the problem cause by some behaviour in the application threads.
Is there any chance to "strip down" the application threads to try to
minimize their possible impact on the situation?
8) I have an example here of a test app, which produces the following
threads list:
CLI>threads
Name Status Prio Stack Memory Timeout INFO-addr Bank
CMDLINE Run 64 891 OK None 36C9 -1
XHTST Sleep 64 357 OK 6 3203 9
XHTST Ready 64 357 OK None 2F3D 8
XHTST Sleep 64 357 OK None 2C77 7
XHTST Ready 64 357 OK None 29B1 6
XHTST Sleep 64 357 OK 24 26EB 5
XHTST Ready 64 357 OK None 2425 4
XHTST Sleep 64 357 OK 13 215F 3
XHTST Ready 64 357 OK None 1E99 2
XHTST Sleep 64 357 OK 6 1BD3 1
tcpsm Sleep 32 468 OK 102 1925 -1
XHTST Sleep 64 357 OK None 16C1 0
rxi5 Sleep 9 603 OK 699 145F -1
main Sleep 200 705 OK 940 1041 -1
idle Ready 254 356 OK None D21 -1
The XHTST threads a looping apps who work in memory, display info via TCP/IP
to a telnet client and sometimes sleep for a random time.
There are threads which are Sleeping and do not have a Timeout associated
with them! (maybe when they are waiting for the telnet output to complete?)
I am quite sure you have thought about some (if not all) of this already,
but maybe it "kicks" off some more thoughts.
Good luck
Regards
Ernst
-----Ursprüngliche Nachricht-----
Von: en-nut-discussion-bounces at egnite.de
[mailto:en-nut-discussion-bounces at egnite.de] Im Auftrag von Erik Lindstein
Gesendet: Montag, 24. März 2008 17:13
An: en-nut-discussion at egnite.de
Betreff: [En-Nut-Discussion] Thread stops executing after some time.
Guys please help me out.
I'm on a wild goose chase trying to figure out what is happening with
a thread that handles communications with a PC thru a tcp/ip socket.
The setup is:
Ethernut V2.1
Software 4.4.0
The software is build up by a couple of threads each handling some
functions( lcd, push buttons, user functions etc )
Then i have one thread that communicates with a server software on my PC.
The communication is pretty simple.
It's a client socket that connects to my server PC and then exchanges
some XML data, disconnects, sleeps for 300ms and then start all over
again.
This works fine for weeks without any problems if i have the server PC
up and running and connected to the same LAN my ethernut is connected
to.
Then one day i disconnected the server PC from the LAN and left a
couple of the ethernut clients running over the weekend, then on
Monday i connected my PC again and started up the server software but
i noticed that only 1 out of 4 clients where still connecting (all of
them uses the same software just different MACs and IPs )
I looked at the incoming traffic with wireshark and could not see any
sign of life at all from the 3 clients not connecting.
I tried to ping them and they all answer on pings and also all other
threads that handles the LCD and push buttons are still up and running
so the software is not dead.
I tested to deactivate/activate the network connection on my PC to see
if anyone of the clients woke up. No luck.
I then added another thread to the software ( i took the sample code
for the tcps in the apps dir and created a thread to run that code )
And when everything is ok i see the "inetd" thread timeout counting
all the time and the thread executes as expected.
When the inetd thread stops executing i can connect to the unit and i
get the output seen below:
----------------------------------------------------------------------------
--------------------------------------------
220 List of threads with name,state,prio,stack,mem,timeout follows
tcpsm Sleep 32 461 OK 27
TcpS Run 64 2546 OK None
inetd Sleep 64 2381 OK None
rxi5 Sleep 9 603 OK 1392
wdt Sleep 40 255 OK 8
SmuTh Sleep 64 65 OK 71
PcuTh Sleep 64 805 OK 1
HvpsTh Sleep 64 605 OK 24
IppsTh Sleep 64 965 OK 4
TaTh Sleep 64 65 OK 35
LcdTh Sleep 64 929 OK 34
main Sleep 64 733 OK 451
idle Ready 254 356 OK None
----------------------------------------------------------------------------
--------------------------------------------
For some reason the thread(inetd) just gets a timeout set to "None"
instead of the NutSleep value.
I only have one place in the code that sets the thread to sleep and i
have a fixed value there of 300. (NutSleep(300))
So there must be somewhere else in the code the thread gets set to
some wait state, but i have no idea how to figure out where and why
this happens.
Can it be something that happens when the socket tries to connect to a
IP/Server that doesn't exist on the LAN.
If it happens all the time it would be easier to figure out whats
wrong but this can run for days without happening.
Also if there was low memory the tcps thread wouldn't answer the
incoming connection attempts i guess.
The thread code is below:
----------------------------------------------------------------------------
--------------------------------------------
THREAD(InetdThread, arg)
{
TCPSOCKET *socket;
FILE *stream = 0;
u_long rip = inet_addr("192.168.0.115");
u_long tmo = 500;
int socket_error = 0;
uint8_t *start = 0, *stop = 0;
uint8_t unit[20], cmd[40], value[40];
uint8_t data_exchange_buffer[100] = "0";
for(;;)
{
if ((socket = NutTcpCreateSocket()) != 0)
{
NutTcpSetSockOpt(socket, SO_RCVTIMEO, &tmo,
sizeof(tmo));
NutTcpSetSockOpt(socket, SO_SNDTIMEO, &tmo,
sizeof(tmo));
if(NutTcpConnect(socket, rip, TCPSERVERPORT) == 0)
{
stream = _fdopen((int) ((uptr_t) socket),
"r+b");
if(stream != 0)
{
fprintf_P(stream, info_P,
INFO_P_ARGS); // Send some XML DATA
fflush(stream);
fgets(data_exchange_buffer,
sizeof(data_exchange_buffer),
stream); // Get some XML DATA
{
// Handle XML data
}
fclose(stream);
/*
info_text is a extern
variable that another thread prints on the
LCD for debug output.
*/
sprintf(info_text ,"COK\n%lu",
(u_long)NutGetMillis());
}
}
else
{
socket_error = NutTcpError(socket);
sprintf(info_text ,"CE:%d \n%lu",
socket_error, (u_long)NutGetMillis());
}
NutTcpCloseSocket(socket);
}
NutSleep(300);
}
}
----------------------------------------------------------------------------
--------------------------------------------
--
/Erik
_______________________________________________
http://lists.egnite.de/mailman/listinfo/en-nut-discussion
--
No virus found in this incoming message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.21.8/1340 - Release Date: 23.03.2008
18:50
More information about the En-Nut-Discussion
mailing list