[En-Nut-Discussion] [en-nut-discussion] thread stops executingafter some time.
Erik L
erik.lindstein at gmail.com
Sat Mar 29 11:06:04 CET 2008
I dont think the problem is created when i disconnect the PC.
I connected/disconnected the PC lots of times in a short period of time and
the clients still always connect.
Also i have clients that havent been connected at all efter i powerd them up
and they still get the same problem if i leave them for a couple of hours
before i connect the PC.
I have no route set, the communication is just on the LAN.
(Clients 192.168.0.1-10 server 192.168.0.115) all ipadresses are static.
/Erik
ernstst wrote:
>
> Hi Erik!
>
> quote
> But when the problem occurs the software in the client can´t get to that
> point where it actualy exchanges the data because the socket shoudnt be
> able
> to connect.
> " if(NutTcpConnect(socket, rip, TCPSERVERPORT) == 0) "
>
> If the server(rip:TCPSERVERPORT) issnt connected to the LAN it can´t
> return
> 0 correct?
> And even if it for some reason does the socket read timeouts should occur
> and it should manage to get past the data exchange rutines and then start
> all over again.
> --------------------------------
> Unquote
>
> I am not sure this is correct (ie. If the read timeout triggers on a
> NutTcpConnect which does not do thru)
>
> I need to think about it, maybe give it a try. A quick look into TCPSOCK.C
> and the like didn't enlighten me ... Maybe the TCP protocol state in which
> the disconnect occurred somehow influences if the reconnect works.
> Another thought: IP Routing tables. Are they still "there" when the
> reconnect is attempted? Have you defined a route?
>
> Regards
> Ernt
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: en-nut-discussion-bounces at egnite.de
> [mailto:en-nut-discussion-bounces at egnite.de] Im Auftrag von Erik Lindstein
> Gesendet: Dienstag, 25. März 2008 12:21
> An: en-nut-discussion at egnite.de
> Betreff: [En-Nut-Discussion] [en-nut-discussion] thread stops
> executingafter
> some time.
>
> Ernst, thank you very much for taking time to answer.
>
> I'l write some comments down below
>
> I understand this is a sporadic problem so it takes a lot of time to run
> into the "error" situation, but anyway:
>
> 1) ... only 1 out of 4 clients where still connecting ...
> When you experience this situation more than once, is it always the same
> Ethernut which can still connect or is that also "random". And what looks
> "random" in the first place, is it really?
>
> --------------------------------
> Well, i don´t think there is much that involves software that is truly
> random :-) so ofcorse this issnt either.
>
> But here it can be any one of the clients that stops executing the
> thread, usuly i can see that after some time(~6 - 7h) one or two
> stoped connecting and there can be one left running for up to 24h
> (perhaps longer). But in the end all of them stops trying to connect
> and the thread sets sleep time to "None".
> --------------------------------
>
>
>
>
> 2) .. (all of them uses the same software just different MACs and IPs ) ..
> Even if all of them use the same SW, are they operating under
> same/similar/different conditions? (I mean ".. exchanges some XML
> data,..":
> where does this data come from?, i.e. how is it generated and how
> different
> can it be between the Ethernuts?
>
> --------------------------------
> When the server software on the PC is running and the PC is connected
> then the client socket gets connected and the client sends some values
> read from the A/D, some status variables and then the PC responds with
> a command that tells the client to do "something".
> Usuly the PC just sends a "reset watchdog command" to the client.
>
> But in this case everything workes fine as long as the software is
> running and the PC is connected.
> When i then close down the server software the client gets a command
> that tells it to start reseting the WDT localy.
>
> But when the problem occurs the software in the client can´t get to
> that point where it actualy exchanges the data because the socket
> shoudnt be able to connect.
> " if(NutTcpConnect(socket, rip, TCPSERVERPORT) == 0) "
>
> If the server(rip:TCPSERVERPORT) issnt connected to the LAN it can´t
> return 0 correct?
> And even if it for some reason does the socket read timeouts should
> occur and it should manage to get past the data exchange rutines and
> then start all over again.
> --------------------------------
>
>
>
> 3) How about buffer overflows due to "special" tx/rx data conditions
> (length)?
> --------------------------------
> In this case it only happens when (atleast think that) I don´t read or
> send any data more than the data that the tcpsm sends out trying to
> connect the socket.
> My code dossnt do any rx/tx until the socket and stream is OK.
>
> --------------------------------
>
>
>
> 4) Try looking "into" the TCP sockets. My bank switch test-program at
> http://www.es-business.com/Firma/eng/edocs.htm may help. Include cli.c and
> dump.c in your main pgm and create a thread as indicated in the source.
> It contains a Telnet based CLI which has a "lists" command which walks
> thru
> and displays all Nut/OS known lists (TCP Sockets is one of these). One
> word
> of caution: Because Nut/OS (i.e. other threads) are executing while this
> command follows the pointer in the various lists pointing from one entry
> to
> the next, the command may loop in case a list is updated (by an or on
> behalf
> of an app thread) right when this pointer in the list is used by the
> "lists"
> command itself.
> The dump command may help you peek around on RAM.
>
> --------------------------------
> Il look into that, thanks..
>
> --------------------------------
>
>
>
> 5) Is there a possibility to have wireshark monitoring the TCP/IP link up
> until the PC gets disconnected? This way, you could find out what was
> exchanged immediately before the disconnect happened and maybe this gives
> more info about the internal status of the Ethernuts and the TCP/IP
> connection states.
>
> --------------------------------
> The PC only gets disconnected when i remove the LAN cable but i could
> monitor the data until that point.
> But i can disconnect and connect the cable many ( unlimited? ) times
> and there is no problems. The clients always connects again if i dont
> leave the PC unconnected for a longer period of time ( > ~5-6h )
>
> One possibility might be to have the switch i use setup to echo all
> trafic out on another port and monitor the trafic there with
> wireshark. That way i might be able to se what happens before it stops
> working.
> But if i have the PC connected in the "normal" way the problem dossnt
> occur.
> --------------------------------
>
>
>
> 6) Do you log the state of the TCP/IP connection between the Ethernuts and
> the PC within the PC? Maybe such log (record length / contents) could
> provide some more info.
>
> --------------------------------
> Because of the problem only occuring when the PC issnt connected this
> is hard to do.
> I can log the trafic when everything workes fine but not sure it gives
> away the problem but perhaps someone with more knowledge of TCP/IP can
> se some something here.
> --------------------------------
>
>
>
>
> 7) The most important question is:
> Is the problem caused by behaviour in Nut/OS or Nut/Net (IP stack, timers,
> events etc)
> Or
> Is the problem cause by some behaviour in the application threads.
> Is there any chance to "strip down" the application threads to try to
> minimize their possible impact on the situation?
>
> --------------------------------
> I minimized the software to only include the thread for client socket
> and one tcpserver thread. But this still happends. Il try to remove
> some more code in the client thread and se if it changes anything.
> I did get the feeling that this software took longer time before it
> stoped trying to connect. But thats not 100% verified. Anyway still
> it stops.
> --------------------------------
>
>
> 8) I have an example here of a test app, which produces the following
> threads list:
>
> CLI>threads
> Name Status Prio Stack Memory Timeout INFO-addr Bank
> CMDLINE Run 64 891 OK None 36C9 -1
> XHTST Sleep 64 357 OK 6 3203 9
> XHTST Ready 64 357 OK None 2F3D 8
> XHTST Sleep 64 357 OK None 2C77 7
> XHTST Ready 64 357 OK None 29B1 6
> XHTST Sleep 64 357 OK 24 26EB 5
> XHTST Ready 64 357 OK None 2425 4
> XHTST Sleep 64 357 OK 13 215F 3
> XHTST Ready 64 357 OK None 1E99 2
> XHTST Sleep 64 357 OK 6 1BD3 1
> tcpsm Sleep 32 468 OK 102 1925 -1
> XHTST Sleep 64 357 OK None 16C1 0
> rxi5 Sleep 9 603 OK 699 145F -1
> main Sleep 200 705 OK 940 1041 -1
> idle Ready 254 356 OK None D21 -1
>
> The XHTST threads a looping apps who work in memory, display info via
> TCP/IP
> to a telnet client and sometimes sleep for a random time.
> There are threads which are Sleeping and do not have a Timeout associated
> with them! (maybe when they are waiting for the telnet output to
> complete?)
>
> --------------------------------
>
> I have no idea but perhaps when
> " NutTcpConnect(socket, rip, TCPSERVERPORT) " executes it sets the
> thread to "None" and then waits for some event from the tcpsm that
> then never occurs.
>
> --------------------------------
>
>
> I am quite sure you have thought about some (if not all) of this already,
> but maybe it "kicks" off some more thoughts.
>
> Good luck
> Regards
> Ernst
>
>
> -----Ursprüngliche Nachricht-----
> Von: en-nut-discussion-bounces at egnite.de
> [mailto:en-nut-discussion-bounces at egnite.de] Im Auftrag von Erik
> Lindstein
> Gesendet: Montag, 24. März 2008 17:13
> An: en-nut-discussion at egnite.de
> Betreff: [En-Nut-Discussion] Thread stops executing after some time.
>
> Guys please help me out.
> I'm on a wild goose chase trying to figure out what is happening with
> a thread that handles communications with a PC thru a tcp/ip socket.
>
> The setup is:
> Ethernut V2.1
> Software 4.4.0
>
> The software is build up by a couple of threads each handling some
> functions( lcd, push buttons, user functions etc )
> Then i have one thread that communicates with a server software on my PC.
> The communication is pretty simple.
> It's a client socket that connects to my server PC and then exchanges
> some XML data, disconnects, sleeps for 300ms and then start all over
> again.
>
> This works fine for weeks without any problems if i have the server PC
> up and running and connected to the same LAN my ethernut is connected
> to.
>
> Then one day i disconnected the server PC from the LAN and left a
> couple of the ethernut clients running over the weekend, then on
> Monday i connected my PC again and started up the server software but
> i noticed that only 1 out of 4 clients where still connecting (all of
> them uses the same software just different MACs and IPs )
>
> I looked at the incoming traffic with wireshark and could not see any
> sign of life at all from the 3 clients not connecting.
> I tried to ping them and they all answer on pings and also all other
> threads that handles the LCD and push buttons are still up and running
> so the software is not dead.
> I tested to deactivate/activate the network connection on my PC to see
> if anyone of the clients woke up. No luck.
>
> I then added another thread to the software ( i took the sample code
> for the tcps in the apps dir and created a thread to run that code )
> And when everything is ok i see the "inetd" thread timeout counting
> all the time and the thread executes as expected.
>
> When the inetd thread stops executing i can connect to the unit and i
> get the output seen below:
> ----------------------------------------------------------------------------
> --------------------------------------------
> 220 List of threads with name,state,prio,stack,mem,timeout follows
> tcpsm Sleep 32 461 OK 27
> TcpS Run 64 2546 OK None
> inetd Sleep 64 2381 OK None
> rxi5 Sleep 9 603 OK 1392
> wdt Sleep 40 255 OK 8
> SmuTh Sleep 64 65 OK 71
> PcuTh Sleep 64 805 OK 1
> HvpsTh Sleep 64 605 OK 24
> IppsTh Sleep 64 965 OK 4
> TaTh Sleep 64 65 OK 35
> LcdTh Sleep 64 929 OK 34
> main Sleep 64 733 OK 451
> idle Ready 254 356 OK None
> ----------------------------------------------------------------------------
> --------------------------------------------
>
> For some reason the thread(inetd) just gets a timeout set to "None"
> instead of the NutSleep value.
>
> I only have one place in the code that sets the thread to sleep and i
> have a fixed value there of 300. (NutSleep(300))
> So there must be somewhere else in the code the thread gets set to
> some wait state, but i have no idea how to figure out where and why
> this happens.
>
> Can it be something that happens when the socket tries to connect to a
> IP/Server that doesn't exist on the LAN.
>
> If it happens all the time it would be easier to figure out whats
> wrong but this can run for days without happening.
> Also if there was low memory the tcps thread wouldn't answer the
> incoming connection attempts i guess.
>
> The thread code is below:
> ----------------------------------------------------------------------------
> --------------------------------------------
> THREAD(InetdThread, arg)
> {
> TCPSOCKET *socket;
> FILE *stream = 0;
> u_long rip = inet_addr("192.168.0.115");
> u_long tmo = 500;
> int socket_error = 0;
> uint8_t *start = 0, *stop = 0;
> uint8_t unit[20], cmd[40], value[40];
> uint8_t data_exchange_buffer[100] = "0";
>
> for(;;)
> {
> if ((socket = NutTcpCreateSocket()) != 0)
> {
> NutTcpSetSockOpt(socket, SO_RCVTIMEO, &tmo,
> sizeof(tmo));
> NutTcpSetSockOpt(socket, SO_SNDTIMEO, &tmo,
> sizeof(tmo));
> if(NutTcpConnect(socket, rip, TCPSERVERPORT) == 0)
> {
> stream = _fdopen((int) ((uptr_t) socket),
> "r+b");
> if(stream != 0)
> {
> fprintf_P(stream, info_P,
> INFO_P_ARGS); // Send some XML DATA
> fflush(stream);
> fgets(data_exchange_buffer,
> sizeof(data_exchange_buffer),
> stream); // Get some XML DATA
> {
> // Handle XML data
> }
> fclose(stream);
> /*
> info_text is a extern
> variable that another thread prints on the
> LCD for debug output.
> */
> sprintf(info_text ,"COK\n%lu",
> (u_long)NutGetMillis());
> }
> }
> else
> {
> socket_error = NutTcpError(socket);
> sprintf(info_text ,"CE:%d \n%lu",
> socket_error, (u_long)NutGetMillis());
> }
> NutTcpCloseSocket(socket);
> }
> NutSleep(300);
> }
> }
> ----------------------------------------------------------------------------
> --------------------------------------------
>
>
>
> --
> /Erik
> _______________________________________________
> http://lists.egnite.de/mailman/listinfo/en-nut-discussion
>
>
>
> --
> No virus found in this incoming message.
> Checked by AVG.
> Version: 7.5.519 / Virus Database: 269.21.8/1340 - Release Date:
> 23.03.2008
> 18:50
>
>
> _______________________________________________
> http://lists.egnite.de/mailman/listinfo/en-nut-discussion
>
>
--
View this message in context: http://www.nabble.com/-en-nut-discussion--thread-stops-executing-after-some-time.-tp16277335p16368466.html
Sent from the MicroControllers - Ethernut mailing list archive at Nabble.com.
More information about the En-Nut-Discussion
mailing list