[En-Nut-Discussion] [en-nut-discussion] thread stops executingafter some time.

Erik L erik.lindstein at gmail.com
Sat Mar 29 11:06:04 CET 2008


I dont think the problem is created when i disconnect the PC. 
I connected/disconnected the PC lots of times in a short period of time and
the clients still always connect. 
Also i have clients that havent been connected at all efter i powerd them up
and they still get the same problem if i leave them for a couple of hours
before i connect the PC.

I have no route set, the communication is just on the LAN.
(Clients 192.168.0.1-10 server 192.168.0.115) all ipadresses are static.

/Erik



ernstst wrote:
> 
> Hi Erik!
> 
> quote
> But when the problem occurs the software in the client can´t get to that
> point where it actualy exchanges the data because the socket shoudnt be
> able
> to connect.
> " if(NutTcpConnect(socket, rip, TCPSERVERPORT) == 0) "
> 
> If the server(rip:TCPSERVERPORT) issnt connected to the LAN it can´t
> return
> 0 correct?
> And even if it for some reason does the socket read timeouts should occur
> and it should manage to get past the data exchange rutines and then start
> all over again.
> --------------------------------
> Unquote
> 
> I am not sure this is correct (ie. If the read timeout triggers on a
> NutTcpConnect which does not do thru)
> 
> I need to think about it, maybe give it a try. A quick look into TCPSOCK.C
> and the like didn't enlighten me ... Maybe the TCP protocol state in which
> the disconnect occurred somehow influences if the reconnect works.
> Another thought: IP Routing tables. Are they still "there" when the
> reconnect is attempted? Have you defined a route?
> 
> Regards
> Ernt
> 
> 
> 
> -----Ursprüngliche Nachricht-----
> Von: en-nut-discussion-bounces at egnite.de
> [mailto:en-nut-discussion-bounces at egnite.de] Im Auftrag von Erik Lindstein
> Gesendet: Dienstag, 25. März 2008 12:21
> An: en-nut-discussion at egnite.de
> Betreff: [En-Nut-Discussion] [en-nut-discussion] thread stops
> executingafter
> some time.
> 
> Ernst, thank you very much for taking time to answer.
> 
> I'l write some comments down below
> 
> I understand this is a sporadic problem so it takes a lot of time to run
> into the "error" situation, but anyway:
> 
> 1) ... only 1 out of 4 clients where still connecting ...
> When you experience this situation more than once, is it always the same
> Ethernut which can still connect or is that also "random". And what looks
> "random" in the first place, is it really?
> 
> --------------------------------
> Well, i don´t think there is much that involves software that is truly
> random :-) so ofcorse this issnt either.
> 
> But here it can be any one of the clients that stops executing the
> thread, usuly i can see that after some time(~6 - 7h) one or two
> stoped connecting and there can be one left running for up to 24h
> (perhaps longer). But in the end all of them stops trying to connect
> and the thread sets sleep time to "None".
> --------------------------------
> 
> 
> 
> 
> 2) .. (all of them uses the same software just different MACs and IPs ) ..
> Even if all of them use the same SW, are they operating under
> same/similar/different conditions? (I mean ".. exchanges some XML
> data,..":
> where does this data come from?, i.e. how is it generated and how
> different
> can it be between the Ethernuts?
> 
> --------------------------------
> When the server software on the PC is running and the PC is connected
> then the client socket gets connected and the client sends some values
> read from the A/D, some status variables and then the PC responds with
> a command that tells the client to do "something".
> Usuly the PC just sends a "reset watchdog command" to the client.
> 
> But in this case everything workes fine as long as the software is
> running and the PC is connected.
> When i then close down the server software the client gets a command
> that tells it to start reseting the WDT localy.
> 
> But when the problem occurs the software in the client can´t get to
> that point where it actualy exchanges the data because the socket
> shoudnt be able to connect.
> " if(NutTcpConnect(socket, rip, TCPSERVERPORT) == 0) "
> 
> If the server(rip:TCPSERVERPORT) issnt connected to the LAN it can´t
> return 0 correct?
> And even if it for some reason does the socket read timeouts should
> occur and it should manage to get past the data exchange rutines and
> then start all over again.
> --------------------------------
> 
> 
> 
> 3) How about buffer overflows due to "special" tx/rx data conditions
> (length)?
> --------------------------------
> In this case it only happens when (atleast think that) I don´t read or
> send any data more than the data that the tcpsm sends out trying to
> connect the socket.
> My code dossnt do any rx/tx until the socket and stream is OK.
> 
> --------------------------------
> 
> 
> 
> 4) Try looking "into" the TCP sockets. My bank switch test-program at
> http://www.es-business.com/Firma/eng/edocs.htm may help. Include cli.c and
> dump.c in your main pgm and create a thread as indicated in the source.
> It contains a Telnet based CLI which has a "lists" command which walks
> thru
> and displays all Nut/OS known lists (TCP Sockets is one of these). One
> word
> of caution: Because Nut/OS (i.e. other threads) are executing while this
> command follows the pointer in the various lists pointing from one entry
> to
> the next, the command may loop in case a list is updated (by an or on
> behalf
> of an app thread) right when this pointer in the list is used by the
> "lists"
> command itself.
> The dump command may help you peek around on RAM.
> 
> --------------------------------
> Il look into that, thanks..
> 
> --------------------------------
> 
> 
> 
> 5) Is there a possibility to have wireshark monitoring the TCP/IP link up
> until the PC gets disconnected? This way, you could find out what was
> exchanged immediately before the disconnect happened and maybe this gives
> more info about the internal status of the Ethernuts and the TCP/IP
> connection states.
> 
> --------------------------------
> The PC only gets disconnected when i remove the LAN cable but i could
> monitor the data until that point.
> But i can disconnect and connect the cable many ( unlimited? ) times
> and there is no problems. The clients always connects again if i dont
> leave the PC unconnected for a longer period of time ( > ~5-6h )
> 
> One possibility might be to have the switch i use setup to echo all
> trafic out on another port and monitor the trafic there with
> wireshark. That way i might be able to se what happens before it stops
> working.
> But if i have the PC connected in the "normal" way the problem dossnt
> occur.
> --------------------------------
> 
> 
> 
> 6) Do you log the state of the TCP/IP connection between the Ethernuts and
> the PC within the PC? Maybe such log (record length / contents) could
> provide some more info.
> 
> --------------------------------
> Because of the problem only occuring when the PC issnt connected this
> is hard to do.
> I can log the trafic when everything workes fine but not sure it gives
> away the problem but perhaps someone with more knowledge of TCP/IP can
> se some something here.
> --------------------------------
> 
> 
> 
> 
> 7) The most important question is:
> Is the problem caused by behaviour in Nut/OS or Nut/Net (IP stack, timers,
> events etc)
> Or
> Is the problem cause by some behaviour in the application threads.
> Is there any chance to "strip down" the application threads to try to
> minimize their possible impact on the situation?
> 
> --------------------------------
> I minimized the software to only include the thread for client socket
> and one tcpserver thread. But this still happends. Il try to remove
> some more code in the client thread and se if it changes anything.
> I did get the feeling that this software took longer time before it
> stoped trying to connect. But thats not 100% verified.  Anyway still
> it stops.
> --------------------------------
> 
> 
> 8) I have an example here of a test app, which produces the following
> threads list:
> 
> CLI>threads
> Name    Status  Prio    Stack   Memory  Timeout  INFO-addr  Bank
> CMDLINE Run      64        891  OK      None        36C9     -1
> XHTST   Sleep    64        357  OK      6           3203      9
> XHTST   Ready    64        357  OK      None        2F3D      8
> XHTST   Sleep    64        357  OK      None        2C77      7
> XHTST   Ready    64        357  OK      None        29B1      6
> XHTST   Sleep    64        357  OK      24          26EB      5
> XHTST   Ready    64        357  OK      None        2425      4
> XHTST   Sleep    64        357  OK      13          215F      3
> XHTST   Ready    64        357  OK      None        1E99      2
> XHTST   Sleep    64        357  OK      6           1BD3      1
> tcpsm   Sleep    32        468  OK      102         1925     -1
> XHTST   Sleep    64        357  OK      None        16C1      0
> rxi5    Sleep     9        603  OK      699         145F     -1
> main    Sleep   200        705  OK      940         1041     -1
> idle    Ready   254        356  OK      None         D21     -1
> 
> The XHTST threads a looping apps who work in memory, display info via
> TCP/IP
> to a telnet client and sometimes sleep for a random time.
> There are threads which are Sleeping and do not have a Timeout associated
> with them! (maybe when they are waiting for the telnet output to
> complete?)
> 
> --------------------------------
> 
> I have no idea but perhaps when
> " NutTcpConnect(socket, rip, TCPSERVERPORT) " executes it sets the
> thread to "None" and then waits for some event from the tcpsm that
> then never occurs.
> 
> --------------------------------
> 
> 
> I am quite sure you have thought about some (if not all) of this already,
> but maybe it "kicks" off some more thoughts.
> 
> Good luck
> Regards
> Ernst
> 
> 
> -----Ursprüngliche Nachricht-----
> Von: en-nut-discussion-bounces at egnite.de
> [mailto:en-nut-discussion-bounces at egnite.de] Im Auftrag von Erik
> Lindstein
> Gesendet: Montag, 24. März 2008 17:13
> An: en-nut-discussion at egnite.de
> Betreff: [En-Nut-Discussion] Thread stops executing after some time.
> 
> Guys please help me out.
> I'm on a wild goose chase trying to figure out what is happening with
> a thread that handles communications with a PC thru a tcp/ip socket.
> 
> The setup is:
> Ethernut V2.1
> Software 4.4.0
> 
> The software is build up by a couple of threads each handling some
> functions( lcd, push buttons, user functions etc )
> Then i have one thread that communicates with a server software on my PC.
> The communication is pretty simple.
> It's a client socket that connects to my server PC and then exchanges
> some XML data, disconnects, sleeps for 300ms and then start all over
> again.
> 
> This works fine for weeks without any problems if i have the server PC
> up and running and connected to the same LAN my ethernut is connected
> to.
> 
> Then one day i disconnected the server PC from the LAN and left a
> couple of the ethernut clients running over the weekend, then on
> Monday i connected my PC again and started up the server software but
> i noticed that only 1 out of 4 clients where still connecting (all of
> them uses the same software just different MACs and IPs )
> 
> I looked at the incoming traffic with wireshark and could not see any
> sign of life at all from the 3 clients not connecting.
> I tried to ping them and they all answer on pings and also all other
> threads that handles the LCD and push buttons are still up and running
> so the software is not dead.
> I tested to deactivate/activate the network connection on my PC to see
> if anyone of the clients woke up. No luck.
> 
> I then added another thread to the software ( i took the sample code
> for the tcps in the apps dir and created a thread to run that code )
> And when everything is ok i see the "inetd" thread timeout counting
> all the time and the thread executes as expected.
> 
> When the inetd thread stops executing i can connect to the unit and i
> get the output seen below:
> ----------------------------------------------------------------------------
> --------------------------------------------
> 220 List of threads with name,state,prio,stack,mem,timeout follows
> tcpsm   Sleep   32      461     OK      27
> TcpS    Run     64      2546    OK      None
> inetd   Sleep   64      2381    OK      None
> rxi5    Sleep   9       603     OK      1392
> wdt     Sleep   40      255     OK      8
> SmuTh   Sleep   64      65      OK      71
> PcuTh   Sleep   64      805     OK      1
> HvpsTh  Sleep   64      605     OK      24
> IppsTh  Sleep   64      965     OK      4
> TaTh    Sleep   64      65      OK      35
> LcdTh   Sleep   64      929     OK      34
> main    Sleep   64      733     OK      451
> idle    Ready   254     356     OK      None
> ----------------------------------------------------------------------------
> --------------------------------------------
> 
> For some reason the thread(inetd) just gets a timeout set to "None"
> instead of the NutSleep value.
> 
> I only have one place in the code that sets the thread to sleep and i
> have a fixed value there of 300. (NutSleep(300))
> So there must be somewhere else in the code the thread gets set to
> some wait state, but i have no idea how to figure out where and why
> this happens.
> 
> Can it be something that happens when the socket tries to connect to a
> IP/Server that doesn't exist on the LAN.
> 
> If it happens all the time it would be easier to figure out whats
> wrong but this can run for days without happening.
> Also if there was low memory the tcps thread wouldn't answer the
> incoming connection attempts i guess.
> 
> The thread code is below:
> ----------------------------------------------------------------------------
> --------------------------------------------
> THREAD(InetdThread, arg)
> {
>        TCPSOCKET *socket;
>        FILE *stream = 0;
>        u_long rip = inet_addr("192.168.0.115");
>        u_long tmo = 500;
>        int socket_error = 0;
>        uint8_t *start = 0, *stop = 0;
>        uint8_t unit[20], cmd[40], value[40];
>        uint8_t data_exchange_buffer[100] = "0";
> 
>        for(;;)
>        {
>                if ((socket = NutTcpCreateSocket()) != 0)
>                {
>                        NutTcpSetSockOpt(socket, SO_RCVTIMEO, &tmo,
> sizeof(tmo));
>                        NutTcpSetSockOpt(socket, SO_SNDTIMEO, &tmo,
> sizeof(tmo));
>                        if(NutTcpConnect(socket, rip, TCPSERVERPORT) == 0)
>                        {
>                                stream = _fdopen((int) ((uptr_t) socket),
> "r+b");
>                                if(stream != 0)
>                                {
>                                        fprintf_P(stream, info_P,
> INFO_P_ARGS); // Send some XML DATA
>                                        fflush(stream);
>                                        fgets(data_exchange_buffer,
> sizeof(data_exchange_buffer),
> stream); // Get some XML DATA
>                                        {
>                                                // Handle XML data
>                                        }
>                                        fclose(stream);
>                                        /*
>                                                info_text is a extern
> variable that another thread prints on the
> LCD for debug output.
>                                        */
>                                        sprintf(info_text ,"COK\n%lu",
> (u_long)NutGetMillis());
>                                }
>                        }
>                        else
>                        {
>                                socket_error = NutTcpError(socket);
>                                sprintf(info_text ,"CE:%d \n%lu",
> socket_error, (u_long)NutGetMillis());
>                        }
>                        NutTcpCloseSocket(socket);
>                }
>                NutSleep(300);
>        }
> }
> ----------------------------------------------------------------------------
> --------------------------------------------
> 
> 
> 
> --
> /Erik
> _______________________________________________
> http://lists.egnite.de/mailman/listinfo/en-nut-discussion
> 
> 
> 
> -- 
> No virus found in this incoming message.
> Checked by AVG. 
> Version: 7.5.519 / Virus Database: 269.21.8/1340 - Release Date:
> 23.03.2008
> 18:50
> 
> 
> _______________________________________________
> http://lists.egnite.de/mailman/listinfo/en-nut-discussion
> 
> 

-- 
View this message in context: http://www.nabble.com/-en-nut-discussion--thread-stops-executing-after-some-time.-tp16277335p16368466.html
Sent from the MicroControllers - Ethernut mailing list archive at Nabble.com.




More information about the En-Nut-Discussion mailing list