[En-Nut-Discussion] [en-nut-discussion] thread stops executingafter some time.
jakub nowak
jdnowak at gmail.com
Wed Apr 2 00:06:06 CEST 2008
Maybe try dynamic IP gateway ... -> if dhcp will work fine, at least
You will know where is problem.
2008/3/29, Erik L <erik.lindstein at gmail.com>:
>
> I dont think the problem is created when i disconnect the PC.
> I connected/disconnected the PC lots of times in a short period of time and
> the clients still always connect.
> Also i have clients that havent been connected at all efter i powerd them up
> and they still get the same problem if i leave them for a couple of hours
> before i connect the PC.
>
> I have no route set, the communication is just on the LAN.
> (Clients 192.168.0.1-10 server 192.168.0.115) all ipadresses are static.
>
> /Erik
>
>
>
>
> ernstst wrote:
> >
> > Hi Erik!
> >
> > quote
> > But when the problem occurs the software in the client can´t get to that
> > point where it actualy exchanges the data because the socket shoudnt be
> > able
> > to connect.
> > " if(NutTcpConnect(socket, rip, TCPSERVERPORT) == 0) "
> >
> > If the server(rip:TCPSERVERPORT) issnt connected to the LAN it can´t
> > return
> > 0 correct?
> > And even if it for some reason does the socket read timeouts should occur
> > and it should manage to get past the data exchange rutines and then start
> > all over again.
> > --------------------------------
> > Unquote
> >
> > I am not sure this is correct (ie. If the read timeout triggers on a
> > NutTcpConnect which does not do thru)
> >
> > I need to think about it, maybe give it a try. A quick look into TCPSOCK.C
> > and the like didn't enlighten me ... Maybe the TCP protocol state in which
> > the disconnect occurred somehow influences if the reconnect works.
> > Another thought: IP Routing tables. Are they still "there" when the
> > reconnect is attempted? Have you defined a route?
> >
> > Regards
> > Ernt
> >
> >
> >
> > -----Ursprüngliche Nachricht-----
> > Von: en-nut-discussion-bounces at egnite.de
> > [mailto:en-nut-discussion-bounces at egnite.de] Im Auftrag von Erik Lindstein
> > Gesendet: Dienstag, 25. März 2008 12:21
> > An: en-nut-discussion at egnite.de
> > Betreff: [En-Nut-Discussion] [en-nut-discussion] thread stops
> > executingafter
> > some time.
> >
> > Ernst, thank you very much for taking time to answer.
> >
> > I'l write some comments down below
> >
> > I understand this is a sporadic problem so it takes a lot of time to run
> > into the "error" situation, but anyway:
> >
> > 1) ... only 1 out of 4 clients where still connecting ...
> > When you experience this situation more than once, is it always the same
> > Ethernut which can still connect or is that also "random". And what looks
> > "random" in the first place, is it really?
> >
> > --------------------------------
> > Well, i don´t think there is much that involves software that is truly
> > random :-) so ofcorse this issnt either.
> >
> > But here it can be any one of the clients that stops executing the
> > thread, usuly i can see that after some time(~6 - 7h) one or two
> > stoped connecting and there can be one left running for up to 24h
> > (perhaps longer). But in the end all of them stops trying to connect
> > and the thread sets sleep time to "None".
> > --------------------------------
> >
> >
> >
> >
> > 2) .. (all of them uses the same software just different MACs and IPs ) ..
> > Even if all of them use the same SW, are they operating under
> > same/similar/different conditions? (I mean ".. exchanges some XML
> > data,..":
> > where does this data come from?, i.e. how is it generated and how
> > different
> > can it be between the Ethernuts?
> >
> > --------------------------------
> > When the server software on the PC is running and the PC is connected
> > then the client socket gets connected and the client sends some values
> > read from the A/D, some status variables and then the PC responds with
> > a command that tells the client to do "something".
> > Usuly the PC just sends a "reset watchdog command" to the client.
> >
> > But in this case everything workes fine as long as the software is
> > running and the PC is connected.
> > When i then close down the server software the client gets a command
> > that tells it to start reseting the WDT localy.
> >
> > But when the problem occurs the software in the client can´t get to
> > that point where it actualy exchanges the data because the socket
> > shoudnt be able to connect.
> > " if(NutTcpConnect(socket, rip, TCPSERVERPORT) == 0) "
> >
> > If the server(rip:TCPSERVERPORT) issnt connected to the LAN it can´t
> > return 0 correct?
> > And even if it for some reason does the socket read timeouts should
> > occur and it should manage to get past the data exchange rutines and
> > then start all over again.
> > --------------------------------
> >
> >
> >
> > 3) How about buffer overflows due to "special" tx/rx data conditions
> > (length)?
> > --------------------------------
> > In this case it only happens when (atleast think that) I don´t read or
> > send any data more than the data that the tcpsm sends out trying to
> > connect the socket.
> > My code dossnt do any rx/tx until the socket and stream is OK.
> >
> > --------------------------------
> >
> >
> >
> > 4) Try looking "into" the TCP sockets. My bank switch test-program at
> > http://www.es-business.com/Firma/eng/edocs.htm may help. Include cli.c and
> > dump.c in your main pgm and create a thread as indicated in the source.
> > It contains a Telnet based CLI which has a "lists" command which walks
> > thru
> > and displays all Nut/OS known lists (TCP Sockets is one of these). One
> > word
> > of caution: Because Nut/OS (i.e. other threads) are executing while this
> > command follows the pointer in the various lists pointing from one entry
> > to
> > the next, the command may loop in case a list is updated (by an or on
> > behalf
> > of an app thread) right when this pointer in the list is used by the
> > "lists"
> > command itself.
> > The dump command may help you peek around on RAM.
> >
> > --------------------------------
> > Il look into that, thanks..
> >
> > --------------------------------
> >
> >
> >
> > 5) Is there a possibility to have wireshark monitoring the TCP/IP link up
> > until the PC gets disconnected? This way, you could find out what was
> > exchanged immediately before the disconnect happened and maybe this gives
> > more info about the internal status of the Ethernuts and the TCP/IP
> > connection states.
> >
> > --------------------------------
> > The PC only gets disconnected when i remove the LAN cable but i could
> > monitor the data until that point.
> > But i can disconnect and connect the cable many ( unlimited? ) times
> > and there is no problems. The clients always connects again if i dont
> > leave the PC unconnected for a longer period of time ( > ~5-6h )
> >
> > One possibility might be to have the switch i use setup to echo all
> > trafic out on another port and monitor the trafic there with
> > wireshark. That way i might be able to se what happens before it stops
> > working.
> > But if i have the PC connected in the "normal" way the problem dossnt
> > occur.
> > --------------------------------
> >
> >
> >
> > 6) Do you log the state of the TCP/IP connection between the Ethernuts and
> > the PC within the PC? Maybe such log (record length / contents) could
> > provide some more info.
> >
> > --------------------------------
> > Because of the problem only occuring when the PC issnt connected this
> > is hard to do.
> > I can log the trafic when everything workes fine but not sure it gives
> > away the problem but perhaps someone with more knowledge of TCP/IP can
> > se some something here.
> > --------------------------------
> >
> >
> >
> >
> > 7) The most important question is:
> > Is the problem caused by behaviour in Nut/OS or Nut/Net (IP stack, timers,
> > events etc)
> > Or
> > Is the problem cause by some behaviour in the application threads.
> > Is there any chance to "strip down" the application threads to try to
> > minimize their possible impact on the situation?
> >
> > --------------------------------
> > I minimized the software to only include the thread for client socket
> > and one tcpserver thread. But this still happends. Il try to remove
> > some more code in the client thread and se if it changes anything.
> > I did get the feeling that this software took longer time before it
> > stoped trying to connect. But thats not 100% verified. Anyway still
> > it stops.
> > --------------------------------
> >
> >
> > 8) I have an example here of a test app, which produces the following
> > threads list:
> >
> > CLI>threads
> > Name Status Prio Stack Memory Timeout INFO-addr Bank
> > CMDLINE Run 64 891 OK None 36C9 -1
> > XHTST Sleep 64 357 OK 6 3203 9
> > XHTST Ready 64 357 OK None 2F3D 8
> > XHTST Sleep 64 357 OK None 2C77 7
> > XHTST Ready 64 357 OK None 29B1 6
> > XHTST Sleep 64 357 OK 24 26EB 5
> > XHTST Ready 64 357 OK None 2425 4
> > XHTST Sleep 64 357 OK 13 215F 3
> > XHTST Ready 64 357 OK None 1E99 2
> > XHTST Sleep 64 357 OK 6 1BD3 1
> > tcpsm Sleep 32 468 OK 102 1925 -1
> > XHTST Sleep 64 357 OK None 16C1 0
> > rxi5 Sleep 9 603 OK 699 145F -1
> > main Sleep 200 705 OK 940 1041 -1
> > idle Ready 254 356 OK None D21 -1
> >
> > The XHTST threads a looping apps who work in memory, display info via
> > TCP/IP
> > to a telnet client and sometimes sleep for a random time.
> > There are threads which are Sleeping and do not have a Timeout associated
> > with them! (maybe when they are waiting for the telnet output to
> > complete?)
> >
> > --------------------------------
> >
> > I have no idea but perhaps when
> > " NutTcpConnect(socket, rip, TCPSERVERPORT) " executes it sets the
> > thread to "None" and then waits for some event from the tcpsm that
> > then never occurs.
> >
> > --------------------------------
> >
> >
> > I am quite sure you have thought about some (if not all) of this already,
> > but maybe it "kicks" off some more thoughts.
> >
> > Good luck
> > Regards
> > Ernst
> >
> >
> > -----Ursprüngliche Nachricht-----
> > Von: en-nut-discussion-bounces at egnite.de
> > [mailto:en-nut-discussion-bounces at egnite.de] Im Auftrag von Erik
> > Lindstein
> > Gesendet: Montag, 24. März 2008 17:13
> > An: en-nut-discussion at egnite.de
> > Betreff: [En-Nut-Discussion] Thread stops executing after some time.
> >
> > Guys please help me out.
> > I'm on a wild goose chase trying to figure out what is happening with
> > a thread that handles communications with a PC thru a tcp/ip socket.
> >
> > The setup is:
> > Ethernut V2.1
> > Software 4.4.0
> >
> > The software is build up by a couple of threads each handling some
> > functions( lcd, push buttons, user functions etc )
> > Then i have one thread that communicates with a server software on my PC.
> > The communication is pretty simple.
> > It's a client socket that connects to my server PC and then exchanges
> > some XML data, disconnects, sleeps for 300ms and then start all over
> > again.
> >
> > This works fine for weeks without any problems if i have the server PC
> > up and running and connected to the same LAN my ethernut is connected
> > to.
> >
> > Then one day i disconnected the server PC from the LAN and left a
> > couple of the ethernut clients running over the weekend, then on
> > Monday i connected my PC again and started up the server software but
> > i noticed that only 1 out of 4 clients where still connecting (all of
> > them uses the same software just different MACs and IPs )
> >
> > I looked at the incoming traffic with wireshark and could not see any
> > sign of life at all from the 3 clients not connecting.
> > I tried to ping them and they all answer on pings and also all other
> > threads that handles the LCD and push buttons are still up and running
> > so the software is not dead.
> > I tested to deactivate/activate the network connection on my PC to see
> > if anyone of the clients woke up. No luck.
> >
> > I then added another thread to the software ( i took the sample code
> > for the tcps in the apps dir and created a thread to run that code )
> > And when everything is ok i see the "inetd" thread timeout counting
> > all the time and the thread executes as expected.
> >
> > When the inetd thread stops executing i can connect to the unit and i
> > get the output seen below:
> > ----------------------------------------------------------------------------
> > --------------------------------------------
> > 220 List of threads with name,state,prio,stack,mem,timeout follows
> > tcpsm Sleep 32 461 OK 27
> > TcpS Run 64 2546 OK None
> > inetd Sleep 64 2381 OK None
> > rxi5 Sleep 9 603 OK 1392
> > wdt Sleep 40 255 OK 8
> > SmuTh Sleep 64 65 OK 71
> > PcuTh Sleep 64 805 OK 1
> > HvpsTh Sleep 64 605 OK 24
> > IppsTh Sleep 64 965 OK 4
> > TaTh Sleep 64 65 OK 35
> > LcdTh Sleep 64 929 OK 34
> > main Sleep 64 733 OK 451
> > idle Ready 254 356 OK None
> > ----------------------------------------------------------------------------
> > --------------------------------------------
> >
> > For some reason the thread(inetd) just gets a timeout set to "None"
> > instead of the NutSleep value.
> >
> > I only have one place in the code that sets the thread to sleep and i
> > have a fixed value there of 300. (NutSleep(300))
> > So there must be somewhere else in the code the thread gets set to
> > some wait state, but i have no idea how to figure out where and why
> > this happens.
> >
> > Can it be something that happens when the socket tries to connect to a
> > IP/Server that doesn't exist on the LAN.
> >
> > If it happens all the time it would be easier to figure out whats
> > wrong but this can run for days without happening.
> > Also if there was low memory the tcps thread wouldn't answer the
> > incoming connection attempts i guess.
> >
> > The thread code is below:
> > ----------------------------------------------------------------------------
> > --------------------------------------------
> > THREAD(InetdThread, arg)
> > {
> > TCPSOCKET *socket;
> > FILE *stream = 0;
> > u_long rip = inet_addr("192.168.0.115");
> > u_long tmo = 500;
> > int socket_error = 0;
> > uint8_t *start = 0, *stop = 0;
> > uint8_t unit[20], cmd[40], value[40];
> > uint8_t data_exchange_buffer[100] = "0";
> >
> > for(;;)
> > {
> > if ((socket = NutTcpCreateSocket()) != 0)
> > {
> > NutTcpSetSockOpt(socket, SO_RCVTIMEO, &tmo,
> > sizeof(tmo));
> > NutTcpSetSockOpt(socket, SO_SNDTIMEO, &tmo,
> > sizeof(tmo));
> > if(NutTcpConnect(socket, rip, TCPSERVERPORT) == 0)
> > {
> > stream = _fdopen((int) ((uptr_t) socket),
> > "r+b");
> > if(stream != 0)
> > {
> > fprintf_P(stream, info_P,
> > INFO_P_ARGS); // Send some XML DATA
> > fflush(stream);
> > fgets(data_exchange_buffer,
> > sizeof(data_exchange_buffer),
> > stream); // Get some XML DATA
> > {
> > // Handle XML data
> > }
> > fclose(stream);
> > /*
> > info_text is a extern
> > variable that another thread prints on the
> > LCD for debug output.
> > */
> > sprintf(info_text ,"COK\n%lu",
> > (u_long)NutGetMillis());
> > }
> > }
> > else
> > {
> > socket_error = NutTcpError(socket);
> > sprintf(info_text ,"CE:%d \n%lu",
> > socket_error, (u_long)NutGetMillis());
> > }
> > NutTcpCloseSocket(socket);
> > }
> > NutSleep(300);
> > }
> > }
> > ----------------------------------------------------------------------------
> > --------------------------------------------
> >
> >
> >
> > --
> > /Erik
> > _______________________________________________
> > http://lists.egnite.de/mailman/listinfo/en-nut-discussion
> >
> >
> >
> > --
> > No virus found in this incoming message.
> > Checked by AVG.
> > Version: 7.5.519 / Virus Database: 269.21.8/1340 - Release Date:
> > 23.03.2008
> > 18:50
> >
> >
> > _______________________________________________
> > http://lists.egnite.de/mailman/listinfo/en-nut-discussion
> >
> >
>
>
> --
> View this message in context: http://www.nabble.com/-en-nut-discussion--thread-stops-executing-after-some-time.-tp16277335p16368466.html
> Sent from the MicroControllers - Ethernut mailing list archive at Nabble.com.
>
>
> _______________________________________________
> http://lists.egnite.de/mailman/listinfo/en-nut-discussion
>
More information about the En-Nut-Discussion
mailing list