[En-Nut-Discussion] Nutos 5.1 on Ethernut 1.3g with multiple threads: network freezes

Jonathan Woithe jwoithe at atrad.com.au
Thu Jun 11 10:04:12 CEST 2015


Hi everyone

I have run many tests today without coming up with a definitive answer.  I
did not get a chance to do pin toggle tests - that will come later if
needed.

The most interesting finding came when I instrumented only NutIpOutput(),
NutIpInput(), NutEtherOutput() and NicPutPacket() (in nicrtl.c).  In the
fault condition, NicPutPacket() appeared to be receiving the ICMP reply
packet for transmission, was submitting it and returning normally.  However,
no such packet appeared on the ethernet output.  Taken at face value this
suggested that the NIC itself had failed to transmit the packet for some
reason.

I had also noticed that compared to NutOS 4.4.1 the behaviour of the NIC
LEDs under NutOS 5.1.0 had changed (rather than both being mostly on, one
was now continuously off).  This got me thinking about the RTL8019AS
initialisation sequence.

Looking in nicrtl.c it is clear that EEPROM emulation is no longer being
done here.  A check of the svn repo shows that it was removed in r4711
(dated 7 Oct 2012).  Curiously, there was also a less sophisticated EEPROM
emulation routine originally in arch/avr/os/nutinit.c which was moved to
arch/avr/board/ethernut1.c in r3000.  This code remains in place, but as
mentioned it does not set the NIC up as completely as the deleted code.

The commit message in r4711 says:

  Remove EEPROM emulation from RTL8019AS driver. When compiling with
  avr-gccdbg, the driver crashes at NutDelay. EEPROM emulation is quite board
  specific and we may later implement a separate driver in the Ethernut 1
  board file. For now we live with the fact, that additional Ethernet
  collisions may appear. Much better than a driver which crashes in debug
  mode.

Given my experience chasing the present issue down (where precise symptoms
seemingly varied depending on what additional code was included in NutOS
functions) I wonder whether this crashing in NutDelay() was a symptom of the
same deeper problem that I have triggered.  It is certainly far from obvious
why the code removed in r4711 would crash inside NutDelay() only in debug
mode.  This kind of non-deterministic behaviour is exactly the sort of thing
I've been seeing.

Lacking anything else more concrete, I manually patched the EEPROM emulation
code from NutOS 4.4.1 back into nicrtl.c from NutOS 5.1.0 along with the
defines it required.  The call sequence

  if (DetectNicEeprom() == 0) {
      EmulateNicEeprom();
  }

was inserted into NicStart() just after the NicReset() clause.  When the
resulting NutOS was used with the test program posted previously, full
network functionality was restored (in so far as ICMP pings worked once
again).

I then recompiled and relinked our original application against this revised
NutOS and tested it.  It too worked, and full network connectivity (ICMP and
TCP) was observed.

Next up I temporarily changed the call sequence in NicStart() to

  if (DetectNicEeprom() == 0) {
      EmulateNicEeprom();
  }

This would effectively prevent the old EEPROM emulation code from running
while still keeping it in the program space (thus preserving program memory
layout).  The resulting firmware failed to respond to ICMP pings.

These observations suggest one of several options.

 1) The old EEPROM emulation code is configuring the NIC in a certain way
    that is required for reliable operation under all possible
    circumstances.

 2) When it runs the EEPROM emulation code is leaving some values in RAM
    which happen to word around some other as yet unidentified problem
    within NutOS.

As I understand it, the default EEPROM emulation in nutinit.c simply holds
the data line high for the duration of the process.  In nicrtl.c, the only
data fields not held high is CONFIG3 (0x30).  I changed this to 0xFF and
re-tested.  Networking worked and the LEDs were configured to both operate. 
If option 1 is the key to the problem, the critical configuration details
may not be in the emulated EEPROM contents themselves, but rather a side
effect of the process.

This is as far as I got today.  Is there anything we can conclude from these
findings?

Regards
  jonathan


More information about the En-Nut-Discussion mailing list