Author Topic: Ethernet interface on SB65EC stops responding  (Read 24536 times)

modtro2

  • Administrator
  • Hero Member
  • *****
  • Posts: 564
    • View Profile

peter_f

  • Full Member
  • ***
  • Posts: 18
    • View Profile
Re: Ethernet interface on SB65EC stops responding
« Reply #31 on: October 28, 2008, 06:39:52 PM »
Just an update to my lockups

===== peter_f =====
Lock up reported by peter_f

  • Description: Lock up with modified code, using FRAM
  • Model of SBC Board(s) that locks up: SBC65EC, HW:V3.01 SW:V3.06. Same error on multiple boards.
  • Lockup occurs on board with original or modified firmware: Both.  Is currently running modified code but I reverted to original image downloaded from Modtronix site and that locked up too.  Note: FRAM chip was still soldered to board but not being used by original code.
  • Details of error: Logging/displaying information from weather station. Worked fine for months, recently having issues with network not responding. Main program still works, but no more access via network. Reset cures problem, but after about a week it happens again.
  • How is board connected to network: direct to pc, via switch: Connected on home network, visable on internet. Blocked internet access, still happens.
  • Possible cause of error suggested by user #1:  I have an FRAM chip installed that I know shares I/O ports with the ethernet controller.  I have been careful with my code so as not to overlap FRAM operations and network operations but the issues started some time after I started using the FRAM chip.
  • Possible cause of error suggested by user #2: I had upgraded the MPLAB C Compiler to version 3.20 around that time as well (I can't remember what the previous version was but it was only a minor upgrade) but I'm pretty sure the previous version of my code that I reverted back to was compiled with the older compiler anyway (not 100% sure as I may have recompiled with the new compiler).
  • Does activity and link LED still work: Yes, activity LED still flashes when lock up occurs
  • Does unpluging network cable fix problem: No
  • Make and model of switch board is connected to: Netgear DG384GT ADSL Modem/Router/Switch
  • DHCP Server on network: Yes, in Netgear DG384GT
  • Is the board connected to a network with multiple computer? If so, what operating system are they running: 4 x Windows XP PC's
  • Sending UDP packets to board when it locks up: Sending UDP packets do not seem to cause the lockup.  Sending UDP packets while locked up do not work.  When locked up, all ARP requests fail (if aged out of ARP tables).  Have tried configuring static ARP entry but does not help.  The board will not respond via the network at all.
  • Is DHCP enabled on board: It was initially but after I started getting these lockups, I reverted to static IP address.
  • Has netBIOS and IP address of board been changed: IP address is 192.168.0.200.  Netbios name is still MXBOARD.

Although I haven't proved it yet but I suspect that the power supply may be a contributing factor.  I changed from a transformer type power pack to a switch mode power supply (plug pack type) and it locked up after less than a day.  I have switched back to a different transformer type power supply (I don't think it is regulated) and so far it has been running for several weeks.  This may be coincidental though.

Regards,

peter_f

AlexDavidson

  • Newbie
  • *
  • Posts: 1
    • View Profile
Re: Ethernet interface on SB65EC stops responding
« Reply #32 on: November 01, 2008, 09:12:28 PM »
I had a very similar problem with an SB45EC, with only UDP, ARP, & ICMP enabled. I eventually found that received bytes were appearing at data memory location 0x100 regardless of stack version, thus corrupting whatever variable resided there. I worked around it by declaring a dummy variable at 0x100.

That improved, but didn't completely fix, the problem. I later discovered that other memory locations in the range 0x100 - 0x117 would sometimes get altered at random. Since reserving that range with a dummy array I haven't had any problems for 7 months.

I also had the same experience with the power supply - a switching one seemed to cause the problem to appear more frequently than a transformer type.

In my case the problem had the symptoms of a buffer overrun or a faulty/missing bank select instruction, but I was never able to find exactly what was causing it.

sparkcatcher

  • Sr. Member
  • ****
  • Posts: 31
    • View Profile
Re: Ethernet interface on SB65EC stops responding
« Reply #33 on: November 04, 2008, 01:31:06 PM »
Here's some information you had requested regarding the lockups:

===== sparkcatcher =====
Lock up reported by sparkcatcher
Description: When the error happens, the receiver module stops receiving packets from the Realtek chip - tcp and udp, nothing comes in even though the receive light blinks on the ethernet connector

Model of SBC Board that locks up: sbc45 boards running 3.06
Lockup occurs on board with original or modified firmware: modified firmware
How is board connected to network:  switch:
Does activity and link LED still work:  Yes
Does unpluging network cable fix problem:  No
Make and model of switche board is connected to: LinkSys 10/100 4 port
DHCP Server on network: Yes
Is the board connected to a network with multiple computer? Yes
If so, what operating system are they running: XP, 98, Linux, FreeBSD
Sending UDP packets to board when it locks up: Don't Know
Is DHCP enabled on board: Yes
Has netBIOS and IP address of board been changed: NetBios Module Removed

I'd like to add that I think this is a stack timing issue.  I've monitored my application for over a year using the serial port as an output.  Everytime the lockup occurred, the main application continued to run, my serial port debug statements in the main loop continued to execute.  I had debug statements at the MAC layer of the stack and those statements stopped  executing after the lockup occurred, even though the ethernet connector LEDs showed packet activity, but the main loop logic continued to perform.  So the MAC layer never received notification from the RealTek chip that a new packet was ready for processing.  This leads me to believe that the receive transmit buffers are being corrupted.

If this were a power supply glitch problem, the main loop would glitch and would force a reset via the watchdog.  I never saw that, and that wouldn't cause a hangup because within 20 msec the board would be reset. 

That's what made the problem so difficult to engineer away - the processor continues executing like there is nothing wrong.

Because the mainloop application continues to execute, it seems that one might be able to write a test application that queried the the state of the network transmit and receive buffers.  These are on the Realtek chip, so might be hard to do.  Someone with very good knowledge of the way the Realtek chip interfaces with the controller will be required to find this.  I did study the Realtek interface and I have to say that interface is very unorthodox, you'd need the design engineer or a RealTek App Engineer to help you, good luck with that!

hope this helps. 
Now that the aussie dollar has depreciated so much, wonder if board prices is $us can return to previous levels? :wink:

sp,


g8kmh

  • Newbie
  • *
  • Posts: 1
    • View Profile
Re: Ethernet interface on SB65EC stops responding
« Reply #34 on: November 29, 2008, 06:10:12 AM »
I have had this (or a very similar) problem. The application is stripped down to UDP for the home automation xAP protocol. If I generate '000s of packets externally then I see errors in the MAC debug overruns first and then Invalid Packet Errors:

13:38:17.399 - Invalid Packet Error!
13:38:17.399 - Invalid Packet Error, PRX not set!
13:38:17.399 - Invalid Packet Error, Frame too large = 0x7d0a
13:38:17.415 - Invalid Packet Error, Next packet pointer too large = 0xd9


It appears the stack/chip never recovers from this. I can continue to send UDP packets from the board OK and main loop executes still, albeit much quicker.
My kludge (YMMV) is to reset the NIC in the code following the MAC error debug output in MACGetHeader. I'm sure with some time there may be a more elegant solution.
Regards,
Lehane
Board: SBC68EC/HW 2.22/BLN 1.00
Bulid info: UDP_SPEED_OPTIMIZE, STACK_USE_UDP, STACK_USE_FAST_NIC, MAC_SPEED_OPTIMIZE, NIC_DISABLE_INT0
Kludge:
        NICPut(CMDR, 0x21);
        DelayMs(2);

        //Clear Remote Byte count registers
        NICPut(RBCR0, 0);
        NICPut(RBCR1, 0);
        //Put NIC in loopback mode and issue start command
        NICPut(TCR, 0x02);
        NICPut(CMDR, 0x22); // HM: Modified, changed to 22 to re-start NIC

        //Ring now contains garbage
//        MACCurrRxbuf = NICWritePtr; // Empty ring buffer completely
//        MACDiscardRx();             //Discard the contents of the current RX buffer
        //Reset ISR by writing FF to it
        NICPut(ISR, 0xff);
        // Initialize CURRent pointer - this is the RX Buffer page PUT pointer
        NICPut(CURRP, RXSTART);

        // TCP layer uses this value to calculate window size. Without init the
        // first window size value sent would be wrong if no packet has been received before.
        MACCurrRxbuf = RXSTART;
        MACDiscardRx();             //Discard the contents of the current RX buffer

        //Take NIC out of loopback mode
        NICPut(TCR, 0x00);

sparkcatcher

  • Sr. Member
  • ****
  • Posts: 31
    • View Profile
Re: Ethernet interface on SB65EC stops responding
« Reply #35 on: December 09, 2008, 03:19:42 PM »
g8:
Interesting post.  I note that in my application when the board stops accepting packets, it does still issue DHCP requests, which are udp packets, so your observations are very similar to what I have been seeing.

Just wondering what your criterion was for resetting the NIC like you do, and if you reset this way it would seem that you would also have to re-intialize the tcp stack.

However, if your criteria for detecting the glitch is 100% then it would be very easy to just reset the complete controller when the detection condition is met.

Is there a repeateable set of error messages that you see prior to the corruption happening? 
In my case the incoming packets simply stop being detected, so there is no get mac header request, no sign that a packet has arrived from the realtek chip.  It could be that your detection criterion would be worth monitoring, so if you could expand on that I'd appreciate it.

sp,


 

peter_f

  • Full Member
  • ***
  • Posts: 18
    • View Profile
Re: Ethernet interface on SB65EC stops responding
« Reply #36 on: February 26, 2009, 02:10:56 AM »
After putting up with my SBC65EC disappearing off the network frequently (on average, once a week), I thought I would try something simple.  I use my SBC65EC to collect data from a weather station via the Serial Port.  Every minute it receives the weather station's current time and date so I modified the code to call MACInit() whenever it receives a timestamp for midnight.  The idea was that if the NIC does stop responding then hopefully at midnight, the MACInit() would reset it and it would start working again.  That was over three weeks ago and so far, it has behaved itself perfectly.  I have yet to detect the NIC stop responding at all.

I know it has been only just over three weeks and it may be just pure coincidence but in the past, the board would very rarely run this long without issues.

I'll let it run hopefully a few weeks longer and see how it goes.

Peter

zed984

  • Newbie
  • *
  • Posts: 3
    • View Profile
Re: Ethernet interface on SB65EC stops responding
« Reply #37 on: March 05, 2009, 07:16:05 AM »
I've had quite the same issue on two boards this week! Never happened before, this coincide with me adding some bytes in the appcfg struct.
Like all the others, the board is still working properly apart from the Ethernet. I'm using the SPI LCD with a 16keys keypad, along with 2 external SPI DAC all on port F[6,0] so there is some talk between devices.
I use the appcfgPut/getc() functions to read and write to the eeprom a configuration that is to be changed by a user operating the keypad. This works great without any problems. The reason i'm discussing this is because unlike all of you, even if I reset the board, unplug/replug the Ethernet cable, boot in safe mode, I cannot connect to the webserver or bootloader anymore. Maybe it has something to do with the bootloader?

One other thing while I think of it: I have modified the .lkr file to reserve 512bytes of memory to declare a LCD table larger than 256bytes as described to OmarZ in a post from Modtro2. Worked well, but thought it was worth to mention.
Heres my report

=====zed984 =====
Lock up reported by zed984

    * Description: Lock up with modified code, using EEPROM to store custom config data
    * Model of SBC Board(s) that locks up: SBC65EC, HW:3.01 SW:3.06. Same error on 2 boards.
    * Lockup occurs on board with original or modified firmware: Modified
    * Details of error: Custom SPI functions still work. Device still communicates with LCD, save and read to EEPROM. Ethernet stops responding even after reset, cable unplug or boot in safe mode.
    * How is board connected to network: XC cable to PC
    * Possible cause of error suggested by user #1: Appcfg struct modified, although seems to work well, may be interfering with the bootloader?
    * Possible cause of error suggested by user #2: Happened working on a Vista PC with XC cable. Board never hang operating it win XP machines over the company network with multiple computers
    * Does activity and link LED still work: Yes
    * Does unpluging network cable fix problem: No
    * Make and model of switche board is connected to:
    * DHCP Server on network:
    * Is the board connected to a network with multiple computer? If so, what operating system are they running: No
    * Sending UDP packets to board when it locks up:
    * Is DHCP enabled on board:No
    * Has netBIOS and IP address of board been changed: Yes, to the 192.168.1/24 domain

I appreciate every effort you make in resolving this issue, as it is critical to me (our product being developed around it). I can send samples of custom code by email if needed

peter_f

  • Full Member
  • ***
  • Posts: 18
    • View Profile
Re: Ethernet interface on SB65EC stops responding
« Reply #38 on: April 02, 2009, 03:17:17 AM »
After putting up with my SBC65EC disappearing off the network frequently (on average, once a week), I thought I would try something simple.  I use my SBC65EC to collect data from a weather station via the Serial Port.  Every minute it receives the weather station's current time and date so I modified the code to call MACInit() whenever it receives a timestamp for midnight.  The idea was that if the NIC does stop responding then hopefully at midnight, the MACInit() would reset it and it would start working again.  That was over three weeks ago and so far, it has behaved itself perfectly.  I have yet to detect the NIC stop responding at all.

I know it has been only just over three weeks and it may be just pure coincidence but in the past, the board would very rarely run this long without issues.

I'll let it run hopefully a few weeks longer and see how it goes.

Peter

Another five weeks has gone by and the board is still behaving beautifully.  I have still to detect any form of issue with the NIC since changing my code to run a MACInit() at midnight.  It may be a hack but at least it works.

Peter

Danchoi-955

  • Sr. Member
  • ****
  • Posts: 26
    • View Profile
Re: Ethernet interface on SB65EC stops responding
« Reply #39 on: March 09, 2010, 11:16:45 AM »
I've designed a new GUI to interface and control things through the SBC65EC using javascript.  I had the same problem but it's through using Firefox browser, but it was kind of random, and more so with the cache clear out.  All the other browsers like Chrome, IE, and Safari all work.  Has anyone had this problem with the newer TCP/IP Stack 4.18?