Author Topic: Various bugs in the network stack  (Read 4722 times)

VincentLascaux

  • Full Member
  • ***
  • Posts: 18
    • View Profile
Various bugs in the network stack
« on: November 09, 2008, 11:54:46 PM »
I've been chasing various bugs in the stack, here is what I found:

- In StackTsk.c, the StackTask function has a loop (and it's also common to see in this forum people writing tight loops calling StackTask). I believe the original intent of the function was to fully process a packet. TCP doesn't process the packet (it only marks the socket as "ready to be read"). There is nothing to protect StackTask to overwrite the current Rx buffer when it calls MACGetHeader.
The fix I did for this one was to add a variable (I called it MACRXBufferDiscarded) that indicates if the MACDiscardRxBuffer function was called, and to return without doing anything before calling MACGetHeader when that flag is set (so init MACRXBufferDiscarded to TRUE in MACInit, set it to TRUE in MACDiscardRx, add a check for the variable before the call to MACGetHeader and set the flag to TRUE if MACGetHeader succeeds)

- TCPIsPutReady checks IPIsTxReady (which is defined to MACIsTxReady) if the socket doesn't have a buffer allocated for it yet. If the currently selected MAC Tx buffer is being transmitted, MACIsTxReady returns if the transmission is completed (which is not what is necessary for TCPIsPutReady, all we care about is if there are available MAC buffers)

- TCPPut and TCPPutArray don't ensure the buffer of the socket is the active MAC buffer. Proposed fix is to add a check:
    if (CurrentTxBuffer != ps->TxBuffer)
        IPSetTxBuffer(ps->TxBuffer, sizeof(TCP_HEADER) + ps->TxCount);

- TCPGetArray doesn't update the value of ps->RxCount as TCPGet does

VincentLascaux

  • Full Member
  • ***
  • Posts: 18
    • View Profile
Re: Various bugs in the network stack
« Reply #1 on: December 13, 2008, 12:26:13 PM »
So nobody from modtronix care about those bugs?

sparkcatcher

  • Sr. Member
  • ****
  • Posts: 31
    • View Profile
Re: Various bugs in the network stack
« Reply #2 on: December 14, 2008, 08:49:29 PM »
Vincent,

>>There is nothing to protect StackTask to overwrite the current Rx buffer when it calls MACGetHeader.

I wrote my app about a year ago and so don't remember the exact details, but I do recall in the in the documentation something about what you state.  Essentially once a header is received, the application has to retreive all of the bytes as quickly as possible to prevent what you are describing.   Just based on the documentation I just assumed that some overwrite might be possible.  If any overwrite were to happen it would seem that it would gum up the application and not cause corruption in the receiving buffers, though I'm not sure.  I haven't experenced an application "gum up" at this point.

I'm wondering if you could provide some results as to the benefit your modification has provided.

>>TCPPut and TCPPutArray don't ensure the buffer of the socket is the active MAC buffer.

I recall from the documentation that some function had to be called prior to TCPut and this function set the default buffer that TCPPut used.

Looks like you are diving into this hard core.  It gets very messy fast.  I haven't had a problem my my application receiving bad data or getting confused, I've only seen an annoying issue where sometimes the receiver stops receiving, though the application still works and the transmitter still transmits.  I've attributed this to some type of receiver corruption, though it is very interrmitent.  I'm still interested in tracking this down and wonder if you could characterize the issues that you were having that the mods you made fixed.

thanks,
sp,


VincentLascaux

  • Full Member
  • ***
  • Posts: 18
    • View Profile
Re: Various bugs in the network stack
« Reply #3 on: December 14, 2008, 11:11:35 PM »
I find it really hard to debug the card. I had some lost packet issue and dropped connection issues. I did a review of the code and fixed whatever looked suspicious without checking for sure if it was related to the bug I'm seeing.

Actually that's the main reason why I'd like to get feedback on the changes: I can't prove that there was an issue that got fixed by each of my fix. I can only reason about them and I'm looking for someone to tell me why I'm wrong and why the original modtronix code was right.

For the StackTask problem, I can't see what prevents the second pass of the loop to read another message. In other word, I can't see what prevents a single call to StackTask from reading two (or more) TCP packets. So even following the documentation, I think there can be packet loss.

For TCPPut and TCPPutArray, you have a point, I should have read the documentation. The fact that each function takes the socket you want to act on make it look like you can change socket without problem (but the documentation says it's not the case). I might keep my modification that allows me to do so because I find it less error prone, but since the documentation says it's not a bug, I guess it's not :)

I had another post where I questioned the correctness of the TCP timer. I still think it's wrong to reset the timer on reception of a message. We should reset the timeout when we send a message (waiting for the acknowledgment of the message). As it is, it the TCP connection goes silent, it times out.

sparkcatcher

  • Sr. Member
  • ****
  • Posts: 31
    • View Profile
Re: Various bugs in the network stack
« Reply #4 on: December 15, 2008, 04:20:51 PM »
Vincent:
The tcip stack has fairly long history and has gone through some major revs.  I think it orginated at some version of the Microchip stack, and was subsequently updated as Microchip updated their stack.  It's very possible because of this history it is very possible that what you think is logical might not actually be the case.

Also, because fitting the stack in what was originally a microcontroller with very limitied resource (ie, 32kROM, 1.5k RAM, 4k or 8k of receive/transmit buffer in the MAC chip) neccessitated some tradeoffs, and most likely robustness was involved in the tradeoff.  Now that many of the microcontrollers have 64k rom and 3k to 9k RAM,  it is most likely the case that the stack needs to be be re-architected, because with the increased resources, people will definitely be expecting  more robust performance.

If you are interested in doing more in-depth bug fixing you might want to start scanning the Microchip Ethernet Stack forum.  There are other stacks that have split from the main direction for various reason.  There is a stack, I seem to recall a Version 3.75, forgot the name, that is said to be fairly robust, though that depends on how one is using it.  These most likely need porting to work on Modtronix product, but you could review certain modules to look to see how different they are from the Modtronix stack.

I think most here are expecting some degree of operability in the provided stack, though certianly not the same level of resilience provided by a more resource full client adaptor.   Most aren't going to look into the stack to try to troubleshoot it.  I think doing so most likely should be a paying position somewhere, because whoever can do it, would most likely end up learning how to write a complete tcp stack. Plus, to get it right you are going to need a well equipped testing lab and reams of changelogs. :cry:

sp,

modtro2

  • Administrator
  • Hero Member
  • *****
  • Posts: 564
    • View Profile
Re: Various bugs in the network stack
« Reply #5 on: December 15, 2008, 08:44:45 PM »
Hello Vincent

Sorry for late reply! Thanks for your suggestions! I have added them to the list of things to check (possible bug fixes) for the next version of the stack. The current beta version of the stack will soon be officially released, I hope to have time to check it with your suggestions. It is very time consuming integrating changes into the stack, and ensuring it works with all possible equipment. But, all input and bug reports are highly appreciated!