Downtime

Moderator: Officers

Post Reply
User avatar
Oz
Site Admin
Posts: 83
Joined: Thu 01 Jan, 1970 10:00 am

Downtime

Post by Oz »

Mar 02, 2004 (2:30AM EST) - Beginning on Saturday, Feb. 28th and continuing through Monday, March 1st the ServerFly network experienced extended periods of downtime. The outages appear to have been caused by multiple variables in several facilities.

During the outages, extensive testing, maintenance, and replacement of gear in our Scranton facility, in our Philadelphia facility, and in our fiber providers facilities was performed. Much of this maintenance is only possible during a total outage such as this. Possible problematic pieces of equipment and cabling were found in all facilities, including the apparent cause of the recent packet loss issues over the last 2 months (my tests show 0% packet loss since restoring of service this evening).

The basic summary is that our fiber provider had gear that needed to be swapped out due to errors, and we had one piece of network gear and 2 fiber cables that also showed errors. It is basically impossible to exactly pinpoint which piece caused the issue. Likely, any one of these by themself would have caused difficult to track and diagnose issues (such as low-level packet loss) but, when combined, seemed to trigger this weekends outages.

Due to customer feedback, ServerFly Management realizes that many of our customers were not satisfied with the perceived handling of this crisis. We will be meeting to seriously evaluate and discuss the outage and our response to it, as well as to discuss possible solutions for avoiding further problems of this magnitude.

All services should be fully restored now network-wide, including; network connectivity, software, and servers. If you have any further issues currently, please contact our support team through our Technical Helpdesk that are available 24x7.

Your support and encouragement during the outage has been much appreciated. We appreciate your past business, and thank you for sticking with us during these issues. We look forward to providing a higher level of service and support to you now that the repairs have been completed.

We apologize for the large inconvenience, and thank our customer base for their patience.

Warmest Regards,
ServerFly Management Team


--------------------------------------------------------
Mar 01, 2004 (7:30PM EST) - We will have an official statement on the outages and repairs later this evening.

Please check back later this evening.

Thank you all for your support throughout this stressful ordeal. Most of you have been very patient and understanding -- and we thank you so much for that.


--------------------------------------------------------
Mar 01, 2004 - CTSI (our telco) explained that the network outage earlier was caused by faulty GIG-E cards on their equipment in one of their POPs, in which they had to wait for the replacement hardware to arrive.

Finer details will be added here once we have a full report from the telco. We sincerely apologize for the outage.


--------------------------------------------------------
On Feb 29, 2004, we experienced yet another outage on our fiber path between Scranton and Philadelphia.

At this point, we are back up and running. The problem was discovered to be a multi-level issue within the providers network. Tomorrow more details as to the exact cause should be available. Please check back this thread tomorrow.


--------------------------------------------------------
On Feb 26, 2004 at aproximately 11:39AM EST, ServerFly lost transport connectivity between our Scranton data center and Philadelphia.

While both sides initially were reporting that the traffic was flowing normally, the receiving side was seeing garbage as opposite to the properly formatted frames. At aproximately 12:15PM EST, equipment on both sides of the transport gige circuit stopped receving packets.

The telco dispatched a repair tech to our Scranton facility and arrived at our datacenter for repairs at 1:07PM EST. At approximately 1:31PM EST, connectivity was fully restored.

We apologize for any inconvenience caused.
Post Reply