Data Center Horror Stories
With Halloween coming up in just a few days, this week will most likely be dominated by trick-or-treaters, candy, and cheesy costumes. While I won’t get into things like the best decorations or most unique Halloween costumes this year (we don’t get many trick-or-treaters at Data Cave), I do have a few scary stories to share.
For Data Cave as well as any other data center, our number one enemy is downtime. All of our equipment, security policies, and infrastructure we have in place is centered around ensuring a level of 100% uptime, all the time. Any level of downtime is unacceptable, but when a data center experiences a significant level of downtime from an outage, it’s downright scary.
With that, I give you the scariest data center “horror stories” of 2013:
1) Vampires suck all power from Sears data center
Shortly after the 2012 holiday season, Sears experienced not one, but two major outages at its primary data center in Michigan. It began when all 4 of their UPS (uninterruptable power supply) units failed nearly simultaneously. In addition, several hours passed before they could start up their backup generators to restore power. This led to their e-commerce website and all of their internal systems to go down for several hours (not good for their online sales efforts).
Later in the month, they experienced virtually the same outage at their data center, only this time one of their backup generators failed in addition to their UPS units. This led to several more hours of downtime, and required them to rent a generator from an outside vendor (to the tune of $13,500 per week). Sears has since filed lawsuits against several of its equipment maintenance providers for lost sales as well as the cost to restore service at the data center.
The fact that so many different pieces of their equipment failed at virtually the same time (and at separate times in January) seems to indicate that the equipment either wasn’t tested regularly, or wasn’t tested adequately. Since a data center’s goal is to ensure a maximum level of uptime, regularly testing the backup power equipment such as the UPS units and backup generators is crucial.
2) Double Danger: Multiple switch failures slash connectivity to websites
In August of this year, a data center in Utah that houses web servers for several of the country’s largest web hosting companies experienced simultaneous failures on two of their facility’s core network switches. This led to several hours of downtime, and rendered the websites hosted with each provider (thousands of them) completely inaccessible. By the time service was completely restored, thousands of hosting customers complained that they definitely lost money due to their websites being down.
While the data center staff worked to restore service as quickly as they could, this huge outage reinforces the need for data center providers to take every measure necessary to not only test all of their equipment regularly, but to also create contingency plans that will ensure uptime, even in the event of a hardware failure.
3) Power outage drops data center into darkness
In early September, the state of New Jersey suffered an outage at its primary data center. Unlike the others though, this outage was not related to any physical equipment failure, but rather a power outage to the building itself. The power provider, Public Service Electric and Gas Company, had a temporary outage of power to the data center, and this resulted in several hours of downtime for state-run services, including all of its websites, and BMV services at each state license branch.
The real issue with this outage is that it appears the data center was completely dependent on power from the outside, with very little (if any) backup power in place. When there was an outage of power to the data center, the facility itself lost all power completely. Since the goal for any data center is ensuring 100% uptime, this is a HUGE issue that makes that goal virtually impossible to achieve, since the data center can’t control the company providing it with power. What the data center can control though, is working to ensure that there is an efficient level of backup power in place, should an outage occur. This should include redundancies like UPS power supplies, and backup generators (ideally not the same equipment that Sears was using though).
While these horror stories may not be scary in a Freddy Krueger kind of way, they embody the one thing that can be horrifying to data center operators and customers: downtime. Unlike most things that go bump in the night though, downtime is a risk that can either be avoided completely or mitigated, with the right measures, equipment, and policies in place.
Do you have your own technology “horror story” that you’d like to share? If so, post a comment below or Contact Us, we’d love to hear it!
Share this with your friends!