David Krider The Recent Amazon Outage

July 10, 2012 by · Leave a Comment 

Amazon’s Elastic Computing Cloud recently suffered an outage. Summarizing, Amazon’s explanation was that power to the facility failed, their generators failed to provide “stable” power, and energy within their UPS’s was exhausted before power could be restored. It only took about 20 minutes to fix the generators and get power back online, but as anyone who has supported large distributed systems knows, it’s not that easy. Getting services back online involves more than just restoring power, and made for serious interruptions for several popular web sites, such as Pinterest and Instagram. For instance, Netflix was down for three hours.

The Hacker News community discussed the event at length. There are a couple of takeaways that I would like to point out.

First, it’s not common for a data center — even a large data center — to have fully redundant power equipment. Data Cave does. We have two power feeds and two generators which all feed into electrical switch gear which can choose either utility power or generator power for either the “A” or “B” side. In addition, both “sides” are sized to run the entire demand alone, should the need arise, and the switchgear can feed both sides with any one of the 4 sources. Both sides then feed dedicated flywheel UPS systems, which, in turn, supply the A/B PDUs which then break down power by rack through breakers.

The main weakness in this setup is customer equipment. Most “enterprise”-level equipment has redundant power supplies, but they are not always connected. Also, it’s critical that the load gets split even across the A/B sides of power — and that no more than 50% of the breaker capacity is used on either side — so that if one fails, breakers aren’t tripped and power is lost when the load falls fully to the other side. At Data Cave, we help monitor this situation for our customers.

Second, it’s not common for data centers — even large ones — to have people on staff that really understand generators. Data Cave does. Our affiliate company, located right next door, is a world leader in high-end, large-displacement diesel engine testing. As such, we have, at our disposal, experts with both large diesel engines, and the large electrical motors they drive. If something goes wrong with either one, someone can fix it immediately. We don’t need to call a service company to come fix “their” equipment; these are not “black boxes” to us.

Data Cave was designed, from the ground up, to be a world-class data center, and the implementation of our systems would make a textbook of best practices. Our staff is filled with subject matter experts in everything relating to building and running a facility like this. If you’re in the market for a colocation or disaster recovery site, you owe it to yourself to come see the features of Data Cave for yourself.

Caleb Tennis Redundancy and Uptime

January 5, 2010 by · Leave a Comment 

Is your data center maximally redundant?

Read more

October Updates

October 30, 2009 by · Leave a Comment 

Here’s a roundup of some of the work that has been completed in October.
Read more

Additional UPS Capacity

September 15, 2009 by · 2 Comments 

A few weeks ago, we ordered additional flywheels for our UPS systems.

Read more

Flywheel UPSs

July 21, 2009 by · Leave a Comment 

In late June, the UPSs were set into place, wired up, and installed.

Read more

Power Distribution Units

July 14, 2009 by · Leave a Comment 

Last week, two power distribution units (PDU) were installed in the data center.

Read more

Electrical Switch Gear

April 29, 2009 by · Leave a Comment 

Last week, we moved the electrical switch gear for the first quadrant of the building into place.

Read more

Block and the Office Area

April 2, 2009 by · Leave a Comment 

Block work continues on the east and west interior customer hallway walls.

Read more

Electrical Service

March 5, 2009 by · 2 Comments 

Over the past few days, we’ve been working to bring the electrical service for Data Cave online.

Read more

Electrical Gear

January 30, 2009 by · Leave a Comment 

The electrical gear for quadrant one arrived today.
Read more