To Err is Human; to Forgive is Against Company Policy
In data center world, we often discuss redundancy and preventative maintenance. Typically the conversations regarding redundancy revolve around natural disasters, power outages, and other uncontrollable forces. We appear as the noble data center, which remains faultless at all turns.
But research demonstrates that redundancy often protects data centers and their customers from human errors. In fact, a study by the Uptime Institute shows that human error causes roughly 70% of data center issues. An outage at Hosting.com that lasted between eleven minutes and five hours was due to human error. According their CEO Art Zeile, “This was not a failure of any critical power system or backup power system and is entirely a result of human error.” We recognize that human error often leads to downtime. So we have taken many cautionary actions in order to mitigate these risks, including regular maintenance testing, documentation, and strong communication between team members and customers.
It’s important to remember that human errors occur in every industry—not only in data centers. In fact, consider these famous debacles in which human error led to a disaster.
- In 1979, a secondary feed water pump was manually disabled and valves closed during preventative maintenance at a nuclear plant. And thus, America was introduced to the Three Mile Island nuclear meltdown.
- Midwestern nurse accidentally threw out a kidney, which terminated the potential transplant.
- The US Defense Department reported that a flight and subsequent crash of the Marine Corps MV-22 Osprey plane this April was due to human error.
- In Lebanon, Ohio officials cite human error as the cause of a 911 outage that lasted 14 hours this past June. (Does this remind you of Verizon and Virginia this year?)
Human error is ubiquitous, so it is vital to be aware of your organization’s areas of weakness. Creating systems to protect your most crucial functions will help mitigate the risks to these functions. Similarly, effective communication often prevents these errors. Communicating and controlling your risks will help prevent your own Three Mile Island nuclear meltdown.