Congratulations! You’ve been given the task of researching and finding a data center for your company’s IT equipment. Where do you even start?
Many of the people I talk to feel like Goldilocks. Don’t remember the story? Goldilocks breaks into the Bears’ house and tries different beds, chairs, and porridge. Two of the three were too… something. Hard or soft. Big or small. Hot or cold. She struggled until she found the bed (or chair or porridge) that was just right.
Location makes many IT decision makers feel like Goldilocks. This data center is too close, and my equipment is at risk. This data center is too far, and it will be tough to maintain my equipment. What is the location that is just right?
When making a location decision, ask yourself the following questions. Your answers will help you select an appropriate location and to determine your distance threshold.
- Will the equipment in the data center be focused on production or disaster recovery?
- Does your equipment require heavy management?
If your equipment is for disaster recovery, choose a data center at least 50 miles from your production site. I talk to many CIOs, network administrators, and IT professionals who struggle with this. It’s tough to imagine your babies (your equipment) so far from you and your attentive care, but I urge you not to be what some call a “server hugger.” If you need disaster recovery, it’s best to have geographic redundancy. By nature, disaster recovery is intended to protect you should your first set of equipment were to meet with unforeseen circumstances. If your data center is too close, your equipment will be at risk, and you’ll have defeated the purpose of having a disaster recovery site.
If your equipment is for production, choose a data center that is accessible for regular maintenance and meets your quality standards. For production servers, location is a less important criteria. It is more important to focus on choosing the highest quality data center that meets your needs.
If your equipment requires heavy management, you may believe that a close location is just right. But with options like remote hands and managed service providers, companies can reap the benefits of geographic redundancy for their high maintenance equipment. Using additional support for server maintenance allows your organization flexibility and the option to focus on other high priority items.
Selecting a data center location is not an easy task. Hopefully, after asking yourself these questions, you’ll have selected the geographic location that is just right for your data center.
Still looking for more guidance on how to choose a data center? Check out the following resources, or feel free to contact me at firstname.lastname@example.org.
- 10 Things to Consider When Choosing a Data Center
- 7 Critical Things to Look for When Touring a Data Center
- Top 10 Tips for Disaster Recovery
- Why Columbus, Indiana is a Great Data Center Location
Sunday marked one of the most important days of the year (for us, anyways). March 31, 2013 was World Backup Day 2013. This campaign was recently founded to remind computer users around the globe about the importance of backing up data. What would you do if you lost everything on your computer tomorrow? What would your business do if it were to suffer a natural disaster or power failure?
Did you know…
- More than 60 million computers will fail worldwide in 2013.
- Companies that aren’t able to resume their operations within 10 days after a disaster are not likely to survive.
- 90% of small companies spend less than 8 hours planning/managing their continuity plans.
- Between 60-70% of problems that hurt business are due to internal malfunctions of hardware or software.
- 80% of businesses that suffer a major disaster go out of business within one year.
- Over 50% of businesses experienced an unforeseen interruption. The majority of the interruptions caused the business to be closed one or more days.
- Only 1 in 4 people backup their information regularly.
- 113 cell phones are lost or stolen every minute in the U.S. alone.
Companies can choose from several options, when evaluating backup options. One option is to use comprehensive offsite backup services. These services are designed to run continuously in the background of your computer or server and provide your company with real-time data replication to a secure server within the data center. Another option to consider is colocation, which houses your IT infrastructure at a data center to maximize reliability and uptime. Colocation is maintained at Data Cave, our fully redundant Indiana data center, with on-site technicians who can manage any of your unforeseen crises.
In the spirit of World Backup Day 2013, we have put together some questions for you to consider while examining your own backup routine.
- Are you backing up every database that is important to you?
- Do you double check that your backups are working? Check your backed up data periodically to ensure the backup is complete and successful.
- Do you have multiple copies of your data? If you backup your data (photos, files, etc) and then remove them from your primary computer, you may want to consider redundant backups.
We challenge you to take the pledge to back up your files in celebration of World Backup Day.
We recently posted a new whitepaper to our website with tips on moving IT equipment. Be sure to check out this great resource with insightful guidelines on determining critical requirements, developing a detailed data center move and relocation project plan, creating an exhaustive inventory list, making final preparations and communicating readiness with your new data center. There are also several IT moving companies that can assist with your move to a new data center. We can even give you some company names if you’re in the Indianapolis or Louisville area.
Data Center Relocation 101
As I’ve said before, what I think makes us different from other data centers is our team. If someone is working at a “job” and they clock in and clock out and think nothing more of the industry we’re in other than the 8 hours a day, 5 days a week that they’re at work, then I don’t really want them handling my crucial data. We’re not like that. Let’s see why…
Let’s get to know David, our Director of Network Operations, a little better. David spent many years in the trenches of a fortune 200 company, handling large amounts of data, and strained-to-the-max networks. This helped him get his hands filthy with networking and computer knowledge, which built on his already robust knowledge of programming and love for everything tech.
Like many kids in 1978, David begged his father for an Atari 2600. He resisted his efforts, preferring to wait until he could afford a home computer, until he finally bought a Commodore VIC-20. David grew bored with it, and wanted to run better games, so he asked his father to “upgrade” to the Commodore 64. He told him that if he could write a program that ran the VIC out of memory, he would. David undertook writing a Dungeons & Dragons character generator, and filled up the 4.5 KB of usable RAM on the VIC within months. (It took 20 minutes to load the program from the cassette tape drive!) Good to his word, David’s father bought a C64. And, of course, David became bored with the tedious nature of the character-generation program, and never finished it.
So two things were immediately evident to him; he loved gaming, and he had a knack for programming. His love for both blossomed from there and he began to follow that path. He went to Purdue University to pursue a mechanical engineering degree. Purdue at the time had a Unix lab and David would spend hours working and learning languages such as Fortran, Pascal, and C. There were several PDP-11‘s for the engineering department to use and David found an administrator there that he was able to shadow and learn more about programming.
David did a brief stint out of college working for a company doing welding. But shortly after found his place in the M.E. world. There is a bit of checkered-ness in his past though. David thought that Windows’ NT technology was going to be all the rage and completely wipe the competition for years to come. We’ll forgive him for that. Especially for the reason that since then, he has been a Linux evangelist and done much work with RedHat, SuSE, Gentoo, and Ubuntu.
He has a starship’s fleet worth of computers at home (pun intended) and has recently signed up for the Steam on Linux beta. You can also find him (maybe a little too often :)) on Battlefield 3 for XBox (gasp! Still PC derivative – it counts!) under his gamer tag TheRealDunkirk.
The guy lives and breathes technology, and that’s part of what makes the team at Data Cave so awesome and unique in the data center culture; we LOVE this stuff!
Don’t forget to check out the blog on Caleb.
Did you know that McDonald’s feeds more than 46 million people every day? That’s more than the population of Spain! Additionally, McDonald’s represents 43% of the United States fast food market. One would think that a company like McDonald’s would practice appropriate server maintenance. We were horrified when a friend of Data Cave sent us this picture they snapped through the window of a local McDonald’s drive through.
So let’s play a game. What’s wrong with this picture?
1. Kitchens and Technology are a Recipe for Disaster
This McDonald’s chose to locate their servers near the kitchen. It doesn’t take a data center expert to note that this is not an effective strategy. Consider your personal cell phone, for example. SquareTrade conducted research that stated that 21% of all iPhone accidents occur in the kitchen. An iPhone is a critical device for many, but most of the vital information is backed up using iCloud. And it isn’t cheap to replace an iPhone, but the price is not nearly as prohibitive as purchasing and implementing a new server. Being near to food and drink can only result in terrible technology tragedies.
2. Exposure to the Elements
Not only did this McDonald’s choose to place their servers near the kitchen, they exposed them to the elements because they were in the drive thru room. It is estimated that an average McDonald’s serves 1,584 customers daily. If half those customers came through the drive thru and the window is open for an average of 10 seconds per customer, those servers were exposed to outside conditions two hours and twelve minutes each day. This takes the idea of an uncontrolled environment to the extreme.
3. Crossed Wires
While the appearance of messy wires isn’t aesthetically pleasing, it is also dangerous. Tangled wires pose fire threats (and we are willing to bet that McDonald’s didn’t employ a fire suppression system exclusively for its servers). Due to this cabling, it doesn’t even appear as if they can shut the door (see #4). In fact, this picture below details the challenges of having messy wires.
4. An Open Door Policy
Open door policies are great for dealing with employees, but they are less than optimal when it comes to technology. Having an open door to their servers poses many security risks. Damage could be done, both intentionally and unintentionally. McDonald’s has employed one in every eight American workers. That is indicative of a high employee turnover. A disgruntled employee could easily wreck havoc on McDonald’s because the technology is so readily available. Additionally, accidents happen. By having an open door, the chance of accidents increases.
5. The Data Closet
Finally, it goes without saying that we encourage all organizations to protect their valuable technology (especially offsite). McDonald’s has their main data center in Dallas but their restaurants obviously still needs local equipment. There are so many risks that come with housing an internal data center, especially one in a closet with no ventilation or cooling. If you want cost savings and increased protection, it only makes sense to outsource your data center.
McDonald’s, we urge you to clean up your technology act! It is inevitable that something will happen, and you will suffer!
If you are a fan of disaster films all you have to do is turn on the national news and grab a bowl of popcorn! The Southwest has hundreds of square miles on fire in Colorado, Utah, Arizona, and California; the Midwest is experiencing the worst drought in decades, all time record temperatures, violent storms, and tornados; and the Southeast is undergoing major flooding, all combining to leave thousands of business and private homes without power for extended periods of time. Wow!
Luckily, some of the fires have been contained and while not enough, we’ve finally seen some rain. But, all that disaster coverage got me thinking about how I would personally deal with a disaster. Now, keep in mind that just about anything that can interrupt power or operations is categorized as a disaster, but I am not referring to a blown transformer that takes out power for six hours. I am talking about the kind of disaster that the recent June 29th storms brought to the Northeast US. High winds, hail, rain, trees down, roads blocked, widespread power outages, and extremely limited travel due to blocked roads. People were without power for up to seven days in some areas. Seven days in 100°F+ temperatures. Again, Wow!
If a massive storm hit Indiana, I have the enormous advantage of living only two miles away from where I work at Data Cave. A typical data center has enough fuel on site to maintain operations for two days or so, and then is dependent on fuel deliveries, which can be a very dicey proposition in disaster areas with flood waters, downed trees and restricted travel. Not to mention hospitals and other such locations take priority delivery over a data center. Data Cave, on the other hand, keeps sufficient fuel on site to maintain operations for about three weeks, a full 21 days before needing fuel delivery.
I know where I will be heading when the storm hits, and it’s a comforting thought.
We all know that backups are important. But beyond just having backups, having a comprehensive validation and restoration strategy is paramount as well.
Case in point, Toy Story 2, which was accidentally deleted through an errant command by an administrator. But beyond that, the admins found that the backups were bad. Think about that for a second – they go to their backups, and find out they’re worthless. Nobody thought to check that stuff ahead of time. Only when they really needed them did they realize they weren’t going to work.
Luckily, someone had archived a backup copy on a computer at home, which ultimately kept the movie from going away forever. But the story is clear, the Popeil method of “set it and forget it” related to computer data backups isn’t sufficient.
Periodic restoration and checking of backups is essential to make sure you know you’ll have what you need when you need it.
Data Cave is a privately owned and operated fully redundant Midwest data center located in Columbus, Indiana convenient to Indianapolis, Louisville and Cincinnati. Please contact us for more information at 866-514-2283.
We at Data Cave think a large portion of what sets us apart from other data centers is our people. We truly love what we do, which we’ve talked about before. We thought it would be nice for you to get to know us a little better.
This is the first in a series of blogs that will let you into our lives a little more so you can see what we’ve worked on in the past, what we’re doing right now, and what we’re focused on for the future.
Let’s start at the top with our president, Caleb Tennis. Caleb has been interested in computers and everything surrounding them for almost his entire life. When he was only three, Caleb’s family got a Tandy TRS-80 and he fell “in like” (because who loves a Tandy, really?) At five years young, Caleb was hacking with BASIC on MS-DOS and by 10 he had written his first commercial program that was sold to the online service, Delphi.
Throughout his life, Caleb has had a huge interest in electronics. It fascinated him and became one of his passions. During high school, he took four years of electronics classes and also did electronics repair work along with computer and network support for his high school. At the time, the school didn’t have resources to hire someone for the school corporation who could, or would do that sort of thing. Caleb fit the job perfectly.
He went on to study computer science at the Rose-Hulman Institute of Technology, but quickly changed to an Electrical Engineering major when he decided it was quite a bit more interesting to him. He personally felt he already knew a lot about all computer related things, but lacked a lot of understanding of the deeper fundamentals of engineering. Ever the one for a challenge and a knowledge seeker, Caleb immersed himself in the major.
After graduating from college, Caleb ended up landing a full-time position at Analytical Engineering, Inc. (AEI) for which he was an intern during his college career in 1999 and 2000. AEI is an engine testing facility here in Columbus, Indiana. Caleb began developing a complex network control system for running and analyzing diesel engine test performance data after realizing the company’s need for it. The program is still the core of the system they use today.
Caleb decided he wasn’t busy enough and while working full-time and teaching part-time, spent two and half years pursing his masters degree in Electrical Engineering at Purdue University.
Over the years, he has worked in a variety of roles including teaching Linux classes at Ivy Tech Community College, written two books, Rapid GUI Development with QTRuby ,in which you saw how to use the powerful Qt3 library to create cross-platform GUI applications for Linux and OS X in Ruby, and A Peek at Computer Electronics wherein “From basic electronics to advanced computer hardware, you’ll learn the magic behind the gear that makes it all run.” He was also a Gentoo developer who helped maintain hundreds of packages within the Gentoo portage tree – no easy task.
Caleb also keeps himself busy with speaking appearances and by hacking on various open source projects, which you can see and get involved with at github. He also has a passion for flying and received his pilot’s license in 2010. He has been flying his single engine around the counties of Indiana, which he blogs about. In 2012, he became a member of the local airport aviation board.
Caleb is married with a son and most recently welcomed a baby girl in April 2012.
Amazon Web Services had an outage last week that again got the media into a frenzy. There is some good analysis at Hacker News.
Rather than point fingers, I wanted to spend a minute writing about the particular failure mode that AWS experienced
Unfortunately, one of the breakers on this particular back-up power distribution circuit was incorrectly configured to open at too low a power threshold and opened when the load transferred to this circuit.
The overall gist here is that a primary generator stopped due to overheating, and when the secondary generator had to carry the load, the breaker tripped causing a total loss of power. This, in particular, is what I want to go to into more detail on.
From the time the power enters the data center facility until it gets to the server, it travels through a number of circuit breakers. Entering the facility it likely goes through a large main breaker, much larger than anything you’d have in your home, something like this:
From there it may be distributed to subpanels, with progressively smaller breakers, until it gets to say a standard 20 or 30 amp breaker that connects to your equipment rack. These breakers each have a main purpose, protect equipment and people. If too much electricity attempts to flow, the breaker is designed to trip. Plain and simple.
A big issue however comes with what’s known as “breaker coordination.” If you have a chain of circuit breakers, the idea is that you want the one closest to the “fault” to be the one that trips. If I accidentally overpower my computer rack, then we want the circuit breaker that is closest to my computer rack, the 20 or 30 amp breaker noted above, to trip. We most certainly DON’T want the main circuit breaker to the building to trip. In fact, it’s likely impossible it would, since it’s rated for much higher current capabilities than the 20 or 30 amp breaker in my cabinet so it would never, even though there was a problem.
However, these big circuit breakers have a number of custom settings that allow you to change the dynamics of their “trip curves.” That is, you can set values for instantaneous trip, long term current trip values, and slopes of amps versus time for tripping. Why do you want to do this? For two reasons: one, so you can precisely ensure that the larger breakers don’t trip before the smaller ones do, and two, to ensure your current trip values make sense. For example, if you have a 4000A breaker but are only going to be able to, at most, pull 3000A of current due to the way your systems are designed, then you can reduce the capacity of the breaker down to 3200A, so it trips at a more reasonable level. It’s the same reason you wouldn’t put a 100 Amp breaker on a 20 Amp wall outlet in your living room – you want the breaker size to be close to the actual electrical flow size as possible.
In addition, there is a programmble instantaneous trip value, and it’s usually set much higher than than the rating of the breaker, perhaps 2-3x higher. When transitioning between power systems (from grid power to generator power for example), as the current turns off momentarily and then back on again, there is a quick inrush of current that may very rapidly exceed the normal steady-state current value for a few moments. We need to make sure our breaker settings accounts for this and doesn’t prematurely trip the breaker.
In Amazon’s case, I don’t know exactly what happened, but for this example I will speculate. The timing from the RCA indicates that secondary generator power was provided at 8:53pm, and at 8:57pm, the breaker tripped, so we can likely conclude it was not an instantaneous trip otherwise it would have happened within moments.
My best guess is that Amazon’s normal running setup relies on multiple power feeds, and potentially multiple generators when the main power feeds go out. In the normal scenario, the load to the end servers is balanced through a number of feeds. In this scenario, with the failures, the number of power paths to the server was reduced, which means that the amount of current through the remaining path(s) increased. In this case, it increased enough to cause an incorrectly configured breaker to trip too aggressively.
Could this have been caught earlier? Absolutely. When new electrical distribution systems are added to the fold, a re-coordination of the entire system should be done. Also, more thorough testing methodologies probably could have caught the issue sooner too. However, it may have had the same impact in catching it by tripping it – which may be why Amazon doesn’t include it on their test schedule. In most colocation data centers, customers are notified during test windows in case something like this were to happen (and in general, is designed to catch), but since AWS doesn’t notify its end customers of its test windows, there’s no added customer value in testing some of these scenarios.
I think the biggest takeaway is that you need to ensure your data center provider has a solid grasp on what the testing schedule looks like, and the reasons why the tests are conducted. If that’s outsourced, or automated, it may be a sign that the operator doesn’t have the technical resources to understand the overall electrical system – and respond to issues quickly.
The number of Cisco and Microsoft certified guys on staff doesn’t matter at bit if nobody can troubleshoot why the power goes out.
That’s why we keep technically knowledgeable staff onsite to maintain and test our equipment. We don’t have to wait on someone to troubleshoot for us. We are data center people, this is what we do and why we’ve experienced 100% uptime.
Data Cave is a privately owned and operated Tier IV Midwest data center located in Columbus, Indiana convenient to Indianapolis, Louisville and Cincinnati. Please contact us for more information at 866-514-2283.