Sunday 1 March 2015

Why should you expect failures in your datacenter?

Not only should you expect failures but you should expect FREQUENT failures.

If you can estimate that every server in your data centre will fail once every ten years then that sounds pretty good right?

Failure rate = Once / 10 years = once / 120 months

But... if you have 120 servers then that will mean you should expect a failure every month!

According to this article, in 2010 Facebook was running at least 60,000 servers across its data centres. 

If these each of these 60,000 servers is expected to fail once every 10 years, then at that time Facebook would have expected a server failure about every hour and a half (120 months / 60000 ~= 1.46  hr)

Eeeek. 





1 comment:

Scala with Cats: Answers to revision questions

I'm studying the 'Scala with Cats' book. I want the information to stick so I am applying a technique from 'Ultralearning&#...