Redundancy and power outages

Scott Beale reports that many Web 2.0 websites were affected by today’s power outage at 365 Main in San Francisco. While unfortunate, as a systems guy I have to assume things like this are going to happen. They shouldn’t happen, but they can and they will. At the data center level, there should be multiple levels of redundancy that minimize the probability of a power outage. Things such as multiple power circuits, redundant UPSes, and generators are standard. For a complete power outage to occur there should have to be multiple simultaneous system failures. I looked for a statement from 365 Main as to what the problem was, but couldn’t find one.

The system architecture behind WordPress.com and Akismet is designed to take entire data center failures into account. For WordPress.com, we serve live content in real-time from 3 data centers (33% from each data center) and in the event of a data center failure, traffic is automatically re-routed to the 2 remaining data centers. Syncing content in real-time between multiple data centers has not been easy, but at times like this I am sure that we made the right decision.

Author: Barry

To be written by someone famous...

21 thoughts on “Redundancy and power outages”

  1. I just “outsourced” Cernio’s blog to Automattic last night. TypePad was one of my alternatives, but I ruled them out for various reasons. Today, I see that I chose wisely. 🙂

    (Of course, our San Francisco facility stayed online, but our utility power wasn’t affected by the explosion so our datacenter wasn’t really put to the test today. It just looked good by comparison.)

    Graham

  2. I read 365 Main’s explanation of what happened, and they imply that in the event of a power outage, they switch instantly to back-up generator power. This is, of course, not possible unless the generator is running, and even then it’s a problem. They have banks of batteries, I have to assume.

    My colo here in Seattle, digital forest, has two redundant and independent battery backups that can run the facility until the diesel (the size of a semi truck) fires up.

    I have nearly 600 days of uptime on one of my colo servers, and that’s only because i had to replace a drive 600 days ago. It had 200 days before that, with downtime only due to a facility move that digital forest made to a swank new location.

    (I have no financial interest in digital forest. I just dig ’em.)

  3. 365 main’s power and colo power

    The power redundancy at 365 main is based on four or six fly weal UPS systems directly connected in to a generator. This set up is really quite nice. It is not clear what happpend and I look forward to finding out. The fact is that not all of the colo went down. I have a computer located there and I experienced 0 down time though this.

    I think the more gennreal problem for colo centers is that power requirements are going up; more CPU density and hotter CPUs. They may not be engineered to handle the load.

    Making a data center stay up is also hard. This is not the first time 365 main went down. ( The last time was due to a failures of a fire sensor that caused a automated shut down of the power ) I have also been collocated in other facilities that have had full, gennreator failures, or partial, power distribution line failure, outages. This stuff is hard. It may be that 365 main is not as good as others, still to be seen, but expecting 100% up time in one data center is a bad gamble.

  4. The outages were a problem for alot of people/ Especially if you use typepad and a few other places. It didn’t effect me any though. Thanks goodness. Thinking about getting a dedicated server myself.

  5. i feel without multiple simultaneous system failures – it would not happen.. protective measures are taken so that it never happens – it may happens once in a blue moon — i feel its rare…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s