WordPress.com DDoS Details

As you may have heard, on March 3rd and into the 4th, 2011, WordPress.com was targeted by a rather large Distributed Denial of Service Attack. I am part of the systems and infrastructure team at Automattic and it is our team’s responsibility to a) mitigate the attack, b) communicate status updates and details of the attack, and c) figure out how to better protect ourselves in the future.  We are still working on the third part, but I wanted to share some details here.

One of our hosting partners, Peer1, provided us these InMon graphs to help illustrate the timeline. What we saw was not one single attack, but 6 separate attacks beginning at 2:10AM PST on March 3rd. All of these attacks were directed at a single site hosted on WordPress.com’s servers. The first graph shows the size of the attack in bits per second (bandwidth), and the second graph shows packets per second. The different colors represent source IP ranges.

The first 5 attacks caused minimal disruption to our infrastructure because they were smaller in size and shorter in duration. The largest attack began at 9:20AM PST and was mostly blocked by 10:20AM PST. The attacks were TCP floods directed at port 80 of our load balancers. These types of attacks try to fill the network links and overwhelm network routers, switches, and servers  with “junk” packets which prevents legitimate requests from getting through.

The last TCP flood (the largest one on the graph) saturated the links of some of our providers and overwhelmed the core network routers in one of our data centers. In order to block the attack effectively, we had to work directly with our hosting partners and their Tier 1 bandwidth providers to filter the attacks upstream. This process took an hour or two.

Once the last attack was mitigated at around 10:20AM PST, we saw a lull in activity.  On March 4th around 3AM PST, the attackers switched tactics. Rather than a TCP flood, they switched to a HTTP resource consumption attack.  Enlisting a bot-net consisting of thousands of compromised PCs, they made many thousands of simultaneous HTTP requests in an attempt to overwhelm our servers.  The source IPs were completely different than the previous attacks, but mostly still from China.  Fortunately for us, the WordPress.com grid harnesses over 3,600 CPU cores in our web tier alone, so we were able to quickly mitigate this attack and identify the target.

We see denial of service attacks every day on WordPress.com and 99.9% of them have no user impact. This type of attack made it difficult to initially determine the target since the incoming DDoS traffic did not have any identifying information contained in the packets.  WordPress.com hosts over 18 million sites, so finding the needle in the haystack is a challenge. This attack was large, in the 4-6Gbit range, but not the largest we have seen.  For example, in 2008, we experienced a DDoS in the 8Gbit/sec range.

While it is true that some attacks are politically motivated, contrary to our initial suspicions, we have no reason to believe this one was.  We are big proponents of free speech and aim to provide a platform that supports that freedom. We even have dedicated infrastructure for sites under active attack.  Some of these attacks last for months, but this allows us to keep these sites online and not put our other users at risk.

We also don’t put all of our eggs in one basket.  WordPress.com alone has 24 load balancers in 3 different data centers that serve production traffic. These load balancers are deployed across different network segments and different IP ranges.  As a result, some sites were only affected for a couple minutes (when our provider’s core network infrastructure failed) throughout the duration of these attacks.  We are working on ways to improve this segmentation even more.

If you have any questions, feel free to leave them in the comments and I will try to answer them.

New Datacenter for WordPress.com

Towards the end of 2008, we brought online a new datacenter to serve the over 5.5 million blogs now hosted on the WordPress.com platform.  Adding the data center in Chicago, IL gives us a total of 3 data centers across the US which serve live content at any given time.  We have decommissioned one of our facilities in the Dallas, TX area.  Our friends at Layered Technologies were kind enough to shoot this footage for us (think The Blair Witch Project) and the always awesome Michael Pick took care of the editing.  Here’s a peak at what a typical WordPress data center installation looks like…

For those interested in technical details here is a hardware overview of the installation:

150 HP DL165s dual quad-core AMD 2354 processors 2GB-4GB RAM
50 HP DL365s dual dual-core AMD 2218 processors 4GB-16GB RAM
5 HP DL185s dual quad-core AMD 2354 processors 4GB RAM

And here is a graph of what the current CPU usage looks like across about 700 CPU cores.  As you can see there is plenty of idle CPU for those big spikes or in case one of the other 2 data centers fail and we have to route more traffic to this one.

cpuusage-chicago

Anatomy of a Denial of Service Attack

Running one of the largest websites on the internet with about 5 million unique sites hosted exposes you to all sorts of issues.  There are constant events to deal with, some internal, some external.  This morning, one of the more common external events, a Distributed Denial of Service Attack occurred.  We experience these types of attacks rather frequently, but most are easily mitigated and have no user impact.  One this morning, however, was rather large and thus impacted some users.

Here is a timeline and description of this morning’s events:

9:40 AM EST — Our internal monitoring systems alerted us to unusual activity in one of the four geographically diverse datacenters which serve WordPress.com traffic.  Here is what that anomaly looks like in graphical terms:

10:00 AM EST — The target of the attack was identified and removed from our network.  The attack, however continued.  This is because the attacker had hijacked tens of thousands of computers (probably by installing a virus which was spread via email) and these computers had no idea the site was no longer there.  A small log sample shows over 8 million requests for this one site from over 10,000 unique IP addresses.

10:20 AM EST — Since we have servers in multiple data centers throughout the United States which serve traffic for WordPress.com all the time, we were able to route all legitimate traffic out of the affected data center, and let the single affected data center deal with the attack.   

11:30 AM EST — The IPs targeted in the attack were null routed at this point which allowed us to bring all datacenters back online to serve normal traffic.

We keep hourly traffic metrics and based on those numbers, it looks like during the attack there was about a 5% decrease in overall pageviews during the 40 minutes before traffic was re-routed.  All things considered, not a bad outcome for an attack this size.  Looking at bandwidth graphs, this attack was in the 500Mbit – 750Mbit/sec range.  

One milllllliion blogs on WordPress.com

picture-6.pngAt 10:56:22PM PDT on 5/23/2007, the 1 millionth active blog was registered on WordPress.com. And the winner is…..

claudiacanals.wordpress.com

Not much there right now, but hopefully there will be soon. Maybe head over and leave a comment on their about page to let them know!

Predictions on how long it will take to get to 2 million?