Barry on WordPress

Author: Barry

Congratulations Erin!

Today, my sister, Erin, graduated from college. We thought the day would never come 😉 Congratulations, Erin, we are all proud of you.

December 14, 2007
Moving to NYC

San Francisco is a great city and I have thoroughly enjoyed my time here. Time for something different, however. I am moving to New York City in January. I will be leaving San Francisco on Thursday afternoon and spending some time traveling between now and January 26th, when I head to NYC. Good bye SF, it was fun.

December 12, 2007
Road trip to Texas

Tomorrow morning I am leaving for Texas. When I drove from Texas to San Francisco in January of 2007 I said to myself I would never do it again, but in today’s world I guess never < 2 years. Going to take a bit of scenic route this time. Here is the map. It breaks down as follows:

Day 1: San Francisco, CA to Pismo Beach, CA (via Hwy 1)

Day 2: Pismo Beach, CA to Grand Canyon, AZ

Day 3: Grand Canyon, AZ to Albuquerque, NM

Day 4: Albuquerque, NM to Lubbock, TX

Day 5: Lubbock, TX to Houston, TX

Day 6: Houston, TX to San Antonio, TX

Total trip is 2383 miles. Will try to post some pictures along the way.

November 16, 2007
Static hostname hashing in Pound

WordPress.com just surpassed her 300th server today. How do we distribute requests to all those servers? We use Pound of course. For those of you not familiar with Pound, it is an open source software load balancer that is easy to setup and maintain, flexible, and fast!

In general, we do not stick individual sessions to particular backend servers because WordPress uses HTTP cookies to keep track of users and is therefore not dependent on server sessions. Any web server can process any request in any given point of time and the correct data will be returned. This is important since serve traffic in real time across three data centers.

There is one exception to this rule, however, and it has to do with the way we serve images. As Demitrious explained in his detailed post, when a request for an image is made, pound sends the request to a cache server running Varnish. How does it decide which server to send the request to? Well, it looks at the hostname of the request, hashes it, and then assigns that to a particular cache server. By default Pound supports sessions based on any HTTP header, so we could easily use the hostname as the determining factor, but the mapping is not static. In other words, when we restart pound, all the hostname assignments would be reset and we would effectively invalidate a large portion of our cache.

To circumvent this problem, please see the following patch. What the patch does is statically hash hostnames so a given hostname is sent to the same server all the time, even across restarts. If the backend server happens to go down, the requests will be sent to another server in the pool until the server is back up, at which point the requests will be sent to the original server. This allows us to restart pound without invalidating our image cache. We have been using this in production for a couple months now and everything is working great. The patch is written against Pound 2.3.2 and to use the static mapping you would add the following to the end of the Service directive in your Pound configuration file:
Session Type hostname End

One thing to keep in mind is that if you add or remove servers from the Service definition, you will change the mapping, so I would recommend adding a few more backend directives than you need right away to allow for future growth without complete cache invalidation. For example, we currently have 4 caching servers, but 16 BackEnds listed (4 instances of each server). This will allow us to add more cache servers and only invalidate a small portion of the cache each time.

Of course this works for us because each blog has a unique hostname from which images are served (mine is barry.files.wordpress.com). If all of your traffic is served from a single domain name, this strategy won’t do you much good.

November 1, 2007
Making Gravatar fast again
As Matt blogged, Automattic recently purchased Gravatar. The first thing we did was move the service onto the WordPress.com infrastructure. Since the application is very different from WordPress.com what this really means is using what we have learned from scaling WordPress.com to increase both speed and reliability of the service, as well as leveraging our existing hardware and network infrastructure to stabilize the service. The current infrastructure is laid out as follows:
- 2 application servers (in 2 different data centers for redundancy). One of these servers primarily handles the main Gravatar website which is Ruby on Rails while the other serves the images themselves. If either of these servers or data centers were to fail, we could easily switch things around to work around the outage.
- 2 cache servers (1 in each datacenter). These servers are running Varnish. They cache requested images for a period of 10 minutes, so frequently requested images are not repeatedly requested from the application servers. We are seeing about a 65% cache hit rate and about 1000 requests/second at peak times, although as adoption of the service increases, we expect this number to go up significantly. A single server running Varnish can serve many thousands of requests/sec. The amount of data we are caching is small enough to fit in RAM, so disk I/O is not currently an issue.
On the hardware side, for those of you who are curious, we are using HP DL365s for the application servers, and HP DL145s for the caching servers. 4GB of RAM and 2 x AMD Opteron 2218s all around. The application servers have 4 x 73GB 15k SAS drives in a RAID 5, while the caching servers are just single 80GB SATA drives. We use the same hardware configurations extensively for WordPress.com and they work well.

Previously, the service was using Apache2 + Mongrel to serve the main site and lighttpd + mod_magnet to serve the images. We decided to simplify this and we are currently using lighttpd to serve everything and it is working well for the most part. We seem to have a memory usage issue with lighttpd, which may be related to this long-standing bug. For now, we are just monitoring memory usage of the application with monit, and restarting the service before memory usage gets too high.
October 26, 2007