Load Balancer Update

A while back, I posted about some testing we were doing of various software load balancers for WordPress.com.  We chose to use Pound and have been using it past 2-ish years.  We started to run into some issues, however, so we starting looking elsewhere.  Some of these problems were:

  • Lack of true configuration reload support made managing our 20+ load balancers cumbersome.  We had a solution (hack) in place, but it was getting to be a pain.
  • When something would break on the backend and cause 20-50k connections to pile up, the thread creation would cause huge load spikes and sometimes render the servers useless.
  • As we started to push 700-1000 requests per second per load balancer, it seemed things started to slow down.  Hard to get quantitative data on this because page load times are dependent on so many things.

So…  A couple weeks ago we finished converting all our load balancers to Nginx.  We have been using Nginx for Gravatar for a few months and have been impressed by its performance, so moving WordPress.com over was the obvious next step.  Here is a graph that shows CPU usage before and after the switch.  Pretty impressive!

Before choosing nginx, we looked at HAProxy, Perlbal, and LVS. Here are some of the reasons we chose Nginx:

  • Easy and flexible configuration (true config “reload” support has made my life easier)
  • Can also be used as a web server, which allows us to simplify our software stack (we are not using nginx as a web server currently, but may switch at some point).
  • Only software we tested which could handle 8000 (live traffic, not benchmark) requests/second on a single server
We are currently using Nginx 0.6.29 with the upstream hash module which gives us the static hashing we need to proxy to varnish.  We are regularly serving about 8-9k requests/second  and about 1.2Gbit/sec through a few Nginx instances and have plenty of room to grow!

Author: Barry

To be written by someone famous...

126 thoughts on “Load Balancer Update”

  1. Barry –

    One question: How are you guys doing failover with the nginx box? Nginx has been on my list of things to look at, but so far no time in the R&D bank.

    Mike

  2. Thanks for the rundown Barry. We’re about to go live with nginx in a similar role, it’s a really nice piece of software. We’ve got it behind ipvs / keepalived to handle simple layer 4 load balancing and failover, the combination works well.

    Have you seen any issues with ssl or ssl+gzip? This seems to be an area where 0.5 and 0.6 have both had a few bugs recently — and something that seems not too easy to exercise without real traffic. Thanks!

  3. From everything I’m reading, there’s not many reasons *not* to switch to nginx. I’m building my network with it starting out, so I can use its various capabilities in the future. What kind of load balancing does it do? It has built in round-robin, with a weight measurement, right? It doesn’t have anything to check the upstream servers’ health as far as I know. I’m esp interested in the static gzip module and passing things off to Varnish – can you explain more how those tie together? I assume Varnish is upstream from the nginx load balancer?

    Thanks

  4. What’s the max *safe* amount of traffic a single Nginx instance is load balancing for you and how much memory and cpu are being utilized?

    🙂

  5. We have tested it up to about 10k req/sec. Memory footprint is minimal, and Nginx doesn’t use much CPU time. Where you end up with problems is in the TCP overhead and the time spent handling software interrupts. It gets much worse with iptables and connection tracking. Performance here is probably better on FreeBSD than Linux (we run Linux), but I haven’t tested it.

  6. Hi. Nice article – thanks. Not sure if I understand where varnish fits in. Does it work like this?

    Internet Client –> Varnish(s) –> NGINX(s) –> Webservers

  7. Our setup is:

    Client –> Nginx –> (Varnish|Webserver) –> [Webserver]

    Depending on the request type some requests are then passed to Varnish and others are sent directly to the web servers. We currently use Varnish only to serve on static images and video content (reverse caching proxy to Amazon’s S3).

  8. Barry,

    This is a follow up to Mike’s question on 4/28 about the failover configuration of nginx. We are specifically interested in understanding if and how nginx can be configured for a traditional active/active failover pair. We want to know if nginx supports state sharing between the failover pair so as to maintain continuation of service for such features as server affinity.
    Any light and/or guidance you can share is greatly appreciated.

    Matthew

  9. Hi, you said ..

    “Only software we tested which could handle 8000… ”

    so you mean LVS kind of kernel level load balancing is even slower than ngnix?

  10. Wackamole would be a decent replacement for heartbeat for managing IPs it sounds like.

    So in theory you could use Wackamole+nginx for Active/Passive(+more) nginx instances and Wackamole would handle all the IP switching and skip using LVS/ldirectord|keepalived/heartbeat, right?

  11. Barry – I appreciate your answers, having real-world examples of nginx and varnish give us the answers we need – this is a great resource.

  12. Pingback: Nginx Hacking Tips
  13. hi,
    we plan a website with around 10000-50000 online users.
    we plan to use nginx as a loadbalancer and will have the webservers within an internal ip-network.
    my question is: if the nginx LB has to route+NAT all the users to the internal webservers, how much load will that make on the nginx server? Is it possible?
    Thank u very much for your help!

  14. Hi,
    Really interesting post. I also like to know like Matthew Porter +/- against HAProxy. HAProxy as i know also supports hot-reconfiguration and can take pretty heavy load.

    And another thing thats really interesting is how many servers you need to server that kind of traffic. Especially how many you need as proxy servers?

    Thanks for sharing!!

  15. @Mathias – reading this document, looks like nginx is only able to do simple round-robin.
    .
    Nothing fancy yet such as like intelligent request queueing with HAproxy.
    .
    Also I noticed that HAproxy can handle 34000+ connections per second, as shown in this page. This is well beyond WP.com’s 10000 conn/sec.
    .
    Could the lack of performance of HAproxy in this post be explained by the request queue bug? Which has since been fixed after the publication of this post. The test on the new version of HAproxy shows it beating nginx, cpu-load wise.
    No bench on connection/sec though, so it may be completely irrelevant, but still, it might be of interest.
    .
    Hopefully we’ll be able to find out even more on these great pieces of software.
    .
    Thanks.

  16. I cant believe your comment that nginx was the only solution that could reach 8000 cons/sec. HaProxy (latest) I’ve had doing full cookie inserts at 27,000 cons/sec. A graph here compares connections/sec on the Kemp 1500 and Loadbalancer.org R16 which are both based on LVS here http://www.loadbalancer.org/whyr16.html (we also use Pound & Haproxy). Blatant commercial link but still relevant.

  17. I’m just learning about load balancing WordPress. I was wondering how the load balancing deal with mysql and how data would replicate between the different servers.

    The idea is to have 2 data centers, each data center would have 1 load balancers, 2 web servers and 1 mysql server.

    Scott.

  18. Barry,

    Thanks, though I found this post a bit late, it saved my job. We have decided to port our latest word press news site to nginx. We are already getting 10K hits per day, and expect around 50K once new features and channels are added..

  19. I’m definately going to download it now.. I’ve been looking for a small load-balancing solution myself for along time, I just with it was’t all russian documentation…

  20. Also give crossroads a try (crossroads.e-tunity.com). It has a very small footprint but still a lot of powerful features like access control and dos prevention.

  21. Pingback: Nginx « clragon
  22. Great post!
    You indicated that you might want to use nginx for content serving, too. What has come of it, and why/why not, please?

    1. So how does nginx handling massive 1000PPS+ DDOS attacks? Especially the http ones. In that case you would put a filtering device before it which stops the “bad” packets but im curious by itself how does it deal with it.

  23. Interesting write-up. I understand that you uses quite a few Nginx LBs, and even more backends. How do you spread the load across the Nginx instances?

      1. I know this item is very old, but I have a question.
        I have 2 NginX load balancers which DNS spreads the load between them, but if one of load balancer servers stops, what will happen? half of users will get 404?

        1. nope, half of users will get an error page from their browser telling that the connection is not possible, because your nginx does not answer anymore.
          that’s the problem with DNS…
          In this case you should set a very low TTL on your DNS records, in order to switch quickly if neededd 😉

  24. Pingback: Hey world! | heyWP
  25. Thank barry for sharing how you handle clients of wordpress.com. I am also an nginx user but I haven’t tried using varnish. I’ll try to use it in some of my web applications. More power!

  26. Vor diesen Hintergründen ist es tatsächlich nachvollziehbar, warum Informationsmagazine oder Nachrichten absolut die professionellen Schädlingsbekämpfer favorisieren.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s