HyperDB Replication Lag Detection

Howdy – Iliya here again. Seems like I am taking over Barry’s blog. Hopefully this will motivate him to blog more.

On WordPress.com we have over 218 million tables and perform tens of thousands queries per second. To scale all of this, we shard our 24 million blogs across more than 550 MySQL servers. This allows us to cope with load bursts and to handle database servers failures.

For those who are unfamiliar, MySQL data replication is asynchronous and works as follows:

  1. [Master] Receives a query that modifies database structure or content (INSERT, UPDATE, ALTER etc.)
  2. [Master] The query is written to a log file (aka the binlog).
  3. [Master] The query is executed on the master.
  4. [Slaves] Create a “Slave I/O” thread that connects to the [Master] and requests all new queries from the mater’s binlog.
  5. [Master] Creates a “Binlog dump” thread for each connected slave, that reads the requested events from the binlog and sends them to the slave.
  6. [Slaves] Start a “Slave SQL” thread which reads queries from the log file written by the “Slave I/O” thread and executes them

There are a number of things to be considered in this scenario, which can lead to a condition known as replication lag where the slaves have older data then the master:

  • Since only one thread on the slave executes write queries, and there are many execution threads on the master, there is no guarantee that the slave will be able to execute queries with the same speed as the master.
  • Long running SELECTs or explicit locks on the slave, will cause the “Slave SQL” thread to wait, thus slowing it down.
  • Long running queries on the master would take at least the same amount of time to run on the slave, causing it to fall behind the master
  • I/O (disk or network) issues can prevent or slow down the slave from reading and replaying the binlog events

In order to deal with this, we needed a way to avoid connections to lagged slaves as long as there are slaves that are current. This would allow for the lagged ones to recover faster and avoid returning old data to our users. It also had to be something flexible enough, so we could have different settings for acceptable replication lag per dataset or stop tracking it altogether. Since we use the advanced database class, HyperDB, for all our database connections, it was the obvious place to integrate this.

We implemented it  in the following steps:

  • If a connection modifies data in a given table, then all subsequent SELECTs on the same connection for that table are sent to the master. Chances are replication won’t be fast enough to propagate the changes to the slaves on the same page load.  This logic has existed in HyperDB for a while.
  • Before we make a connection to a slave, we use a callback, to check if we have information for this slave’s lag in the cache and we skip it based on that, unless all slaves in the dataset are considered lagged.  In case replication breaks on all slaves, we would rather return old data then overload the master with read queries and cause an outage.
  • After a successful connection to a slave, if there was nothing in the cache regarding its lag status and not all slaves are considered lagged, we execute a second callback that checks whether this slave is lagged and updates the cache.

A slave is considered lagged when it has a “lag threshold” defined in it’s dataset configuration and the current lag is more than this threshold.

We considered the following options for checking if a slave is lagged.  No MySQL patches are required for any of them:

  • Checking the value of Seconds_Behind_Master from the SHOW SLAVE STATUS statement executed on the slave. It shows the difference between the timestamp of the currently executed query and the latest query we have received from the master. Although it is easy to implement and has low overhead, the main problem with using this option is that it is not completely reliable, as it can be tricked by IO latency and/or master connection problems.
  • Tracking the “File” and “Position” on SHOW MASTER STATUS executed on the master and comparing it to Relay_Master_Log_File and Exec_Master_Log_Pos of SHOW SLAVE STATUS on the slave. This way we can wait until the slave executes the queries from binlog “file” and position “position” before send certain queries to that slave and thus effectively we wait for the data to be replicated to the point where we need it. While very reliable, this option is more complex, has lots of overhead and doesn’t give us clock time value which we can track and set between servers.
  • Tracking the difference between the current time on the slave and the replication of a timestamp update from the master, which runs every second. This is basically what mk-heartbeat does. It requires proper time sync between the master and the slave servers but is otherwise very reliable.

The third option fit our needs best, however the code is flexible enough to easily support any of these. For caching, we decided to go with memcached, since it works well in our distributed, multi-server, multi-datacenter environment, but other methods (APC cache, shared memory, custom daemon etc.) would work just fine.

HyperDB is free, open-source and easy to integrate in your WordPress installation. You can download it here.  We hope you enjoy this new functionality and please let us know if you have any questions in the comments.

6 thoughts on “HyperDB Replication Lag Detection”

  1. Sounds very cool.

    What made you guys come to the decision to compensate for replication lag, rather than just let it play out and occasionally have out of date info? Is there major replication lag you saw? We sometimes see replication lag of 10-20 seconds, but that’s less than our cache TTL so we just don’t worry about it — although it does indicate the read slave(s) running slower-than-normal.

    I haven’t looked at the code yet, how often do you check for replication lag?

    1. Hi Gabriel,

      Sorry for the delayed reply. Somehow I missed this comment. Replication lag is bad even if it is less than the cache TTL because you could read old data from the database and use it to populate the cache. Next time you request that data, you get it from the cache, so even if there is no lag at that point the data will be old. In some cases, you could even then write that old data back to the database, overwriting the new data forever.

      More common in our environment than transient lag because of load issues is that something breaks – server, network, etc which causes a large amount of lag on a single server. Previously we would just remove the server from production manually, but now it happens automatically until the server has caught up.

      The lag state is cached:

      $wpdb->lag_cache_ttl = 30;

      You can adjust that to your liking and it will determine how often you check for replication lag.

  2. Hi there,
    First thanks a lot for all of the useful information. i am glad i found your blog.
    But one question concerning the multisite storage performance.
    if i have 1 million user, i will have a database holding more than one million tables as each site/blog have their respective post, comments etc…
    would it be better to seperate users and their content ?

    Thanks a lot again for your input and help.

  3. This is new territory for me and I need to connect to a SQL server. The plan is to have the most of the WordPress tables on a MySQL database but then get some data from the MS SQL database. Is this possible or am I wasting my time?

    Any advice is appreciated.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s