Kristian Nielsen (kristiannielsen) wrote,

Using MASTER_GTID_WAIT() to avoid stale reads from slaves in replication

I have just implemented MASTER_GTID_WAIT() in MariaDB 10.0. This can be used to give a very elegant solution to the problem of stale reads in replication read-scaleout, without incuring the overheads normally associated with synchronous replication techniques. This idea came up recently in a discussion with Stephane Varoqui, and is similar to the concept of Lamport logical clock described in this Wikipedia article.

I wanted to describe this, hoping to induce people to test and maybe start using this, as it is a simple but very neat idea, actually.

A very typical use of MariaDB/MySQL replication is for read-scaleout. The application does all updates against a single master, which replicates to a set of slaves. The application can then distribute its reads among the slaves. If most of the database load is from reads, then this is an effective way to scale beyond what a single database server can handle.

The problem of stale reads occurs since MariaDB/MySQL replication is asynchronous. There is some delay from the commit of a change on the master server until that change becomes committed and visible on a slave server. This problem can be quite annoying in many cases, eg. user changes some preferences and reloads the page (read goes to a different slave server which is not caught up); now the change looks as if it was lost.

The idea is to use the MariaDB Global Transaction ID (GTID for short) as a logical clock. When we do an update on the master, we can obtain the associated GTID from @@LAST_GTID. Then on the slave, we can execute a MASTER_GTID_WAIT(<GTID from master>, <timeout>) before doing a query for which read consistency is critical; this will ensure that the slave will catch up before the query is run, and that no stale read results. Or if we get a timeout, we can try another slave server, which hopefully is not lagging as much behind.

Typically, we would pass around the GTID (the logical clock value) between different parts of the system to maintain consistency where it is needed and avoid stale reads. Eg. in a web application, we would set a cookie with the GTID when an update is made to the database. Then on the next HTTP request, we have the GTID which is needed in MASTER_GTID_WAIT(), even if it goes to a completely different web server. Or if we send an email, we can encode the GTID in any contained link, to be sure that it will work whichever slave server it might end up in, possibly in a different data center on a different continent.

By passing around the GTID whenever we communicate between parts of the system, we can ensure that if transaction A is known to have occured before transaction B, then any change made by A will also be visible to B. If there was no communication (direct or indirect), then B cannot know that A came before - and if there was communication, we can avoid stale reads by passing the GTID as part of that communication.

The great thing about this technique is that it is optional. We can use it for just the queries where avoiding stale reads is critical, and only there take the associated penalty (in the form of increased latency). Other parts of the database or application will not be negatively affected. This is much more flexible than using some form of synchronous replication, which will incur some penalty for all parts of the system.

So for applications that require more performance than what is currently possible to get from fully synchronous replication, we can choose consistency in a few critical places where it matters, and go for the performance of asynchronous replication in the rest.

I am hoping that some users will start to do tests with this, and I am very eager to hear any feedback and suggestions for further improvements (the maria-developers@ list is a good place for those). I think this can become a very useful technique in many cases.

By the way, note that it is not necessary to actually switch the replication to GTID to use this technique, as in MariaDB the GTIDs are always generated and replicated, even if they are not used for the slave when connecting to the master. This should make it simple for people to start testing. One will need to run MariaDB 10.0 on both the master and the slave, of course - but this could be achieved by setting up one MariaDB server as a slave of the main system (which could be running MySQL or Percona Server or MariaDB 5.5 or whatever), and then setting up another MariaDB 10.0 slave using the first slave as master.

If this turns out to be something that people find generally useful, it would be interesting to integrate this more tightly into the client protocol and client API. The @@LAST_GTID could be sent automatically back with the query results, the same way that @@LAST_INSERT_ID is done. And another option could enable the automatic execution of the MASTER_GTID_WAIT() call before queries sent to the slave servers. This could make the technique almost transparent for the end application, and could help avoid the overhead of extra client-server roundtrips and extra SQL parsing.

However, that is only worth doing if this turns out to be something that will actually be shown to be useful in practice. So if you want to see that happen eventually, then get testing and send in that feedback!

The code is available in the 10.0 bzr tree now, and will be in the next (10.0.9) release of MariaDB.

Tags: database, mariadb, mysql, performance, programming, replication
  • Post a new comment

    Error

    default userpic

    Your reply will be screened

  • 4 comments