I got access to our 12-core Intel server, so I was able to do some better benchmarks to test the different group commit thread scheduling methods:
This graph shows queries-per-second as a function of number of parallel connections, for three test runs:
- Baseline MariaDB, without group commit.
- MariaDB with group commit, using the simple thread scheduling, where the serial part of the group commit algorithm is done by each thread signalling the next one.
- MariaDB with group commit and optimised thread scheduling, where the first thread does the serial group commit processing for all transactions at once, in a single thread.
This test was run on a 12-core server with hyper-threading, memory is
24 GByte. MariaDB was running with datadir in
/dev/shm (Linux ram
disk), to simulate a really fast disk system and maximise the stress on the
CPUs. Binlog is enabled with
innodb_flush_log_at_trx_commit=1. Table type is InnoDB.
I use Gypsy to generate the client load, which is simple auto-commit primary key updates:
REPLACE INTO t (a,b) VALUES (?, ?)
The graph clearly shows the optimised thread scheduling algorithm to improve scalability. As expected, the effect is more pronounced on the twelve-core server than on the 4-core machine I tested on previously. The optimised thread scheduling has around 50% higher throughput at higher concurrencies. While the naive thread scheduling algorithm suffers from scalability problems to the degree that it is only slightly better than no group commit at all (but remember that this is on ram disk, where group commit is hardly needed in the first place).
There is no doubt that this kind of optimised thread scheduling involves some complications and trickery. Running one part of a transaction in a different thread context from the rest does have the potential to cause subtle bugs.
On the other hand, we are moving fast towards more and more CPU cores and more and more I/O resources, and scalability just keeps getting more and more important. If we can scale MariaDB/MySQL with the hardware improvements, more and more applications can make do with scale-up rather than scale-out, which significantly simplifies the system architecture.
So I am just not comfortable introducing more serialisation (e.g. more global mutex contention) in the server than absolutely necessary. That is why I did the optimisation in the first place even without testing. Still, the question is if an optimisation that only has any effect above 20,000 commits per second is worth the extra complexity? I think I still need to think this over to finally make up my mind, and discuss with other MariaDB developers, but at least now we have a good basis for such discussion (and fortunately, the code is easy to change one way or the other).