You are viewing kristiannielsen

Kristian Nielsen - First steps with MariaDB Global Transaction ID
February 14th, 2013
04:23 pm

[Link]

Previous Entry Share Next Entry
First steps with MariaDB Global Transaction ID

My previous writings were mostly teoretical, so I wanted to give a more practical example, showing the actual state of the current code. I also wanted to show how I have tried to make the feature fit well into the existing replication features, without requiring the user to enable lots of options or understand lots of restrictions before being able to use it.

So let us start! We will build the code from lp:~maria-captains/maria/10.0-mdev26, which at the time of writing is at revision knielsen@knielsen-hq.org-20130214134205-403yjqvzva6xk52j.

First, we start a master server on port 3310 and put a bit of data into it:

    server1> use test;
    server1> create table t1 (a int primary key, b int) engine=innodb;
    server1> insert into t1 values (1,1);
    server1> insert into t1 values (2,1);
    server1> insert into t1 values (3,1);
To provision a slave, we take a mysqldump:
    bash$ mysqldump --master-data=2 --single-transaction -uroot test > /tmp/dump.sql
Note that with --master-data=2 --single-transaction we obtain the exact binlog position corresponding to the data in the dump. Since MariaDB 5.3, this is completely non-blocking on the server (it does not do FLUSH TABLES WITH READ LOCK):
    bash$ grep "CHANGE MASTER" /tmp/dump.sql
    -- CHANGE MASTER TO MASTER_LOG_FILE='master-bin.000001', MASTER_LOG_POS=910;
Meanwhile, the master server has a couple more transactions:
    server1> insert into t1 values (4,2);
    server1> insert into t1 values (5,2);
Now let us start up the slave server on port 3311, load the dump, and start replicating from the master:
    bash$ mysql -uroot test < /tmp/dump.sql
    server2> change master to master_host='127.0.0.1', master_port=3310,
        master_user='root', master_log_file='master-bin.000001', master_log_pos=910;
    server2> start slave;
    server2> select * from t1;
    +---+------+
    | a | b    |
    +---+------+
    | 1 |    1 |
    | 2 |    1 |
    | 3 |    1 |
    | 4 |    2 |
    | 5 |    2 |
    +---+------+
    5 rows in set (0.00 sec)
So slave is up to date. In addition, when the slave connects to the master, it downloads the current GTID replication state, so everything is now ready for using global transaction ID. Let us promote the slave as the new master, and then later make the old master a slave of the new master. So stop the slave thread on the old slave, and run another transaction to simulate it being the new master:
    server2> stop slave;
    server2> insert into t1 values (6,3);
Finally, let us attach the old master as a slave using global transaction ID:
    server1> change master to master_host='127.0.0.1', master_port=3311,
        master_user='root', master_gtid_pos=auto;
    server1> start slave;
    server1> select * from t1;
    +---+------+
    | a | b    |
    +---+------+
    | 1 |    1 |
    | 2 |    1 |
    | 3 |    1 |
    | 4 |    2 |
    | 5 |    2 |
    | 6 |    3 |
    +---+------+
    6 rows in set (0.01 sec)
Old master is now running as slave and is up-to-date with the new master.

So that is it! A short post from me for once, but that is the whole point. Replication with MariaDB Global Transaction ID works much as it always did. The only new thing here is when we issue the CHANGE MASTER to make the old master a slave of the new master. We do not have to manually try to compute or guess the correct binlog position on the new master, we can just specify MASTER_GTID_POS=AUTO and the servers figure out the rest for themselves.

I hope I managed to show more concretely my ideas with MariaDB Global Transaction ID. Comments and questions are most welcome, as always. Everything above is actual commands that work on the current code on Launchpad. Everything else may or may not work yet, as this is work in progress, just so you know!

Tags: , , , , ,

(9 comments | Leave a comment)

Comments
 
From:(Anonymous)
Date:February 15th, 2013 04:20 am (UTC)
(Link)
Very Nice and simple.
[User Picture]
From:Henrik Ingo
Date:February 15th, 2013 07:07 am (UTC)
(Link)
Another beautiful design from you. I like the MASTER_GTID_POS=AUTO! Congratulations.

Doing completely non-blocking mysqldump had somehow escaped me! I followed the link to knowledgebase. Do I understand correctly this is with the following assumptions:
- you must not run out of InnoDB redo-log space (or equivalent for other engines)
- you must not be doing DML against MyISAM or other engines that don't support consistent snapshot

From:kristiannielsen
Date:February 15th, 2013 07:18 am (UTC)
(Link)
> Another beautiful design from you. I like the MASTER_GTID_POS=AUTO!
> Congratulations.

Thanks! I'm glad like liked it.

> - you must not run out of InnoDB redo-log space (or equivalent for other
> engines)

There is no such limitation. InnoDB never runs out of redo-log space, it just
forces a checkpoint (are you thinking of Oracle RDBMS "snapshot too old" woes?
InnoDB does not have that).

> - you must not be doing DML against MyISAM or other engines that don't
> support consistent snapshot

Right. Things will still work if you do. However, the obtained binlog position
may not be consistent with the dumped data for those tables. Same with DDL.
[User Picture]
From:Henrik Ingo
Date:February 15th, 2013 07:40 am (UTC)
(Link)
>> - you must not run out of InnoDB redo-log space (or equivalent for other
>> engines)
>
> There is no such limitation. InnoDB never runs out of redo-log space, it just
> forces a checkpoint (are you thinking of Oracle RDBMS "snapshot too old" woes?
> InnoDB does not have that).

I wasn't thinking of anything in particular, just intuitively poking in some direction.

I suppose what I'm really asking, is: Clearly it can't be possible to keep a consistent snapshot open indefinitively? Wouldn't InnoDB at some point either block new updates or alternatively have to release the old snapshot? Or will it happily keep around old MVCC copies of records indefinitively?
From:kristiannielsen
Date:February 15th, 2013 07:54 am (UTC)
(Link)
> Clearly it can't be possible to keep a consistent snapshot open
> indefinitively?

I believe InnoDB will try. Of course, the undo tablespace will grow if there
are continously new updates happening. Basically, you will get purge lag.

Mysqldump is a good option for smaller databases (where taking and restoring a
dump can be done in reasonable time), due to its simplicity and
flexibility. For bigger data sets, XtraDB is a better option. It also provides
an option for obtaining non-blocking consistent binlog position, and does not
cause purge lag.
[User Picture]
From:brychev
Date:February 15th, 2013 09:50 am (UTC)
(Link)
seems very nice and good=) waiting for stable release
one question. in my work i have to use chain replication topology.
imagine there are such chain: A->B->C->D
and once time B fall down (dead)
will i be able to connect D or C to A easy without computing and selecting right binlog position?
In this topology C and D know nothing about position where replication was stopped on B
From:kristiannielsen
Date:February 15th, 2013 10:25 am (UTC)
(Link)
>imagine there are such chain: A->B->C->D

> will i be able to connect D or C to A easy without computing and selecting
> right binlog position?

Yes. Just use CHANGE MASTER TO ... MASTER_GTID_POS=AUTO on D or C, specifying
the connection information for A.

This will work in general for any server in any correctly setup replication
topology. I spent a lot of effort in the design to make this work properly and
automatically.
[User Picture]
From:maxmether
Date:February 22nd, 2013 08:04 pm (UTC)
(Link)
Hello Kristian,

I've been enjoying reading all your blog postings about this and I'm looking forward to testing the feature. I think one of the most useful aspects for this feature is in a multi-master approach with multiple data centers. Say that you have 4 data centers (A,B,C,D) and instead of replicating in a chain each server directly replicates each other server, ie you have 4 domains where each domain could actually be multiple servers in a local replication hierarchy (say X1,X2,X3,X4). So here the question is, what if one of the masters (say B1) would fail. Now B2 would be upgraded to become the master of the B data center and B3 and B4 would start replicating from B2. At the same time B2 would start replicating to and from the other masters, ie A1, C1 and D1. All of this should be pretty much automatic based on the GTID, right?

Now the real problem is, because of the asynch nature of replication, what if an event was in B1 but didn't yet get to the other B servers? or if B3 had the latest event of the A domain but B2 had the latest event of the B domain etc. How do all the servers now make sure they are all synchronized for each domain? In particular as you only want 1 master per domain. Or can this not be a problem as all events on B1 were locally serialized?

So can an event be lost and could there be a synchronization problem?
From:kristiannielsen
Date:February 25th, 2013 01:23 pm (UTC)
(Link)
> aspects for this feature is in a multi-master approach with multiple data
> centers. Say that you have 4 data centers (A,B,C,D) and instead of
> replicating in a chain each server directly replicates each other server,
> ie you have 4 domains where each domain could actually be multiple
> servers in a local replication hierarchy (say X1,X2,X3,X4). So here the

Are you going to have parallel updates in all four data centers A, B, C, and
D? Then you need 4 domain_ids and to ensure that updates are not
conflicting. Or will you ensure that only one datacenter at a time does
updates? Then you only need a single domain.

The primary issue here is that events can take multiple paths to the same
destination, so somehow duplicate apply of events needs to be avoided. Ie. if
an event originates at A, it can come to B three times: from a directly, and
via B and C. This could perhaps be handled by comparing GTIDs and only
applying the event if a more recent event has not yet been applied. Or perhaps
by only sending from B to D events originating at B.

> question is, what if one of the masters (say B1) would fail. Now B2 would
> be upgraded to become the master of the B data center and B3 and B4 would
> start replicating from B2. At the same time B2 would start replicating to
> and from the other masters, ie A1, C1 and D1. All of this should be
> pretty much automatic based on the GTID, right?

B3 and B4, as well as A1, C1, and D1, would all be able to use
MASTER_GTID_POS=AUTO to resume at the correct places in the binlog of B2, yes.

> Now the real problem is, because of the asynch nature of replication,
> what if an event was in B1 but didn't yet get to the other B servers? or

For example, the last event of B1 may have reached A but not B2. So you would
need to bring B2 up to date before starting to use it as master. You do this
by making it replicate briefly from A, C, and D until it is up-to-date with
their current state. Then you can remove the read-only status of B2 and let
B3, B4, A, C, and D connect to it as slaves.

> if B3 had the latest event of the A domain but B2 had the latest event of
> the B domain etc. How do all the servers now make sure they are all
> synchronized for each domain? In particular as you only want 1 master per
> domain. Or can this not be a problem as all events on B1 were locally
> serialized?

If you have 4 domains, each server remembers its last applied GTID within each
domain. So B3 can start replicating each domain at different positions inside
the binlog of B2. That is the main purpose of the domain_ids, to allow to keep
track of multiple independent streams within a single binlog.

> So can an event be lost and could there be a synchronization problem?

I think it should be possible to set this up so that no events are lost and
synchronization is done correctly. The mais issue is to solve the
multiple-path problem, which I have not considered before but sounds doable.

Note that global transaction ID provides only the means for doing this by
keeping track of positions. It does not include automated failover and
synchronization, such must be scripted on top.
Powered by LiveJournal.com