|
Kristian Nielsen
[Recent Entries][Archive][Friends][User Info]
Below are the 10 most recent journal entries recorded in the "Kristian Nielsen" journal:[<< Previous 10 entries]
12:33 am
[Link] |
MySQL/MariaDB replication: applying events on the slave side
Working on a new set of replication APIs in MariaDB, I have
given some thought
to the generation of replication events on the master server.
But there is another side of the equation: to apply the generated events on a
slave server. This is something that most replication setups will need (unless
they replicate to non-MySQL/MariaDB slaves). So it will be good to provide a
generic interface for this, otherwise every binlog-like plugin implementation
will have to re-invent this themselves.
A central idea in the current design for generating events is that we do not
enforce a specific content of events. Instead, the API provides accessors for
a lot of different information related to each event, allowing the plugin
flexibility in choosing what to include in a particular event format. For
example, one plugin may request column names for a row-based UPDATE event;
another plugin may not need them and can avoid any overhead related to column
names simply by not requesting them.
To get the same flexibility on the slave side, the roles of plugin and API are
reversed. Here, the plugin will have a certain pre-determined (by the
particular event format implemented) set of information related to the event
available. And the API must make do with whatever information it is provided
(or fail gracefully if essential information is missing).
My idea is that the event application API will provide corresponding events to
the events in the generation API. Each application event will have "provider"
methods corresponding to the accessor methods of the generator API. So the
plugin that wants to apply an event can obtain an event generator object, call
the appropriate provider methods for all the information available, and
finally ask the API to execute the event with the provided context
information.
This is only an abstract idea at this point; there are lots of details to take
care of to make this idea into a concrete design proposal. And I have not
fully decided if such an API will be part of the replication project or
not. But I like the idea so far.
Understanding how MySQL binlog events are applied on the slave
I wanted to get a better understanding of what is involved in an event
application API like the one described above. So I did a similar exercise to
the one I wrote about in my
last post,
where I went through in detail all the information that the existing MySQL
binlog format includes. This time I went through the details in the code that
applies MySQL binlog events on a slave.
Again, I concentrate on the actual events that change the database, ignoring
(most) details that relate only to the particular binlog format used by MySQL
(and there are quite a few :-).
At the top level, the slave SQL thread
(in exec_relay_log_event()) reads events
(next_event()) from the relay logs and executes
(apply_event_and_update_pos()) them.
There are a number of intricate details here relating to switching to a new
relay log (and purging old ones), and re-winding in the relay log to re-try a
failed transaction (eg. in case of deadlock or the like). This is mostly
specific to the particular binlog implementation.
The actual data changes are mostly done in the do_apply_event()
methods of each event class in sql/log_event.cc. I will go
briefly through this method for the events that are used to change actual data
in current MySQL replication. It is relatively easy to read a particular
detail out of the code, since it is all located in
these do_apply_event() methods.
(One problem however is that there are special cases sprinkled across the code
where special action is taken (or not taken) when running in the slave SQL
thread. I have not so far tried to determine the full list of such special
cases, or access how many there are).
Query_log_event
The main task done here is to set up various context in the THD,
execute the query, and then perform necessary cleanup. If the query throws an
error, there is also some fairly complex logic to handle this error correctly;
for example to ignore certain errors, to require same error as on master (if
the query failed on the master in some particular way), and to re-try the
query/transaction for certain errors (like deadlock).
There are also some hard-coded special cases for NDB (this seems to be a
common theme in the replication code).
The main think that to my eyes make this part of the code complex is the set
of actions taken to prepare context before executing the query, and to clean
up after execution. Each individual step in the code is in fact relatively
easy to follow (and often the commenting is quite good). The problem is that
there are so many individual steps. It is very hard to feel sure that exactly
this set of actions is sufficient (and that none are redundant for that
matter).
For example, code like this (not complete, just a random part of the setup):
thd->set_time((time_t)when);
thd->set_query((char*)query_arg, q_len_arg);
VOID(pthread_mutex_lock(&LOCK_thread_count));
thd->query_id = next_query_id();
VOID(pthread_mutex_unlock(&LOCK_thread_count));
thd->variables.pseudo_thread_id= thread_id;
It seems very easy to forget to assign query_id or whatever if
one was to write this from scratch. That is something I would really like to
improve in an API for replication plugins: it should be possible to understand
completely exactly what setup and cleanup is needed around event execution,
and there should be appropriate methods to achieve such setup/cleanup.
Another thing that is interesting in the code
for Query_log_event::do_apply_event() is that the code
does strcmp() on the query against keywords like COMMIT,
SAVEPOINT, and ROLLBACK, and follows a fairly different execution path for
these. This seems to bypass the SQL parser! But in reality, these particular
queries are generated on the master with special code in the server that
carefully restricts the possible format (eg. no whitespace or comments
etc). So in effect, this is just a way to hack in special event types for
these special queries without actually adding new binlog format events.
Rand_log_event, Intvar_log_event, and User_var_log_event
The action taken for these events is essentially to update
the THD with the information in the event: value of random
seed, LAST_INSERT_ID/INSERT_ID,
or @user_variable.
Xid_log_event
On the slave side, this event is essentially a COMMIT. One thing that springs
to mind is how different the code to handle this event is from the special
code in Query_log_event that handles a query "COMMIT". Again, it
is very hard to tell from the code if this is a bug, or if the
different-looking code is in fact equivalent.
Begin_load_query_log_event, Append_block_log_event, Delete_file_log_event, and Execute_load_query_log_event
This implements the LOAD DATA INFILE query, which needs special
handling as it originally references a file on the master server (or on the
client machine). The actual execution of the query is handled the same way as
for normal queries (Execute_load_query_log_event is a sub-class
of Query_log_event), but some preparation is needed first to
write the data from the events in the relay log into a temporary file on the
slave.
The main thing one notices about this code is how it handles re-writing the
LOAD DATA query to use a new temporary file name on the
slave. Take for example this query:
LOAD DATA CONCURRENT /**/ LOCAL INFILE 'foobar.dat' REPLACE INTO /**/ TABLE ...
The event includes offsets into the query string of the two places
marked /**/ in the example. This part of the query is then
re-written in the slave code (so it is not just replacing the
filename). Again, the code by-passes the SQL parser, it just so happens that
the SQL syntax in this case is sufficiently simple that this is not too
hackish to do. If one were to check, one would probably see that any user comments
in this particular part of the query string disappear in the slave
binlog if --log-slave-updates is enabled.
Table_map_log_event
This just links a struct RPL_TABLE_LIST into a list, containing
information about the tables described by the event. The actual opening of the
table is done when executing the first row event (WRITE/UPDATE/DELETE).
Write_rows_log_event, Update_rows_log_event, and Delete_rows_log_event
These are what handle the application of row-based replication events on a slave.
The first of the row events causes some setup to be done, a partial extract is this:
lex_start(thd);
mysql_reset_thd_for_next_command(thd, 0);
thd->transaction.stmt.modified_non_trans_table= FALSE;
query_cache.invalidate_locked_for_write(rli->tables_to_lock);
(As remarked for Query_log_event, while row event application
setup is somewhat simpler, it still appears a bit magic that exactly these
setups are sufficient and necessary).
The code also switches to row-based binlogging for the following row
operations (this is for --log-slave-updates, as it is
not possible to binlog the application of row-based events as statement-based
events in the slave binlog). This is by the way an interesting challenge for a
generic replication API: how does one handle binlogging of events applied on a
slave, for daisy-chaining replication servers? This of course gets more
interesting if one were to use a different binlogging plugin on the slave than
on the master. I need to think more about it, but this seems to be a pretty
strong argument that a generic event application API is needed, which hooks
into event generators to properly generate all needed events for the updates
done on slaves. Another important aspect is the support of a global
transaction ID, that will identify a transaction uniquely across an entire
replication setup to make migrating slaves to a new master easier. Such a
global transaction ID also needs to be preserved in a slave binlog when
replicating events from a master.
During execution of the first row event, the code also sets the flags for
foreign key checks and unique checks that is included in the event from the master.
And it checks if the event should be skipped
(for --replicate-ignore-db and friends).
A check is made to ensure that the table(s) on the slave are compatible with
the tables on the master (as described in the table map event(s) received just
before the row event(s)). The MySQL row-based replication has a fair bit of
flexibility in terms of allowing differences in tables between master and
slave, such as allowing different column types, different storage engine, or
even extra columns on the slave table. In particular allowing extra columns
raises some issues about default values etc. for these columns, though I did
not really go into details about this.
Again, there are a number of hard-coded differences for NDB.
For Write_rows_log_event, flags need to be set to ensure that
column values for AUTO_INCREMENT and TIMESTAMP
columns are taken from the supplied values, not
auto-generated. If slave_exec_mode=IDEMPOTENT, an INSERT that
fails due to already existing row does not cause replication to fail; instead
an UPDATE is tried, or in some cases (like if there are foreign keys) a DELETE
+ re-tried INSERT. There is also code to hint storage engines about the
approximate number of rows that are part of a bulk insert.
For Update_rows_log_event and Delete_rows_log_event,
the code needs to locate the row to update/delete. This is done by primary key
if one exists on the slave table (the current binlogging always includes every
column in the before image of the row, so the primary key value of the row to
modify is always available). But there is also support for tables with no
primary key, in which the first index on the table is used to locate the row
(if any), or failing that a full table scan. This btw. is a good reminder
to not use row-based replication with primary-key-less tables with
non-trivial amount of rows: every row operation applied will need a
full table scan!
Finally, the values from the event are unpacked into a row buffer in the
format used by MySQL storage engines, and the read_set
and write_set are set up (current replication always includes all
columns in row operations), before the
actual ha_write_row(), ha_update_row(),
or ha_delete_row() call into the storage engine handler is made
to perform the actual update. Note that a single row event can include
multiple rows, which are applied one after the other.
Final words
And that is it! Quite a bit of detail, but again I found it very useful to
create this complete overview; it will make things easier when re-implementing
this in a new replication API.
Tags: freesoftware, mariadb, mysql, programming, replication
|
02:16 pm
[Link] |
Dissecting the MySQL replication binlog events
For the replication
project that I am currently working on in MariaDB, I wanted to understand
exactly what information is needed to do full replication of all MySQL/MariaDB
statements on the level of completeness that existing replication does. So I
went through the code, and this is what I found.
What I am after here is a complete list of what the execution engine needs to
provide to have everything that a replication system needs to be able to
completely replicate all changes made on a master server. But not
anything specific to the particular implementation of replication used, like
binlog positions or replication event disk formats, etc.
The basic information needed is of course the query (for statement-based
replication), or the column values (for row-based replication). But there are
lots of extra details needed, especially for statement-based replication. I
need to make sure that the replication API we are designing will be able to
provide all needed information, and it was always nagging in the back
of my head that there would be lots and lots of small bits in various corners
that would be missing and cause problems. So it was good to get this
overview. Turns out that there are a lot of details, but not
that many, and it should be manageable.
All of the events that are used in replication are listed in enum
Log_event_type in sql/log_event.h. So anything needed for
complete replication can be found here, but mixed up with lots of other
details about the MySQL binlog implementation, backwards compatibility,
etc. So what follows is an extract from log_event.cc of the
actual change information contained in those events.
Statement-based replication
QUERY_EVENT
The main event for statement-based replication is QUERY_EVENT. It contains the
query to be executed (as a string) and some information to provide the context
for correct execution. Here is the list of information:
- SQL query.
- Default database for the query (eg. from
USE statement).
- The setting of some server variables in effect at the time the query
was run:
-
sql_mode.
-
autocommit (whether autocommit is enabled).
- Character set and collation at various levels (see the
section 9.1.4. Connection
Character Sets and Collations in the MySQL manual for background on these):
- Client (
character_set_client).
- Connection (
character_set_connection).
- Server (
character_set_server).
- Current default database (
character_set_database;
note that there are few statements that rely on this, comments in
the code say it is only LOAD DATA).
-
foreign_key_checks (whether foreign keys are checked).
-
unique_checks (whether unique constraint checks are
enforced).
-
auto_increment_offset and auto_increment_increment.
- Time zone of the master database server.
- Names to use for days and months; this is identified by a code that
is mapped to a table of names to use in
sql/sql_locale.cc.
-
sql_auto_is_null (whether SELECT ... WHERE
autoinc IS NULL returns last insert id for ODBC compatibility).
-
- Error code from executing the query on the master (for non-transactional
statements that may still make permanent changes even though they fail
mid-way; on the slave the query should fail with the same error).
- Connection ID (this is used to correctly distinguish
TEMPORARY
TABLEs with same name used in different connections on the master
simultaneously).
Note that not all of this information is replicated in all query events, as
not all of it is needed for a given query. But a replication API must make the
information available for the queries where it is needed.
INTVAR_EVENT, RAND_EVENT, and USER_VAR_EVENT
These events provide additional context for executing a query on the slave:
- Value of
LAST_INSERT_ID (for queries that reference it).
- Value of
INSERT_ID (to get same auto_increment numbers for
inserts on the slave as on the master).
- The random seed (so RAND() can return same values in queries on slaves
as on the master).
- The values for any
@user_variables referenced in a query
BEGIN_LOAD_QUERY_EVENT, APPEND_BLOCK_EVENT, EXECUTE_LOAD_QUERY_EVENT, and DELETE_FILE_EVENT
These four events are used to do statement replication of LOAD DATA
INFILE. The contents of the file to be loaded is sent in blocks
in BEGIN_LOAD_QUERY_EVENT followed by zero or
more APPEND_BLOCK_EVENT. Then the actual query is sent
in EXECUTE_LOAD_QUERY_EVENT, which is a variant
of QUERY_EVENT that replaces the original filename with the name
of a temporary file on the slave and deletes the temporary file afterwards
(DELETE_FILE_EVENT is used in certain error cases).
This is the complete story of exactly how much information needs to be
provided on the master to make statement replication work as it does currently
in MySQL. If you get the thought that this is a little bit scary in terms of
complexity I tend to agree with you ;-). There is a lot to be said for the
comparative simplicity of row-based replication (and it is also interesting to
see the history of bug fixes in MySQL 5.1 that gradually have moved more and
more statements to be replicated row-based (in mixed-mode binlogging) due to
corner cases where statement-based replication can fail).
Still, once we have the list of information, it is not that hard to provide
the information in a pluggable replication API for any implementations that
want to try their luck with statement-based replication. And of course,
row-based replication only
handles INSERT/UPDATE/DELETE! We also
need to support CREATE TABLE and similar statements, for which it
is still useful to know the above exhaustive list of information that may be
needed in one form or another.
Row-based replication
In row-based replication, each DML statement is binlogged in two parts. First
the tables modified in the query are described
with TABLE_MAP_EVENT, and second the row values changed are
logged with WRITE_ROWS_EVENT, UPDATE_ROWS_EVENT,
or DELETE_ROWS_EVENT.
TABLE_MAP_EVENT
The information describing modified tables in row-based replication is as
follows:
- Database name.
- Table name.
- List of columns in the table. For each column, the following
information is included:
- Column type (this is
field->type()).
- Column metadata (this is what is returned
by
field->save_field_metadata(); this is for example the
maximum length of a VARCHAR, the precision and number of
decimals in DECIMAL, etc.)
- Whether the column is NULL-able.
- Table map id; this is just an internally generated uniqie number for
subsequent events to refer to the table described.
Note in particular that column names are not used/needed in current
MySQL/MariaDB row-based replication. I personally think this is a good way to
do it. However, in a generic API, it will make sense to make the full table
definition available to implementations, each of which can choose what and how
to log in terms of table metadata.
WRITE_ROWS_EVENT, UPDATE_ROWS_EVENT, and DELETE_ROWS_EVENT
These events handle replication of
respectively INSERT, UPDATE, and DELETE
(and similar statements like REPLACE etc.) They contain the
following information:
- Table map id, referencing a table previously described
with
TABLE_MAP_EVENT.
- Value of
foreign_key_checks and unique_checks,
similar to statement-based binlogging (but for row-based, those two are
all the context storedm though see remarks below).
- List (bitmap really) of columns updated. This is essentially
the
write_set that is used in the storage engine API (but see
below for explanation). For UPDATE, there are two bitmaps, one
for the before image and one for the after image.
- List of records containing the values of each column modified. There is
one such record for every row update logged. For
UPDATE there
are two records for each row update, one for the before image (values before
the update was done) and one for the after image (values after the update
was done).
I must say, investigating how these row-based events are implemented in MySQL
really makes the feature seem rather half-baked. There are several issues:
- The lists/bitmaps of column updated sound useful, but in reality they
are set unconditionally to include all columns! Except for NDB).
- This also means all columns
in a row are always sent, even for
DELETE
and UPDATE. Except for NDB, which only logs needed
columns.
- Some of the "extra" flags in the storage engine API are not included,
such as
HA_EXTRA_WRITE_CAN_REPLACE. This is actually a bug, as
it means that a storage engine using such flags to optimise its operation
will not replicate correctly. In the existing MySQL source, only NDB uses
this flag, but NDB does special tricks for binlogging and slave replication
which avoids this particular issue in most cases.
I strongly suspect that some of this half-baking was done in a quick-and-dirty
attempt to squeeze NDB replication in. At least, there are several "this is
only used by NDB" type comments in the vicinity of these things in the source code.
In any case, for the replication API, it is probably a good idea to re-think
this part and make sure that the information logged for row updates is
complete and sane for all reasonable use cases.
Thoughts
My idea is to have a replication API that provides for generation and
consumption of events completely separate from any details of the actual
format of events in the binlog or any other method used to store or process
the events. This will allow replication plugins that use a completely
different binlog implementation, or even has no binlog at all.
So such an API needs to provide all of the above information (to allow
re-implementing the existing binlog/replication as a plugin, if for no other
reason), but need not provide such information in any particular event
format. In fact, I am trying to make the API so that such information need not
be materialised in structures or memory buffers at all; instead relying on
providing accessor methods, so that an implementation can request just the
information it needs, and materialise it as or if needed.
On top of this I still think it makes sense to define a standard (but
optional) materialised event format, so that more light-weight plugins can be
written that can do interesting things with replication without having to
implement a full new event format each time. I am still considering whether to
extend the existing binlog format (which is not all that attractive, as it is
not very easily extensible), or whether to define a new more flexible format
(for example based on the Google protobuffer library).
More on the existing binlog format
Just for completeness, here is some additional description of the existing
MySQL/MariaDB 5.1 binlog format. These are things that I believe are not
required in a new API, as they are mostly internal implementation
details. However, as I had to go through them anyway while finding the stuff
that does need to be in the API, I will include a brief description here.
Additional query information
Some additional information, which is mostly redundant, is included with query
events for statement-based binlogging:
- Bitmap of tables affected by multi-table update (this allows to know
which tables will be updated without parsing the query, eg. for filtering
events based on database/table name.)
- Time spent in query on master.
- Catalog (I believe this is old unused stuff. Idea is that each database
belongs to a catalog, but I have never seen this actually used anywhere).
- A flag
LOG_EVENT_THREAD_SPECIFIC_F which is set if the
query uses TEMPORARY table (allows to get this information
without parsing the query).
- A flag
LOG_EVENT_SUPPRESS_USE_F set in a few cases when the
master knows that the query is independent of what the current database is
(so that a possible USE statement can be optimised away).
Binlog specific events
These are events that are specific to the binlog implementation:
XID_EVENT
-
This is used to record a transaction ID for each transaction written to
the binlog in 2-phase commit. This recorded ID is needed during crash
recovery on the master to know which prepared transactions in
transactional engines need to be recovered to get consistency with what is
in the binlog. It is not used on the slave in replication (though this
events implies a
COMMIT, which _does_ have effect on the
slave, of course.)
FORMAT_DESCRIPTION_EVENT
-
This event is written at the start of every binlog file. It provides to
slaves reading the binlog the master server version and the event size of
all following events, thereby providing some facilities for extending
event formats while maintaining backwards compatibility.
STOP_EVENT
-
This is logged when the master shuts down gracefully (though I do not
think this is used much, if any)
ROTATE_EVENT
-
This is logged at the end of a binlog file when the master starts a new
binlog file. It is needed by the slave to reset it's master binlog
position so that the IO thread can proceed correctly from the next binlog
file (incidentally, it is a clear weakness in the binlog implementation
that the slaves need knowledge about binlog file names and data offsets on
the master server, and is a cause of much complexity when switching
masters in advanced replication topologies. Something that really needs
improvements in the near future).
INCIDENT_EVENT
-
This is logged by the master when something bad happens that may cause
replication to fail/diverge, so that the slave can be notified of the
problem and stop, informing the DBA/sysadm to resolve the issue.
Obsolete events
Finally there are a number of events that are no longer generated (but which
are still important for the slave replication code to handle to be able to
work with masters of older versions):
LOAD_EVENT, NEW_LOAD_EVENT, CREATE_FILE_EVENT,
and EXEC_LOAD_EVENT
-
Various old events for handling
LOAD DATA INFILE (as can be
seen, LOAD DATA INFILE has had some changes in replication
over the years :-).
START_EVENT_V3
-
Old version of
FORMAT_DESCRIPTION_EVENT.
PRE_GA_WRITE_ROWS_EVENT, PRE_GA_UPDATE_ROWS_EVENT,
and PRE_GA_DELETE_ROWS_EVENT
-
Old versions of the row-based replication binlog events.
SLAVE_EVENT
-
Not used, I think it may have been related to some feature that was never
completed.
Tags: freesoftware, mariadb, mysql, programming
|
04:17 pm
[Link] |
Fixing MySQL group commit (part 4 of 3)
(No
three-part series is complete without a part 4, right?)
Here is an analogy that describes well what group commit does. We have a bus
driving back and forth transporting people from A to B (corresponding
to fsync() "transporting" commits to durable storage on
disk). The group commit optimisation is to have the bus pick up everyone that
is waiting at A before driving to B, not drive people one by one. Makes sense,
huh? :-)
It is pretty obvious that this optimisation of having more than one person in
the bus can dramatically improve throughput, and it is the same for the group
commit optimisation. Here is a graph from a benchmark comparing stock MariaDB
5.1 vs. MariaDB patched with a proof-of-concept patch that enables group
commit:
When group commit is implemented, we see clearly how performance (measured in
queries per second) scales dramatically as the number of threads
increases. Whereas with stock MariaDB with no group commit, there is no
scaling at all. We also see that SSD is better than HDD (no surprise there),
but that with sufficient parallelism from the application, group commit can to
a large extent compensate for the slower disks.
This is the same benchmark as in
the first part
of the series. Binlog is enabled. Durability is enabled
with sync_binlog=1 and flush_log_at_trx_commit=1
(and disk cache disabled to prevent the disks lying about when data is
durable). The load is single-row transactions against a 1000000-row XtraDB
table. The benchmark is thus specifically designed to make
the fsync() calls at the end of commit the bottleneck.
I should remark that I did not really tune the servers used in the benchmark
for high parallelism (except for raising max_connections :-), and
I ran the client on the same machine as the server. So it is likely that there
are other effects than group commit influencing the performance at high
parallelism (especially on the SSD results, which I ran on my laptop). But I
just wanted to see if my group commit work scales with higher parallelism, and the
graphs clearly shows that it does!
Architecture
For this work, I have focused a lot on the API for storage engine and binlog
plugins (we do not have binlog plugins now, but this is something that we will
be working on in MariaDB later this year). I want a clean interface that
allows plugins to implement group commit in a simple and efficient manner.
A crucial point is the desire to get commits ordered the same way in the
different engines (ie. typically in InnoDB and in the binlog), as I discussed
in previous articles. As group commit is about parallelims, and ordering is
about serialisation, these two tend to get into conflict. My idea is to
introduce new calls in the interface to storage engines and the XA transaction
coordinator (which is how binlog interacts with commit internally in the
server). These new calls allow plugins that care about commit order to
cooperate on getting correct ordering without getting in each others way and
killing parallelims. Plugins that do not need any ordering can ignore the
new calls, which are optional (for example the transaction coordinator that
runs when the binlog is disabled does not need any ordering).
The full architecture is written up in detail in the
MariaDB Worklog#116.
But the basic idea is to introduce a new handlerton method:
void commit_ordered(handlerton *hton, THD *thd, bool all);
This is called just prior to the normal commit() method, and is
guaranteed to run in the same commit order across all engines (and binlog)
participating in the transaction.
This allows for a lot of flexibility in plugin implementations. A typical
implementation would in the commit_ordered() method write the
transaction data into its in-memory log buffer, and delay the
time-consuming write() and fsync() to the
parallel commit() method. InnoDB/XtraDB is already structured in
this way, so fits very well into this scheme.
But if an engine wants to use another approach, for example a
ticket-based approach
as Mark
and Mats
suggested, that is easy to do too. Just allocate the ticket
in commit_ordered(), and use it in commit(). I
believe most approaches should fit in well with the proposed model.
I also added a corresponding prepare_ordered() call, which runs
in commit order during the prepare phase. The intension is to provide a place
to release InnoDB row locks early for even better performance, though I still
need to get the Facebook people to explain exactly what they want to do in
this respect ;-)
I also spent a lot of thought on getting efficient inter-thread
synchronisation in the
archtecture. As Mats
mentioned, if one is not careful, it is easy to end up
with O(N2) cost of thread wake-up, with N
the number of transactions participating in group commit. As the goal is to get N
as high as possible to maximise sharing of the expensive fsync()
call, such O(N2) cost is to be avoided.
In the architecture
described in MariaDB
Worklog#116, there should in the normal case only be a single highly
contested lock, the one on the binlog group commit (which is inherent to the
idea of group commit, one thread does the fsync() while the rest
of participating threads wait). I use a lock-free queue to make threads
in prepare_ordered() not block threads
in commit_ordered() and vice
versa. The prepare_ordered() calls runs under a global lock, but
as they are intended to execute very quickly there should ideally be little contention
here. The commit_ordered() calls run in a loop in a single
thread, also avoiding serious lock contention as long as commit_ordered() runs
quickly as intended.
In particular, running the commit_ordered() loop in a single
thread for each group commit avoids high cost of thread wake-up. If we were to
try to run the sequential part of commit in different threads in a specific
commit order, we would need to switch execution context from one thread to the
next, bouncing the thread of control all over the cores in an SMP
system. Which takes lots of context switches, and could potentially be
costly. In the proposed architecture, a single thread runs
all commit_ordered() method calls and wakes up the other waiting
threads individually, each free to proceed immediately without any more
waiting for one another.
Of course, an engine/binlog plugin that so desires is free to implement such
thread-hopping itself, by allocating a ticket in one of
the _ordered() methods, and doing its own synchronisation in
its commit() method. After all, it may be beneficial or necessary in
some cases. The point is that different plugins can use different methods,
each using the one that works best for that particular engine without getting
in the way of each other.
Further improvements
If we implement this approach, there are a couple of other interesting
enhancements that can be implemented relatively easy due to the commit
ordering facilities:
- Currently, we sync to disk three times per commit to ensure consistency
between InnoDB and binlog after a crash. But if we know the commit order is
the same in engine and in binlog, and if we store in the engine the
corresponding binlog position (which InnoDB already does), then we need only
sync once (for the binlog) and can still recover reliably after a
crash. Since we have a consistent commit order, we can during crash recovery replay
the binlog from the position after the last not lost commit inside InnoDB
(just like we would apply the binlog on a slave).
- Currently, the
START TRANSACTION WITH CONSISTENT SNAPSHOT,
which is supposed to run a transaction with a consistent view in multiple
transactional engines, is not really all that consistent. It is quite
possible to see a transaction committed in one engine but not in another,
and vice versa. However, with an architecture like the one proposed here, it
should be easy to just take the snapshot under the same lock
that commit_ordered() runs under, and the snapshot will be
really consistent (on engines that support commit order). As a bonus, it
would also be possible to provice a binlog position corresponding to the
consistent snapshot.
- XtraDB (and similar backup solutions) should be able to create a backup
which includes a binlog position (suitable for provisioning a new slave)
without having to run
FLUSH TABLES WITH READ LOCK, which can be
quite costly as it blocks all transaction processing while it runs.
-
As already mentioned, the Facebook group has some ideas for releasing InnoDB
row locks early in order to reduce the load on hot-spot rows; this requires
consistent commit order.
Implementation
If anyone is interested in looking at the actual code of the proof-of-concept
implementation, it is available as
a quilt patch
series and as
a Launchpad
bzr tree (licence is GPLv2).
Do be aware that this is work in progress.
Tags: freesoftware, mariadb, mysql, performance, programming
|
11:18 am
[Link] |
Fixing MySQL group commit (part 3)
This is the third and final article in a series about group commit in
MySQL. The first article discussed the background: group
commit in MySQL does not work when the binary log is
enabled. The second article explained the part of the
InnoDB code that is responsible for the problem.
So how do we fix group commit in MySQL? As we saw in the second
article of this series, we can just eliminate
the prepare_commit_mutex from InnoDB, extend the binary logging
to do group commit by itself, and that would solve the problem.
However, we might be able to do even better. As explained in
the first article, with binary logging enabled we need XA
to ensure consistency after a crash, and that requires to do three fsyncs for
a commit. Even if each of those can be shared with other transactions using
group commit, it is still expensive. During a discussion on
the maria-developers@ mailing list, an idea came up for how
to do this with only a single (shared) fsync() for a commit.
The basic idea is to only do fsync() for the binary log, not for
the storage engine, corresponding to running
with innodb_flush_log_at_trx_commit set to 2 or even 0.
If we do this, we can end up in the following situation: some transaction A is
written into the binary log, and fsync() makes sure that is
stored durably on disk. Then transaction A is committed in InnoDB. And before
the operating system and hardware gets around to store the InnoDB part of A
durably on disk, we get a crash.
Now on crash recovery, we will have A in the binary log, but in the engine A
may be lost, causing an inconsistency. But this inconsistency can be resolved
simply by re-playing the transaction A against InnoDB, using the data for A
stored in the binary log. Just like it would normally be applied on a
replication slave. After re-playing the transaction, we will again be in a
consistent state.
In order to do this, we need two things:
- For each transaction, we need to store in the InnoDB engine
information about which is the corresponding position in the binary log, so
that at crash recovery we will know from which position in the binary log to
start re-playing transactions from.
- We also need to ensure that the order of commits in the binary log and in
InnoDB is the same! Otherwise, after a crash we could find ourselves in the
situation that the binary log has transaction A followed by transaction B,
while the InnoDB storage engine contains only transaction B committed, not
transaction A. This would leave us with no reliable place in the binary log
to start re-playing transactions from.
Now, for ensuring same commit order, we do not want to re-introduce
the (by now) infamous prepare_commit_mutex, as that would make it
impossible to have group commit for the binary log. Instead we should use
another way to ensure such order. There are several ways to do this. Mark
Callaghan explained one such way to do this at the MySQL conference, described
further in this article.
The basic idea is that when writing transactions into the binary log, we
remember their ordering. We can do this by putting the transactions into a
queue, by assigning them a global transaction id in monotonic sequence, or by
assigning them some kind of ticket as Mark suggests. Then
inside innobase_commit(), transactions can coordinate with each
other to make sure they go into the engine in the order dictated by the queue,
global transaction id, or ticket.
I think I have a working idea for how to extend the storage engine API to be
able to do this in a clean way for any transactional engine. We can Introduce an
optional handler call commit_fast() that is guaranteed to be called
in the same order as transactions are written to the binary log, prior to the
normal commit handler call. Basically it would be called under a binary log
mutex. The idea is that commit_fast() will contain the "fast" part
of innobase_commit(), as explained in the previous
article. Then in commit_fast(), the engine can do the
assignment of a ticket or insertion into a queue, as needed.
I think possibly for symmetry we would want to also add a
similar xa_prepare_fast() handler call that would be
invoked after the normal xa_prepare() and similarly be
guaranteed to be in the same order as binary log commit, though I need to
consider this a bit more to fully make up my mind.
I believe such an addition to the storage engine API would allow to implement
in a clean way for all engines the method of re-playing the binary log at
crash recovery to avoid more than a single fsync() at commit.
So this concludes the series. Using these ideas, I hope we will soon see
patches for MySQL and MariaDB that greatly enhances the performance for
durable and crash-safe commits, so that we can finally declare Peter's
original Bug#13669 for
fixed!
Tags: freesoftware, mariadb, mysql, performance, programming
|
10:51 am
[Link] |
Fixing MySQL group commit (part 2)
This is the second in a series of three articles about ideas for implementing
full support for group commit in MariaDB. The first
article discussed the background: group commit in MySQL does not work when
the binary log is enabled. See also the third article.
Internally, InnoDB (and hence XtraDB) do support group commit. The way this
works is seen in the innobase_commit() function. The work in this
function is split into two parts. First, a "fast" part, which registers the commit in
memory:
trx->flush_log_later = TRUE;
innobase_commit_low(trx);
trx->flush_log_later = FALSE;
Second, a "slow" part, which writes and fsync's the commit to disk to make
it durable:
trx_commit_complete_for_mysql(trx)
While one transaction is busy executing the "slow" part, any number of later
transactions can complete their "fast" part, and queue up waiting for the
running fsync() to finish. Once it does finish, a
single fsync() of the log is now sufficient to complete the slow
part for all of the queued-up transactions. This is how group commit
works in InnoDB when the binary log is disabled.
When the binary log is enabled, MySQL uses XA/2-phase commit to ensure
consistency between the binary log and the storage engine. This means that a
commit now takes three parts:
innobase_xa_prepare()
write() and fsync() binary log
innobase_commit()
Now, there is an extra detail to the prepare and commit code in InnoDB. InnoDB
locks the prepare_commit_mutex
in innobase_xa_prepare(), and does not release it until after the
"fast" part of innobase_commit() has completed. This means that
while one transaction is executing innobase_commit(), all
subsequent transactions will be blocked
inside innobase_xa_prepare() waiting for the mutex. As a result,
no transactions can queue up to share an fsync(), and group
commit is broken with the binary log enabled.
So, why does InnoDB hold the problematic prepare_commit_mutex
across the binary logging? That turns out to be a really good question. After
extensive research into the issue, it appears that in fact there is no good
reason at all for the mutex to be held.
Comments in the InnoDB code, in the bug tracker, and elsewhere, mention that
taking the mutex is necessary to ensure that commits happen in the same order
in InnoDB and in the binary log. This is certainly true; without taking the
mutex we can have transaction A committed in InnoDB before transaction B, but
B written to the binary log before transaction A.
But this just raises the next question: why is it necessary to ensure the same
commit order in InnoDB and in the binary log? The only reason that I could
find stated is that this is needed for InnoDB hot backup and XtraBackup to be able
to extract the correct binary log position corresponding to the state of the
engine contained in the backup.
Sergei Golubchik investigated this issue during the 2010 MySQL conference,
inspired by the many discussions of group commit that took place there. It
turns out that XtraDB does a FLUSH TABLES WITH READ LOCK when it
extracts the binary log position. This statement completely blocks the
processing of commits until released, removing any possibility of
different commit order in engine and binary log (InnoDB hot backup is closed
source, so difficult to check, but presumably works in the same way). So there
certainly is no need for holding the prepare_commit_mutex to
ensure consistent binary log position for backups!
There is another popular way to do hot backups without using FLUSH
TABLES WITH READ LOCK: LVM snapshots. But an LVM snapshot essentially
runs the recovery algorithm at restore time. In this case, XA is used to
ensure that engine and binary log are consistent at server start, eliminating
any need to enforce same ordering of commits.
So it really seems that there just is no good reason for
the prepare_commit_mutex mutex to exist in the first
place. Unless someone can come up with a good explanation for why it should be
needed, I am forced to conclude that we have lived with 5 years of broken
group commit in MySQL solely because of incorrect hearsay about how things
should work. Which is kind of sad, and suggest that no-one at MySQL or InnoDB
ever cared sufficiently to take a serious look at this important issue.
(In order to get full group commit in MySQL there is another issue that needs
to be solved. The current binary log code does not include implementation of
group commit, so this also needs to be implemented. Such an implementation
should be possible to do using standard techniques, and is independent of
fixing of group commit in InnoDB).
This concludes the second part of the series, showing that group commit can be
restored simply by removing the offending prepare_commit_mutex
from InnoDB. The third and final article in the series will discuss some
deeper issues that arise from looking into this part of the server code, and
some interesting ideas for further improving things related to group commit.
Tags: freesoftware, mariadb, mysql, performance, programming
|
10:48 am
[Link] |
Fixing MySQL group commit (part 1)
This is the first in a series of three articles about ideas for implementing
full support for group commit in MariaDB (for the other parts see the second and third articles). Group commit is an
important optimisation for databases that helps mitigate the latency of
physically writing data to permanent storage. Group commit can have a dramatic
effect on performance, as the following graph shows:
The rising blue and yellow lines show transactions per second when group
commit is working, showing greatly improved throughput as the parallelism
(number of concurrently running transactions) increases. The flat red and
green lines show transactions per second with no group commit, with no scaling
at all as parallelism increases. As can be seen, the effect of group commit on
performance can be dramatic, improving throughput by an order of magnitude or
more. More details of this benchmark below, but first some background
information.
Durability and group commit
In a fully transactional system, it is traditionally expected that when a
transaction is committed successfully, it becomes durable. The
term durable is the "D" in ACID, and means that even if the system
crashes the very moment after commit succeeded (power failure, kernel panic,
server software crash, or whatever), the transaction will remain committed
after the system is restarted, and crash recovery has been performed.
The usual way to ensure durability is by writing, to a transactional log file,
sufficient information to fully recover the transaction in case of a crash,
and then use the fsync() system call to force the data to be
physically written to the disk drive before returning successfully from the
commit operation. This way, in case crash recovery becomes necessary, we know
that the needed information will be available. There are other methods
than fsync(), including calling the fdatasync()
system call or using the O_DIRECT flag when opening the log file,
but for simplicity we will use fsync() to refer to any method for
forcing data to physical disk.
fsync() is an expensive operation. A good traditional hard disk
drive (HDD) will do around 150 fsyncs per second (10k rotation per minute
drives). A good solid state disk like the Intel X25-M will do around 1200
fsyncs per second. It is possible to use RAID controllers with a battery
backed up cache (which will keep data in cache memory during a power failure
and physically write it to disk when the power returns); this will reduce the
overhead of fsync(), but not eliminate it completely.
(There are other ways than fsync() to ensure durability. For
example in a cluster with synchronous replication
(like NDB
or Galera),
durability can be achieved by making sure the transaction is replicated fully
to multiple nodes, on the assumption that a system failure will not take out
all nodes at once. Whatever method used, ensuring durability is usually
signficantly more expensive that merely committing a transaction to local
memory.)
So the naive approach, which does fsync() after every commit,
will limit throughput to around 150 transactions per second (on a standard
HDD). But with group commit we can do much better. If we have several
transactions running concurrently, all waiting to fsync their data in the
commit step, we can use a single fsync() call to flush them all
to physical storage in one go. The cost of fsync() is often not
much higher for multiple transactions than for a single one, so as the above
graph shows, this simple optimisation greatly reduces the overhead
of fsync() for parallel workloads.
Group commit in MySQL and MariaDB
MySQL (and MariaDB) has full support for ACID when using the popular InnoDB
(XtraDB in MariaDB) storage engine (and there are other storage engines with
ACID support as well). For InnoDB/XtraDB, durability is enabled by
setting innodb_flush_log_at_trx_commit=1.
Durability is needed when there is a requirement that committed
transactions must survive a crash. But this is not the only reason
for enabling durability. Another reason is when the binary log is enabled, for
using a server as a replication master.
When the binary log is used for replication, it is important that the content
of the binary log on the master exactly match the changes done in the storage
engine(s). Otherwise the slaves will replicate to different data than what is
on the master, causing replication to diverge and possibly even break if the
differences are such that a query on the master is unable to run on the slave
without error. If we do not have durability, some number of transactions may
be lost after a crash, and if the transactions lost in the storage engines are
not the same as the transactions lost in the binary log, we will end up with
an inconsistency. So with the binary log enabled, durability is needed in
MySQL/MariaDB to be able to recover into a consistent state after a crash.
With the binary log enabled, MySQL/MariaDB uses XA/2-phase commit between the
binary log and the storage engine to ensure the needed durability of all
transactions. In XA, committing a transaction is a three-step process:
- First, a prepare step, in which the transaction is made durable in
the engine(s). After this step, the transaction can still be rolled
back; also, in case of a crash after the prepare phase, the transaction
can be recovered.
- If the prepare step succeeds, the transaction is made durable in the
binary log.
- Finally, the commit step is run in the engine(s) to make the
transaction actually committed (after this step the transaction can no
longer be rolled back).
The idea is that when the system comes back up after a crash, crash recovery
will go through the binary log. Any prepared (but not committed) transactions
that are found in the binary log will be committed in the storage
engine(s). Other prepared transactions will be rolled back. The result is
guaranteed consistency between the engines and the binary log.
Now, each of the three above steps requires an fsync()
to work, making a commit three times as costly in this respect compared to a
commit with the binary log disabled. This makes it all the more important to
use the group commit optimisation to mitigate the overhead from
fsync().
But unfortunately, group commit does not work in MySQL/MariaDB when the binary
log is enabled! This is the
infamous Bug#13669,
reported by Peter Zaitsev back in 2005.
So this is what we see in the graph and benchmark shown at the start. This is
a benchmark running a lot of very simple transactions
(a single REPLACE statement on a smallish XtraDB table)
against a server with and without the binary log enabled. This kind of
benchmark is bottlenecked on the fsync throughput of the I/O system when
durability is enabled.
The benchmark is done against two different servers. One has a pair of Western
Digital 10k rpm HDDs (with binary log and XtraDB log on different drives). The
other has a single Intel X25-M SSD. The servers are both running MariaDB
5.1.44, and are configured with durable commits in XtraDB, and with drive
cache turned off (drives like to lie about fsync to look better in casual
benchmarks).
The graph shows throughput in transactions per second for different number of
threads running in parallel. For each server, there is a line for results with
the binary log disabled, and one with the binary log enabled.
We see that with one thread, there is some overhead in enabling the binary
log, as is to be expected given that three calls to fsync() are
required instead of just one.
But much worse, we also see that group commit does not work at all when the
binary log is enabled. While the lines with binary log disabled show excellent
scaling as the parallelism increases, the lines for binary log enabled are
completely flat. With group commit non-functional, the overhead of enabling
the binary log is enourmous at higher parallelism (at 64 threads on HDD it is
actually two orders of magnitude worse with binary log enabled).
So this concludes the first part of the series. We have seen that if we can
get group commit to work when the binary log is enabled, we can expect a huge
gain in performance on workloads that are bottlenecked on the fsync throughput
of the available I/O system.
The second part will go into detail with why the code for group commit does
not work when the binary log is enabled. The third (and final) part will
discuss some ideas about how to fix the code with respect to group commit and
the binary log.
Tags: freesoftware, mariadb, mysql, performance, programming
|
10:38 am
[Link] |
Debugging memory leaks in plugins with Valgrind
I had an interesting IRC discussion the other day with Monty Taylor about what
turned out to be a limitation in Valgrind
with respect to debugging memory leaks in dynamically loaded plugins.
Monty Taylor's original problem was
with Drizzle, but as it turns out, it is
common to all of the MySQL-derived code bases. When there is a memory leak
from an allocation in a dynamically loaded plugin, Valgrind will detect the
leak, but the part of the stack trace that is within the plugin shows up as an
unhelpful three question marks "???":
==1287== 400 bytes in 4 blocks are definitely lost in loss record 5 of 8
==1287== at 0x4C22FAB: malloc (vg_replace_malloc.c:207)
==1287== by 0x126A2186: ???
==1287== by 0x7C8E01: ha_initialize_handlerton(st_plugin_int*) (handler.cc:429)
==1287== by 0x88ADD6: plugin_initialize(st_plugin_int*) (sql_plugin.cc:1033)
Which tells you little more than that there is a leak in one of your plugins.
After trying a couple of things, we found that this is a known limitation in
Valgrind in relation to code that is loaded with dlopen() and
later unloaded with dlclose():
http://bugs.kde.org/show_bug.cgi?id=79362
The basic problem is that Valgrind records the location of
the malloc() call as just a memory address. And when the memory
leak check is performed after the end of program execution, the plugin has
been unloaded with dlclose(), and the recorded memory address is
therefore no longer valid.
The problem is specific to memory leak checks, which are done only after the
code has been unloaded. Other checks (like use of uninitialised values and
use-after-free) work fine with full information in the stack traces, as such
checks are done while the plugin code is still loaded into memory. But the
memory leak checks are arguably among the most useful cheks Valgrind does, as
Valgrind is often the only way to find and fix critical memory leaks
efficiently.
Fortunately, once the issue was understood, we had an easy work-around:
disable the dlclose() call in the server plugin code, and the
leak is then detected with full information in the stack trace. Unfortunately
this introduces a leak of its own, since now the memory allocated
in dlopen() is never freed, so we get another spurious Valgrind
memory leak warning.
Another possible way to get the same effect is to pass
the RTLD_NODELETE flag to dlopen() to achieve the
same effect, though I did not try this yet.
A possibly better work-around (which I also did not try yet) is one suggested
in the above referenced Valgrind feature request. By adding the offending
plugin(s) as LD_PRELOAD when starting the server, the plugin code
will not actually be unloaded in dlclose(), so stack traces
should be available without any spurious leak warnings from Valgrind. However,
this will not work well if some of the dynamic plugins need a particular load
order (according to the suggestion in the feature request). I also need to
check if this actually works for plugins (like storage engines) that has link
dependencies to symbols in the main program. But it might be a good option if
it can be made to work.
(At first I was surprised to learn that this was a problem in MySQL and
MariaDB, as I never saw it before. But I suppose the reason is that we so far
have built most plugins as built-in, rather than as dynamically
loaded .so files. The problem is likely to occur more frequently
as we are moving to do more and more with plugins in MariaDB, so it is nice to
know a work-around. Thanks, Monty!)
Tags: debugging, developmentprocess, drizzle, freesoftware, mariadb, mysql, programming, valgrind
|
11:03 am
[Link] |
MariaDB talk at the OpenSourceDays 2010 conference
Earlier this month, I was at
the OpenSourceDays 2010
conference, giving a talk on MariaDB
(the slides from the talk are
available).
The talk went quite well I think (though I probably talked way too fast as I usually
do; at least that means that I finished on time with plenty room for
questions..)
There was quite a bit of interest after the talk from many of the
people who heard it.
It was even
reported on
by the Danish IT media version2.dk (article in Danish).
Especially interesting to me was to discuss with three people
from Danish site komogvind.dk, who told
me fascinating details about their work keeping a busy site running; one of
them even went right home to benchmark against MariaDB. Thanks to you, and to
everyone else for your interest and time!
This time in the talk, I tried to also focus on the community and development
aspects of MariaDB (in addition to the mandatory feature list and benchmark
graphs, of course). To me, the most important thing about MariaDB is that we
now have the infrastructure and community for people outside of MySQL to do fullscale
development at the same level as inside MySQL. This was missing before. It is
a much less concrete thing than features and benchmarks, so I found it much
harder to present in a good way, without it turning into nothing but buzzwords. But from the
feedback I got afterwards, it seems I succeeded pretty well with this part
also, which I am especially happy about!
The talk was recorded on video by the organisers. The latest I heard was that
the video footage is still being edited though (I was kind of waiting, hoping
to be able to include a link to the video in this post). But if they do manage
to finish the editing and make the videos available later, I will post an update.
A big thanks to the organisers of the OpenSourceDays 2010 conference! I had
a great time, and hope to be back again next spring for OpenSourceDays 2011.
Tags: conference, freesoftware, mariadb, mysql
|
12:44 pm
[Link] |
Conference time!
It is conference time for me. I just came home
from FOSDEM 2010 where we had a booth
and I gave a talk. At the end of the month there will be a company meeting in
Iceland for Monty Program, followed
by Open Source Days 2010
where I will also be speaking. And then in April there is
the MySQL User Conference.
With two additional talks given at local user groups end of last year, I think
I've about filled my quota for now, I feel quite fortunate that it turned out
that I will not also be presenting at the UC! (I do not have a natural talent
for speaking, and tend to need to spend quite a lot of time in preparations.)
Having a booth at FOSDEM turned out really well I think, as I got to talk to a
lot of different people that passed by the booth. I also had a very nice
dinner with the PostgreSQL people where I learned a lot about the internals of
that database. As well as dinners with people from the MySQL world, also with
lots of interesting discussions.
Thanks to all of the people that I met at FOSDEM. It was fun and inspiring to
meet you, looking forward to the next time!
One thing strikes me as I am piecing together the mandatory "MariaDB feature
list" for my next talk. It seems people tend to focus a lot on the extra
features MariaDB has over MySQL. But there is another aspect that I think is
just as important: MariaDB is creating an open framework where community
developers can work together on new development. This is something that has
been missing in the past.
It is easy to focus on a concrete list of features, whereas the idea of an
abstract framework is much harder to present as more than buzzword talk. But I
will try to get it into my next talk, as I think ultimately both are of
equal importance.
Tags: developmentprocess, freesoftware, mariadb, mysql
|
11:50 am
[Link] |
Why I work on Free Software
I happened upon this old
LinuxJournal article about how the University of Zululand in South Africa
used MySQL and other Free Software to make
do with a 128 kbit (and later 768 kbit) internet connection for their staff
and students.
This made me remember the trip I made to another African
country, Burkina
Faso, 15 years ago:
With the huge amount of work and numerous difficult obstacles facing my work
on the MariaDB project, it can be hard to
keep up motivation at times. It helps to remember why I am doing this.
It must be about the time when some of these kids should go to University or
start up new projects. Maybe some of them will work with software. I want them
to be able to invest in local skills and infrastructure that they need, rather
than in software licenses funding nice houses and boats in other countries:
(House and yacht pictures courtesy of Wikipedia under
the Creative Commons Attribution ShareAlike 3.0)
Tags: freesoftware, mariadb, mysql
|
[<< Previous 10 entries] |