|
Kristian Nielsen
[Recent Entries][Archive][Friends][User Info]
Below are the 17 most recent journal entries recorded in the "Kristian Nielsen" journal:
12:49 pm
[Link] |
Oracle speculations
The Planet MySQL has been abuzz with
opinions for or against the acquisition of Sun (and in particular MySQL) by
Oracle, but I do not have a strong opinion to chime in with in support of
either groups.
The reason is that I do not know anything about antitrust laws, which is the
legal basis for the EC blocking or not blocking the deal; and also I do not
know what the alternative is to Oracle buying the MySQL part of Sun.
However, that does not mean that I can not join in the speculations about
Oracles reasons for wanting MySQL in the first place ;-)
I think it is basically a matter of obtaining control over MySQL.
The horror scenario for Oracle is that MySQL (or Postgress or another Free
Software program) does to the proprietary databases what Linux has done to the
proprietary Unixes. Which is essentially to kill them, slowly but surely. This
is not an immediate threat to Oracle, but it is a real long-term
threat given how similar the technical challenges are of developing a
kernel/OS and a database/RDBMS. And should it happen, the impact on
the license revenues from the Oracle RDBMS would be devastating.
Compared to that horror scenario, the potential loss of some fraction of
Oracle license sales which will stem from the continued development of MySQL
is of less consequences. And this loss will
happen in any case, if not to MySQL then to Postgress, to a
MySQL fork, or to another free database.
So from this reasoning, it makes the most sense for Oracle to continue
development of MySQL more or less unchanged from what it was in MySQL AB and
later at Sun. At a sufficiently high level (in terms of bug fixes, features,
etc.) that most of the community interest will remain on MySQL and not turn to
forks or other Free databases that are outside of Oracles control. And keeping
the very tight control over the development that MySQL AB and Sun also had,
with the community having basically no influence over what goes into the code
or not. Then, should it ever become necessary, Oracle has the control it needs
to prevent or at least manage the above-mentioned horror scenario.
In fact, this is exactly what Oracle has done with InnoDB since
buying them four years ago. The development has continued more or less as
before, as far as I can tell at a similar pace and with the same team. And
while there is a fork
in XtraDB,
the Oracle-controlled InnoDB is good enough that by far the majority
of the community is using it and not XtraDB.
So basically, after buying InnoDB Oracle has done essentially nothing with it,
one way or the other. Notably, Oracle has not done any of the
following or similar bold and visionary steps with InnoDB:
- Transfered some of the technology from the Oracle RDBMS product into
InnoDB which is currently missing. Like multiple tablespaces which can be
assigned independently to tables (even though
MySQL already
has the syntax support for this). Or the use of multiple buffer pools to
control page eviction. Or online backup based on the transaction log
(innobackup is not part of the GPL InnoDB version, and there is no
integration with the new MySQL backup interface). Etc. etc.
- Opened up the development process like other Open Source projects, with
public revision control repository (as far as I know they have not even
switched to using bzr like the rest of the MySQL world), public mailing
lists for reviews and discussions, public bug tracker, etc.
- Taken steps to integrate the development process better with MySQL AB and
later Sun. As far as I know, changes to the InnoDB included in the MySQL
source tree still happens by manually sending patches from Oracle to MySQL
which are then manually committed to the MySQL tree! Even worse, much of the
development of new features has taken place on a separate product, the
InnoDB plugin, which is not enabled by default (it was not even in the MySQL
source tree until this Summer), and used by only a minority of the community.
- Given up any kind of control of the development to the community
(eg. as far as I know Oracle has taken no steps to work together with
XtraDB).
Nothing I have seen in the statements or discussions so far seems to suggest
that Oracle would treat an acquisition of MySQL differently.
Will this be good or bad for MySQL? Without an alternative scenario to compare
with, I do not know. But it certainly is not sufficient! MySQL development has
been stalling for several years, and we need to invigorate it to make MySQL
meet the new challenges facing existing applications, and to improve MySQL for
use in applications where it is currently weak.
We need improved management of huge databases: tablespace management; buffer
pool control; backup infrastructure; etc. We need replication improvements:
binlog storage inside default storage engine for improved transaction
handling; interleaved logging of transactions; multi-threaded application of
row-based events in MVCC engines; robust automatic handling of fail-over
scenarios, etc. We need refactoring of the core server to enable future
development: new parser; separate abstract syntax tree; move to modern
multi-threading architecture with lock-free operations
and RCU;
etc. We need scalability improvements to multi-core computers. We need
versioned metadata for better support of on-line DDL. We need
server-side cache of already executed statements with access to per-statement
statistics and execution plans. We need merge and hash joins. We need better
integration of the many new storage engines being developed: inclusion by
default in source and binaries to make them easy to try and use; extensions to
the storage engine API to better interface to the new engines and fully
exploit each of their unique features. And lots more.
Will Oracle take the lead on some of these, and give up sufficient control for
the community to take the lead on the rest? Well, we do not know. It does seem
hard to find a motivation for Oracle to drive the development of MySQL into
new areas that will necessarily cannibalise their huge license revenues. On
the other hand, if they do, they will be most welcome, and we look forward to
hopefully working together with them. In any case, we must not blindly rely on
this to happen, not from Oracle or any other company which may end up with the
MySQL assets!
I sincerely hope that whatever happens to MySQL the company, a sufficient part
of the community will remember that we need not just a MySQL (under whatever
name) that is "good enough" today, we also need a MySQL for tomorrow. And for
this the community needs to support those that step up to lead future MySQL
development, whoever it will be.
So will it be MariaDB leading? We still have a way to go before we
have proven ourselves worthy to saying this. But what I can say is that
we are trying!
Tags: developmentprocess, freesoftware, mariadb, mysql, oracle
|
12:09 pm
[Link] |
MariaDB Buildbot configuration file published
I have now published the
Buildbot
configuration file
that we use for our continuous integration tests in
our Buildbot setup.
Every push into main and development branches of MariaDB is built and tested
on a range of platforms to catch and fix any problems early (and we
also test
MySQL releases before merging to easily see whether any new problems already
existed in MySQL or were introduced by something specific to MariaDB).
The configuration is included in
the Tools for MariaDB
Launchpad project.
Now, the Buildbot configuration file is not something that most MariaDB users
will need or want to care about, of course. But I think it is still very
important to have it publicly available, not sitting on some private server of
the company Monty Program AB.
The reason is that the whole idea with MariaDB is to make a community
branch of MySQL, developed by the community and for the community. We want
MariaDB the project to be bigger than Monty Program AB the company. And since
the Buildbot testing is so central to the whole MariaDB development process,
the Buildbot setup also needs to be available for the community. Want to
improve the setup, just see what it is doing, or even set up your own master
to show you can do a better job (and yes our Windows setup currently really
suck)? Just go ahead! Wondering how the Buildbot setup can be continued if
Monty Program AB disappears or turns fascist? Now there is an answer.
Hopefully the configuration can also be useful as an example for people doing
fancy things with Buildbot. There is some cool stuff in there. Like creating a
source tarball on a linux host, and uploading it to be built on a Windows host
(this is how releases are done, so important to check that no files are
missing from the source tarball). Another cool thing is the builders that
boots op KVM virtual machines on demand to build and test binary packages
(.deb, .rpm, and .tar.gz) on all of the 18 Linux platforms we currently
release for.
BTW, you do not get the miscellaneous passwords in the published configuration
file, sorry! :-)
[The license for the configuration file (which is in fact a sizable Python
script, as this is the way Buildbot is configured) is GPL.]
Tags: debugging, developmentprocess, freesoftware, mariadb, mysql, testing
|
01:59 pm
[Link] |
Fixing a MariaDB package bug
One of the things that I am really happy about in MariaDB is that we have our
releases available as apt (and yum for Centos) repositories. This is
largely thanks to being able to build this on
the OurDelta package build infrastructure
(which again builds on things like the Debian packaging scripts for MySQL).
Something like the Debian apt-get package system (which is also
used by Ubuntu) is one of the major innovations in the Free Software world in
my opinion. Debian has spent many years refining this system to where it is
today. Want to run the mysql client, but it isn't installed? Just try to run
it on your local Ubuntu host:
$ mysql
The program 'mysql' can be found in the following packages:
* mysql-client-5.0
* mysql-client-5.1
Try: sudo apt-get install <selected package>
-bash: mysql: command not found
Installing software does not get much easier than that!
Now, MariaDB is not in the distributions yet. However, it is easy to add
external repositories like OurDelta into your system, after which packages
from the external repositories are available fully integrated with the package
system:
wget -O- http://ourdelta.org/deb/ourdelta.gpg | sudo apt-key add -
sudo wget http://ourdelta.org/deb/sources/karmic-mariadb-ourdelta.list \
-O /etc/apt/sources.list.d/ourdelta.list
(there are also GUI ways to do this of course, for those who prefer that).
After this is done, installation is just a sudo apt-get install mariadb-server-5.1
away, security updates will appear automatically for MariaDB just like any
other package, etc. It will even upgrade an existing MySQL 5.0 installation
automatically (but do take a backup first).
In order to make all this work, there is a lot of work going on behind the
scenes in the scripts that make up the .deb packaging. I think most people
underestimate the amount of work and clever engineering that goes into making
a well-working .deb package. It is easy to laugh at how behind the latest
stable Debian release is on software versions (and I am happy to do this on
occasion as well). But Debian is still unique in the sheer amount of software
it contains and the level of integration of each package in the whole
system. And it is this work through the last more than 15 years that allows
something like Ubuntu to exists, with an upgrade system that allows it to do
6-month release cycles to provide up-to-date software to its users.
As an example of the kind of details that needs to be dealt with, I wanted to
explain a tricky packaging bug I fixed recently.
The bug was with installing MariaDB 5.1.39 on top of an existing MySQL 5.0
installation in Debian and Ubuntu. This is supposed to automatically run the
mysql_upgrade program to upgrade all tables from the 5.0 format to the 5.1
format. The symptoms were that the upgrade was not performed correctly, for
example the system tables in the mysql database were missing some
of the columns added in the 5.1 version.
What made this tricky was that the bug was quite elusive. It did not happen
always, and some platforms were ok while others (eg. Ubuntu Hardy amd64) seemed to
have it more or less repeatedly. Even worse, even when it did occur, it went
away by itself (this is because the .deb MariaDB and MySQL packages actually
check for the need to upgrade the database on every server start, and the
upgrade procedure always worked when the server was restarted after
installation).
After poking for this for some hours, I managed to get it reproducibly on a
KVM virtual machine containing Ubuntu Hardy. I then traced the problem into
the upgrade procedure, which happens in
the /etc/mysql/msyql_upgrade script which is run
from /etc/init.d/mysql start. Comparing the log from this (in
syslog) with the corresponding log from a successful installation showed that
the upgrade procedure seemed to be aborted half-way through, but with
absolutely no output (like an error message) to indicate any reason for this.
Now the fact that running the upgrade procedure (or just server start)
manually did not exhibit any problems made it quite puzzling to
figure out what was going on. Looking through the source code
of mysql_upgrade and mysqlcheck (which is called
from mysql_upgrade) did not reveal anything that would indicate a
problem like this. One step would be to start instrumenting the code with
printouts, but this would require building a full set of packages and
installing them inside a virtual machine for every iteration, which would have
been quite time-consuming.
A better approach turned out to be to install the package running
under strace -f. This generates a log (45 MByte of it!) of all
system calls made by the installation process, including child processes
like the mysql_upgrade invocation. Digging through this for a
while, I finally discovered that the upgrade process was being terminated
because it received a SIGHUP (hangup) signal!
Now why would it be killed with SIGHUP? Turns out it is due to this snippet
from the /etc/mysql/debian-start bash script:
(
upgrade_system_tables_if_necessary;
check_root_accounts;
check_for_crashed_tables;
) <&2 &
The upgrade procedure is run in the background (as it may take a long time,
and since this is run on every server restart, it could also happen eg. during
host boot, which we do not want to block for long periods of time). But there
is no protection against the controlling terminal going away! My guess is
that apt-get in some cases will allocate a pseudo-tty to deal
with package configuration input, and this is closed when installation is
done, causing the background upgrade procedure to be killed with SIGHUP.
Now finally the problem is understood, and the fix is a one-liner: just add
this before starting the background job:
trap "" SIGHUP
Problem solved!
[Incidentally, if you installed MariaDB 5.1.39 .debs and experienced this
problem, there is an easy workaround for this bug: just restart the MariaDB
server once after installation (sudo /etc/init.d/mysql restart).
This will make the upgrade procedure go through if it did not already. This
fix will be included in the upcoming MariaDB 5.1.41 release.]
Tags: debian, debugging, freesoftware, mariadb, mysql, programming, ubuntu
|
11:07 am
[Link] |
Building MariaDB/MySQL with Buildbot and KVM
Testing and automation. These two are key to ensuring high quality of software
releases.
Ever since I worked briefly in the team at MySQL AB that is responsible for
creating the binary (and source) packages of MySQL releases, I have had the
vision of a fully automated release procedure. Whenever someone pushes a new
commit to the release branch revision control tree, the
continuous
integration test framework should kick in and do all the steps needed for
producing release packages:
- Checkout the new revision.
- Build a source tarball, and save it.
- For each platform, build a binary package from the source tarball. The
build should be done in a freshly installed machine without any revision
control checkouts, previous build trees, or extra installed software, to
ensure that no unwanted dependencies or stray references to other files or
packages are introduced.
- For each platform, install the binary packages, this time on freshly
installed machines with also no build tools (compilers, development
packages, etc.) installed, to check that they install correctly without
unexpected dependencies. Run tests of the installed server, including
starting the server and running basic queries and test suites.
- Upgrade testing, installing the new packages on machines prepared with
earlier installations, and testing that the upgrade procedure works and
preserves old data.
To do this efficiently, clearly the use of virtual machines is needed. This
weekend I played with KVM and
Buildbot, and managed to set up a
proof-of-concept of this that I am really pleased with.
KVM
There are lots of options for virtualisation these days, including KVM, Xen,
VirtualBox, and Vmware. I use KVM, and I really like it. The integration into
the distributions is excellent (sudo apt-get install kvm and
you're up and running). The interface is powerful and flexible, and at the
same time really simple to learn and use. Just a couple of commands with man
pages, like it should be in a Unix system. Basically, it just works!
I started by installing a basic ubuntu Jaunty server in a virtual machine:
qemu-img create -f qcow2 vm-jaunty-i386-base.qcow2 8G
kvm -m 2047 -hda vm-jaunty-i386-base.qcow2 -cdrom ubuntu-9.04-server-i386.iso \
-boot d -smp 2 -cpu qemu32,-nx -net nic,model=virtio -net user -redir tcp:2222::22
I use the user mode network stack with port forwarding for ssh access. This
allows to run kvm without root privileges, avoids any need to manage different
MAC addresses, avoids the need for routing or configuring interfaces, etc.
Using the virtio network driver greatly improved throughput for
me when copying things into and out of the virtual machine. The -cpu
qemu32,-nx (disable "No eXecute" support) is needed in this case due to
some bug or incompatibility, or the installation hangs upon reboot. As usual
Google is your friend in cases like this:
Incidentally, I did this using remote X over an SSH connection. This works
fine, no need for physical access to the host server. After installation we
will run the virtual machine without a graphic console, but it was just easier
to use the stock Ubuntu installer than trying to find a way to install over
the emulated serial port.
Initial setup
Next I did some basic preparation to make the installed virtual machine work
well for command line and script usage. However, the amount of extra packages
installed is kept to a minimum to get proper testing against unwanted
dependencies.
I Installed ssh server for remote access. I then set it up to use the serial
console (as we will be running kvm in -nographic mode). To get a login prompt
on serial port 0, create /etc/event.d/ttyS0:
start on stopped rc2
start on stopped rc3
start on stopped rc4
start on stopped rc5
stop on runlevel 0
stop on runlevel 1
stop on runlevel 6
respawn
exec /sbin/getty 115200 ttyS0
To get the kernel to output its boot log to the serial port, edit the kernel
line in /boot/grub/menu.lst, removing quiet splash
and adding console=ttyS0,115200n8 console=tty0. To get Grub to
use the serial port, add these lines to /boot/grub/menu.lst:
serial --unit=0 --speed=115200 --word=8 --parity=no --stop=1
terminal --timeout=3 serial console
Next, we need a user account inside the virtual machine that we can use from
the outside with passwordless login and sudo access. Inside the
guest, create the account and grant passwordless sudo:
sudo adduser --disabled-password buildbot
sudo adduser buildbot sudo
sudo visudo
# uncomment `%sudo ALL=NOPASSWD: ALL'
Then, in the host create an SSH public/private key pair without passphrase:
ssh-keygen -t dsa
Copy the resulting ~/.ssh/id_dsa.pub from the host
into ~/.ssh/authorized_keys in the guest.
Now we should be able to test that things work:
kvm_pid_2222' ; exec kvm -m 2047 -hda /kvm/vms/vm-jaunty-i386-makedist.qcow2 \
-redir 'tcp:2222::22' -boot c -smp 2 -cpu qemu32,-nx -nographic \
-net nic,model=virtio -net user
# We should get a login prompt in the terminal window
ssh -p 2222 buildbot@127.0.0.1 'sudo id'
# We should get root access without login or sudo asking for password.
We now have the basis for scripting actions against the virtual machine: We
can start up the guest from the command line (and shutdown
with kill from the host or sudo shutdown -h now from
the guest). And we can run commands inside the guest using
ssh -p 2222 buildbot@127.0.0.1. The next step is to create
variants of this base virtual installation for the different purposes we need.
qemu-img create -b base_image.qcow2
The qcow2 virtual hard disk image format used by qemu (and kvm) has a very
powerful feature, activated with the -b option of
qemu-img create:
qemu-img create -b vm-jaunty-i386-base.qcow2 -f qcow2 vm-jaunty-i386-makedist.qcow2
This creates a new image vm-jaunty-i386-makedist.qcow2, which is
initially a clone of the base image vm-jaunty-i386-base.qcow2
that takes up (almost) no extra space. But as we use this new image, changes
are added in the new image (copy-on-write), without modifying the original
base image. This allows painless mass cloning and modification of virtual
machines without having to re-install, and without taking up unnecessary extra
disk space and I/O for copying images.
We use this to create a virtual machine that we will use to produce the source
tarball from bzr sources. This needs installing bzr and some development
packages (compilers etc).
sudo apt-get install bzr
sudo apt-get build-dep mysql-5.1-server
Note the very nice build-dep feature of apt-get, it
actually installs a ton of packages needed to build the MySQL server (and
MariaDB has the save dependencies). I also copied in an existing shared bzr
repository; this is not strictly necessary, but saves a very painful initial
cloing of the entire MariaDB repository from Launchpad (bzr is just painfully
slow on source trees of the size of MariaDB/MySQL):
scp -rp -P 2222 .bzr buildbot@127.0.0.1:
Another virtual machine image is set up for building the binary packages (this
does not need bzr):
qemu-img create -b vm-jaunty-i386-base.qcow2 -f qcow2 vm-jaunty-i386-build.qcow2
(with a bit more planning, I could have cloned -makedist
from -build; now I just repeated the install
of mysql-server-5.1 dependencies, but not the bzr install. Also,
a refinement wouldbe to setup the -build guest without autotools
and bison, to check that build is possible without those installed).
Finally, a third image for testing installation:
qemu-img create -b vm-jaunty-i386-base.qcow2 -f qcow2 vm-jaunty-i386-install.qcow2
I will be testing a bintar package install, so create the mysql user and
group:
sudo adduser --system --group mysql
With these preparations, we should be ready to put the pieces together:
Buildbot
For MariaDB, we use Buildbot for continuous integration testing. The Pushbuild
system I developed at MySQL was never released publicly, and in any case it is
better to use a general tool like Buildbot that is widely used and maintained
by a large community.
I have been very satisfied with Buildbot. It has its quirks and bugs, but we
can fix those over time (and have fixed a number of them already, as well as
added extra features we needed). I think Buildbot has all of the right ideas
for doing serious continuous integration testing. As I read in some
presentation, running the builds and tests is the easy part. The hard part is
providing the information and tools needed by developers to fix problems that
are found by testing. Fixing these problems is what it is all about, after
all, not just producing pretty status reports.
First, I installed a buildbot slave on the host machine:
sudo apt-get install buildbot
sudo addgroup buildbot kvm # To allow buildbot to run kvm
sudo -u buildbot buildbot create-slave --usepty=0 /var/lib/buildbot/maria-slave \
hasky.askmonty.org:9989 knielsen-kvm-x86 <password>
Then I set up an account for this in the Buildbot master, and configured the
builder.
With the above preparation, configuring the build is just setting up the
proper shell commands to be run against the slave, although it is of course a
bit more involved than for a normal configure+make. I really like
the simplicity of this. Basically, after initial preparation of the KVM
images, there is very little setup required on the buildbot slave host, it is
all just normal shell commands configured on the master. Of course going
forward we can refine some of this and maybe put some of it into generic
scripts called from the main config, but for a proof-of-concept I think it is
brilliant that one can see exactly which commands are run.
I included the complete config in all detail at the end of this post, but here
are the main points.
f_kvm_jaunty_x86.addStep(Compile(
logfiles={"kernel": "kernel_2222.log"},
command=["sh", "-c", """
kill -9 "$(cat kvm_pid_2222)"
(exec sh -c "echo \$\$ > 'kvm_pid_2222' ; exec kvm -m 2047 \
-hda /kvm/vms/vm-jaunty-i386-makedist.qcow2 -redir 'tcp:2222::22' -boot c \
-smp 2 -cpu qemu32,-nx -nographic -net nic,model=virtio -net user" \
</dev/null >kernel_2222.log 2>&1) &
sleep 15
while : ; do ssh -o ConnectTimeout=4 -p 2222 buildbot@127.0.0.1 true && break; sleep 2; done
ssh -p 2222 buildbot@127.0.0.1 'mkdir -p buildbot && cd buildbot && \
rm -Rf build && bzr co "lp:~maria-captains/maria/mariadb-5.1-knielsen" build && \
cd build && BUILD/compile-dist && make dist && \\
mv "$(make show-dist-name).tar.gz" ..'
"""]))
The kill command removes any previous left-over kvm process
(better safe than sorry). We run kvm in the backgroud, getting the console
output through a log file. Note that redirecting the kvm output is necessary,
as the buildstep will wait for all processes to close the stdout before
considering the buildstep done.
After starting the virtual machine, we wait for boot to have completed by
checking for successful ssh connection in the while loop. Once it
is ready, we send the commands to build the source tarball into the guest
using ssh.
</p>
f_kvm_jaunty_x86.addStep(SetProperty(
property="distname",
command=["ssh", "-p", "2222", "buildbot@127.0.0.1", "cd buildbot/build && make show-dist-name"],
))
This gets the base name of the source tarball into a Buildbot build
property, an essential feature of Buildbot for more advanced usage. We
will need this name in the following build steps (the name depends on the
version of the MariaDB server code).
f_kvm_jaunty_x86.addStep(ShellCommand(
command=["sh", "-c", WithProperties("""
scp -P 2222 buildbot@127.0.0.1:buildbot/%(distname)s.tar.gz .
ssh -p 2222 buildbot@127.0.0.1 'sudo shutdown -h now'
while : ; do sleep 5; kill -0 "$(cat kvm_pid_2222)" || break; done
rm -f kvm_pid_2222
""")],))
We copy out the generated source tarball (we will need it in the next
buildstep, which runs in a different virtual machine). We then shutdown this
guest, and wait for it to finish with another while loop. Note
the use of WithProperties to interpolate the source tarball name
obtained in the previous build step.
f_kvm_jaunty_x86.addStep(Compile(
command=["sh", "-c", WithProperties("""
qemu-img create -b /kvm/vms/vm-jaunty-i386-build.qcow2 -f qcow2 vm-tmp-2222.qcow2
kill -9 "$(cat kvm_pid_2222)"
(exec sh -c "echo \$\$ > 'kvm_pid_2222' ; exec kvm -m 2047 -hda vm-tmp-2222.qcow2 \
-redir 'tcp:2222::22' -boot c -smp 2 -cpu qemu32,-nx -nographic -net nic,model=virtio \
-net user" </dev/null >>kernel_2222.log 2>&1) &
# ...
ssh -p 2222 buildbot@127.0.0.1 'rm -Rf buildbot && mkdir buildbot'
scp -P 2222 %(distname)s.tar.gz buildbot@127.0.0.1:buildbot/
ssh -p 2222 buildbot@127.0.0.1 'cd buildbot && tar zxf %(distname)s.tar.gz && \
cd %(distname)s && ./configure ...'
# ...
""")],))
Here (and in the following install step), we use
qemu-img create -b to create a new, temporary image to work
in. This ensures that each build will run in a clean, fresh install, without
any risk of contamination from previous builds. (The reason we did not do this
for the initial step is that we want to save the bzr revisions pulled from
Launchpad so we do not have to keep repeatedly pulling the old ones over for
each new build. An alternative would be to keep the permanent shared
repository on the host machine and export from that inside the virtual
machine).
And that's it! Full config details below, but it is basically the same, just
with different commands run in the different steps. The result is a builder
that fully automatically tests build and install on real machines with the
correct setup, 100% repeatable between builds.
The results from this can be seen on the
MariaDB Buildbot pages. Things are likely to to shuffle around as we extend
and refine this, but for now an example build can be seen here:
This is just a quick proof of concept, but I think all of the essential
ingredients are in there. I am hoping that in the not too distant future we will be
using something like this regularly to check MariaDB release builds, which
should be very good for getting ensuring both the quality and efficiency of
future MariaDB releases!
Full config
f_kvm_jaunty_x86= factory.BuildFactory()
f_kvm_jaunty_x86.addStep(Compile(
description=["making", "dist"],
descriptionDone=["make", "dist"],
logfiles={"kernel": "kernel_2222.log"},
command=["sh", "-c", """
kill -9 "$(cat kvm_pid_2222)"
(exec sh -c "echo \$\$ > 'kvm_pid_2222' ; exec kvm -m 2047 \
-hda /kvm/vms/vm-jaunty-i386-makedist.qcow2 -redir 'tcp:2222::22' -boot c -smp 2 \
-cpu qemu32,-nx -nographic -net nic,model=virtio -net user" \
</dev/null >kernel_2222.log 2>&1) &
sleep 15
while : ; do ssh -o ConnectTimeout=4 -p 2222 buildbot@127.0.0.1 true && break; sleep 2; done
ssh -p 2222 buildbot@127.0.0.1 'mkdir -p buildbot && cd buildbot && \
rm -Rf build && bzr co "lp:~maria-captains/maria/mariadb-5.1-knielsen" build && \
cd build && BUILD/compile-dist && make dist && \
mv "$(make show-dist-name).tar.gz" ..'
"""
],
))
f_kvm_jaunty_x86.addStep(SetProperty(
property="distname",
command=["ssh", "-p", "2222", "buildbot@127.0.0.1",
"cd buildbot/build && make show-dist-name"],
))
f_kvm_jaunty_x86.addStep(ShellCommand(
description=["copying", "tarball"],
descriptionDone=["copying", "tarball"],
logfiles={"kernel": "kernel_2222.log"},
command=["sh", "-c", WithProperties("""
scp -P 2222 buildbot@127.0.0.1:buildbot/%(distname)s.tar.gz .
ssh -p 2222 buildbot@127.0.0.1 'sudo shutdown -h now'
while : ; do sleep 5; kill -0 "$(cat kvm_pid_2222)" || break; done
rm -f kvm_pid_2222
""")],
))
f_kvm_jaunty_x86.addStep(Compile(
description=["making", "bintar"],
descriptionDone=["make", "bintar"],
logfiles={"kernel": "kernel_2222.log"},
command=["sh", "-c", WithProperties("""
qemu-img create -b /kvm/vms/vm-jaunty-i386-build.qcow2 -f qcow2 vm-tmp-2222.qcow2
kill -9 "$(cat kvm_pid_2222)"
(exec sh -c "echo \$\$ > 'kvm_pid_2222' ; exec kvm -m 2047 -hda vm-tmp-2222.qcow2 \
-redir 'tcp:2222::22' -boot c -smp 2 -cpu qemu32,-nx -nographic -net nic,model=virtio \
-net user" </dev/null >>kernel_2222.log 2>&1) &
sleep 15
while : ; do ssh -o ConnectTimeout=4 -p 2222 buildbot@127.0.0.1 true && break; sleep 2; done
ssh -p 2222 buildbot@127.0.0.1 'rm -Rf buildbot && mkdir buildbot'
scp -P 2222 %(distname)s.tar.gz buildbot@127.0.0.1:buildbot/
ssh -p 2222 buildbot@127.0.0.1 'cd buildbot && tar zxf %(distname)s.tar.gz && \
cd %(distname)s && CC="gcc -static-libgcc" CXX="gcc -static-libgcc" \
CFLAGS="-O2 -fno-omit-frame-pointer -g" CXXFLAGS="-O2 -fno-omit-frame-pointer -g" \
./configure --prefix=/usr/local/mysql --exec-prefix=/usr/local/mysql \
--libexecdir=/usr/local/mysql/bin --localstatedir=/usr/local/mysql/data \
--with-server-suffix=1 --with-comment="(MariaDB - http://askmonty.org/)" \
--with-system-type=linux-gnu --enable-shared --enable-static --enable-thread-safe-client \
--enable-local-infile --with-big-tables --with-libwrap --with-ssl --without-docs \
--with-readline --with-extra-charsets=all --with-embedded-server --with-libevent \
--with-partition --with-zlib-dir=bundled --with-plugins=max-no-ndb && make -j3 && \
sudo rm -Rf /usr/local/mysql && sudo make install && \
sudo mv /usr/local/mysql /usr/local/%(distname)s-Linux-x386 && \
tar zcf ../%(distname)s-Linux-x386.tar.gz -C /usr/local %(distname)s-Linux-x386/'
scp -P 2222 buildbot@127.0.0.1:buildbot/%(distname)s-Linux-x386.tar.gz .
ssh -p 2222 buildbot@127.0.0.1 'sudo shutdown -h now'
while : ; do sleep 5; kill -0 "$(cat kvm_pid_2222)" || break; done
rm -f kvm_pid_2222
""")],
))
f_kvm_jaunty_x86.addStep(Test(
description=["testing", "bintar"],
descriptionDone=["test", "bintar"],
logfiles={"kernel": "kernel_2222.log"},
command=["sh", "-c", WithProperties("""
qemu-img create -b /kvm/vms/vm-jaunty-i386-install.qcow2 -f qcow2 vm-tmp-2222.qcow2
kill -9 "$(cat kvm_pid_2222)"
(exec sh -c "echo \$\$ > 'kvm_pid_2222' ; exec kvm -m 2047 -hda vm-tmp-2222.qcow2 \
-redir 'tcp:2222::22' -boot c -smp 2 -cpu qemu32,-nx -nographic -net nic,model=virtio \
-net user" </dev/null >>kernel_2222.log 2>&1) &
sleep 15
while : ; do ssh -o ConnectTimeout=4 -p 2222 buildbot@127.0.0.1 true && break; sleep 2; done
ssh -p 2222 buildbot@127.0.0.1 'rm -Rf buildbot && mkdir buildbot'
scp -P 2222 %(distname)s-Linux-x386.tar.gz buildbot@127.0.0.1:buildbot/
ssh -p 2222 buildbot@127.0.0.1 'cd buildbot && \
sudo rm -Rf /usr/local/mysql /usr/local/%(distname)s-Linux-x386 && \
sudo tar zxf %(distname)s-Linux-x386.tar.gz -C /usr/local/ && \
sudo ln -s %(distname)s-Linux-x386 /usr/local/mysql && cd /usr/local/mysql && \
sudo sudo chown -R mysql . && sudo chgrp -R mysql . && \
sudo bin/mysql_install_db --user=mysql && sudo chown -R root . && \
sudo chown -R mysql data mysql-test && \
cd mysql-test && sudo su -s /bin/sh -c "perl mysql-test-run.pl alias" mysql'
ssh -p 2222 buildbot@127.0.0.1 'sudo shutdown -h now'
while : ; do sleep 5; kill -0 "$(cat kvm_pid_2222)" || break; done
rm -f kvm_pid_2222
""")],
))
bld_kvm_jaunty_x86 = {'name': 'kvm-jaunty-x86',
'slavename': 'knielsen-kvm-x86',
'builddir': 'kvm-jaunty-x86',
'factory': f_kvm_jaunty_x86,
}
c['builders'].append(bld_kvm_jaunty_x86)
Tags: developmentprocess, mariadb, mysql, programming, testing
|
09:57 pm
[Link] |
Valgrinding Drizzle
Like so many others, I got interested in the Drizzle project when it
started. Some good ideas, lots of enthusiasm, and just pure GPL license, no
"yes, we will take your work for free and sell proprietary licenses to it"
SCA.
I even started contributing some development, fixing a number of
Valgrind-detected bugs in Drizzle. I am proud that we kept the MySQL code 100%
free of Valgrind errors, and wanted to help keep the same in Drizzle. So I
debugged and fixed quite a few of the Valgrind-detected bugs that had crept in
since forking from MySQL.
As I remember, I got down to two or three remaining or so. However, I it did
discourage me somewhat to see how quickly these bugs had been allowed to enter
the code. I remember one case where there was a Drizzle patch that had tried
to simplify some field types. As I remember, the patch tried to simplify the
code by eliminating some of multiple variants of string types. All well and
good, but then there was one place where this elimination was a bit tricky,
and the patch just #ifdef'ed out the offending part of the code, leaving the
resulting code completely broken, as detected by Valgrind. And this had been
in the source for 4 months! Cleaning up code is good, but not if only the easy
90% is done, and the rest is left undone. [Later the Drizzle people started
the nonsense with "Drizzle is GPL, but contributions are considered licensed
to Sun under BSD", and I kind of lost interest.]
Anyway, so half a year later I though it would be interesting to see how the
state of Valgrind is for Drizzle nowadays. So I branced the latest lp:drizzle
(and lp:libdrizzle), built it, and ran the test suite under Valgrind:
(cd tests && ./mtr --valgrind --force)
Unfortunately, the results are not good: 1900 Valgrind warnings!
The warnings are all kinds: Memory leaks,
mismatched free()/delete, uninitialised memory
accessed from system calls or conditional jumps, etc. Some are probably benign
or even false positives from Valgrind. Some are probably minor bugs, like tiny
memory leaks in seldom-used features or garbage in log output. And some are
most definitely serious bugs in the code that need to be fixed. With a flood
of 1900 errors, it is impossible to tell which is which without days of
careful study of the errors and debugging of the code.
I hope Drizzle will fix these issues. I have a lot of experience with
Valgrind, and I know how hard it is both to debug and fix the issues reported,
and also to get all developers to understand the importance of not allowing
code into the tree with Valgrind problems. But I have also learned how many
real, serious problems Valgrind can detect, problems that are often impossible
to otherwise catch during development. Valgrind warnings can be caused by
benign problems, but they are very rarely false alarms. But it is
important to fix problems quickly, otherwise the number of problems will pile
up until the sheer mass of issues makes it impossible to ever get back to a
clean state with zero warnings in the test suite.
(Drizzle has done other good stuff. Like Building with -Werror -pedantic
-Wall. This is something I hope we can soon duplicate in MariaDB. We do
have a clean Valgrind test run in MariaDB, and make sure we keep it by running
every push through Valgrind in Buildbot).
Tags: developmentprocess, drizzle, mariadb, mysql, programming, valgrind
|
09:27 am
[Link] |
Learning Python
Among other things, these past few months I have been working on setting up
Buildbot, including adding various
enhancements and bug
fixes that are needed to properly build and test the MariaDB and MySQL
code base.
Since Buildbot is written in Python, this means I have also had to learn
Python. I am an old-time Perl hacker, so this exercise feels a bit like living
in enemy territory ;-)
Anyway, Python is often touted as a more "pretty" language. And in many ways
it is. Still, it is not without its own gotchas. Think "scope
rules". Obviously someone haven't been reading up on the subject before
implementing things in Python, causing the language to behave stupidly (and
certainly different from what one expects) in the following three cases that I
hit during my Buildbot work.
First assignment is implicit scope declaration
def foo():
s = 0
def inc():
s = s + 1
print s
inc()
print s
This results in this error:
UnboundLocalError: local variable 's' referenced before assignment
Why? Because assigning to `s' declares a new variable. Yep, that's
right, a nested scope can read the value of a variable in an outer scope, but
it cannot assign it!
This is the work-around:
def foo():
s = { 'blarg': 0 }
def inc():
s['blarg'] = s['blarg'] + 1
print s['blarg']
inc()
print s['blarg']
Now the inner scope in inc() does not assign to the
outer variable `s'. It merely reads the value, and updates the dictionary it
contains. So now things work. Hm ...
Class vs. instance members
class Bar:
s = 0
def foo(self, x):
self.s += x
print self.s
a = Bar()
a.foo(5)
b = Bar()
b.foo(8)
So this example actually works as one would expect from first glance (it
prints "5" then "8"). But then when I looked closer, I did not understand how
it could work. That s = 0 creates a class member, shared
by all instances of the class. So how can each instance still get their own
private copy, each correctly initialised to 0?
Ah, the answer is another variant of assignment creating a new scope. Look
at self.s += x. This statement first reads s.self,
which provides the value of the class member. It then assigns the new
value to s.self, but since this is assignment, it now refers to
an instance member, so it creates a new instance member! I don't know
what those Python guys where thinking when they made self.s refer
to two different variables in a single statement ...
So this means that while the above example works as expected, this very
similar one does not:
class Bar:
s = []
def foo(self, x):
self.s.append(x)
print self.s
a = Bar()
a.foo(5)
b = Bar()
b.foo(8)
The last statement prints [5,8] as self.s is now a
class member shared among all instances.
The work-around here is to initialise member variables in the
constructor __init__(), not in the class declaration.
class Bar:
def __init__(self):
self.s = []
def foo(self, x):
self.s.append(x)
print self.s
a = Bar()
a.foo(5)
b = Bar()
b.foo(8)
Late-binding closure construction
b = []
for i in range(10):
b.append(lambda x: i)
b[0](42)
b[3](42)
This outputs the same value "9" twice. All of the functions in the list
return 9! Oops.
The reason is apparently that the closure created by (lambda ...)
does late binding of captured outer variables, meaning that it refers to the
name, not to the value at the time of closure construction. This is unlike any
other language I have ever seen that has lexical scoping, so quite confusing.
I know of two work-arounds in this case, neither of them pretty.
One is to use a dummy extra parameter with a default value:
b = []
for i in range(10):
b.append(lambda x, dummy=i: dummy)
b[0](42)
b[3](42)
See, when the variable i appears in the default value of a
parameter, it is bound early (so to the value of i is used, not
the name), different from when the variable appears in the body of the lambda
expression.
The other work-around is to build and call an extra closure to force binding to the
correct value:
b = []
for i in range(10):
b.append( (lambda j: (lambda x: j)) (i) )
b[0](42)
b[3](42)
This time, passing the value of i to the outer lambda forces
early binding, so we get the expected results.
Something to be aware of for an old-time Perl hacker like me, used to using
functional style when programming...
Tags: mariadb, mysql, programming, python
|
08:10 am
[Link] |
Placeholders and SQL injection, part 2
Actually, what I really wanted to blog about before getting
carried
away with irony yesterday was an old idea on how to force my developers to
use placeholders exclusively for SQL queries in applications. As should be
apparent from yesterdays blog entry, I am strongly in favour of using
placeholders for interpolating values into SQL queries, due to the great
reduction in potential bugs (including, but not limited to, SQL injections).
Basically, wrap the database API so that all database access passes through
the wrapper. This can usually be achieved, for example by subclassing DBI (for
Perl) and returning such subclasses from the application connection pool, or
other similar methods. Probably many large web applications already have such
wrappers or use APIs that can be patched or extended appropriately.
Now add code that basically bombs out with a big error message if any SQL
query contains a quote character. Something like "Always use placeholders
for interpolating values into SQL queries! If in disagreement, go see your
development lead for your regular spanking!" or words to the same effect.
Sometimes, the wrapper may sit below some code in the database API that
emulates placeholders (for example, DBD::mysql used to emulate placeholders in
the client using mysql_real_quote_string() or equivalent, since
real server-side placeholders are only available with the newer version of the
MySQL protocol for prepared statements). But even in this case, the wrapper
can still force the use of placeholders by exploiting the fact that MySQL
supports both single (')and double (") quotes. Basically, the wrapper would
set some private global variable at random to either a single or a double
quote, make placeholder emulation use one, and bomb out if the other is
detected in query strings. Then any developer trying to sneak manual quoting
into the application would quickly be caught, and subsequently taught.
The technique is not perfect. It does not catch completely unquoted number
interpolation (shudder). It will also be somewhat of an annoyance to have
to specify all string constants as placeholders (there is nothing wrong with
"SELECT value FROM t WHERE id = ? AND color = 'red'"). In the
end, I never got to implement it, also because my team was small enough and
clue-full enough that normal face-to-face talk was sufficient to make
placeholders be used throughout.
But if I ever find myself as lead or architect for a web application team, I
will be sorely tempted to implement it, as an educational means for the
developers and just to see what reactions it will cause.
Tags: developmentprocess, mysql, perl, security
|
04:06 pm
[Link] |
Placeholders and SQL injection
It is sad to see how 9X% (or should that be 99.X%?) of SQL applications are
riddled with
SQL
injection bugs.
There really is no excuse for this. Nobody writes code like this:
sub stupid_sum {
my ($list) = @_;
my $string = shift @$list;
for (@$list) {
$string .= " + " . $_;
}
my $sum = eval($string);
return $sum;
}
Right? Just because our computers use the
Von
Neumann architecture, where CPU instructions and data is stored in the
same memory, does not mean that we cannot distinguish between code and data
(ok, so in TeX
we do not, but there is a reason TeX is not pleasant to write applications
in).
So when we use functions to group our code into logical units, we have this
fancy syntax for something called parameters. And we can write clever
stuff like this:
int foo(int a) {
return a + 1;
}
And so the "+" and the "1" are part of the code for the function
foo(), while the other value "a" to be added is
data which comes from another part of the program. Great stuff!
In fact, in the old days, people were using something called
embedded
SQL, which tries to keep this distinction for using SQL with another
language. Though I have to admit, having used Oracle ProC, that this was quite
horrible.
But there is no need for embedding the SQL into the syntax of every language,
because these brilliant people invented the placeholder! So now we
can also have parameters for SQL, an SQL string holds the code to be executed,
and placeholders supply the values to be used from other parts of the program.
And it is so easy. No need for mysql_real_quote_string() and
other horrors. Just do like this, using Perl and DBI as example:
sub mark_items {
my ($dbh, $mark, @keys_list) = @_;
my $placeholders = join(",", map("?", @keys_list));
$dbh->do("UPDATE t SET flag = ? WHERE id IN (" . $placeholders . ")",
undef,
$mark, @keys_list);
}
And then you can sleep well at night not worrying about which kind of values
are passed to your SQL, or whether "+" can maybe format your hard disk if
someone passes in the wrong argument.
Just because modern dynamic languages make string concatenation easy, does not
mean that confusing code and data is a good thing. Von Neumann architecture is
good for CPUs, but at higher levels of abstraction we have moved on.
As Bjarne
Stroustrup often says, just as plumbers need an education to be allowed to
mess up pipes, why doesn't a programmer need an education that makes him or her
understand these things before being allowed to release software?
Tags: mysql, perl, security
|
11:51 pm
[Link] |
Selecting rows holding group-wise maximum of a field, part two
Selecting rows holding group-wise maximum is a favorite problem of mine, but
one which only rarely pops up. But for some reason, after my last blog post on
the subject, it seems to be mentioned almost daily around here.
Something that I forgot to mention in the previous post is that most of the
examples there assume suitable indexing is available to get decent
performance. Basically a composite index on both the column(s) in the GROUP BY
and the column over which MAX is computed is needed. In the example I gave,
such an index is available throught the primary key.
However, such an index may not be available in all cases. Maybe maintaining it would be too
expensive, or maybe the data the max is computed over is itself the result of
a (sub-)query, and no indexing is available. So it is worth it also to
understand this case, as the performance of the different possible queries
differ greatly from the indexed case.
So let us modify the original example to not have any useful indexes:
CREATE TABLE object_versions (
id INT PRIMARY KEY AUTO_INCREMENT,
object_id INT NOT NULL,
version INT NOT NULL,
data VARCHAR(1000)
) ENGINE=InnoDB;
This time, I will use a data set of size only 1% of the previous example, as without
indexes some of the queries get ridiculously poor performance. So let us take
10,000 rows, 1000 object each with 10 versions. I use this Perl
long^H^H^H^Hone-liner to load the data:
perl -MDBI -le '$vers=10; $groups=1000; $dbh=DBI->connect("dbi:mysql:", "test",
"pass", {RaiseError => 1}); $dbh->do("USE test"); foreach $o (1..$groups)
{ $dbh->do("INSERT INTO object_versions VALUES " . join(", ", map("(null, ?,?,?)",
1..$vers)), undef, map( ($o, $_, "data_${o}_$_"), 1..$vers)); }'
(Yes, I know... but I have a strange love for Perl one-liners).
Here are the results:
mysql> SELECT data FROM object_versions o1
WHERE version = (SELECT MAX(version) FROM object_versions o2 WHERE o1.object_id = o2.object_id);
1000 rows in set (25.55 sec)
mysql> SELECT o1.data FROM object_versions o1
INNER JOIN (SELECT object_id, MAX(version) AS version FROM object_versions GROUP BY object_id) o2
ON (o1.object_id = o2.object_id AND o1.version = o2.version);
1000 rows in set (0.72 sec)
mysql> SELECT o1.data FROM object_versions o1
LEFT JOIN object_versions o2 ON o1.object_id = o2.object_id AND o1.version < o2.version
WHERE o2.object_id IS NULL;
1000 rows in set (15.44 sec)
mysql> SELECT MAX(CONCAT(version, ":", data)) FROM object_versions GROUP BY object_id;
1000 rows in set (0.52 sec)
mysql> SELECT data FROM
(SELECT * FROM object_versions ORDER BY object_id DESC, version DESC) t GROUP BY object_id;
1000 rows in set (0.04 sec)
The only query that has any kind of decent performance here is last one using
the "evil trick" of abusing the MySQL GROUP BY extensions in a way that is
explicitly documented to not produce well-defined results. Which is really
sad, since it is the only way I know of of getting the database to make the
obvious execution plan in this case, which is to simply sort the data on the
GROUP BY expression, and then loop over the rows picking the max row in each
group on the way.
In fact, in many cases I think a decent alternative is to just select
all rows into the client using ORDER BY, and do the aggregation there.
Now I just need someone to implement my SELECT MAX(object_id,
version) ...
Tags: mysql, sql
|
11:36 pm
[Link] |
Selecting rows holding group-wise maximum of a field
Today there was a question on the Freenode MySQL channel about a classical
problem: Rows
holding group-wise maximum of a column. This is a problem that I keep
encountering every so often, so I thought I would write up something about it.
A good example of the problem is a table like the following holding versioned
objects:
CREATE TABLE object_versions (
object_id INT NOT NULL,
version INT NOT NULL,
data VARCHAR(1000),
PRIMARY KEY(object_id, version)
) ENGINE=InnoDB
Now it is easy to get the latest version for an object:
SELECT data FROM object_versions WHERE object_id = ? ORDER BY version DESC LIMIT 1
The query will even be very fast as it can use the index to directly fetch the
right row:
mysql> EXPLAIN SELECT data FROM object_versions
WHERE object_id = 42 ORDER BY version DESC LIMIT 1;
+----+-------------+-----------------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+------+---------------+---------+---------+-------+------+-------------+
| 1 | SIMPLE | object_versions | ref | PRIMARY | PRIMARY | 4 | const | 3 | Using where |
+----+-------------+-----------------+------+---------------+---------+---------+-------+------+-------------+
1 row in set (0.00 sec)
But what if we want to select the latest version of all (or some range
of) objects? This is a problem that I think standard SQL (or any SQL dialect
that I know of, including MySQL) has no satisfactory answer to.
Intuitively, the problem should be simple for the database engine to
solve. Just traverse the BTree structure of the primary key (assume InnoDB
clustered index storage here), and for each value of the first part of the
primary key (object_id), pick the highest value of the second
part (version) and return the corresponding row (this is similar
to what I believe is sometimes called index skip scan). However, this
idea is surprisingly difficult to express in SQL.
The first method suggested in the above link to the MySQL manual works in this
case, but it is not all that great in my opinion. For example, it does not work
well if the column that MAX is computed over is not unique per group (as it is
in this example with versions); it will return all of the maximal rows which
may or may not be what you wanted. And the query plan is not all that great either:
mysql> EXPLAIN SELECT data FROM object_versions o1 WHERE version =
(SELECT MAX(version) FROM object_versions o2 WHERE o1.object_id = o2.object_id);
+----+--------------------+-------+------+---------------+---------+---------+-----------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+------+---------------+---------+---------+-----------------------+------+-------------+
| 1 | PRIMARY | o1 | ALL | NULL | NULL | NULL | NULL | 6 | Using where |
| 2 | DEPENDENT SUBQUERY | o2 | ref | PRIMARY | PRIMARY | 4 | einstein.o1.object_id | 1 | Using index |
+----+--------------------+-------+------+---------------+---------+---------+-----------------------+------+-------------+
2 rows in set (0.00 sec)
It is apparently doing a full table scan with an index lookup for every row in the table,
which is not that bad, but certainly more expensive than necessary,
especially if there are many versions per object. Still, it is probably the
best method in most cases (or so I thought first, but see benchmarks below!).
The two other suggestions from the MySQL manual are not perfect either (though
the first one is blazingly fast, see benchmarks below). One is
to use an uncorrelated subquery with a join:
mysql> EXPLAIN SELECT o1.data FROM object_versions o1
INNER JOIN (SELECT object_id, MAX(version) AS version FROM object_versions GROUP BY object_id) o2
ON (o1.object_id = o2.object_id AND o1.version = o2.version);
+----+-------------+-----------------+--------+---------------+---------+---------+-------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+--------+---------------+---------+---------+-------------------------+------+-------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 3 | |
| 1 | PRIMARY | o1 | eq_ref | PRIMARY | PRIMARY | 8 | o2.object_id,o2.version | 1 | |
| 2 | DERIVED | object_versions | index | NULL | PRIMARY | 8 | NULL | 6 | Using index |
+----+-------------+-----------------+--------+---------------+---------+---------+-------------------------+------+-------------+
At first, I actually did not know exactly how to interpret this plan
output. After the benchmarks given below, I now think this plan is actually
very good, apparently it is first using something like an index skip scan to
compute the MAX() in the uncorrelated subquery, and then looking
up each row using the primary key. It still has the issue with multiple rows
if version was not unique per object.
The other suggestion uses an outer self-join:
mysql> EXPLAIN SELECT o1.data FROM object_versions o1
LEFT JOIN object_versions o2 ON o1.object_id = o2.object_id AND o1.version < o2.version
WHERE o2.object_id IS NULL;
+----+-------------+-------+------+---------------+---------+---------+-----------------------+------+--------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+---------+---------+-----------------------+------+--------------------------------------+
| 1 | SIMPLE | o1 | ALL | NULL | NULL | NULL | NULL | 6 | |
| 1 | SIMPLE | o2 | ref | PRIMARY | PRIMARY | 4 | einstein.o1.object_id | 1 | Using where; Using index; Not exists |
+----+-------------+-------+------+---------------+---------+---------+-----------------------+------+--------------------------------------+
2 rows in set (0.00 sec)
The plan again looks reasonable, but not optimal. And somehow, all three
methods feel unnatural for something that ought to be simple to express.
And in fact, there is a nice way to express this in SQL, except that it does
not work (at least not in MySQL):
SELECT MAX(version, data) FROM object_versions GROUP BY object_id;
If there was just support for computing MAX() over multiple
columns like this, this query would be a nice, natural, and simple way to
express our problem. And it would be relatively easy for database engines to
create the optimal plan, I think index skip scan is fairly standard
already for single-column MAX() with GROUP BY. And
the syntax feels very natural, even though it does bend the rules somehow by a
single expression (MAX(version, data)) returning multiple
columns. I have half a mind to try to implement it myself in MySQL or Drizzle
one day ...
In fact, one can almost use this technique by an old trick-of-the-trade:
mysql> SELECT MAX(CONCAT(version, ":", data)) FROM object_versions GROUP BY object_id;
+---------------------------------+
| max(concat(version, ":", data)) |
+---------------------------------+
| 2:foo2 |
| 1:bar |
| 3:baz2 |
+---------------------------------+
3 rows in set (0.00 sec)
mysql> EXPLAIN SELECT MAX(CONCAT(version, ":", data)) FROM object_versions GROUP BY object_id;
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+-------+
| 1 | SIMPLE | object_versions | index | NULL | PRIMARY | 8 | NULL | 6 | |
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+-------+
1 row in set (0.00 sec)
Though I consider this (and variations thereof) a hack with limited practical
usage.
And speaking of hacks, there is actually another way to solve the
problem, one which I learned about recently at a customer:
mysql> EXPLAIN SELECT data FROM
(SELECT * FROM object_versions ORDER BY object_id DESC, version DESC) t GROUP BY object_id;
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 6 | Using temporary; Using filesort |
| 2 | DERIVED | object_versions | index | NULL | PRIMARY | 8 | NULL | 6 | |
+----+-------------+-----------------+-------+---------------+---------+---------+------+------+---------------------------------+
Get it? This cleverly/evilly (ab)uses the MySQL non-standard extension which
allows SELECT of columns not in the GROUP BY clause even without using
aggregate functions. MySQL will return a value for the column from an
"arbitrary" row in the group. In practise, it chooses it deterministically
from the first row in the group, which is why this trick seems to work well in
practise. But it is clearly documented to not be supported, so not
really something to recommend, though interesting to see.
Bonus benchmark
As a free bonus, I decided to run some quick benchmarks. As it turns out, the
results are quite surprising!
So I filled the table above with 1,000,000 rows, 1000 objects each with 1000
versions. Total table size is about 50Mb or so. I then ran each of the five
above queries:
mysql> SELECT data FROM object_versions o1
WHERE version = (SELECT MAX(version) FROM object_versions o2 WHERE o1.object_id = o2.object_id);
1000 rows in set (4 min 22.86 sec)
mysql> SELECT o1.data FROM object_versions o1
INNER JOIN (SELECT object_id, MAX(version) AS version FROM object_versions GROUP BY object_id) o2
ON (o1.object_id = o2.object_id AND o1.version = o2.version);
1000 rows in set (0.01 sec)
mysql> SELECT o1.data FROM object_versions o1
LEFT JOIN object_versions o2 ON o1.object_id = o2.object_id AND o1.version < o2.version
WHERE o2.object_id IS NULL;
1000 rows in set (2 min 42.72 sec)
mysql> SELECT MAX(CONCAT(version, ":", data)) FROM object_versions GROUP BY object_id;
1000 rows in set (0.63 sec)
mysql> SELECT data FROM
(SELECT * FROM object_versions ORDER BY object_id DESC, version DESC) t GROUP BY object_id;
1000 rows in set (15.61 sec)
The differences are huge!
The clear winner is query 2, the uncorrelated subquery. Apparently it can do
an index skip scan for the inner MAX/GROUP BY query, followed with a primary
key join, so it only ever has to touch 1000 rows. An almost optimal plan.
Query 1 and 3 (correlated subquery and outer join) are spectacularly bad. It
looks as if they are doing something like for each 1,000,000 rows in the full
table scan, do a 1000 row index range scan in the subquery/join, for a total
of 1 billion rows examined. Or something. Not good.
Query 4 and 5, the trick queries, are doing so-so, probably they get away with
a sort of a full table scan of 1,000,000 rows.
Conclusions: Uncorrelated subquery is the undisputed winner!
Tags: mysql, sql
|
06:02 am
[Link] |
Slides for my lightning talks at Open Source Days 2008
In case anyone is interested in a copy of my slides for the two lightning
talks I gave at the Open Source Days 2008 conference, I have
made them available here:
- "Optimizing Large Databases Using InnoDB Clustered Indexes:"
HTML
and
PDF.
- "Profiling with OProfile and Intel Core 2 performance counters:"
HTML
and PDF.
I waqs quite pleased with the benchmark that I prepared for the InnoDB
mini-talk, where I measure the performance difference between clustered and
auto_increment primary key, both with data that fits in memory and with data
that does not. I have wanted to do this benchmark for quite some time, as I
have not really seen real results for this before, though the technique of
using clustered primary keys for performance is well-known.
The results are quite interesting, with clustered indexes being faster for I/O
bound load with more than an order of magnitude. It is also interesting to see
how dead-cheap hardware can do 23k queries/second and read 1000000
rows/second with a properly tuned database.
Tags: innodb, linux, mysql, oprofile, talk
|
09:54 pm
[Link] |
25 years of Open Source
Today I realised that I have been programming for 25 years. Quite an
anniversary! Of course, it does not really feel like that long, probably
because a bit more than the first half of the time was really 'just for fun',
as a kid's hobby and later during University.
I also realised that right from the beginning and through all the years I have
had the Open Source mindset, even though I did not learn about Free Software
until much later.
I believe there are two reasons:
- I love reading source code.
- I hate the wasted effort when code could be reused, but is not.
Right from the start I have been reading code. This was when software was
distributed as BASIC listings in magazines that one would manually type into
the computer. And I remember reading and modifying the dis-assembly of machine
code programs, getting 'infinite lives' in games and learning the techniques
they used to produce their state-of-the-art graphics.
I doubt there was ever a great novelist that did not learn from reading lots
of literature. And I believe the same is true for coding, great programmers
learn from reading lots of code.
And as soon as I could, I moved towards more portable languages and platforms,
wanting my code to be used and reused as much as possible. I distributed my
first GPL'ed program 17 years ago (XFH), after getting Internet access at the
University.
It is the same mindset that made me decide to join MySQL, really.
Previously, I worked with applications on top of Oracle. Now in many ways,
Oracle sucks less than most other software. But it is not free software. When
we started using MySQL (version 4.1 beta I think), I was really pleased with
it.
While Oracle had some advantages in terms of features and usability, MySQL
had the advantage of being Open Source, and to me that made them about
equal. And improving MySQL features and usability seemed rather more feasible
to me than getting Oracle open-sourced. So I wanted to contribute my skills
towards the first of these goals. Improving MySQL, enabling the use and reuse
of the code by me and others. And getting to know the source code of the M in
LAMP.
On to the next 25 years of Open Source, I guess ...
Tags: freesoftware, mysql
|
12:57 pm
[Link] |
DGCov: A tool for checking test coverage of a patch Today I published the DGCov tool on the MySQL Forge.
DGCov is a neat tool that I implemented last year for use internally at MySQL, an old idea of Monty's.
The idea is to help developers to check that a new patch for the server code has received adequate testing before pushing it to the main tree. The GCC compiler has the gcov tool that can check which lines of the source code were never executed even once. But suppose you change a few 1000 lines across a big source base like MySQL. Then GCov output is not all that useful, since it will report tons of lines as not executed, and it is difficult to manually check which of those lines were touched by your patch.
The DGCov tool takes the GCov output and filters it so that it only shows those lines that were touched by the patch being checked. This output is immediately applicable to the work done by the developer, and very useful to check what the test coverage is, especially when doing patches that make smaller changes across a big part of the source.
Of course, just checking code coverage is not sufficient to ensure that good testing has been done of the code! But it is very good for making sure that one's changes have at least been run once before releasing them. This way one can avoid embarrasments like publishing infinite loops or uncontitional crashes in the code.
Hopefully, others will be able to use this for MySQL or other development projects (it is not specific to MySQL at all really).
Tags: mysql, perl, testsuite
|
09:01 pm
[Link] |
Partitioned archive tables
Is there anyone using partitioned archive tables in MySQL 5.1 for storing logs (or other voluminous data)?
Storing large amounts of logs in a relational database can bring special challenges.
Logs can take up huge amounts of space on disk, and while disk space is cheap, disk I/O can be expensive, performance-wise. But many logs compress really well, and for this the MySQL archive storage engine (insert-only, no indexes, gzip'ed storage) can be used.
You often want to scan across a few hours or days worth of logs, and indexes are poor for this purpose as the large number of disk seeks can kill performance. And full table scans of years of logs is not all that much fun either. For this, table partioning (supported in MySQL 5.1) is very useful. Store each day or week of logs in a separate table partition, partitioned on day (maybe use the 5.1 event sceduler, or simply a cron job, to add new partitions automatically). Then efficient full-partition scans of just the days or weeks in question can be used.
So why not combine both approaches? MySQL 5.1 supports partitioned archive tables just fine:
CREATE TABLE mylog (unix_time INT, msg VARCHAR(1000))
ENGINE=archive
PARTITION BY RANGE(unix_time)
(PARTITION p0 VALUES LESS THAN (UNIX_TIMESTAMP('20060902000000')),
PARTITION p1 VALUES LESS THAN (UNIX_TIMESTAMP('20060903000000')));
This has the potential to really boost performance of relational log management. Very low space usage, and still full SQL query access to the data, with good performance.
A quick test confirms this (after adding some more partitions). First load some rows:
perl -MDBI -e '$dbh=DBI->connect("dbi:mysql:"); $dbh->do("INSERT INTO mylog values($_, \"log message on $_\")") for (1157148000 .. 1157148000 + 86400*5 - 1)'
Only about 10 bytes used per row, great (archive storage engine doing it's bit)! And queries in limited date ranges are faster:
mysql> select * from mylog where msg LIKE '%71234%';
+------------+---------------------------+
| unix_time | msg |
+------------+---------------------------+
| 1157171234 | log message on 1157171234 |
| 1157271234 | log message on 1157271234 |
| 1157371234 | log message on 1157371234 |
| 1157471234 | log message on 1157471234 |
| 1157571234 | log message on 1157571234 |
+------------+---------------------------+
5 rows in set (2.15 sec)
mysql> select * from mylog where unix_time >= unix_timestamp('20060903100000') and unix_time < unix_timestamp('20060903105959') and msg LIKE '%71234%';
+------------+---------------------------+
| unix_time | msg |
+------------+---------------------------+
| 1157271234 | log message on 1157271234 |
+------------+---------------------------+
1 row in set (0.66 sec)
So partition pruning is working as expected.
Tags: mysql
|
11:07 pm
[Link] |
Data Corruption!
At the start of this week, we suffered a corruption of our main 5.1 source
code repository at MySQL. No data was lost, but I spent most of four working
days on cleaning up the corruption, Monty spent one day, and many other people
had to spend time on this or were stalled in their work while the problem was
being resolved. Including the usual stories with fetching off-site backup
tapes only to find them broken, etc.
Our source code repository is the centre that all our work in Engineering
revolves around, and it just has to be stable. The confidence in the
revision control software that we use suffers greatly from such an experience,
and the lost confidence can never really be restored.
But there is a good lesson in this for MySQL, I think.
Like revision control software, MySQL is used by our users to store their
valuable data. The database is the center around which applications revolve,
and it must be stable. If our customers suffer loss or corruption of
data due to bugs in MySQL, the consequential loss of confidence will be
impossible to restore.
There are a number of tools and procedures used to keep tight control of
the quality of the server code. For example:
- New code undergoes code review by two other developers before being
accepted into the main repositories.
- We have an open bug database. Everyone can open a bug, and everyone can
see bugs that are open, or that were fixed in the past.
- The server is available for community testing right from the early alpha
versions. Users can test new versions early (and are rewarded for doing so;
MySQL has a wow to fix every repeatable bug reported in the bug DB, so by
testing early releases users can make sure that the server will work in their
applications once it reaches GA).
- We have a very comprehensive automatic test suite. For every bug fixed,
we add a test to the test suite so that the same bug will never sneak in again
un-noticed.
- We have a tool 'autopush' that runs the entire test suite
before code is pushed to the main repository, and rejects new code if the
testsuite fails even a single test.
- The test suite runs with a custom debugging memory allocator
my_malloc that tests for memory leaks. Of course, a single missed
my_free during the test suite is considered a testsuite
failure. (Since a database server must be able to run uninterrupted for
indefinite periods of time, any memory leak is a serious error).
- We use Valgrind to catch memory leaks
in third-party libraries (which do not use
my_malloc) and to
catch pointer errors and other memory related errors.
- We use the GCov
program to check that newly added code includes sufficient test cases to cover
all aspects of new functionality (GCov is a tool for GCC that reports how many
times each line of code is executed during a test program run).
- We have a tool 'Pushbuild' that builds and tests the server
source every time new code is pushed. Builds in pushbuild include multiple
processors and OS'es (Pentium, Opteron, Sparc, PowerPC, ...; Linux, Windows,
Solaris, HPUX, QNX, ...); building with full feature set or with just a few
features enabled; debug and optimized builds; Valgrind tests; GCov tests; and
others. (There has been talk about making reports from pushbuild available
externally; if you think this is a good idea drop a comment, and it may happen
sooner).
So overall, MySQL is in very good shape, quality wise. But it is still good
to remind ourselves from time to time why we do this, and why it is
important.
Tags: mysql, scm
|
10:36 pm
[Link] |
Full-index scan faster than full-table scan
At this years LinuxForum I was
manning the MySQL booth together with Carsten Pedersen. We were kept
quite busy with lots of people coming to tell about their use of the MySQL
database for their particular project and ask about or discuss a particular
issue of theirs. Which was fine, since the talks did not appeal a lot to me
anyway.
One guy (I forgot who) had a small performance problem in his
application. The application is a database of about 550,000 companies, storing
name and various other bits of information about each company. What I would
call a "small" database (since it is easily kept completely cached
in ram), though not a trivial one.
This application has a facility to search for a company using any part of
the company name:
SELECT ... FROM company WHERE name LIKE '%pattern%&apos
The query was taking too long, I think something like 10 or 15
seconds. Clearly a full table scan is needed, since neither BTREE or full-text
indexes can be used, but still >10 seconds seems way over the top on modern
hardware.
He was using InnoDB, so the obvious suggestion was to make sure that InnoDB
was configured with sufficiently large buffer pool to cache all of the
database in ram (the machine had sufficient memory for that, but it is
necessary to set the innodb_buffer_pool_size sufficiently large in order for
MySQL to actually cache the data).
Still, the problem got me thinking: How fast can one actually do full table
scans in MySQL on modern hardware? 15 seconds for a measly 550.000 records
seems pretty poor given that modern CPUs will execute billions of instructions
each second. Probably in this particular case insufficient caching was causing
disk I/O, reducing the speed. But another issue might be that this table has
lots of fields with different information, so the scan needs to not only
search the 'apos' column, but also skip over all of the
other fields in the table.
In this case, an index on the 'name' column can help. Normally
you are told that in a query using LIKE '%...%', an
index can not be used. But if all columns used in the query are available from
a single index, MySQL can use an index scan instead of a table scan. Even
though the whole of the index must be scanned, this can still be beneficial if
the index is smaller than the table.
So lets run a quick test on some actual data. First a table definition:
CREATE TABLE company (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255),
addr VARCHAR(255),
phone VARCHAR(64),
status INT
) ENGINE=innodb;
The table in the original application undoubtedly had many more columns, but
this will do for a quick test.
Now some Perl code to fill the table with some realistic data:
for (1..550000) {
$dbh->commit() if($_ % 500 == 0);
my $words = int(rand(3)) + 1;
my @word = ();
for (1..$words) {
my $x = '';
my $length = int(rand(13)) + 2;
for (1..$length) {
$x .= chr(int(rand(26)) + ord('a'));
}
push @word, $x;
}
my $name = join(' ', @word);
$dbh->do(<<SQL, undef, $name, "foobar", "+1 234 56", int(rand(1000)));
INSERT INTO company(name, addr, phone, status)
VALUES (?, ?, ?, ?)
SQL
}
$dbh->commit();
And lets set a reasonable InnoDB buffer pool size in my.cnf:
innodb-buffer-pool-size = 262144000
Now let us try it:
mysql> CREATE UNIQUE INDEX cover1 on company(name, id);
mysql> EXPLAIN SELECT id FROM company WHERE name LIKE '%xgef%';
+----+-------------+---------+-------+---------------+--------+---------+------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+--------+---------+------+--------+--------------------------+
| 1 | SIMPLE | company | index | NULL | cover1 | 262 | NULL | 548220 | Using where; Using index |
+----+-------------+---------+-------+---------------+--------+---------+------+--------+--------------------------+
1 row in set (0.00 sec)
mysql> SELECT id FROM company WHERE name LIKE '%xgef%';
+--------+
| id |
+--------+
| 310501 |
| 408997 |
| 274935 |
| 418352 |
| 335782 |
| 98100 |
| 239031 |
| 454742 |
+--------+
8 rows in set (0.52 sec)
So half a second to search all 550,000 names. Much better than 15 seconds, and
completely adequate for the application in question, where such a search is
performed perhaps once every minute.
Of course half a second for a search would not be acceptable for all
applications. To speed it up more, one idea might be to store all suffixes of
company names in a separate table, along with the company id. This table would
be quite big, but still of reasonable size: If names are on average 25
characters, the number of rows would be about 14 million. Then with an index
on the column of name suffixes, a query like this:
SELECT id FROM suffix_table WHERE name_suffix LIKE 'pattern%'
could use the index to immediately look up the required data with millisecond
response time. But that kind of complexity was not required for the problem at
hand.
Tags: mysql
|
01:41 pm
[Link] |
Visual Studio limitations
MySQL 5.1 is moving to a new system for building the source code on
Windows, based on CMake (more on this
another time).
While finishing up the last bits of this, I ran into a very annoying
limitation in Visual Studio (version 7.1 I think). The library include path in
Visual Studio project files (.vcproj) has a very tight limit on
the maximum length, around 2000 characters it seems. Put a path longer than
that in the AdditionalLibraryDirectories section, and you will
not get an error but the build will fail to find required libraries. This
problem is made more serious by the fact that CMake tends to use absolute
paths, leading to longer paths in the project files.
Googling a bit for this problem turned up nothing, so in the end I had to
resort to some extremely ugly Perl one-liner build cleanup:
perl -i.bak -pe "s/[a-z]:[^,; ]+\\\\$BUILD_DIR/../gi if /AdditionalLibraryDirectories/" sql/mysqld.vcproj
Not very nice :-(
I think there is a lesson here: Do not put arbitrary limits into
development tools. People will try to use the tools in unexpected ways, and
fail.
Tags: mysql, windows
|
|