I had an interesting IRC discussion the other day with Monty Taylor about what turned out to be a limitation in Valgrind with respect to debugging memory leaks in dynamically loaded plugins.
Monty Taylor's original problem was with Drizzle, but as it turns out, it is common to all of the MySQL-derived code bases. When there is a memory leak from an allocation in a dynamically loaded plugin, Valgrind will detect the leak, but the part of the stack trace that is within the plugin shows up as an unhelpful three question marks "???":
==1287== 400 bytes in 4 blocks are definitely lost in loss record 5 of 8 ==1287== at 0x4C22FAB: malloc (vg_replace_malloc.c:207) ==1287== by 0x126A2186: ??? ==1287== by 0x7C8E01: ha_initialize_handlerton(st_plugin_int*) (handler.cc:429) ==1287== by 0x88ADD6: plugin_initialize(st_plugin_int*) (sql_plugin.cc:1033)Which tells you little more than that there is a leak in one of your plugins.
After trying a couple of things, we found that this is a known limitation in
Valgrind in relation to code that is loaded with
later unloaded with
The basic problem is that Valgrind records the location of
malloc() call as just a memory address. And when the memory
leak check is performed after the end of program execution, the plugin has
been unloaded with
dlclose(), and the recorded memory address is
therefore no longer valid.
The problem is specific to memory leak checks, which are done only after the code has been unloaded. Other checks (like use of uninitialised values and use-after-free) work fine with full information in the stack traces, as such checks are done while the plugin code is still loaded into memory. But the memory leak checks are arguably among the most useful cheks Valgrind does, as Valgrind is often the only way to find and fix critical memory leaks efficiently.
Fortunately, once the issue was understood, we had an easy work-around:
dlclose() call in the server plugin code, and the
leak is then detected with full information in the stack trace. Unfortunately
this introduces a leak of its own, since now the memory allocated
dlopen() is never freed, so we get another spurious Valgrind
memory leak warning.
Another possible way to get the same effect is to pass
RTLD_NODELETE flag to
dlopen() to achieve the
same effect, though I did not try this yet.
A possibly better work-around (which I also did not try yet) is one suggested
in the above referenced Valgrind feature request. By adding the offending
LD_PRELOAD when starting the server, the plugin code
will not actually be unloaded in
dlclose(), so stack traces
should be available without any spurious leak warnings from Valgrind. However,
this will not work well if some of the dynamic plugins need a particular load
order (according to the suggestion in the feature request). I also need to
check if this actually works for plugins (like storage engines) that has link
dependencies to symbols in the main program. But it might be a good option if
it can be made to work.
(At first I was surprised to learn that this was a problem in MySQL and
MariaDB, as I never saw it before. But I suppose the reason is that we so far
have built most plugins as built-in, rather than as dynamically
.so files. The problem is likely to occur more frequently
as we are moving to do more and more with plugins in MariaDB, so it is nice to
know a work-around. Thanks, Monty!)