Autotest framework in RonDB#
MTR is not designed to handle the most important part of the testing of RonDB. This is restart testing. Given that RonDB is designed to survive failures in any type of operation in a distributed system the most important part in testing RonDB is testing it while crashes of various kinds are happening in the system.
Another aspect is that the main interface to RonDB is the NDB API. MTR only tests RonDB using the NDB storage engine API. The Autotest framework focuses all tests on using the NDB API. Autotest also contains a framework to inject errors in the code, to restart nodes in various ways and in general to modify RonDB to ensure that critical code paths are entered and verified to be working.
To handle this testing we have built something we call Autotest. This is a framework that has automated test runs such that one can run a test for about 36 hours automatically where a battery of around 500 tests are executed one at a time. Each of those test cases can contain many different crashes. Given that crashes and failures usually don't occur in the most critical places these tests can inject errors in the code to ensure that we can test failures in all sorts of scenarios.
The test cases executed in autotest is to a great extent developed inside the Hugo framework. The Hugo framework makes it easy to work with a predefined set of tables and quickly write code that fills those tables, deletes them, creates them, reads them, updates them and so forth.
Autotest is a functional test framework although it is focused on testing restart functions and functions using the NDB API directly whereas MTR is more focused on functions used in normal operation with SQL as the main interface to RonDB.
Given that Autotest is a generic framework testing using the NDB API and the Hugo API it is possible to run the autotest framework in many different configurations. Thus we have the ability to run tests in many different configurations to ensure that we can handle all sorts of configurations of RonDB.
Setting up the Autotest framework#
First to execute Autotest one needs to use Linux, Autotest so far fails on Mac OS X, it is probably possible to get it to work, but Autotest probably interacts with Mac OS X security layers too much.
To execute a set of tests using Autotest one need to setup a few things first.
Setting up ndb_cpcd#
The first thing that is required is to have a running ndb_cpcd process. This is a simple program used to start and stop RonDB processes using an extremely simple communication protocol. It is only designed for test execution and has not taken any security into its design and can thus not be used for management of RonDB in a production setting. But to execute automatic tests it is a very useful tool.
What I do to setup a simple way of starting up ndb_cpcd is to create a new directory called cpcd, put the ndb_cpcd binary in this directory and create a short startup script to start it.
Below shows how to create the contents in this directory, it assumes that you have built RonDB under the directory /home/user/rondb_2104 and used debug_build as the directory where you built it.
cd /home/user mkdir cpcd cd cpcd cp /home/user/rondb_2104/debug_build/storage/ndb/src/cw/cpcd/ndb_cpcd .
Now in this directory create a file called start_cpcd.sh with the following content.
cat start_cpcd.sh #!/bin/bash /home/user/cpcd/ndb_cpcd --work-dir=/home/user/cpcd --debug --user=user --logfile=/home/user/cpcd/cpcd.log &
Make sure that this shell script has execute privileges using the chmod command. E.g. as below.
chmod a+x start_cpcd.sh
Now all you have to do before starting your tests is to ensure that the ndb_cpcd is running by executing the start_cpcd.sh script.
The above works for RonDB 21.04. A minor change is required to execute the same test on RonDB 22.01. The reason is that this makes it possible to execute tests on RonDB 21.04 in parallel with tests on RonDB 22.01 simultaneously on the same machine. Each test run needs a separate ndb_cpcd program to connect to. RonDB 21.04 defaults to connect to port 1234 and RonDB 22.01 defaults to connect to port 1235.
cd /home/user mkdir cpcd_1235 cd cpcd_1235 cat start_cpcd.sh #!/bin/bash /home/user/cpcd/ndb_cpcd --work-dir=/home/user/cpcd_1235 --port=1235 --debug --user=user --logfile=/home/user/cpcd_1235/cpcd.log &
Setting up the Autotest directory#
Each autotest run requires a separate directory where all the test input and output is created.
In my case I create the following directories for testing RonDB 21.04. Replace gituser with your username at github.com where you have forked your own version of RonDB.
cd /home/user mkdir autotest_2104 cd autotest_2104 mkdir results mkdir run mkdir rondb_2104_build mkdir rondb_2104_install git clone https://github.com/gituser/rondb rondb_2104
Now we need to setup the run directory to be able to execute the Autotest tests. First you need to copy the autotest-boot.sh script from the RonDB tree to here.
cd /home/user/autotest_2104/run cp /home/user/rondb_2104/storage/ndb/test/run-test/autotest-boot.sh .
Next you need to create a shell script that executes the Autotest as below:
1 2 3
This shell script will execute 2 of the 40 test suites in Autotest. The first execution will clone the rondb_2104 tree into a local tree in the rondb_2104_build directory and execute the build in this directory. After finishing the build the installation will be placed in the rondb_2104_installation directory.
The second execution requires neither clone nor build since it will continue to use the installation in the rondb_2104_installation directory.
There are 8 test suites called daily-devel and 15 test suites called daily-basic, 2 test suites called weekly-basic and 14 test suites called weekly-devel. There is also a test suite called daily-perf, but it is rarely used since the main focus on performance testing is using the DBT2 benchmark scripts.
Finally it is possible to execute upgrade tests as well. These require even more setup to be performed which I won't go into here.
Now before the tests can be executed we also need to create a configuration file for the test runs. This is placed in a file called autotest.conf with the below content.
cd /home/user/autotest_2104/run cat autotest.conf install_dir="/home/user/autotest_2104/rondb_2104_install" build_dir="/home/user/autotest_2104/rondb_2104_build" git_local_repo="/home/user/autotest_2104/rondb_2104" git_remote_repo="/home/user/autotest_2104/rondb_2104" base_dir="/home/user/autotest_2104/results" baseport="14000" clustername=".2node" target="x86_64_Linux" hosts="hostname hostname hostname hostname hostname hostname hostname hostname" report= MAKEFLAGS="-j32" export MAKEFLAGS BOOST_ROOT="/home/user/boost_1_73_0" export BOOST_ROOT WITH_NDB_JAVA_DEFAULT="0" export WITH_NDB_JAVA_DEFAULT WITH_NDB_NODEJS_DEFAULT="0" export WITH_NDB_NODEJS_DEFAULT
The directories we have already mentioned. The hostname should be replaced with the hostname of the computer you are using for the tests. It is possible to run the tests in a distributed setup, but I usually prefer to run tests on one machine. The baseport is normally set to something like 14000, 15000 or similar. This enables multiple Autotests to run in parallel on one machine.
The BOOST_ROOT is required to build RonDB and also MAKEFLAGS, and the WITH flags are used by the build process. There is no need to build with support for Java and NodeJS when executing the Autotest. Autotest focuses only on the C++ NDB API.
The actual build is executing a script called compile-cluster that resides in the storage/ndb directory. It ensures that the build uses the error injection build flags with a somewhat optimised code build.
The clustername needs to be one of the clusters found in the file called conf-autotest.cnf found in the storage/ndb/test/run-test directory. If you want to change the configuration to a new one not mentioned in this file, then change the file and commit in the Git clone or simply go into autotest_2104/rondb_2104_install/mysql-test/ndb and directly edit the conf-autotest.cnf file found there.
Now you are all setup to execute the tests.
How to check the results#
The results are created in a tarball file placed in the directory:
ls -la /home/user/autotest_2104/results/saved_results -rw-r--r-- 1 user user 4573 Jan 29 02:08 res.daily-devel--01..x86_64_Linux.2022-01-29.hostname.30335.tgz
The tarball has a long name concatenating the name of the test suite, the target platform, the date, the hostname it was executed and the PID (in this case 30335) of the executed test.
tar xfz res.daily-devel--01..x86_64_Linux.2022-01-29.hostname.30335.tgz cd result-daily-devel--01--x86_64_Linux cd 2022-01-29 ls
The most important files to look into in this directory is the potential result.12 directory where 12 is the number of the failed test case. The output on a high level is found in the log.txt file. In this file you will see what the failed test case is and what error code it produced. The most common are 101 which is a crash in a data node, 256 which is a test failure found by the test program, 103 which means that the test case timed out before completing. OK(0) means a successful test.
Under the result.12 you will find all log files produced by the cluster during the test run and even some log files from previous test runs leading up to the failure. See the chapters on log files and trace files to find out how to interpret those files.
How to debug a failed test#
Now you have developed your contribution, you have tested it in MTR, you have tested it using Autotest and you have tested it. You found a failure, but couldn't discover the bug based on the log files. So what to do next.
The next step I normally perform is to move the test to MTR again. It is very easy to more or less replicate the test execution using MTR in the following manner:
cd /home/user/rondb_2104/debug_build/mysql-test ./mtr --suite=ndb --start-and-exit
The above command starts up a default configured cluster using port 13000 for the RonDB management server and port 13001 for the MySQL server. Remember that the default configuration is found in mysql-test/include/default_ndbd.cnf. The default configuration is a 2-node cluster. An easy manner to change to a default 3-node cluster is to instead start the cluster with the following command:
cd /home/user/rondb_2104/debug_build/mysql-test ./mtr --suite=ndb --start-and-exit ndb_basic_3rpl
Similarly ndb_basic_4rpl sets up a default cluster with 4 nodes and 4 replicas.
Now to execute the offending test case execute the following commands after starting up the cluster using MTR. Setting up NDB_CONNECTSTRING is a quick way to tell the test case where the RonDB management server resides and thus be able to connect to the RonDB cluster.
export NDB_CONNECTSTRING=localhost:13000 cd /home/user/rondb_2104/debug_build/bin ./testSystemRestart -n SR1 T1
The above command runs the test that failed in Autotest, now hopefully the failure is reproducible, sometimes it isn't, this often requires a bit of fiddling with the default configuration. It could also be related to the environment, sometimes inserting some load in the machine while executing the test case can invoke new code paths to be executed. Most of the time the failures are reproducable and this gives you the possibility to add various printouts to your code to understand more what is going on to debug the failure.
Finally when you need to start up the cluster from scratch again you need to stop the MTR cluster. Since you started and exited, MTR can no longer assist in stopping it. In a normal MTR execution, MTR will ensure that the test cleans up after it. So here I usually use killall -9 to shut down the cluster using the following command:
killall -9 mysqld ndbmtd ndb_mgmd