Release Notes RonDB 21.04.5#
RonDB 21.04.5 is the fifth release of RonDB 21.04. RonDB 21.04.4 was only released internally and the fixes in it is listed in the release notes for RonDB 21.04.5.
RonDB is based on MySQL NDB Cluster 8.0.23. It is a bug fix release based on RonDB 21.04.3.
RonDB 21.04.5 is released as an open source SW with binary tarballs for usage in Linux usage. It is developed on Linux and Mac OS X and using WSL 2 on Windows (Linux on Windows).
RonDB 21.04.5 can be used with both x86_64 and ARM64 architectures although ARM64 is still in beta state.
The RonDB 21.04.5 is tested and verified on both x86_64 and ARM platforms using both Linux and Mac OS X. It is however only released with a binary tarball for x86_64 on Linux.
There are four ways to use RonDB 21.04.5:
-
You can use the managed version available on hopsworks.ai. This sets up a RonDB cluster provided a few details on the HW resources to use. The RonDB cluster is integrated with Hopsworks and can be used for both RonDB applications as well as for Hopsworks applications. Currently AWS is supported, Azure support is soon available.
-
You can use the cloud scripts that will enable you to set up in an easy manner a cluster on Azure or GCP. This requires no previous knowledge of RonDB, the script only needs a description of the HW resources to use and the rest of the set up is automated.
-
You can use the open source version and use the binary tarball and set it up yourself.
-
You can use the open source version and build and set it up yourself. This is the commands you can use:
RonDB 21.04 is a Long Term Support version that will be maintained until at least 2024.
Maintaining 21.04 means mainly fixing critical bugs and minor change requests. It doesn’t involve merging with any future release of MySQL NDB Cluster, this will be handled in newer RonDB releases.
Backports of critical bug fixes from MySQL NDB Cluster will happen.
Summary of changes in RonDB 21.04.5#
RonDB has 9 bug fixes since RonDB 21.04.3. In total RonDB 21.04 contains 15 new features on top of MySQL Cluster 8.0.23 and a total of 96 bug fixes.
Test environment#
RonDB has a number of unit tests that are executed as part of the build process.
MTR testing#
RonDB has a functional test suite using the MTR (MySQL Test Run) that executes more than 500 RonDB specific test programs. In adition there are thousands of test cases for the MySQL functionality. MTR is executed on both Mac OS X and Linux.
We also have a special mode of MTR testing where we can run with different versions of RonDB in the same cluster to verify our support of online software upgrade.
Autotest#
RonDB is very focused on high availability. This is tested using a test infrastructure we call Autotest. It contains also many hundreds of test variants that takes around 36 hours to execute the full set. One test run with Autotest uses a specific configuration of RonDB. We execute multiple such configurations varying the number of data nodes, the replication factor and the thread and memory setup.
An important part of this testing framework is that it uses error injection. This means that we can test exactly what will happen if we crash in very specific situations, if we run out of memory at specific points in the code and various ways of changing the timing by inserting small sleeps in critical paths of the code.
During one full test run of Autotest RonDB nodes are restarted thousands of times in all sorts of critical situations.
Autotest currently runs on Linux with a large variety of CPUs, Linux distributions and even on Windows using WSL 2 with Ubuntu.
Benchmark testing#
We test RonDB using the Sysbench test suite, DBT2 (an open source variant of TPC-C), flexAsynch (an internal key-value benchmark), DBT3 (an open source variant of TPC-H) and finally YCSB (Yahoo Cloud Serving Benchmark).
The focus is on testing RonDBs LATS capabilities (low Latency, high Availability, high Throughput and scalable Storage).
Hopsworks testing#
Finally we also execute tests in Hopsworks to ensure that it works with HopsFS, the distributed file system built on top of RonDB, and HSFS, the Feature Store designed on top of RonDB, and together with all other use cases of RonDB in the Hopsworks framework.
BUG FIXES#
RONDB-114: Missing sync on initialisation of files#
During initial start of RonDB and during creation of tablespace files there was no sync call to the file system. This led to lots of dirty pages filling up memory in the OS and subsequently the data node was killed due to out of memory condition in the OS.
RONDB-115: Change hostname also for MGM server when set hostname called#
The new set hostname functionality required also the management server to set the new hostname as part of the command.
RONDB-116: Set hostname in transporter sections when set hostname called#
When setting a new configuration using the NDB MGM API it was necessary to set the new hostname also in the transporter sections, not only in the node section of the configuration.
RONDB-117: Changed connect timeout for management servers#
Connect timeout for management servers reused the command timeout for management client commands that was 60 seconds. This led to too long delays when using a management server that had a correct IP address, but no computer using the IP address. Replaced with a connect timeout of 3 seconds instead.
RONDB-118: Ensure that active node status is properly set in API/MGMD nodes#
When setting hostname and active status ensure that also API nodes and management servers are updated. To ensure that the update reaches all nodes, each data node will send the information to all API nodes and management servers it is connected to.
RONDB-119: Ensure that all nodes can be activated and hostname changed online#
RonDB-120: Make it possible to use IPv4 sockets between ndbmtd and API nodes#
When using Dolphin SuperSockets it was necessary use sockets that could only use IPv4 addresses. A new configuration variable was introduced, UseOnlyIPv4 that is set on node level to ensure that all connections use IPv4 sockets. Can only be set on data nodes and API nodes.
RONDB-121: EnableRedoControl won’t work when #LDMs > #LogParts and get_total_memory incorrect#
This bug is actually two bugs.
Previously there was an assumption that the LDM thread had access to log parts. Without this the LDM thread simply ignored any attempt to set a proper disk write speed and instead used the static checkpoint speed.
Now if #LDMs > #LogParts the LDM thread will look at all log parts that it can use and add those together and divide it by the number of LDM threads using those log parts. Thus the REDO log usage is indicative of a combined REDO log usage of multiple LDM threads.
This required changing the calculation of percentage of REDO log used as well. We calculate this in get_redo_stats instead where we rather check for the REDO log which is the one with the highest redo percentage.
The function get_total_memory assumed that the memory usage among the LDM threads was evenly distributed, this is not always correct. Replaced by getting number of allocated pages in the LDM thread from DBTUP from the variable m_pages_allocated which is already maintained. This led to estimating the proposed disk write speed to low or too high and thus the checkpoint time kept increasing with uneven use of the LDM threads.
Added an extra increase when the checkpoint time reaches beyond 1 minute, 2 minutes and 3 minutes to ensure that we try to retain the checkpoint times fairly low. No more than 15% added in this manner.
When we execute in a high CPU state for an extended time we will use very low disk write speeds. This would previously not be changed until we reach 25% REDO fill level. Added now a check such that we slowly start writing checkpoints at proposed disk write speeds if the lag is larger than 1 GByte. Thus we allow for temporary high loads that create a lag of 1 GByte, but after reaching a lag of 1 GByte we will try hard to keep up with the proposed disk write speed even in the context of high load.
This is 1 GByte per LDM thread.