Skip to content

Release Notes RonDB 22.10.1#

RonDB 22.10.1 is still in active development and is thus not fully supported yet.

RonDB 22.10.1 is based on MySQL NDB Cluster 8.0.34 and RonDB 21.04.15.

RonDB 21.04 is a Long-Term Support version of RonDB that will be supported at least until 2024.

RonDB 22.10 is a new Long-Term support version and will be maintained at least until 2025.

RonDB 22.10 is released as open source SW with binary tarballs for usage in Linux and Mac OS X. It is developed on Linux and Mac OS X and using WSL 2 on Windows (Linux on Windows).

RonDB 22.10.1 and onwards is supported on Linux/x86_64 and Linux/ARM64.

The other platforms are currently for development and testing. Mac OS X is a development platform and will continue to be so.

Description of RonDB#

RonDB is designed to be used in a managed cloud environment where the user only needs to specify the type of the virtual machine used by the various node types. RonDB has the features required to build a fully automated managed RonDB solution.

It is designed for appplications requiring the combination of low latency, high availability, high throughput and scalable storage (LATS).

You can use RonDB in a Serverless version on app.hopsworks.ai. In this case Hopsworks manages the RonDB cluster and you can use it for your machine learning applications. You can use this version for free with certain quotas on the number of Feature Groups (tables) you are allowed to add and quotas on the memory usage. You can get started in a minute with this, no need to setup any database cluster and worry about its configuration, it is all taken care of.

You can use the managed version of RonDB available on hopsworks.ai. This sets up a RonDB cluster in your own AWS, Azure or GCP account using the Hopsworks managed software. This sets up a RonDB cluster provided a few details on the HW resources to use. These details can either be added through a web-based UI or using Terraform. The RonDB cluster is integrated with Hopsworks and can be used for both RonDB applications as well as for Hopsworks applications.

You can use the cloud scripts that will enable you to set up in an easy manner a cluster on AWS, Azure or GCP. This requires no previous knowledge of RonDB, the script only needs a description of the HW resources to use and the rest of the set up is automated.

You can use the open source version and use the binary tarball and set it up yourself.

You can use the open source version and build and set it up yourself.

This is the commands you can use to retrieve the binary tarball:

# Download x86_64 on Linux
wget https://repo.hops.works/master/rondb-22.10.1-linux-glibc2.17-x86_64.tar.gz

Summary of changes in RonDB 22.10.1#

RonDB 22.10.1 is based on MySQL NDB Cluster 8.0.34 and RonDB 21.04.15.

RonDB 22.10.1 adds 13 new features on top of RonDB 21.04.15, and adds 48 new features on top of MySQL NDB Cluster 8.0.34.

RonDB 22.10.1 fixes 6 bugs, in total this means that 8 bugs have been fixed in RonDB 22.10 and in total 156 bugs have been fixed in the RonDB.

Test environment#

RonDB uses four different ways of testing. MTR is a functional test framework built using SQL statements to test RonDB.

The Autotest framework is specifically designed to test RonDB using the NDB API. The Autotest is mainly focused on testing high availability features and performs thousands of restarts using error injection as part of a full test suite run.

Benchmark testing ensures that we maintain the throughput and latency that is unique to RonDB. The benchmark suites used are integrated into the RonDB binary tarball making it very straightforward to run benchmarks for RonDB.

Finally we also test RonDB in the Hopsworks environment where we perform both normal actions as well as many actions to manage the RonDB clusters.

RonDB has a number of MTR tests that are executed as part of the build process to improve the performance of RonDB.

MTR testing#

RonDB has a functional test suite using the MTR (MySQL Test Run) that executes more than 500 RonDB specific test programs. In adition there are thousands of test cases for the MySQL functionality. MTR is executed on both Mac OS X and Linux.

We also have a special mode of MTR testing where we can run with different versions of RonDB in the same cluster to verify our support of online software upgrade.

Autotest#

RonDB is very focused on high availability. This is tested using a test infrastructure we call Autotest. It contains also many hundreds of test variants that takes around 36 hours to execute the full set. One test run with Autotest uses a specific configuration of RonDB. We execute multiple such configurations varying the number of data nodes, the replication factor and the thread and memory setup.

An important part of this testing framework is that it uses error injection. This means that we can test exactly what will happen if we crash in very specific situations, if we run out of memory at specific points in the code and various ways of changing the timing by inserting small sleeps in critical paths of the code.

During one full test run of Autotest, RonDB nodes are restarted thousands of times in all sorts of critical situations.

Autotest currently runs on Linux with a large variety of CPUs, Linux distributions and even on Windows using WSL 2 with Ubuntu.

Benchmark testing#

We test RonDB using the Sysbench test suite, DBT2 (an open source variant of TPC-C), flexAsynch (an internal key-value benchmark), DBT3 (an open source variant of TPC-H) and finally YCSB (Yahoo Cloud Serving Benchmark).

The focus is on testing RonDBs LATS capabilities (low Latency, high Availability, high Throughput and scalable Storage).

Hopsworks testing#

Finally we also execute tests in Hopsworks to ensure that it works with HopsFS, the distributed file system built on top of RonDB, and HSFS, the Feature Store designed on top of RonDB, and together with all other use cases of RonDB in the Hopsworks framework.

New features#

RONDB-479: Improved description of workings of the RonDB Memory manager#

Changed name of alloc_spare_page to alloc_emergency_page to make it easier to understand the difference between spare pages in DataMemory and allocation of emergency pages in rare situations.

Much added comments to make it easier to understand the memory manager code.

RONDB-474: Local optimisations#

1: Continue running if BOUNDED_DELAY jobs are around to execute

2: Local optimisation of Dblqh::execPACKED_SIGNAL

3: Inline Dbacc::initOpRec

4: Improve seize CachedArrayPool

RONDB-342: Replace md5 hash function with XX3_HASH64#

This change decreases the overhead for calculating the hash function by about 10x since the new hash function is better suited for SIMD instructions.

Bug Fixes#

RONDB-491: Fix for testNodeRestart -n Bug34216, bug in test case setup#

RONDB-490: Fixed testNodeRestart -n WatchdogSlowShutdown#

The delay in finishing watchdog shutdown was too long such that the other nodes decided to finish it using a heartbeat error.

Also removed a crash when nodes started wanting to connect in the wrong state. Could happen in shutdown situations.

RONDB-486: Map node group to start from 0 and be consecutive#

It is allowed to set Nodegroup id on a node. However for DBDIH to work these ids must start at 0 and be consecutive. Otherwise things will fall apart in lots of places.

To make it easier for the user we add a function that maps from node group to a node group that starts at 0 and continues consecutively. Thus for example if the user sets Nodegroup to 1 and 2 it will be mapped to 0 and 1.

RONDB-155: Fix for erroneus ndbassert#

After adding a lot of debug statements it was finally clear that the ndbassert on m_scanFragReqCount > 0 was incorrect for ordered index tables, but is still correct for base tables.

Thus the bug actually had nothing to do with RONDB-155.

RONDB-479: Double mutex lock led to watchdog failure#

RONDB-398: Ensure that we call do_send each loop to keep latency low#

When an LCP needs to allocate a page to handle a DELETE it could run out of Undo pages, to avoid that we release the page before we allocate the copy page, this needs to be done under mutex protection. The handling of this was buggy such that the mutex was taken twice leading to a watchdog crash.