Release Notes RonDB 21.04.1#

RonDB 21.04.1 is the second RonDB release. It is based on MySQL NDB Cluster 8.0.23. It is a bug fix release based on RonDB 21.04.0.

RonDB 21.04.1 is released as an open source SW with a binary tarball for Linux usage. It is developed on Linux and Mac OS X and is occasionally tested on FreeBSD.

There are four ways to use RonDB 21.04.1:

You can use the managed version available on hopsworks.ai. This sets up a RonDB cluster provided a few details on the HW resources to use. The RonDB cluster is integrated with Hopsworks and can be used for both RonDB applications as well as for Hopsworks applications. Currently AWS is supported, Azure support is soon available.
You can use the cloud scripts that will enable you to set up in an easy manner a cluster on Azure or GCP. This requires no previous knowledge of RonDB, the script only needs a description of the HW resources to use and the rest of the set up is automated.
You can use the open source version and use the binary tarball and set it up yourself.
You can use the open source version and build and set it up yourself

RonDB 21.04 is a Long Term Support version that will be maintained until at least 2024.

Maintaining 21.04 means mainly fixing critical bugs and minor change requests. It doesn’t involve merging with any future release of MySQL NDB Cluster, this will be handled in newer RonDB releases.

Summary of changes in RonDB 21.04.1#

3 new features and 18 bug fixes.

New features:

Add support for Longvarchar data types in ClusterJ.
Handling Auto Increment in ndb_import program
Kill -TERM causes graceful stop of data node

New features#

HOPSWORKS-2524#

Add support for Longvarchar data types in ClusterJ. This is necessary to handle primary keys that are defined as VARCHAR(x) where max size of this column is larger than 255 bytes. Thus even VARCHAR(64) can be a problem with character sets that are 4 byte per character.

Added tests also for Varchar as primary key which was previously missing although it was supported.

HOPSWORKS-2565#

ndb_import will use values from CSV files to set values of auto increment values. It will however not ensure that the auto increment values set in RonDB will be correct after the ndb_import process.

To solve this we introduce a new option to ndb_import called --use-auto-increment, this will ignore the settings from the CSV file and instead set an autoincrement value, this will ensure that INSERTs to the table will be correct and working after importing data using ndb_import.

If this option isn’t set it is necessary to set a new starting value for the auto increment. This can be done e.g. by the following transaction:

BEGIN; INSERT INTO table (column list) (auto_increment_value, values) ROLLBACK;

This will set the new auto increment value in the table.

This version of ndb_import is integrated in the DBT2 benchmark provided with the RonDB release.

HOPSWORKS-2610#

For RonDB to interact with systemd in a good way we need to enable graceful shutdown even if it is started using kill -TERM ndbmtd.

The method in NDB to handle shutdown isn’t working very well since it is hard to know which processes to stop to really stop ndbmtd. However most use cases can use killall -TERM ndbmtd to kill all ndbmtd processes in the VM. This patch ensures that the above command leads to a graceful shutdown that completes within 30 seconds.

BUG FIXES#

Fixed NPE caused after unload schema#

The new improvements in RonDB 21.04.0 had a problem with unload schema.

HOPSWORKS-2532#

In the situation that block threads assisted send threads, they didn’t transfer the responsibility back to the send thread when they didn’t complete their assistance. This only applied when assistance failed to send all data to a certain transporter, the send was usually picked up by other threads quite quickly, so impact was some outliers in latency.

HOPSWORKS-2547#

During setup of multiple transporters between two data nodes in a node group, we can miss sending the ACTIVATE_TRP_REQ signal.

This was reproducible with some changes that made receive threads more active in executing signals. It wasn’t reproducible in RonDB 21.04.

HOPSWORKS-2365 fix#

Don’t report that node is deactivated until we checked that node is part of configuration.

HOPSWORKS-2407#

We don’t support --continue for ndb_import any more. We added support for handling rejected lines in CSV files. We still don’t support any errors in NDB API processing. Fixed ndb_import test programs to reflect this.

Fix memory leak in ndb_mgmd#

When a new configuration was created we removed all LogHandlers currently used and replaced them with the new ones as decided by the new configuration. However in doing so we created a new FileLogHandler object, but when removing the old objects this object was not destroyed, this led to a leak of an open file descriptor eventually leading to running out of open file descriptors.

Fixed by calling the destructor of FileLogHandler as part of destructor of BufferedFileHandler object which is the one placed in the list of LogHandlers on g_eventLogger.

This fixes the test case test_mgm.

Fix 2 test cases in testMgmd#

UnresolvedHosts2 assumed that the node would wait for 30 seconds before failing when connecting from the wrong hostname, this failed instantly.

Bug45495 test case failed due to a check that should have been removed more than 10 years ago when fixing Bug45495. It assumed that reload in parallel could only be applied when nodes were starting in node id order.

Fix Bug56844 test case in testMgmd#

If 2 management servers are restarted simultaneously to change the config and they both intend to use the same config we will allow this change to go through even if we already prepared the configuration change.

HOPSWORKS-2570#

seizeLogpage fails in ptrCheckGuard in a properly allocated log pages. It fails since we assume that all the log pages are consecutive while allowing them to be allocated in chunks.

The simple solution is to ensure that they are actually allocated consecutively which is what happens when we require alloc_pages in ndbd_malloc.cpp to return all pages in 1 chunk from the allocChunks method.

HOPSWORKS-2571#

Minor changes to automatic thread configuration. More tc threads, less receive and send threads. Colocate tc threads with send, recv, main and rep threads on the same CPU cores.

Fix ClusterJ default cached sessions#

HOPSWORKS-2579#

ClusterJ uses a scratch buffer for primary key hash calculations which is limited to 10000 bytes. This is too small in some cases.

The solution is to malloc the buffer in computeHash if the buffer isn’t big enough. This also solves some application problems.

This patch contains a test for this bug that ensures that the primary key requires a buffer larger than 10000 bytes.

Fix ARM HW configuration and ARM HW on OCI#

This is a first step into supporting ARM HW for RonDB. There are still known issues, but it works for prototyping and testing purposes.

HOPSWORKS-2591#

releaseTcCon uses a block variable as input parameter. However this input parameter is changed by the call to abortTransFromTrigger which leads to release of the wrong record and later that can lead to a crash.

It only happens after a successful allocation of a TC connect record followed by an unsuccessful allocation of an API connect record. Thus will almost never happen in practice. But it does happen in the test case ndb_transaction_memory_shortage.

HOPSWORKS-2603#

In a single node setup all restarts are system restarts. In this case if the node fails in the last few seconds of the restart this leads to a node in the state not recoverable on its own. Obviously a node in a single node setup cannot rely on any other node for recovery so this isn’t correct.

The problem with nodes not being able to recover on its own is related to node restart as explained in a large comment in Backup.cpp. Thus System restarts should never set the state to NODE_NOT_RECOVERABLE_ON_ITS_OWN

ndb_restore_conv_remap_debug fails due to blob backup log order#

Backport of bug fix from MySQL Cluster 8.0.24.

Fixes a corruption that could occur with replication to other RonDB cluster involving BLOBs.

HOPSWORKS-2651#

Receiving SCAN_NEXTREQ in the real-time break when waiting for ACC_ABORTREQ causes havoc since we call relinkScan without any context having been setup. Avoided by not calling relinkScan when coming to closeScanRequestLab in the state WAIT_CLOSE_SCAN.

This bug has many different ways of crashing dependent on which scan context was present before this signal was executed.

This bug hasn’t been reproduced in RonDB 21.04.0, it was however reproducible using MySQL 8.0.24 using the DBT2 benchmark.

HOPSWORKS-2652#

When an update causes a delete of a row in an ordered index we will move any scan pointers that currently points to the deleted row. We move this row using scanNext and after this we call relinkScan to move the position such that scan continues from the correct position.

In this we used our scan instance instead of using the scan instance of the scan record we are moving.