Release Notes RonDB 21.04.2#

RonDB 21.04.2 is the third release of RonDB 21.04. It is based on MySQL NDB Cluster 8.0.23. It is a bug fix release based on RonDB 21.04.1.

RonDB 21.04.2 is released as an open source SW with binary tarballs for usage in Linux usage and Mac OS X. It is developed on Linux and Mac OS X and using WSL 2 on Windows (Linux on Windows).

RonDB 21.04.2 can be used with both x86_64 and ARM64 architectures although ARM64 is still in beta state.

There are four ways to use RonDB 21.04.2:

You can use the managed version available on hopsworks.ai. This sets up a RonDB cluster provided a few details on the HW resources to use. The RonDB cluster is integrated with Hopsworks and can be used for both RonDB applications as well as for Hopsworks applications. Currently AWS is supported, Azure support is soon available.
You can use the cloud scripts that will enable you to set up in an easy manner a cluster on Azure or GCP. This requires no previous knowledge of RonDB, the script only needs a description of the HW resources to use and the rest of the set up is automated.
You can use the open source version and use the binary tarball and set it up yourself.
You can use the open source version and build and set it up yourself

RonDB 21.04 is a Long Term Support version that will be maintained until at least 2024.

Maintaining 21.04 means mainly fixing critical bugs and minor change requests. It doesn’t involve merging with any future release of MySQL NDB Cluster, this will be handled in newer RonDB releases.

Backports of critical bug fixes from MySQL NDB Cluster will happen.

Summary of changes in RonDB 21.04.2#

6 new features and 29 bug fixes and 27 backports of bug fixes from MySQL 8.0.24 through 8.0.27. In addition a number of minor fixes of test cases.

New features:

Support larger transactions in RonDB
Improved error handling when no contact with cluster (4009)
Binary tarballs for Mac OS X on x86_64
Beta support for Mac OS X on ARM64
Beta support for Linux on ARM64
Running RonDB on WSL 2 (Linux on Windows)

New features#

HOPSWORKS-2783 Support larger transactions#

Currently large transactions can easily cause job buffer explosion or send buffer explosion. By introducing batching of abort and commit and complete requests we ensure that no buffer explosions occur. Also introduced CONTINUEB handling of releasing records belonging to a large transaction to sustain low latency even in the context of large transactions.

Previously Take over processing used 1 node operation at a time. This was safe against overload, but obviously can make large transactions take ridiculously long time. A transaction with 1 million operations in a 3 replica cluster would have to do 3 million round trips of which 2 million would be to remote nodes at least. Thus could easily take a couple of hours.

In this patch we instead ensure that we send a batch and when half of the batch has returned we start sending the next batch. Using batch sizes of around 1k, we ensure that we can commit very large transactions at least within a few seconds.

Much of the take over processing also used loops over all operations, this is changed to handle a batch per real-time break. Thus ensuring that other transactions are not badly affected by the large transactions.

To simplify code we don’t handle timeouts while we are still sending batches, similarly while we are transforming the transaction from the normal transaction handling to the take over variant.

The take over variant is currently always used when a TC thread takes over transactions from a failed node. It is also used when we have a timeout in the commit and complete processing.

We can later introduce other reasons to use this code. It sends COMMIT and COMPLETE messages to all participants which increases load on CPUs and networks, but it also decreases latency of commit operations.

Standardised such that all handling of timeouts used the take over processing path.

Introduced a new configuration variable LowLatency that is a boolean that defaults to off. If not set we get the normal commit behaviour with linear commit. When set we use normal commit that sends the commit to all nodes in parallel which should decrease latency at the expense of higher networking costs, both CPU wise and bandwidth-wise.

This affects the COMMIT phase and COMPLETE phase. It doesn’t affect the prepare phase. Using Low latency means that we will always wait with releasing the lock until the complete phase.

It is mainly useful in clusters with high latency and clusters with 3-4 replicas.

HOPSWORKS-2756#

The error 4009 has a short text Cluster Failure. However the error could be due to many different causes. To extend the possibility to troubleshoot this we extended those error messages to have more elaborate error messages and more error variants.

Also added more logging to the MySQL error log when nodes connect, disconnect, goes alive and goes dead to see why they are available and unavailable.

ARM64 support#

In this version we introduce binary tarballs for ARM64 platforms. These versions have been tested and verifed on both Linux using Oracle Linux and on Mac OS X. The support for ARM64 is still in beta state. We will continue to add more test and verification of our ARM64 binaries.

Mac OS X support#

RonDB is developed on a mix of Mac OS X and Linux platforms. Thus it is natural to extend the support for RonDB on Mac OS X. What is new is that now this has also been extended to the newest ARM64 Macs.

The x86_64 version was built on Mac OS X 11.6 using XCode 13.1. The ARM64 version was built on a Mac OS X 12.1 using XCode 13.1.

WSL 2 support#

RonDB doesn’t support running directly on Windows. However with the release of Windows 11 and the release of WSL 2 we feel that the best way of running RonDB on Windows is to execute it with WSL 2, the new Linux subsystem on Windows.

We have added testing using both MTR and Autotest (mentioned below) to Windows. The testing on Windows using WSL 2 have been a great success and using Remote Desktop it is possible to have Windows running the full test suite without manual intervention.

Test environment#

RonDB has a number of unit tests that are executed as part of the build process.

MTR testing#

RonDB has a functional test suite using the MTR (MySQL Test Run) that executes several hundred RonDB specific test programs. In adition there are thousands of test cases for the MySQL functionality. MTR is executed on both Mac OS X and Linux.

We also have a special mode of MTR testing where we can run with different versions of RonDB in the same cluster to verify our support of online software upgrade.

Autotest#

RonDB is very focused on high availability. This is tested using a test infrastructure we call Autotest. It contains also many hundreds of test variants that takes around 36 hours to execute the full set. One test run with Autotest uses a specific configuration of RonDB. We execute multiple such configurations varying the number of data nodes, the replication factor and the thread and memory setup.

An important part of this testing framework is that it uses error injection. This means that we can test exactly what will happen if we crash in very specific situations, if we run out of memory at specific points in the code and various ways of changing the timing by inserting small sleeps in critical paths of the code.

During one full test run of Autotest RonDB nodes are restarted thousands of times in all sorts of critical situations.

Autotest currently runs on Linux with a large variety of CPUs, Linux distributions and even on Windows using WSL 2 with Ubuntu.

Hopsworks testing#

Finally we also execute tests in Hopsworks to ensure that it works with HopsFS, the distributed file system built on top of RonDB, and HSFS, the Feature Store designed on top of RonDB, and together with all other use cases of RonDB in the Hopsworks framework.

BUG FIXES#

HOPSWORKS-2885#

Sum of const char* isn’t portable. ARM64 treats this as an unsigned char and x86 treats it as signed char. Fix in ndbapi-examples.

HOPSWORKS-2889#

We keep GCC 8 for building releases on Red Hat 7. For ARM64 we always require GCC 10.

Fix some issues with testing memory manager on ARM due to 64k page size.

HOPSWORKS-2886#

Add lots of debug code to understand how usageCountR is handled by adding a linked list with TcConnectionrec-records on the table object and some other debugging.

Query threads don’t clean up after scan operations when the TC node fails. For this to happen it is required that TC and LDMs are on separate nodes and that the scan operation requires going back to the API, the failure must also occur while the LDM thread is processing the scan such that the SCAN_FRAGCONF is sent but not received due to the node failure. Thus fairly rare, but also important to fix since it means that the table can’t be dropped until the node with the scan is killed.

HOPSWORKS-2883#

It is possible for ALTER_TAB_REQ to come in between GET_TABINFOCONF and BACKUP_FRAGMENT_REQ, thus changing maxRecordSize. Thus we need to set maxRecordSize in BACKUP_FRAGMENT_REQ for LCPs.

HOPSWORKS-2874 Bugfix in Dbtc::initialiseTcConnect#

Use configured reserved record count in Dbtc::initialiseTcConnect rather than a constant.

Change default number of reserved records to a constant.

Add comments.

Fixes for handling out-of-memory alert message#

HOPSWORKS-2871#

Fix of testOIBasic -case d test case

The problem was pkupdatescanread This test case runs 4 threads where some thread can perform pkupdate, some perform scans using index and some makes lookups using a unique hash index. The problem is related to performing updates in pkupdate concurrent with reads using unique keys.

pkupdate can update the rows in parallel with reads in hashindexread. This means that the result of the operation is random, this means that the test case fails regularly.

To make the test case predictable in its result we have to stop read UK and pkupdate to not run concurrently.

Solution: Introduce new state StRead and m_read_ref_count to keep track of number of concurrent readers.

Ensure that writes will not be allowed when the state is StRead. Similarly ensure that reads will not be allowed to proceed when the state is not equal to StCommit.

Change hashindexread such that it calls a selrow method that sets state to StRead and increments reference counter. Ensure that the state is reset after completion in a new method called post_read. Ensure that both those methods are called under mutex protection.

testOIBasic.cpp was quite hard to get into due to a great lack of comments. Added a bunch of new comments and fixed too long lines.

Fix bug in JamEvent::isEmpty#

It is common for an empty JamEvent to be marked as the last event before the execution of the next incoming signal. In these cases, JamEvent::isEmpty used to erroneously return false, causing the Jam event log to contain lines for file "unknown_file_32767", line 65535.

HOPSWORKS-2862#

Crash when using NumCPUs=3 and added it to test runs.

HOPSWORKS-2861#

Fixed a problem where a node restart hangs. This can happen in 3 replica setups where another node has decided to use only 1 transporter in the multi transporter setup and another node is using multi transporter setups.

Fixed by being done with a node immediately when the node has only 1 transporter which is already there.

Removed some debug statements and added more debugging of multi transporter setups.

HOPSWORKS-2783#

Previously an interaction with RonDB would time out after 6 minutes of inactivity from the cluster. Committing and aborting massively large transactions can potentially exceed this limit, to avoid problems in this case a heartbeat signal has been added that ensures that we don’t time out as long as the data nodes are still actively executing the request.

HOPSWORKS-2859#

Fix problem with using wrong variable for block reference in sending delayed LQH_TRANSREQ (only affects testing).

Fix issue where GCP records that are left overs from node failure handling cause us to send GCP_TCFINISHED before the GCP is actually done in the TC thread that just completed handling a node failure.

Fixed by removing those before processing the GCP_NOMORETRANS. Also ensured that the GCP records are in sorted order in the list.

HOPSWORKS-2848#

Add read memory barrier before reading data in signal. Problem only seen on Apple M1 Pro CPUs, not on Appe M1 CPUs.

HOPSWORKS-2847#

On ARM64 we can have race conditions when performing NdbCondition_Signal that causes bus errors since the condition signal is no longer existing since it has been destroyed. By calling it with mutex locked we avoid this race condition.

HOPSWORKS-2846#

We missed some scans doing complete_scan_on_table since the scan cookie didn’t change in accordance with the m_scan_count changes. To fix we use m_scan_reorg_flag as scan cookie instead which changes when m_scan_count is changed.

Also fixed timeout printouts to cluster log.

HOPSWORKS-2851#

Dropped tables can cause a crash by calling complete_scan_on_table after the table has been dropped and a new table has been created using the same i-value of the table.

To avoid this we ensure that all scans remember the schema version they started the scan against. We check that the schema version is not a schema version of a new table. It is ok that the schema version has changed due to an ALTER TABLE.

In the rare case that table version is 0xFFFFFF in the least 24 significant bits (which is the value we set it to after dropping the table) we keep the values of m_scan_count to avoid crashes in debug/error build mode.

The schema version cookie is returned in DIH_SCAN_TAB_CONF and sent to DIH in DIH_SCAN_TAB_COMPLETE_REP. This signal is used by DBTC, DBSPJ, BACKUP and SUMA.

Make sure that scan completions are only handled with the correct table schema version. This ensures that left over scans won’t crash the node if a scan is not completed before the drop table is completed.

HOPSWORKS-2834#

In rare situations we can have multi socket setup hang. To handle this better more reporting of progress is required. Also a few fixes are required to avoid race conditions.

More documentation of the code around multi sockets is also added.

Clusterj reconnection fixes#

HOPSWORKS-2831#

check_node_recovery_timers missed handling of INCLUDED_IN_HB_PROTOCOL when it comes directly from NODE_NOT_RESTARTED_YET.

HOPSWORKS-2830#

Sometimes we get more than 1 COPY_FRAGREQ signal in parallel to DBLQH. In this case we will often save a corrupt signal since before we save it we have used the signal object to send another signal. Thus we need to save a copy of the signal object and use a pointer to the copy rather than the signal object itself.

Use the same method on COPY_ACTIVEREQ although there is no bug in that code at the moment.

HOPSWORKS-2816#

Crash on GCP too new erroneusly in commitGciHandling

When handling node failures we will crash if we find a GCP record for a newer than a committed transaction after node failure handling. The correct behaviour is to crash only if this GCP has received GCP_NOMORETRANS.

HOPSWORKS-2691#

Possible to mix TRANSID_AI signals from A-level signal with JBB and JBA and this could lead to TRANSID_AI arriving after SCAN_FRAGCONF. Should only happen with backups and not when using multithreaded backup.

HOPSWORKS-2739#

Date type needs to be supported also when using Dynamic classes as primary key.

HOPSWORKS-2740#

Fix an upgrade issue from 7.x to 8.x/RonDB

HOPSWORKS-2747#

Missing return after send_scan_fragref Fixed test case for ndb_transaction_memory_shortage

HOPSWORKS-2749#

We can get constant failures due to 1204, distribution key errors in node 1 with 3-4 replicas. We miss spreading the new distribution key to node 1 when other nodes are primary replicas and starting nodes for the fragment.

HOPSWORKS-2752#

Fix such that fragmented signals to other nodes are not sent to query thread in other nodes.

HOPSWORKS-2755#

Suma can after a node failure resend buffers from a bucket that belongs to a subscription already released.

Fixed by checking that subscription isn’t released before use and by introducing an autoincrement value to the subscription and to the buffer in the bucket that can be used at resend.

HOPSWORKS-2816#

Crash on GCP too new erroneusly in commitGciHandling

When handling node failures we will crash if we find a GCP record for a newer than a committed transaction after node failure handling. The correct behaviour is to crash only if this GCP has received GCP_NOMORETRANS.

Backport of bug fixes from MySQL NDB Cluster#

For more details on these bug fixes, check the git log.

Bug 32198728#

Fix of ARM64 builds.

Bug 32381003#

Fix of ARM64 builds.

Bug 32891206#

Fix crash in ndb_dd_restart test case. Could cause crashes in schema operation in some situations.

Bug 32924533#

Improved condition pushdown to push conditions also to tables inside views and subqueries.

Bug 27538524#

Fix problem in ndb_rpl_conflict_read_tracking test case.

Bug 32886430#

Fix build issues involving Docker.

Bug 32774281#

Improved NULL handling of multi-range lookups

Bug 13881465#

Delete by scan could cause rows in BLOB tables to be orphaned.

Bug 31958327#

Case is not properly reflected in RonDB dictionary.

Bug 32686116#

Fix crash in MySQL meta data layer.

Bug 32701583#

Fix crash when defining a node group that contains an already used node id.

Bug 32068551#

Increase latency between empty checkpoints to decrease load due to checkpoints.

Bug 32413458#

Fix arithmetic exception in ndb_init method.

Bug 31925977#

Fix of calculation of REDO log alert levels with more than 1 log part per LDM.

Bug 32354817#

Fix for a crash in NDB storage engine when preparing a pushed join query.

Bug 32459686#

Fix for error check of seize backup records.

Bug 32413686#

Fix error message of Job buffer full, mixed from/to.

Bug 32257374#

Fix crash in ndb_restore on wrong arguments.

Bug 33075828#

Fix crash in routines using CAST and DEFAULT.

Bug 33161080#

Ordered index scans can crash in certain situations.

Bug 33206293#

The backup block can receive signals out of order. Fix was already in RonDB, merged in the change in MySQL NDB Cluster. See HOPSWORKS-2691 above.

Bug 33181964#

Fix incorrect result from pushed join query with OUTER JOIN inside an EXISTS subquery.

Bug 32593352#

DDL can hang if schema operation event is lost.

Bug 32997832#

Fix error code handling in global replication such that temporary errors don’t lead to broken replication.

Bug 33019959#

Added checks of array index in Suma.cpp

Bug 32920099#

Fix initialisation of BIT columns

Bug 32339789#

Added message printout when ndb_mgmd is waiting for second ndb_mgmd to start.