Skip to content

Release Notes RonDB 22.05.0#

RonDB 22.05.0 is a BETA release of RonDB.

It is based on MySQL NDB Cluster 8.0.29 and RonDB 21.04.6 and RonDB 22.01.2.

RonDB 21.04 is a Long-Term Support version of RonDB that will be supported at least until 2024.

RonDB 22.01 is a Development version which will be supported through 2022.

RonDB 22.05 will not be supported, it is mainly intended for early users that want to try out the new functionality. The aim is to use it as the base for the next long-term support version of RonDB that we are planning for releasing some time later in 2022.

RonDB 22.05.0 is released as an open source SW with binary tarballs for usage in Linux usage and Mac OS X. It is developed on Linux and Mac OS X and using WSL 2 on Windows (Linux on Windows). The only supported version is currently the Linux/x86 version. The others are currently for development and testing. We plan to soon also support Linux/ARM. Mac OS X is a development platform and will continue to be so.

There is two ways to use RonDB 22.05.0:

  1. You can use the open source version and use the binary tarball and set it up yourself.

  2. You can use the open source version and build and set it up yourself.

    This is the commands you can use to retrieve the binary tarball:

    # Download x86_64 on Linux
    wget https://repo.hops.works/master/rondb-22.05.0-linux-glibc2.17-x86_64.tar.gz
    # Download ARM64 on Linux
    wget https://repo.hops.works/master/rondb-22.05.0-linux-glibc2.31-arm64_v8.tar.gz
    # Download x86_64 on Mac OS X (at least Mac OS X 11.6)
    wget https://repo.hops.works/master/rondb-22.05.0-macosx11.6-xcode-13.1-x86_64.tar.gz
    # Download ARM64 on Mac OS X (at least Mac OS X 12.3)
    wget https://repo.hops.works/master/rondb-22.05.0-macosx12.3-xcode-13.1-arm64_v8.tar.gz
    

Summary of changes in RonDB 22.05.0#

7 new features and 4 bug fixes have been added on top of the latest stable RonDB version, RonDB 21.04.6. In addition we have upgraded to base RonDB 22.05.0 on MySQL 8.0.29 whereas RonDB 21.04 was based on MySQL 8.0.23.

New features:

  1. Support variable sized disk rows

  2. Make query threads the likely scenario

  3. Move Schema Memory to global memory manager

  4. Improved placement of primary replicas.

  5. More flexibility in thread configuration.

  6. Removing index statistics mutex as bottleneck in MySQL Server

  7. Use Query threads also for Locked Reads

The flagship feature of this version is that we are making disk columns a flagship feature in RonDB. Previously this has been a feature possible to use for expert users. As an example HopsFS have built the capability to to store small files in RonDB in disk columns.

The disk columns previously had a limitation that disk rows always had a fixed size. Thus declaring a VARCHAR(100) using UTFMB4 meant that 400 bytes of space was used for each row independent of what was stored in the columns.

With RonDB 22.05 the disk rows only use the space they require, thus it depends on the data how much storage they use.

In Feature Store applications that often store arrays of numbers and arrays of status variables that are variable in size this leads to a significant saving of storage space. It can easily save a factor of 10x storage space.

The combination of this and the use of disk columns can lead to the ability to increase the storage space for Feature Stores with a factor of 10x while the price is still similar.

The development of modern SSDs and NVMe drives is thus now fully integrated into RonDB. Modern NVMe drives can handle very large loads, main memory is still more capable and will deliver much better latency and throughput, but at a higher cost. Thus with this new version of RonDB the user can decide based on his requirements whether to store the features in main memory columns or in disk columns.

New features#

Variable sized disk rows#

Storing columns on disk was introduced into MySQL NDB Cluster already in version 5.1. It has been constantly improved and efforts have been made to make it much more stable and performant. Thus in RonDB 22.05 the quality of disk columns is at par with in-memory columns.

Benchmark experiments available on www.rondb.com show how latency throughput is for in-memory columns and disk columns. The section on Scalable Storage focus on performance and latency of disk columns using the YCSB benchmark.

On mikaelronstrom.blogspot.com there are blogs from October 2020 about performance of large insert loads into tables using disk columns. These experiments shows how RonDB can handle much more than 1 GByte per second of insert loads into the disk columns with sufficient HW to support it.

In RonDB 22.05 we add on top of this performance and stability also the ability to use more compact representation of the data in disk columns. Previously each row had a fixed size, with RonDB 22.05 each disk row is using the same data structure used by in-memory columns. This means that we have support of storing columns of variable size in a variable sized row. The disk rows also support storing dynamic columns. This means that we will also be able to support online add of disk columns. This is planned to be finished in the next long term support version of RonDB. The online add disk column isn't supported in RonDB 22.05.

Make query threads the likely scenario#

An optimisation was made in the code such that the patch using query threads is the one the compiler will optimise on. Minor performance improvement.

Move Schema Memory to global memory manager#

This is the final step in a long series of development to move all major memory consumers to use the global memory manager. Now all major memory consumers use the global memory manager from RonDB 22.01 and all later versions.

This change is mostly an internal change that ensures that all Schema Memory objects are using the global memory manager. Already in RonDB 21.04 memory configuration was automatic, so this makes some memory more flexibly available.

Another major improvement in this change is the addition of a malloc-like interface to the global memory manager. This is used in a few places which e.g. means that we can now have any number of partitions per table in a single table independent of the number of LDM threads in the data node.

Move c_descPagePool from ArrayPool to TransientPool

Introduced getUncheckedPtr for RWPool64 records. This required also adding check_ptr_rw since Magic was calculated differently in RWPool64 compared to TransientPool.

Reorganised variables in fragment record to save some space.

Moved DescPage from TransientPool to RWPool64 and ensured that records are returned to the pool when no longer used.

Fixed test case result for ndb_basic_3rpl and ndb_basic_4rpl after change to test only 144 partitions.

Moved DeleteLcpFile in Backup block to RWPool64.

Moved the Trigger record in Backup block to TransientPool, it needs 32-bit i-values since the trigger id is sent in signals.

Moved Fragment record in Backup to malloc:ed memory using Schema Memory.

Ensure that we avoid breaking consecutive memory when not required to do so. This will ensure that we retain lots of consecutive even when we are close to running out of memory.

Introduced priorities in memory allocations. Organised configuration of memory configuration.

Added 3 levels of shared global memory. Low prio High prio Ultra prio

Moved some regions from ReplicationMemory back to SchemaMemory

Major rewrite of alloc_page(s)/release_page(s) of the global memory manager.

New way of handling tight memory situations.

Fixed resources table in ndbinfo

Fixed such that DataMemory overflow pages are counted as reserved, this avoids using more than 100% of the reserved pages in a memory region when checked from blocks.

More work on get_resource_limit

Fixed problems in handling the free bitmap

Fixed a missing return after send_scan_fragref

Remove ndb_transaction_memory_shortage_dbacc removed since we no longer use ACC scans.

Removed test cases no longer valid

Fixed const declaration of check routine

Remove use of ndbd from ATRT

Fixed getFragPtr method in restore using query thread

Handle takeover actions from Query threads better

A leftover ndbabort was removed. In addition a much more detailed comment on how the Query threads handle the take over is provided.

HOPSWORKS-2875: Improve crashlog#

  1. Interleave Signal and Jam dumps. Signals are printed NEWEST first, and under each signal the corresponding Jam entries, OLDEST first.

  2. Let printPACKED_SIGNAL detect whether we're in a crashlog dump. If so, print the contained/packed signals NEWEST first and under each signal the corresponding Jam entries, OLDEST first. When not in a crashlog dump, print the contained signals NEWEST first without Jam entries.

  3. Better formatting and messages

    1. Cases with missing/unmatched signals and Jam entries are handled gracefully.

    2. Legend added

    3. Print signal ids in hexadecimal form

    4. Don't print block number in packed signals

  4. JamEvents can have five types

    1. LINE: As before, show the line number in the dump

    2. DATA: Show the data in the dump with a \"d\" prefix to distinguish it from a line number. This type of entry is created by *jam*Data* macros (or the deprecated *jam*Line* macros). The data is silently truncated to 16bit.

    3. EMPTY: As before, do not show in the dump

    4. STARTOFSIG: Used to mark the start of a signal and to save the signal Id

    5. STARTOFPACKEDSIG: Used to mark the start of a packed signal and to save both its signal Id and pack index

  5. Update Jam macros

    1. Deprecate *jam*Line* macros and add *jam*Data* macros in their place

      1. jamBlockData

      2. jamData

      3. jamDataDebug

      4. thrjamData

      5. thrjamDataDebug

    2. Cleanup, add documentation, and add internal prefix to macros only used in other macro definitions

  6. Static asserts to make sure that

    1. EMULATED_JAM_SIZE is valid

    2. JAM_FILE_ID refers to the correct filename. This was previously tested occasionally, run-time in debugging builds. With this change the test is performed always and compile-time. The jamFileNames table and JamEvent::verifyId had to be moved from Emulator.cpp to Emulator.hpp in order to be available at compile-time.

    3. File id and line number fit in the number of bits used to store them

  7. Refactoring, comments etc.

  8. Refactor signaldata

    1. Introduce printHex function and use it to print Uint32 sequences
  9. jamFileNames maintenance

    1. Add test_jamFileNames.sh script to find problems in the jamFileNames[] table

    2. Add a unit test for test_jamFileNames.sh

    3. Correct the problems found

HOPSWORKS-2572: Improved placement of primary replicas#

The current distribution of primary replicas isn't optimal for the new fragmentation variants. With e.g. 8 fragments per table and 2 nodes we will find that the same LDM thread gets two fragments to act as primary for in a 4-LDM setup whereas the 2 LDM threads gets no primary replicas to handle.

This is handled by a better setup at creation of the table. However to also address handling of Not Active nodes we need to also redistribute the fragments at various events.

The redistribution is only allowed if all nodes have upgraded to 22.01 in the cluster. Older versions of RonDB will not redistribute and we need to ensure that all data nodes use the same primary replicas. If not we would cause a multitude of constant deadlocks.

There was issues in distributing the primary replicas at add fragment, the reuse of add_nodes_to_fragment required a minor modification and the tracking of which primary to use next used an incorrect index variable.

Nodegroups are not necessarily numbered from 0 and onwards. calc_primary_replicas need to take this into account.

This change improves performance by about 30% for the DBT2 benchmark.

HOPSWORKS-2525: More flexibility in thread configuration#

This patch serie was introduced mainly to be able to use RonDB to experiment with various thread configurations that typically wasn't supported in NDB. The main change is to enable to use receive threads for all types of thread types.

With these changes it is possible to e.g. run with only a set of receive threads.

The long-term goal of this patch is to find an even better configuration for automatic thread configuration.

Step 1: Added a new socket to each receive thread. This socket is used to wakeup the receive thread when communication from another thread wants to make use of the receive thread.

This feature is important when the receive thread is used for other activities than the receive handling. In this case the other thread needs the receive thread to immediately react. For other threads this wakeup happens through a futex_wake on Linux and a condition signal on other platforms.

However receive thread sleeps on either epoll_wait or on poll. Thus to only mechanism to wake those threads is by sending something to a socket that the receive thread listens to. This is where the extra socket comes into play. To wake the receive thread it is enough to send 1 byte to the receive thread and it will immediately wakeup.

To handle this conditional wakeup we added a boolean on the thread object indicating if it is a receive thread or not, we also added a reference to the TransporterReceiveHandle of the receive thread.

This patch enables all sorts of experimentation with setting up threads in new manners with the receive thread as an active participant in both receive and other activities.

Step 2: When using ThreadConfig and not creating any TC threads we will instead map the TC threads to the receiver threads.

Step 3: When TCSEIZEREQ arrives we will check which node sent the message and assign the DBTC instance that is colocated with the receive thread instance handling this node. This means that when receiving TCKEYREQ, SCAN_TABREQ, these signals will be sent locally to the same thread. This cuts some latency away and could potentially be a performance benefit.

Step 5: Removed the requirement that receive threads had the nosend flag set to 1. Updated the receive thread's main loop to reflect that it can also act as a block thread in addition to performing receive activities.

The receive thread does an extra flush after receiving data on transporters to ensure that execution of its own signals doesn't cause receive thread to slow down flushing signals to other threads. This included ensuring that send buffer pool is filled before starting to execute signals.

Simplified handling of alert_send_thread in that all send threads are woken up, also the one we will assist. This decreases the number of acquisitions of the send thread mutex and greatly simplifies the code.

do_send is called in the same manner as in block threads when no signals was executed.

Fixed a bug in missing wakeup of send threads in rare situations.

Clarified that code on handling load indicators are only required for LDM and Query threads.

Ensured that sendpacked is called in more situations to assist in NDBFS communication.

Step 7: Changed some parts of the automatic thread configuration. Now that we can handle TC and receive in the recv thread it is possible to make a bit more efficient of CPU resources in smaller configurations.

However using tc threads is still used in larger configs since it is still more efficient.

Step 8: This patch introduces one more variant of how to configure threads in RonDB data nodes. Previously the only configuration that didn't have specific LDM threads was a configuration with a single receive thread and a configuration with a single receive thread and a single main thread.

In this patch we enable a configuration with a large number of receive threads without any LDM threads. In this configuration the idea is that the receive threads will be able to do all work from start to beginning. Thus executing without a thread pipeline. The only need in traffic execution to not do everything in the local receive thread is handling of non-committed READs and any WRITE queries. These still can only be handled by the LQH that owns the data.

This configuration cannot be combined with TC threads, Query threads and Recover threads. Thus in this configuration we only have receive threads and possibly main thread(s).

In this configuration each receive thread has 1 LQH worker, one Query thread worker, 1 DBTC worker. This means that any Committed Read queries can be served fully in the receive thread.

Having send threads or not is still optional in this configuration.

In this configuration we don't activate any load distribution mechanisms to pick the right query thread. We always pick the local query thread worker.

We have made stronger division in this patch with the use of globalData.ndbMtLqhThreads vs globalData.ndbMtLqhWorkers and similarly for ndbMtQueryThreads/ndbMtQueryWorkers. Likewise we previously did the same thing for ndbMtTcThreads/ ndbMtTcWorkers.

There is no such distinction for ndbMtMainThreads and ndbMtReceiveThreads, there are no special variables for workers for these thread types and similarly not for send threads.

Step 9: Make it possible to set nosend=1 also on Query threads.

Step 10: Use only LDM threads with 4 CPUs, no specific gain with only 2 CPUs to use query threads.

Step 11: Previously performReceive first read from all transporters and then looped over all transporters to unpack the read data. This means that we sweep through the data twice, seems better to unpack data immediately after receiving the data. In the NDB API this even means that the signal execution happens when data is already in CPU caches. So could potentially provide even bigger benefits for the NDB API performance.

Step 12: Reorganised code in performReceive a bit. Fix of a potential lost signal during activation of multi transporter.

HOPSWORKS-2493: Index stat mutex bottleneck removed#

A major bottleneck in the MySQL Server is the index statistics mutex.

This is acquired 3 times per index lookup to gather index statistics. This becomes a bottleneck when Sysbench OLTP RW reaches around 10000 TPS with around 100 threads. Thus a severe limitation on scalability for the MySQL Server using RonDB.

To handle this we ensure that the hot path through the code doesn't need to acquire the global mutex at all. This is solved by using the NDB_SHARE mutex a bit more and making the ref_count variable an atomic variable.

Also needed to handle some global statistics variables. Fixed by adding them on local object and every now and then transferring to the global object.

HOPSWORKS-2573: Query thread improvement#

In MySQL Cluster 8.0.23 query threads was introduced. This meant that query threads could be used for READ COMMITTED queries. In this feature this is extended to also handle the PREPARE phase of LOCKED reads using key-value lookup through LQHKEYREQ.

This means more concurrency and provides a better scalability for applications that rely heavily on locked reads such as the benchmark DBT2.

BUG FIXES#

HOPSWORKS-2934: Stabilize output from ndbinfo_plans test case#

HOPSWORKS-2525: Wrong assert in recv_awake method#

The method recv_awake asserted that it was always called in state FS_SLEEPING, this wasn't correct, so removed this assert.

Use GCC 8 when compiling on Oracle Linux 7.9#

Our tests shows that binaries compiled using GCC 8 outperforms binaries compiled with GCC 10. Most likely GCC 10 is too aggressive in inlining. Until we have analysed this more extensively we will continue using GCC 8 to compile RonDB binaries.

HOPSWORKS-2908: Enable GCP stop#

Ensure that GCP stop is enabled by default.

Ensure that DBTC tracks long running transactions to print out outliers that cause DBTC to block GCPs.

Added more printouts when GCP stop is close to happening.

Added code to check if DBTC for some reason is making no progress on handling a GCP. Printouts added to enable better handling of this issue.

Put back the inactive transaction timeout to 40 days