Release Notes RonDB 22.01.1#
RonDB 22.01.1 is the second release of RonDB 22.01. It replaces RonDB 22.01.0 that contained a bug that made start of data nodes fail with more than 25 GByte of memory.
It is based on MySQL NDB Cluster 8.0.28 and RonDB 21.04.3.
RonDB 21.04 is a Long-Term Support version of RonDB that will be supported at least until 2024.
RonDB 22.01 is a Development version which will be supported through 2022.
RonDB 22.01.1 is released as an open source SW with binary tarballs for usage in Linux usage and Mac OS X. It is developed on Linux and Mac OS X and using WSL 2 on Windows (Linux on Windows). The only supported version is currently the Linux/x86 version. The others are currently for development and testing. We plan to soon also support Linux/ARM. Mac OS X is a development platform and will continue to be so.
There are two ways to use RonDB 22.01.1:
You can use the open source version and use the binary tarball and set it up yourself.
You can use the open source version and build and set it up yourself. This is the commands you can use:
# Download x86_64 on Linux wget https://repo.hops.works/master/rondb-22.01.1-linux-glibc2.17-x86_64.tar.gz # Download ARM64 on Linux wget https://repo.hops.works/master/rondb-22.01.1-linux-glibc2.28-arm64_v8.tar.gz # Download x86_64 on Mac OS X (at least Mac OS X 11.6) wget https://repo.hops.works/master/rondb-22.01.1-macosx11.6-xcode-13.1-x86_64.tar.gz # Download ARM64 on Mac OS X (at least Mac OS X 12.2) wget https://repo.hops.works/master/rondb-22.01.1-macosx12.2-xcode-13.1-arm64_v8.tar.gz
Summary of changes in RonDB 22.01.1#
6 new features and 4 bug fixes have been added on top of RonDB 21.04.3.
Make query threads the likely scenario
Move Schema Memory to global memory manager
Improved placement of primary replicas.
More flexibility in thread configuration.
Removing index statistics mutex as bottleneck in MySQL Server
Use Query threads also for Locked Reads
Make query threads the likely scenario#
An optimisation was made in the code such that the patch using query threads is the one the compiler will optimise on. Minor performance improvement.
Move Schema Memory to global memory manager#
This is the final step in a long series of development to move all major memory consumers to use the global memory manager. Now all major memory consumers use the global memory manager in RonDB 22.01.
This change is mostly an internal change that ensures that all Schema Memory objects are using the global memory manager. Already in RonDB 21.04 memory configuration was automatic, so this makes some memory more flexibly available.
Another major improvement in this change is the addition of a malloc-like interface to the global memory manager. This is used in a few places which e.g. means that we can now have any number of partitions per table in a single table independent of the number of LDM threads in the data node.
Move c_descPagePool from ArrayPool to TransientPool
Introduced getUncheckedPtr for RWPool64 records. This required also adding check_ptr_rw since Magic was calculated differently in RWPool64 compared to TransientPool.
Reorganised variables in fragment record to save some space.
Moved DescPage from TransientPool to RWPool64 and ensured that records are returned to the pool when no longer used.
Fixed test case result for ndb_basic_3rpl and ndb_basic_4rpl after change to test only 144 partitions.
Moved DeleteLcpFile in Backup block to RWPool64.
Moved the Trigger record in Backup block to TransientPool, it needs 32-bit i-values since the trigger id is sent in signals.
Moved Fragment record in Backup to malloc:ed memory using Schema Memory.
Ensure that we avoid breaking consecutive memory when not required to do so. This will ensure that we retain lots of consecutive even when we are close to running out of memory.
Introduced priorities in memory allocations. Organised configuration of memory configuration.
Added 3 levels of shared global memory. Low prio High prio Ultra prio
Moved some regions from ReplicationMemory back to SchemaMemory
Major rewrite of alloc_page(s)/release_page(s) of the global memory manager.
New way of handling tight memory situations.
Fixed resources table in ndbinfo
Fixed such that DataMemory overflow pages are counted as reserved, this avoids using more than 100% of the reserved pages in a memory region when checked from blocks.
More work on get_resource_limit
Fixed problems in handling the free bitmap
Fixed a missing return after send_scan_fragref
Remove ndb_transaction_memory_shortage_dbacc removed since we no longer use ACC scans.
Removed test cases no longer valid
Fixed const declaration of check routine
Remove use of ndbd from ATRT
Fixed getFragPtr method in restore using query thread
Handle takeover actions from Query threads better
A leftover ndbabort was removed. In addition a much more detailed comment on how the Query threads handle the take over is provided.
HOPSWORKS-2875: Improve crashlog#
Interleave Signal and Jam dumps. Signals are printed NEWEST first, and under each signal the corresponding Jam entries, OLDEST first.
Let printPACKED_SIGNAL detect whether we're in a crashlog dump. If so, print the contained/packed signals NEWEST first and under each signal the corresponding Jam entries, OLDEST first. When not in a crashlog dump, print the contained signals NEWEST first without Jam entries.
Better formatting and messages
Cases with missing/unmatched signals and Jam entries are handled gracefully.
Print signal ids in hexadecimal form
Don't print block number in packed signals
JamEvents can have five types
LINE: As before, show the line number in the dump
DATA: Show the data in the dump with a \"d\" prefix to distinguish it from a line number. This type of entry is created by *jam*Data* macros (or the deprecated *jam*Line* macros). The data is silently truncated to 16bit.
EMPTY: As before, do not show in the dump
STARTOFSIG: Used to mark the start of a signal and to save the signal Id
STARTOFPACKEDSIG: Used to mark the start of a packed signal and to save both its signal Id and pack index
Update Jam macros
Deprecate *jam*Line* macros and add *jam*Data* macros in their place
Cleanup, add documentation, and add internal prefix to macros only used in other macro definitions
Static asserts to make sure that
EMULATED_JAM_SIZE is valid
JAM_FILE_ID refers to the correct filename. This was previously tested occasionally, run-time in debugging builds. With this change the test is performed always and compile-time. The jamFileNames table and JamEvent::verifyId had to be moved from Emulator.cpp to Emulator.hpp in order to be available at compile-time.
File id and line number fit in the number of bits used to store them
Refactoring, comments etc.
- Introduce printHex function and use it to print Uint32 sequences
Add test_jamFileNames.sh script to find problems in the jamFileNames table
Add a unit test for test_jamFileNames.sh
Correct the problems found
HOPSWORKS-2572: Improved placement of primary replicas#
The current distribution of primary replicas isn't optimal for the new fragmentation variants. With e.g. 8 fragments per table and 2 nodes we will find that the same LDM thread gets two fragments to act as primary for in a 4-LDM setup whereas the 2 LDM threads gets no primary replicas to handle.
This is handled by a better setup at creation of the table. However to also address handling of Not Active nodes we need to also redistribute the fragments at various events.
The redistribution is only allowed if all nodes have upgraded to 22.01 in the cluster. Older versions of RonDB will not redistribute and we need to ensure that all data nodes use the same primary replicas. If not we would cause a multitude of constant deadlocks.
There was issues in distributing the primary replicas at add fragment, the reuse of add_nodes_to_fragment required a minor modification and the tracking of which primary to use next used an incorrect index variable.
Nodegroups are not necessarily numbered from 0 and onwards. calc_primary_replicas need to take this into account.
This change improves performance by about 30% for the DBT2 benchmark.
HOPSWORKS-2525: More flexibility in thread configuration#
This patch serie was introduced mainly to be able to use RonDB to experiment with various thread configurations that typically wasn't supported in NDB. The main change is to enable to use receive threads for all types of thread types.
With these changes it is possible to e.g. run with only a set of receive threads.
The long-term goal of this patch is to find an even better configuration for automatic thread configuration.
Step 1: Added a new socket to each receive thread. This socket is used to wakeup the receive thread when communication from another thread wants to make use of the receive thread.
This feature is important when the receive thread is used for other activities than the receive handling. In this case the other thread needs the receive thread to immediately react. For other threads this wakeup happens through a futex_wake on Linux and a condition signal on other platforms.
However receive thread sleeps on either epoll_wait or on poll. Thus to only mechanism to wake those threads is by sending something to a socket that the receive thread listens to. This is where the extra socket comes into play. To wake the receive thread it is enough to send 1 byte to the receive thread and it will immediately wakeup.
To handle this conditional wakeup we added a boolean on the thread object indicating if it is a receive thread or not, we also added a reference to the TransporterReceiveHandle of the receive thread.
This patch enables all sorts of experimentation with setting up threads in new manners with the receive thread as an active participant in both receive and other activities.
Step 2: When using ThreadConfig and not creating any TC threads we will instead map the TC threads to the receiver threads.
Step 3: When TCSEIZEREQ arrives we will check which node sent the message and assign the DBTC instance that is colocated with the receive thread instance handling this node. This means that when receiving TCKEYREQ, SCAN_TABREQ, these signals will be sent locally to the same thread. This cuts some latency away and could potentially be a performance benefit.
Step 5: Removed the requirement that receive threads had the nosend flag set to 1. Updated the receive thread's main loop to reflect that it can also act as a block thread in addition to performing receive activities.
The receive thread does an extra flush after receiving data on transporters to ensure that execution of its own signals doesn't cause receive thread to slow down flushing signals to other threads. This included ensuring that send buffer pool is filled before starting to execute signals.
Simplified handling of alert_send_thread in that all send threads are woken up, also the one we will assist. This decreases the number of acquisitions of the send thread mutex and greatly simplifies the code.
do_send is called in the same manner as in block threads when no signals was executed.
Fixed a bug in missing wakeup of send threads in rare situations.
Clarified that code on handling load indicators are only required for LDM and Query threads.
Ensured that sendpacked is called in more situations to assist in NDBFS communication.
Step 7: Changed some parts of the automatic thread configuration. Now that we can handle TC and receive in the recv thread it is possible to make a bit more efficient of CPU resources in smaller configurations.
However using tc threads is still used in larger configs since it is still more efficient.
Step 8: This patch introduces one more variant of how to configure threads in RonDB data nodes. Previously the only configuration that didn't have specific LDM threads was a configuration with a single receive thread and a configuration with a single receive thread and a single main thread.
In this patch we enable a configuration with a large number of receive threads without any LDM threads. In this configuration the idea is that the receive threads will be able to do all work from start to beginning. Thus executing without a thread pipeline. The only need in traffic execution to not do everything in the local receive thread is handling of non-committed READs and any WRITE queries. These still can only be handled by the LQH that owns the data.
This configuration cannot be combined with TC threads, Query threads and Recover threads. Thus in this configuration we only have receive threads and possibly main thread(s).
In this configuration each receive thread has 1 LQH worker, one Query thread worker, 1 DBTC worker. This means that any Committed Read queries can be served fully in the receive thread.
Having send threads or not is still optional in this configuration.
In this configuration we don't activate any load distribution mechanisms to pick the right query thread. We always pick the local query thread worker.
We have made stronger division in this patch with the use of globalData.ndbMtLqhThreads vs globalData.ndbMtLqhWorkers and similarly for ndbMtQueryThreads/ndbMtQueryWorkers. Likewise we previously did the same thing for ndbMtTcThreads/ ndbMtTcWorkers.
There is no such distinction for ndbMtMainThreads and ndbMtReceiveThreads, there are no special variables for workers for these thread types and similarly not for send threads.
Step 9: Make it possible to set nosend=1 also on Query threads.
Step 10: Use only LDM threads with 4 CPUs, no specific gain with only 2 CPUs to use query threads.
Step 11: Previously performReceive first read from all transporters and then looped over all transporters to unpack the read data. This means that we sweep through the data twice, seems better to unpack data immediately after receiving the data. In the NDB API this even means that the signal execution happens when data is already in CPU caches. So could potentially provide even bigger benefits for the NDB API performance.
Step 12: Reorganised code in performReceive a bit. Fix of a potential lost signal during activation of multi transporter.
HOPSWORKS-2493: Index stat mutex bottleneck removed#
A major bottleneck in the MySQL Server is the index statistics mutex.
This is acquired 3 times per index lookup to gather index statistics. This becomes a bottleneck when Sysbench OLTP RW reaches around 10000 TPS with around 100 threads. Thus a severe limitation on scalability for the MySQL Server using RonDB.
To handle this we ensure that the hot path through the code doesn't need to acquire the global mutex at all. This is solved by using the NDB_SHARE mutex a bit more and making the ref_count variable an atomic variable.
Also needed to handle some global statistics variables. Fixed by adding them on local object and every now and then transferring to the global object.
HOPSWORKS-2573: Query thread improvement#
In MySQL Cluster 8.0.23 query threads was introduced. This meant that query threads could be used for READ COMMITTED queries. In this feature this is extended to also handle the PREPARE phase of LOCKED reads using key-value lookup through LQHKEYREQ.
This means more concurrency and provides a better scalability for applications that rely heavily on locked reads such as the benchmark DBT2.
HOPSWORKS-2934: Stabilize output from ndbinfo_plans test case#
HOPSWORKS-2525: Wrong assert in recv_awake method#
The method recv_awake asserted that it was always called in state FS_SLEEPING, this wasn't correct, so removed this assert.
Use GCC 8 when compiling on Oracle Linux 7.9#
Our tests shows that binaries compiled using GCC 8 outperforms binaries compiled with GCC 10. Most likely GCC 10 is too aggressive in inlining. Until we have analysed this more extensively we will continue using GCC 8 to compile RonDB binaries.
HOPSWORKS-2908: Enable GCP stop#
Ensure that GCP stop is enabled by default.
Ensure that DBTC tracks long running transactions to print out outliers that cause DBTC to block GCPs.
Added more printouts when GCP stop is close to happening.
Added code to check if DBTC for some reason is making no progress on handling a GCP. Printouts added to enable better handling of this issue.
Put back the inactive transaction timeout to 40 days