Release Notes RonDB 21.10.1#
RonDB 21.10.1 is a beta release of RonDB 21.10. It is based on MySQL NDB Cluster 8.0.24 and RonDB 21.04.1. RonDB 21.10.0 was an internal release not made public.
RonDB 21.10.1 is released as an open source SW with a binary tarball for Linux usage. It is developed on Linux and Mac OS X and is occasionally tested on FreeBSD.
There are three ways to use RonDB 21.10.1:
You can use the cloud scripts that will enable you to set up in an easy manner a cluster on Azure or GCP. This requires no previous knowledge of RonDB, the script only needs a description of the HW resources to use and the rest of the set up is automated.
You can use the open source version and use the binary tarball and set it up yourself.
You can use the open source version and build and set it up yourself
RonDB 21.10 will be maintained for 9 months after its release in October 2021.
Summary of changes in RonDB 21.10.1#
4 new features and 2 bug fixes.
Make query threads the likely scenario
Improved placement of primary replicas.
More flexibility in thread configuration.
Removing index statistics mutex as bottleneck in MySQL Server
Use Query threads also for Locked Reads
Make query threads the likely scenario#
Improved placement of primary replicas.
The current distribution of primary replicas isn't optimal for the new fragmentation variants. With e.g. 8 fragments per table and 2 nodes we will find that the same LDM thread gets two fragments to act as primary for in a 4-LDM setup whereas the 2 LDM threads gets no primary replicas to handle.
This is handled by a better setup at creation of the table. However to also address handling of Not Active nodes we need to also redistribute the fragments at various events.
The redistribution is only allowed if all nodes have upgraded to 21.10 in the cluster. Older versions of RonDB will not redistribute and we need to ensure that all data nodes use the same primary replicas. If not we would cause a multitude of constant deadlocks.
There was issues in distributing the primary replicas at add fragment, the reuse of add_nodes_to_fragment required a minor modification and the tracking of which primary to use next used an incorrect index variable.
Nodegroups are not necessarily numbered from 0 and onwards. calc_primary_replicas need to take this into account.
This change improves performance by about 30% for the DBT2 benchmark.
More flexibility in thread configuration.
This patch serie was introduced mainly to be able to use RonDB to experiment with various thread configurations that typically wasn't supported in NDB. The main change is to enable to use receive threads for all types of thread types.
With these changes it is possible to e.g. run with only a set of receive threads.
The long-term goal of this patch is to find an even better configuration for automatic thread configuration.
Step 1: Added a new socket to each receive thread. This socket is used to wakeup the receive thread when communication from another thread wants to make use of the receive thread.
This feature is important when the receive thread is used for other activities than the receive handling. In this case the other thread needs the receive thread to immediately react. For other threads this wakeup happens through a futex_wake on Linux and a condition signal on other platforms.
However receive thread sleeps on either epoll_wait or on poll. Thus to only mechanism to wake those threads is by sending something to a socket that the receive thread listens to. This is where the extra socket comes into play. To wake the receive thread it is enough to send 1 byte to the receive thread and it will immediately wakeup.
To handle this conditional wakeup we added a boolean on the thread object indicating if it is a receive thread or not, we also added a reference to the TransporterReceiveHandle of the receive thread.
This patch enables all sorts of experimentation with setting up threads in new manners with the receive thread as an active participant in both receive and other activities.
Step 2: When using ThreadConfig and not creating any TC threads we will instead map the TC threads to the receiver threads.
Step 3: When TCSEIZEREQ arrives we will check which node sent the message and assign the DBTC instance that is colocated with the receive thread instance handling this node. This means that when receiving TCKEYREQ, SCAN_TABREQ, these signals will be sent locally to the same thread. This cuts some latency away and could potentially be a performance benefit.
Step 5: Removed the requirement that receive threads had the nosend flag set to 1. Updated the receive thread's main loop to reflect that it can also act as a block thread in addition to performing receive activities.
The receive thread does an extra flush after receiving data on transporters to ensure that execution of its own signals doesn't cause receive thread to slow down flushing signals to other threads. This included ensuring that send buffer pool is filled before starting to execute signals.
Simplified handling of alert_send_thread in that all send threads are woken up, also the one we will assist. This decreases the number of acquisitions of the send thread mutex and greatly simplifies the code.
do_send is called in the same manner as in block threads when no signals was executed.
Fixed a bug in missing wakeup of send threads in rare situations.
Clarified that code on handling load indicators are only required for LDM and Query threads.
Ensured that sendpacked is called in more situations to assist in NDBFS communication.
Step 7: Changed some parts of the automatic thread configuration. Now that we can handle TC and receive in the recv thread it is possible to make a bit more efficient of CPU resources in smaller configurations.
However using tc threads is still used in larger configs since it is still more efficient.
Step 8: This patch introduces one more variant of how to configure threads in RonDB data nodes. Previously the only configuration that didn't have specific LDM threads was a configuration with a single receive thread and a configuration with a single receive thread and a single main thread.
In this patch we enable a configuration with a large number of receive threads without any LDM threads. In this configuration the idea is that the receive threads will be able to do all work from start to beginning. Thus executing without a thread pipeline. The only need in traffic execution to not do everything in the local receive thread is handling of non-committed READs and any WRITE queries. These still can only be handled by the LQH that owns the data.
This configuration cannot be combined with TC threads, Query threads and Recover threads. Thus in this configuration we only have receive threads and possibly main thread(s).
In this configuration each receive thread has 1 LQH worker, one Query thread worker, 1 DBTC worker. This means that any Committed Read queries can be served fully in the receive thread.
Having send threads or not is still optional in this configuration.
In this configuration we don't activate any load distribution mechanisms to pick the right query thread. We always pick the local query thread worker.
We have made stronger division in this patch with the use of globalData.ndbMtLqhThreads vs globalData.ndbMtLqhWorkers and similarly for ndbMtQueryThreads/ndbMtQueryWorkers. Likewise we previously did the same thing for ndbMtTcThreads/ ndbMtTcWorkers.
There is no such distinction for ndbMtMainThreads and ndbMtReceiveThreads, there are no special variables for workers for these thread types and similarly not for send threads.
Step 9: Make it possible to set nosend=1 also on Query threads.
Step 10: Use only LDM threads with 4 CPUs, no specific gain with only 2 CPUs to use query threads.
Step 11: Previously performReceive first read from all transporters and then looped over all transporters to unpack the read data. This means that we sweep through the data twice, seems better to unpack data immediately after receiving the data. In the NDB API this even means that the signal execution happens when data is already in CPU caches. So could potentially provide even bigger benefits for the NDB API performance.
Step 12: Reorganised code in performReceive a bit. Fix of a potential lost signal during activation of multi transporter.
A major bottleneck in the MySQL Server is the index statistics mutex.
This is acquired 3 times per index lookup to gather index statistics. This becomes a bottleneck when Sysbench OLTP RW reaches around 10000 TPS with around 100 threads. Thus a severe limitation on scalability for the MySQL Server using RonDB.
To handle this we ensure that the hot path through the code doesn't need to acquire the global mutex at all. This is solved by using the NDB_SHARE mutex a bit more and making the ref_count variable an atomic variable.
Also needed to handle some global statistics variables. Fixed by adding them on local object and every now and then transferring to the global object.
In MySQL Cluster 8.0.23 query threads was introduced. This meant that query threads could be used for READ COMMITTED queries. In this feature this is extended to also handle the PREPARE phase of LOCKED reads using key-value lookup through LQHKEYREQ.
This means more concurrency and provides a better scalability for applications that rely heavily on locked reads such as the benchmark DBT2.
The method recv_awake asserted that it was always called in state FS_SLEEPING, this wasn't correct, so removed this assert.
Use GCC 8 when compiling on CentOS 7#
Our tests shows that binaries compiled using GCC 8 outperforms binaries compiled with GCC 10. Most likely GCC 10 is too aggressive in inlining. Until we have analysed this more extensively we will continue using GCC 8 to compile RonDB binaries.