Release Notes RonDB 22.10.2#
RonDB 22.10.2 is the second release of the RonDB 22.10 series. It is the first 22.10 release which is integrated into the Hopsworks platform. In Hopsworks it is integrated into Hopsworks version 3.7.
RonDB 22.10.2 is based on MySQL NDB Cluster 8.0.34 and RonDB 21.04.16.
RonDB 21.04 is a Long-Term Support version of RonDB that will be supported at least until 2024.
RonDB 22.10 is a new Long-Term support version and will be maintained at least until 2025.
RonDB 22.10 is released as open source SW with binary tarballs for usage in Linux and Mac OS X. It is developed on Linux and Mac OS X and using WSL 2 on Windows (Linux on Windows).
RonDB 22.10.2 and onwards is supported on Linux/x86_64 and Linux/ARM64.
The other platforms are currently for development and testing. Mac OS X is a development platform and will continue to be so.
Description of RonDB#
RonDB is designed to be used in a managed cloud environment where the user only needs to specify the type of the virtual machine used by the various node types. RonDB has the features required to build a fully automated managed RonDB solution.
It is designed for appplications requiring the combination of low latency, high availability, high throughput and scalable storage (LATS).
You can use RonDB in a Serverless version on app.hopsworks.ai. In this case Hopsworks manages the RonDB cluster and you can use it for your machine learning applications. You can use this version for free with certain quotas on the number of Feature Groups (tables) you are allowed to add and quotas on the memory usage. You can get started in a minute with this, no need to setup any database cluster and worry about its configuration, it is all taken care of.
You can use the managed version of RonDB available on hopsworks.ai. This sets up a RonDB cluster in your own AWS, Azure or GCP account using the Hopsworks managed software. This sets up a RonDB cluster provided a few details on the HW resources to use. These details can either be added through a web-based UI or using Terraform. The RonDB cluster is integrated with Hopsworks and can be used for both RonDB applications as well as for Hopsworks applications.
You can use the cloud scripts that will enable you to set up in an easy manner a cluster on AWS, Azure or GCP. This requires no previous knowledge of RonDB, the script only needs a description of the HW resources to use and the rest of the set up is automated.
You can use the open source version and use the binary tarball and set it up yourself.
You can use the open source version and build and set it up yourself.
This is the commands you can use to retrieve the binary tarball:
# Download x86_64 on Linux
wget https://repo.hops.works/master/rondb-22.10.2-linux-glibc2.28-x86_64.tar.gz
# Download ARM64 on Linux
wget https://repo.hops.works/master/rondb-22.10.2-linux-glibc2.28-arm64_v8.tar.gz
Summary of changes in RonDB 22.10.2#
RonDB 22.10.2 is based on MySQL NDB Cluster 8.0.34 and RonDB 21.04.16.
RonDB 22.10.2 adds 13 new features on top of RonDB 21.04.16, and adds 48 new features on top of MySQL NDB Cluster 8.0.34. In addition it adds a new product, the REST API Server that can be used to access data in RonDB using a REST protocol. The current version of this product is implemented in Go, a new version implemented in C++ is in development.
RonDB 22.10.2 fixes 23 bugs, in total this means that 25 bugs have been fixed in RonDB 22.10 and in total 173 bugs have been fixed in the RonDB.
Test environment#
RonDB uses four different ways of testing. MTR is a functional test framework built using SQL statements to test RonDB.
The Autotest framework is specifically designed to test RonDB using the NDB API. The Autotest is mainly focused on testing high availability features and performs thousands of restarts using error injection as part of a full test suite run.
Benchmark testing ensures that we maintain the throughput and latency that is unique to RonDB. The benchmark suites used are integrated into the RonDB binary tarball making it very straightforward to run benchmarks for RonDB.
Finally we also test RonDB in the Hopsworks environment where we perform both normal actions as well as many actions to manage the RonDB clusters.
RonDB has a number of MTR tests that are executed as part of the build process to improve the performance of RonDB.
MTR testing#
RonDB has a functional test suite using the MTR (MySQL Test Run) that executes more than 500 RonDB specific test programs. In adition there are thousands of test cases for the MySQL functionality. MTR is executed on both Mac OS X and Linux.
We also have a special mode of MTR testing where we can run with different versions of RonDB in the same cluster to verify our support of online software upgrade.
Autotest#
RonDB is very focused on high availability. This is tested using a test infrastructure we call Autotest. It contains also many hundreds of test variants that takes around 36 hours to execute the full set. One test run with Autotest uses a specific configuration of RonDB. We execute multiple such configurations varying the number of data nodes, the replication factor and the thread and memory setup.
An important part of this testing framework is that it uses error injection. This means that we can test exactly what will happen if we crash in very specific situations, if we run out of memory at specific points in the code and various ways of changing the timing by inserting small sleeps in critical paths of the code.
During one full test run of Autotest, RonDB nodes are restarted thousands of times in all sorts of critical situations.
Autotest currently runs on Linux with a large variety of CPUs, Linux distributions and even on Windows using WSL 2 with Ubuntu.
Benchmark testing#
We test RonDB using the Sysbench test suite, DBT2 (an open source variant of TPC-C), flexAsynch (an internal key-value benchmark), DBT3 (an open source variant of TPC-H) and finally YCSB (Yahoo Cloud Serving Benchmark).
The focus is on testing RonDBs LATS capabilities (low Latency, high Availability, high Throughput and scalable Storage).
Hopsworks testing#
Finally we also execute tests in Hopsworks to ensure that it works with HopsFS, the distributed file system built on top of RonDB, and HSFS, the Feature Store designed on top of RonDB, and together with all other use cases of RonDB in the Hopsworks framework.
New features#
REST API Server in C++#
This is a new development in the RonDB tree. It implements the same REST API as the existing REST API Server implemented in Go. The idea is that it should have better performance. This is still in active developoment and not yet ready.
RONDB-479: Improved description of workings of the RonDB Memory manager#
Changed name of alloc_spare_page to alloc_emergency_page to make it easier to understand the difference between spare pages in DataMemory and allocation of emergency pages in rare situations.
Much added comments to make it easier to understand the memory manager code.
RONDB-474: Local optimisations#
1: Continue running if BOUNDED_DELAY jobs are around to execute
2: Local optimisation of Dblqh::execPACKED_SIGNAL
3: Inline Dbacc::initOpRec
4: Improve seize CachedArrayPool
RONDB-342: Replace md5 hash function with XX3_HASH64#
This change decreases the overhead for calculating the hash function by about 10x since the new hash function is better suited for SIMD instructions.
Bug Fixes#
RONDB-491: Fix for testNodeRestart -n Bug34216, bug in test case setup#
RONDB-490: Fixed testNodeRestart -n WatchdogSlowShutdown#
The delay in finishing watchdog shutdown was too long such that the other nodes decided to finish it using a heartbeat error.
Also removed a crash when nodes started wanting to connect in the wrong state. Could happen in shutdown situations.
RONDB-486: Map node group to start from 0 and be consecutive#
It is allowed to set Nodegroup id on a node. However for DBDIH to work these ids must start at 0 and be consecutive. Otherwise things will fall apart in lots of places.
To make it easier for the user we add a function that maps from node group to a node group that starts at 0 and continues consecutively. Thus for example if the user sets Nodegroup to 1 and 2 it will be mapped to 0 and 1.
RONDB-155: Fix for erroneus ndbassert#
After adding a lot of debug statements it was finally clear that the ndbassert on m_scanFragReqCount > 0 was incorrect for ordered index tables, but is still correct for base tables.
Thus the bug actually had nothing to do with RONDB-155.
RONDB-479: Double mutex lock led to watchdog failure#
RONDB-398: Ensure that we call do_send each loop to keep latency low#
When an LCP needs to allocate a page to handle a DELETE it could run out of Undo pages, to avoid that we release the page before we allocate the copy page, this needs to be done under mutex protection. The handling of this was buggy such that the mutex was taken twice leading to a watchdog crash.
RONDB-585: SUMA can free pages still in use + a memory leak#
When allocating a new chunk of pages it is put into the free list in SUMA, but it is also kept in the free chunk list. This means that if we allocate half of the pages in the chunk and then the first free_page call can at times call release_chunk and release the entire page chunk since the chunk is in the free chunk list, but some of its pages is still in the free list.
One other thing fixed in this patch is to avoid calling remove from free chunk list when last page is allocated from chunk. The reason for this is that it isn’t in this list and calling the remove function will effectively make the list empty, thus a memory leak.
RONDB-584: Make it possible to verify pages are allocated when used#
Added method to verify page is not in the set of free pages in global memory manager. To ensure that we are not working on the free pages as if they were allocated we added a call to the global memory manager to verify that a page is allocated, this checks that the bit in the bitmap of free pages is not set. Later we can add calls to perform this check, in this patch we add it for the SUMA trigger pages.
Previously the free bitmap of pages in global memory manager only set the first and last bit of page ranges. This meant that the bitmap could not be used to quickly verify if a page has been allocated or not when it is used.
We changed this such that every bit in the bitmap now indicates if the respective page is free or allocated. Thus the new method verify_page_allocated can be used to check if a page is allocated or not. This will make it a lot easier to find cases where we use a page after we released the page. These types of bugs are otherwise extremely hard to find.
To handle this efficiently an implementation of clearRange of a bitmap was added together with a unit test for it. The test was executed and passed before committing the patch.
More verify calls in SUMA and verified bitmap and free page info in synch.
RONDB-583: Fixed a problem with memory management in SUMA#
In communicating between TUP and SUMA we use special trigger pages.
Those pages are allocated from TUP and released in SUMA. We could allocate 1 or 2 pages in TUP. The problem is when we allocate 2 pages and only 1 remains in the chunk. In this case we will insert the page in the full list. When we release a page in SUMA we will move it from full to free only when there is 1 page in the chunk. However if there was one page remaining and one was free’d, then there will be 2 free pages in the chunk. Similarly if the chunk was empty and we release 2 pages at once, in this case as well we will not move it from full to free. Finally we will release the last page and in this case we will release the whole chunk if there is one chunk in the free list. Thus if we release the chunk we remove it from the free list, but it is in the full list, thus we will corrupt lists, if we don’t release we will leak the entire chunk since there is no other way of getting removed from the full list.
Fixed by always moving from full to free when releasing a page. Also increased chunk size to align with size used by RonDB mallocator.
RONDB-581: Set more reasonable memory settings#
Decreased reserved part of Replication Memory significantly, it can still grow to use a major part of the shared global memory.
Only use 25% of the calculated job buffer size as reserved area, add another 25% of the computed area to the shared global memory and still allow job buffer to grow as ultra high prio to the maximum job buffer size.
Since we now have twice as many LDM threads we can decrease the size of the backup log buffer per LDM thread.
BackupSchemaMemory only gets 25% reserved and 25% added to shared global memory, this should be ok since it can use shared global memory up to its limit.
Ensure that MaxNoOfConcurrentOperations setting means that we have memory to handle a large transaction with that many operations.
Decreased amount of memory that is reserved for ReplicationMemory, SchemaMemory, BackupSchemaMemory, JobBuffer and Send Buffer. Added memory from the previously reserved memory to shared global memory to ensure that there is more shared global memory to use when a resource runs out of reserved memory.
Move TransactionMemory from ultra high prio to high prio since having TransactionMemory run low doesn’t risk crashing the node as it does for JobBuffer, SendBuffer and ReplicationMemory.
At allocation from global memory manager we crashed when allocation failed. Now will return nullptr instead.
Fixed setting of priority memory levels. Also introduced a new priority level. This means we have 4 prio levels.
At the lowest level we have query memory that is used by complex queries. By placing this on the lowest level it cannot disturb any other actions.
The second level is the medium level, this level will be used by transaction memory, thus too large transactions should not disturb schema transactions, or asynchronous replication or other even more important actions.
The next level is the schema memory level that allows for schema transactions to use memory.
The next level is used by Replication Memory, this ensures that we will hopefully not run out of memory for asynchronous replication even in the context of other large activities.
The final level is the actions that require memory to ensure nodes don’t go down, thus job buffer, send buffers and also access records of disk pages.
Fixed an issue with setting of SchemaMemory in config.ini.
RONDB-567: Bugs in List management in SUMA#
As part of poolification of ReplicationMemory the free list in SUMA of pages for events was changed from single list to double list. However the handling of previous pointers was not done in a proper way. Also an old bug when failing to allocate pages from global memory was fixed.
RONDB-519: change based RDRS image to oracle linux 8 for 22.10.X#
RONDB-547: Optimize error message in FileLogHandler#
In the FileLogHandler, only the ndb error has its description set. For the system errors, only the error number is set and it is not human-readable. Therefore, we need to set the error descriptions at the same time.
RONDB-540: Docker build fixes for 22.10#
RONDB-538: Build 22.10.1 with oracle linux 8#
RONDB-526: Hardened RonDB build. Inspired by MySQL NDB build parameters#
RONDB-524: Improve reporting of how TransactionMemory is set#
RONDB-517: Fix nodegroup 65536 handling from config variable Nodegroup#
RONDB-616: Fix wrong input to update_extent_pos in disk_page_free using old disk format#
calc_page_free_bits should return 2 if page is full, not 3, also important for variable sized data to use index 2.
RONDB-620: Fix column ordering in Dbdict::buildFK_prepare#
RONDB-641: Fixed a number of problems with move of disk row#
With variable sized disk rows we have a possibility that the row must move to a new page when it is growing to a larger size. In this case we need to both log the UNDO of the removal of the old row and the UNDO of the allocation of the row.
In this case commit sends in diskPagePtr referring to the new row page. However this was mixed up in the commit handling and we instead used the old row page where the new row page should have been used and vice versa.
Also when freeing log space we used the wrong page to figure out whether to return log space.
A final complexity came from the fact that the UNDO of the allocation of a new row wasn’t part of m_undo_buffer_space, this was accounted for in one place, but not in another place.
This patch also adds a bit more debugging support required to understand what was going on in the commit of disk row that used DISK_REORG. Also some comments with faulty comments or comments that weren’t understandable were removed.
Also fix issues related to multi operations combined with DISK_REORG.
When calling load_diskpage we haven’t yet called prepareActiveOperation. This means that we will never have set the bit field m_load_extra_diskpage_on_commit. This bit exists on the previous operation record, so it must be fetched from there in load_diskpage if there is a previous operation.
Second in load_extra_diskpage we need to get the pointer to the copy tuple. This is however not present at this early stage in the new operation record. Thus this must also be fetched from the previous operation record.
Finally in handle_size_change_after_update we need to use the extra disk page when DISK_REORG is already set. This was the case, but we calculated some variables using the wrong page, we used the old page instead of the new disk page. Thus these variables needed to be recalculated when discovering that the DISK_REORG bit was set.
Also added some more jam’s in debug and in production.
Forgot to remove declaration of useless variable causing crash.
Fixed printout in debug mode that can cause crashes.