New features in RonDB 21.04#
Automated Thread Configuration#
Thread Configuration defaults to automatated configuration of threads based on the number of CPUs. This feature is introduced in NDB 8.0.23, in RonDB it is the default behaviour. A number of bug fixes have been found and fixed in this area. RonDB will make use of all CPUs it gets access to and also handles CPU locking automatically. It is possible to set NumCPUs to a specific number if one wants to override this default. It is also possible to use RonDB in a backwards compatible manner although this isn’t recommended.
Automated Memory Configuration#
NDB has historically required setting a large number of configuration parameters to set memory sizes of various memory pools. In RonDB 21.04 the default behaviour is that no configuration parameter is required. RonDB data nodes will use the entire memory available in the server/VM. One can limit the amount of memory it can use with a new configuration parameter TotalMemoryConfig.
Automated CPU spinning#
NDB 8.0.23 contained support for CPU spinning. RonDB 21.04 defaults to using LatencyOptimisedSpinning.
Improved networking through send thread handling#
RonDB 21.04 has made improvements to the send thread handling making it less aggressive and also automatically activating send delays at high loads.
Configurable number of replicas#
RonDB 21.04 solves another configuration problem from NDB 8.0.23. In NDB 8.0.23 and earlier releases of NDB one could only set NoOfReplicas at the initial start of the cluster. The only method to increase or decrease replication level was to perform a backup and restore.
In RonDB 21.04 one can set NoOfReplicas=3 even if one wants to run with a single replica. It is necessary to define the extra nodes in the configuration, but by setting the configuration parameter NodeActive=0 on those nodes, they are not really part of the cluster until they get activated.
To support this feature three new commands have been added in the management client.
The above commands activates/deactivates node 2 and the HOSTNAME command changes the hostname of node 2 to 192.168.1.113. A node must be inactive to make it possible to change its hostname.
This feature also makes it very easy to move a node from one hostname to another hostname.
3x performance improvement in ClusterJ API#
A new addition to the ClusterJ API was added that release data objects and Session objects to a cache rather than releasing them fully. In addition, an improvement of the garbage collection handling in ClusterJ was handled. These improvements led to a 3x improvement in a simple key lookup benchmark.
A number of bug fixes to this feature was introduced in RonDB 21.04.9.
Integrated benchmarking tools in RonDB binary distribution#
In the RonDB binary tarball we have integrated a number of benchmark tools to assess the performance of RonDB.
Improved report of use of memory resources in ndbinfo.resources table#
A new row to the ndbinfo table resources was added which reports TOTAL_GLOBAL_MEMORY. This is the sum of memory resources managed by the shared global memory that contains
New configuration variable LowLatency#
Introduced a new configuration variable LowLatency that is a boolean that defaults to off. If not set we get the normal commit behaviour with linear commit. When set we use normal commit that sends the commit to all nodes in parallel which should decrease latency at the expense of higher networking costs, both CPU wise and bandwidth-wise.
This affects the COMMIT phase and COMPLETE phase. It doesn’t affect the prepare phase. Using Low latency means that we will always wait with releasing the lock until the complete phase.
It is mainly useful in clusters with high latency and clusters with 3-4 replicas.
Improvements in ClusterJ to avoid single-threaded garbage collection#
Changes of defaults in RonDB#
A number of configuration variables have new defaults in RonDB 21.04.
ClusterJ lacked support for Primary keys using Date columns#
Removed this limitation by removing check for that this is not supported.
In addition it required the addition of a new object to handle Date as Primary Key based on the handling of Date columns in non-key columns.
Support minimal memory configurations#
Increase SendBufferMemory default#
Place pid-files in configured location#
Add support for Longvarchar data types as primary key in ClusterJ#
This is necessary to handle primary keys that are defined as VARCHAR(x) where max size of this column is larger than 255 bytes. Thus even VARCHAR(64) can be a problem with character sets that are 4 byte per character.
Added tests also for Varchar as primary key which was previously missing although it was supported.
Improvements of ndb_import#
ndb_import will use values from CSV files to set values of auto increment values. It will however not ensure that the auto increment values set in RonDB will be correct after the ndb_import process.
To solve this we introduce a new option to ndb_import called --use-auto-increment, this will ignore the settings from the CSV file and instead set an autoincrement value, this will ensure that INSERTs to the table will be correct and working after importing data using ndb_import.
If this option isn’t set, it is necessary to set a new starting value for the auto increment. This can be done e.g. by the following transaction:
This will set the new auto increment value in the table.
This version of ndb_import is integrated in the DBT2 benchmark provided with the RonDB release.
Graceful shutdown when using kill -TERM#
For RonDB to interact with systemd in a good way we need to enable graceful shutdown even if it is started using kill -TERM ndbmtd.
The method in NDB to handle shutdown isn’t working very well since it is hard to know which processes to stop to really stop ndbmtd. However most use cases can use killall -TERM ndbmtd to kill all ndbmtd processes in the VM. This patch ensures that the above command leads to a graceful shutdown that completes within 30 seconds.
Support larger transactions#
In NDB large transactions can easily cause job buffer explosion or send buffer explosion. By introducing batching of abort and commit and complete requests we ensure that no buffer explosions occur. We also introduced CONTINUEB to handle releasing of records belonging to a large transaction. This sustains low latency even in the context of large transactions.
ARM64 support#
Mac OS X support for both x86 and ARM#
Two new ndbinfo tables to check memory usage#
Make it possible to use IPv4 sockets between ndbmtd and API nodes#
Use of realtime prio in NDB API receive threads#
Experiments show that it removes a lot of variance in benchmarks, decreasing variance by a factor of 3. In a Sysbench Point select benchmark it improved performance by 20-25% while at the same time improving latency by 20%. One could also get the same performance at almost 4x lower latency (0.94 ms round trip time for a PK read compared to 0.25 ms after change).
RONDB-167: ClusterJ supporting setting database when retrieving Session object#
ClusterJ has been limited to handle only one database per cluster connection. This severely limits the usability of ClusterJ in cases where there are many databases such as in a multi-tenant use case for RonDB.
At the same time the common case is to handle only one database per cluster connection. Thus it is important to maintain the performance characteristics for this case.
One new addition to the public ClusterJ API is the addition of a new getSession call with a String object representing the database name to be used by the Session object. Once a Session object has been created it cannot change to another database. A session object can have a cache of DTO objects, this would be fairly useless when used with many different databases. Thus this isn’t supported in this implementation. The limitation this brings about is that a transaction is bound to a specific database.
We can cache sessions in RonDB by having one linked list of cached session objects for the default database. Other databases create a linked list at first use of a database in a SessionFactory object. The limit on the amount of cached Session objects is maintained globally. Currently we simply avoid putting it back on the list if the maximum has been reached. An improvement could be to have a global order of the latest use of a Session object, this hasn’t been implemented here.
A Session object has a Db object that represents the Ndb object used by the Session. This Ndb object is bound to a specific database. For simplicity we store the database name and a boolean if the database is the default database. The database name could have been retrieved from the Ndb object as well. This database name in the Db object is used when retrieving an NdbRecord for the table.
ClusterJ handles NdbRecord in an optimised manner that tries to reuse them as much as possible. Previously it created on Ndb object together with an NdbDictionary object to handle NdbRecord creation. Now this dictionary is renamed and used for only the default database. Each new database will create one more Ndb object together with an NdbDictionary object. This object will handle all NdbRecord’s for that database. For quick finding of this object we use a ConcurrentHashMap using database name to find this NdbDictionary object.
Previously there was a ConcurrentHashMap for all NdbRecord’s, both for tables and for indexes. These used a naming scheme that was tableName only or tableName+indexName.
This map is kept, but now the naming scheme is either databaseName+tableName or databaseName+tableName+indexName.
Thus more entries are likely to be in the hash map, but it should not affect performance very much.
These maps are used to iterate over when unload schemas and when removing cached tables.
With multiple databases in a cluster connection the LRU list handling becomes more important to ensure that hot databases are more often cached than cold databases. Implemented a specific LRU list of Session objects in addition to a queue per database.
Added a few more test cases for multiple databases in ClusterJ. Added also more tests to handle caching of dynamic objects and caching of session objects.
Added support for running MTR with multiple versions of mysqld#
RONDB-169: Allow newer versions from 21.04 series to create tables recognized by older versions at least 21.04.9#
RONDB-171: Support setLimits for query.deletePersistentAll()#
This feature adds support for limit when using the deletePersistentAll method in ClusterJ.
deletePersistentAll on a Query object gives the possibility to delete all rows in a range or through a search condition. However a range could contain millions of rows, thus a limit is a good idea to avoid huge transactions.
RONDB-174: Move log message to debug when connecting to wrong node id using a fake hostname#
RONDB-184: Docker build changes#
New Docker build files to build experimental ARM64 builds and fixes to the x86 Docker build files.
Fixes of the Jenkins build files.
-
Added Dockerfile with base image ubuntu:22.04 for experimental ARM64 builds
-
Using caching of downloads and builds within Dockerfiles
-
Bumped sysbench and dbt2 versions to accommodate for ARM64 building (build_bench.sh)
-
Dynamic naming of tarballs depending on building architecture (create_rondb_tarball.sh)
-
Placed docker-build.sh logic into Dockerfiles
-
Formatting of scripts
Removed a few printouts during restart that generated loads of printouts with little info#
Update versions for Sysbench and DBT2 in release script#
Updated to ensure that Sysbench and DBT2 works also on ARM64 platforms.
RONDB-199: Ensured that pid file contains PID of data node, not of angel#
The data node uses two processes - one is an angel process which is the first process started. This process is daemonized after which it is forked into another process which is the real data node process.
When interacting with environments like systemd it is easier to handle this if the pid file contains the pid file of the real data node process.
Stopping the angel process doesn’t stop the data node process. Thus keeping track of this PID isn’t of any great value.
Thus in using RonDB data nodes in combination with systemd it is recommended to not set StopOnError to 1 since this means that the angel will restart the ndbmtd no matter how it stopped. Thus it is better to set StopOnError=0 and to use the pid file (ndb_NODEID.pid in the same directory as the log files) to find the data node pid to stop or kill.
RONDB-224: Standardise error codes from MySQL server for temporary node failure/recovery errors#
Ensure that more error codes are handled as temporary errors using the common error codes for temporary errors in MySQL.
RONDB-230: Handle failure when IPv6 disabled at boot#
It is possible to disable IPv6 support at startup of Linux. This means that one cannot use IPv6 sockets in any fashion. This causes a number of issues in RonDB.
Since before RonDB handles that data nodes can use IPv4 sockets only by default. This is used for communicating using Dolphin SCI sockets that don’t support IPv6.
Using this code one can retry creating a socket using IPv4 instead if the IPv6 setup fails. This is expected to fix connect and bind issues when disabling IPv6.
RONDB-282: Parallel copy fragment process#
Copy fragment process have been limited to one fragment copy per LDM thread previously. The copy fragment is also limited by the parallelism such that at most 6000 words are allowed to be outstanding (a row counts for 56 words plus the row size in words).
This means that in particular initial node restart is fairly slow. Thus we improved the parallelism while at the same time maintaining protection against overload of the live node which could be very busy serving readers and writers of the databases.
RONDB-364: Add service-name parameter to ndb_mgmd and ndbmtd#
This parameter adds a --service-name parameter to ndb_mgmd and ndbmtd. E.g. --service-name=ndbmtd sets the file name of the pid file to ndbmtd.pid and node log to ndbmtd_out.log and similarly for trace files and other log files and the error file. Also sets the directory name of the NDB file system to service_name_fs.
RONDB-385: Update dbt2 and sysbench versions to include new changes#
RonDB REST API Server#
This new feature is a major new open source contribution by the RonDB team. It has been in the works for almost a year and is now used by some of our customers.
The REST API Server has two variants. The first variant provides read access to tables in RonDB using primary key access. The reads can either read one row per request or use the batch variant that can read multiple rows from multiple tables in one request. This variant supports both access using REST and gRPC. The REST API is by default available on port 4406 and the gRPC is by default using port 5406.
The second variant is to use the Feature Store REST API. This interface is used by Hopsworks applications that want direct read access to Feature Groups in Hopsworks.
The REST API server described above is just a first version. We aim to extend it both in terms of improved performance, more functionality and also adding advanced features.
The binary for the REST API server is called rdrs and is found in the RonDB binary tarballs.
The documentation of the REST API server is found here.
The documentation of the REST API for the Feature Store is found here
RONDB-468: Separate connections for data and metadata operations#
The REST API server needs to read metadata to perform its services. This feature makes it possible to store the metadata in a separate RonDB cluster from where the feature data is stored.
RONDB-473: Minor adjustments of hash table sizes and hash functions#
RONDB-342: New hash function XX3_HASH64#
A new hash function is supported by newer 21.04 versions, but 21.04 will not create any new tables with new hash function, but can use tables created by a newer RonDB version using the new hash function. This ensures that older NDB API can coexist in newer 22.10-based clusters, in particular supporting downgrade from 22.10 to 21.04.15.