If you are using RonDB as a managed version or if you have installed a RonDB cluster using the cloud scripts then the RonDB configuration is already done and there is no more configuration required. However you might still want to read this part to understand what configuration parameters can be changed through commands. You might also want to understand the product and how it works under the hood.
Open source users that use their own tools to install and configure RonDB need to read this section to understand how they set up the configuration of RonDB.
RonDB consists of a number of different nodes. We have the RonDB management server that handles the configuration of the RonDB cluster, we have the RonDB data nodes that contain the actual data in RonDB, we have the MySQL servers that enable users to access data in RonDB through an SQL interface and we have API nodes from where users of the native RonDB APIs can access the data in RonDB.
Configuration of a RonDB cluster includes setting up which computers are part of the RonDB cluster and where they are located (usually their IP address and port number). There are also a wide variety of configuration parameters affecting the operation of the various nodes. Finally we have the configuration of the MySQL Servers that also have many different configuration parameters.
The aim in RonDB is to remove the need for most of the configuration handling except the requirement to define the nodes in the RonDB cluster and where they are located. However we want it to be possible to change all aspects of the configuration of even the nodes in the RonDB cluster. This means adding more replicas to the node groups and adding more node groups without any downtime of RonDB.
With this aim we will focus on documenting those parameters that we think are parameters that have the potential to be set differently for different users. Thus the RonDB source code might have loads of configuration parameters that are possible to set, but they are not intended for use with RonDB.
Configuration of RonDB nodes#
Configuration of RonDB consists of the nodes in the RonDB cluster. It is managed by a RonDB management server that provides the configuration to all nodes in the RonDB cluster. In addition the MySQL Servers in the RonDB cluster requires configuration.
Managed RonDB and RonDB started from cloud scripts takes care of the configuration of RonDB based on the VMs selected for the node types and the number of nodes of each type. Thus for users of the managed version of RonDB this chapter is mostly of interest to understand what goes on under the hood.
However open source users that want to setup things according to their own decisions will make use of the documentation of how to configure the RonDB cluster and the MySQL Servers in the RonDB cluster.
RonDB is based on MySQL NDB Cluster, NDB has support for a wide variety of configuration parameters. There are only a few of those parameters that we document for RonDB. We only document those that we think that RonDB users should care about. A large majority of the configuration parameters were added to make it possible to accomplish backward compatibility or to simply make it possible to test which configuration parameters actually work best.
A good example of this is the thread configuration, this supports almost any type of thread configuration. Not all of those configurations have any benefit, but the ThreadConfig configuration parameter makes it possible to test almost any type of thread configuration which makes it possible for the designers of automatic thread configuration to know which configuration that works best.
After running NDB in production for 17 years, knowledge about which configuration parameters that work best has been established through long experience. Thus in the RonDB documentation we can simply skip large parts of the possible configuration options available in NDB. In fact we even removed some of those configuration parameters and replaced them with adaptive algorithms that select the configuration based on the environment it executes in.
Historically two parts of the configuration that created a lot of friction was the configuration of threads and memory resources. These are now completely replaced by two configuration parameters. The first one AutomaticThreadConfiguration is by default set to 1, this means that the RonDB data nodes will use all the available CPU resources in the VM the data node runs in (or bare metal server). In addition the threads will be locked to CPUs to ensure that we use the CPU caches in an optimal manner.
Similarly the configuration parameter AutomaticMemoryConfiguration defaults to 1. This means that we will use all the memory in the VM for the memory resources in RonDB.
Both of those parameters are set by default, if one runs RonDB data nodes in a shared environment it is possible to provide hints to these automatic configurations. We can set the number of CPUs in the configuration parameter NumCPUs. In this case we will not lock threads to CPUs since we assume that in this case we use a shared environment. This setup is very useful in testing RonDB and where RonDB is a small part of the application setup.
The automatic memory configuration can be adapted by setting the configuration parameter TotalMemoryConfig. It isn't possible to set this smaller than 2 GByte. Automatic memory configuration is designed to ensure that we avoid problems with lacking resources, if we know exactly the amount of memory we need for the various parts it is possible to set a lot of configuration parameters to assist in this. However RonDB is focused on easy configuration rather than the perfect optimal use of memory resources. If this is interesting the support organisation of RonDB at Logical Clocks AB will be able to assist in these setups.
Since both automatic memory and thread configuration is default it means that no configuration parameters need to set for those things if one uses full VMs for the RonDB data nodes.
One more important part of NDB configurations was to define the size of the REDO logs. This was especially important in versions up to MySQL Cluster 7.5. However in RonDB the checkpoints use Partial LCP which is enabled by default and also Adaptive control of checkpoint speed to make sure that we don't run out of REDO logs. This means that if checkpoints don't cut the REDO log tail fast enough we will increase checkpoint speed to ensure that we don't run out of REDO log.
Experiments have showed that 64 GByte of REDO log will support almost any setup for RonDB. Thus only very extreme setups will require any configuration that affects the size of the REDO log. The size of the REDO log defaults to using 4 log parts, using 16 log files per log part and each log file is 1 GByte in size. Thus no configuration is required for setting REDO log sizes either.
Given that we use Automatic Thread Configuration we make use of query threads, this means that we can also standardize on the number of partitions in a table. This defaults to 2 partitions per data node in the RonDB cluster. This should be sufficient for almost every application. However if an application is very update intensive and very much focused on key lookups, then we could use more partitions per table.
Thus using the configuration PartitionsPerNode can be used in rare cases. In a 2-node setup with 4 partitions per table, one table should be able to handle about 400.000 updates per second. If this isn't enough it can be useful to increase the PartitionsPerNode configuration parameter.
Thus in principle the only configuration that needs to be specified is the placement of the disk files, we also need to configure the hostnames for the nodes in the cluster.
History of RonDB configuration#
Configuration of RonDB is important. Since NDB was designed for predictable latency, highest performance and highest possible availability, it was important to not use a very dynamic memory allocation. RonDB has strict control over all types of resources. At startup of a data node we set the limits on the various resources we are using.
This strict memory allocation behaviour made it a bit more challenging to configure NDB. Since MySQL Cluster 7.5 there has been constant work to make ensure that memory allocation is more flexible, this has been a number of large development projects which was finalised in RonDB 21.04.0 with the introduction of automatic memory configuration.
In NDB a significant number of configuration parameters was developed over the years. There are a few parameters that should be set in most clusters. There is a substantial number of configuration parameters that will only be used by a small percent of all users. Often these parameters were developed to solve a specific problem in some specific user setup.
I will explain shortly most of those configuration parameters in the chapters on advanced configurations of RonDB. Most every user of RonDB should be able to ignore those configuration parameters.
Basic configuration setup#
When you look at the forums it is quite obvious that the absolute majority of the NDB users are looking for a very basic high availability setup with one VM per node in the cluster.
We will look at setting up the configuration for such a basic cluster as an exercise now. The previous chapters have given us a lot of input on what parameters that matters the most. We want to setup a memory efficient setup that can handle some load, but not the very highest load. We are not aiming for the absolute highest performance here, it is enough to setup things for a basic user that wants a very basic HA setup.
The most basic HA setup has 5 VMs, one for the NDB management server, two VMs for the data nodes and two VMs for the MySQL servers. In addition we will add one API node for use by NDB tools.
We assume that the management server is using IP address 192.168.1.100, the data nodes are setup on 192.168.1.101 and 192.168.1.102 and the MySQL servers are using 192.168.1.103 and 192.168.1.104.
As preparation for setting up the cluster we assume that the VMs have opened up port 1186 to communicate to and from the NDB management server, 3306 to communicate to and from the MySQL servers and port 11860 to communicate to and from the data nodes. This is particularly important to consider when the OS by default closes down all ports.
When setting up things in the cloud it is very common that one can define the ports to open up, in a cloud setup almost all VMs open up port 22 for SSH access, but in addition we need port 1186, 3306 and 11860 opened up here for the VMs.
In /etc/my.cnf we store the configuration used to startup the MySQL server nodes. In this file we need to ensure that the MySQL server can use the NDB storage engine and we need to ensure that it can connect to the management server.
In the NDB configuration file we need to create 6 nodes, the management server with node id 49, the two data nodes with node id 1 and 2 and the two MySQL servers with node id 51 and 52. In addition we provide an API node that is using node id 231. This node is not bound to any specific host.
In this case we assume that the cluster is used to store a fairly small database of up to 200 MByte of data. It is enough to configure with 256 MByte of DataMemory. In addition we need a bit of IndexMemory to handle the hash index. 40 MByte is sufficient to store more than 2 million rows, so should be sufficient for a database size of 200 MByte.
[ndbd default] NoOfReplicas=2 NoOfFragmentLogParts=4 NoOfFragmentLogFiles=4 FragmentLogFileSize=64M DataDir=/usr/local/ndb_data DataMemory=256M IndexMemory=40M DiskPageBufferMemory=0 #DiskPageBufferMemory=256M ServerPort=11860 MaxNoOfAttributes=10000 BatchSizePerLocalScan=64 StartPartitionedTimeout=0 TwoPassInitialNodeRestartCopy=1 BuildIndexThreads=128 ConnectCheckIntervalDelay=1 HeartbeatIntervalDbDb=1000 MinDiskWriteSpeed=1M MaxDiskWriteSpeed=2M BackupDataBufferSize=512k BackupMaxWriteSize=512k BackupWriteSize=256k ODirect=1 IndexStatAutoCreate=1 IndexStatAutoUpdate=1 StringMemory=5 [ndb_mgmd default] DataDir=/usr/local/ndb_data [ndb_mgmd] NodeId=49 Hostname=192.168.1.100 [ndbd] NodeId=1 Hostname=192.168.1.101 [ndbd] NodeId=2 Hostname=192.168.1.102 [mysqld] NodeId=51 Hostname=192.168.1.103 [mysqld] NodeId=52 Hostname=192.168.1.104 [api] NodeId=231
We have commented out setting a higher value for DiskPageBufferMemory, instead we set it to 0. This means no disk columns are allowed in any table. If the user wants to use disk columns we create a properly sized page cache such that the user can have up to about 2 GByte of data in disk columns as well. If so, swap the DiskPageBufferMemory rows.
This configuration will allow us to create 320 table and index objects with up to 1000 columns in the tables. This should be sufficient for a small database.
We have changed the configuration parameters where it makes sense to change the default settings. These are as discussed in previous chapters BatchSizePerLocalScan to decrease memory usage for lock records. Setting StartPartitionedTimeout to 0 to avoid ever starting up in a partitioned state. Setting NoOfFragmentLogParts, NoOfFragmentLogFiles and FragmentLogFileSize to get a proper sized REDO log. In this case we have 4 parts with 4 files each and each file is 64 MByte, thus in total 1 GByte.
We have changed the defaults of local checkpoint buffers to avoid large buffers that are not needed and even increase latency of queries towards NDB. Given that the database is very small it is a good idea to decrease the checkpoint write speed to avoid wasting disk bandwidth on writing checkpoints very fast.
Setting NoOfReplicas to 2 isn't absolutely necessary since it is default, but it is good to be clear since this is a very basic configuration parameter.
We need to set DataDir to ensure that the placement of NDB files is set.
We use the ConnectCheckIntervalDelay parameter that provides a safer handling of heartbeat failures and we combine this with a shorter heartbeat timeout that should be long enough.
We use the most efficient algorithm to perform initial node restart by setting BuildIndexThreads to 128 and TwoPassInitialNodeRestartCopy to 1.
Many of those defaults have changed in MySQL Cluster 7.6. It is a good idea to use those also in MySQL Cluster 7.5.
Using ODirect is preferrable in most file systems so we set it here.
[mysqld] ndbcluster ndb-cluster-connection-pool-nodeids=51 ndb-connectstring=192.168.1.100 query-cache-size=0 query-cache-type=0 ndb-read-backup=1 ndb-allow-copying-alter-table=0 ndb-autoincrement-prefetch-sz=1024 [ndbd] connect-string=192.168.1.100 [ndb_mgmd] connect-string=192.168.1.100 config-file=/usr/local/ndb_data/config.ini nodeid=49 [ndb_mgm] connect-string=192.168.1.100
In the my.cnf file it is important to ensure that the query cache is entirely disabled to ensure that it doesn't prevent the MySQL server from scaling to a large number of CPUs.
Using NDB for SQL applications means mostly that the focus is on read scaling. We set the flag that ensures that all tables use the Read Backup feature. To ensure good scalability of autoincrement tables we set the prefetch size to 1024 to avoid any scalability issues with inserting rows with autoincrement keys.
To ensure that the user doesn't use a copying ALTER TABLE by mistake, we set the flag that disallows using a copying ALTER TABLE.
Whether to set the node id of the MySQL server, data nodes and management server in the my.cnf is a matter of choice. Here I have done so, but this means that the file must be changed in each VM. Otherwise one can provide the node id in the startup command.