If you are using RonDB as a managed version or if you have installed a RonDB cluster using the cloud scripts then the RonDB configuration is already done and there is no more configuration required. However you might still want to read this part to understand what configuration parameters can be changed through commands. You might also want to understand the product and how it works under the hood.
Open source users that use their own tools to install and configure RonDB need to read this section to understand how they set up the configuration of RonDB. However one achievement in RonDB is that the configuration required to start a cluster is very minimal. All the advanced configurations have been automated such that a normal user won't have to consider how to map the threads and memory configurations.
Minimum requirements on a configuration file#
In RonDB we have removed the need for all configuration handling except the requirement to define the nodes in the RonDB cluster and where they are located.
What is still required is to define the nodes in the RonDB cluster. Even here we have made it possible to define nodes not currently in the cluster that are defined as not active. The configuration requires us to choose the replication level, but with not active nodes we can select number of replicas to be 3, even if we only want to start a cluster with 1 replica. These other replicas can be added later when we want to use them. Changing those to become active and setting their IP addresses can be done using commands in the management client discussed in the chapter on management of RonDB.
RonDB consists a number of three types of nodes. We have the RonDB management server that handles the configuration of the RonDB cluster, we have the RonDB data nodes that contain the actual data in RonDB, we have the MySQL servers that enable users to access data in RonDB through an SQL interface and we have API nodes from where users of the native RonDB APIs can access the data in RonDB. Both MySQL Servers and API nodes and all users of the NDB API are considered to be API nodes from an RonDB point of view.
Thus we have the three types, management servers, data nodes and API nodes. Users interact with RonDB through the API nodes and can manage the RonDB cluster through the management client that accesses the management server.
The API nodes could be providing an SQL interface, an LDAP interface, a REST API or even a file system API dependent on the application using RonDB.
Configuration of a RonDB cluster includes setting up which computers are part of the RonDB cluster and where they are located (usually their IP address and port number).
We will focus on documenting those parameters that we think are parameters that have the potential to be set differently for different users. Thus the RonDB source code might have loads of configuration parameters that are possible to set, but they are not intended for use with RonDB.
Configuration of RonDB nodes#
Configuration of RonDB consists of the nodes in the RonDB cluster. It is managed by a RonDB management server that provides the configuration to all nodes in the RonDB cluster. In addition the MySQL Servers in the RonDB cluster requires configuration.
Managed RonDB and RonDB started from cloud scripts takes care of the configuration of RonDB based on the VMs selected for the node types and the number of nodes of each type. Thus for users of the managed version of RonDB this chapter is mostly of interest to understand what goes on under the hood.
However open source users that wants to setup things according to their own decisions will make use of the documentation of how to configure the RonDB cluster and the MySQL Servers in the RonDB cluster.
RonDB is based on MySQL NDB Cluster, NDB has support for a wide variety of configuration parameters. There are only a few of those parameters that we document for RonDB. We only document those that we think that RonDB users should care about. A large majority of the configuration parameters were added to make it possible to accomplish backward compatibility or to simply make it possible to test which configuration parameters that actually works best.
A good example of this is the thread configuration, this supports almost any type of thread configuration. Not all of those configurations have any benefit, but the ThreadConfig configuration parameter makes it possible to test almost any type of thread configuration which makes it possible for the designers of automatic thread configuration to know which configuration that works best.
After running NDB in production for 17 years, knowledge about which configuration parameters that work best has been established through long experience. Thus in the RonDB documentation we can simply skip large parts of the possible configuration options available in NDB. In fact we even removed some of those configuration parameters and replaced them with adaptive algorithms that select the configuration based on the environment it executes in.
Historically two parts of the configuration that created a lot of friction was the configuration of threads and memory resources. These are now completely replaced by two configuration parameters. The first one AutomaticThreadConfiguration is by default set to 1, this means that the RonDB data nodes will use all the available CPU resources in the VM the data node runs in (or bare metal server). In addition the threads will be locked to CPUs to ensure that we use the CPU caches in an optimal manner.
Similarly the configuration parameter AutomaticMemoryConfiguration defaults to 1. This means that we will use all the memory in the VM for the memory resources in RonDB.
Both of those parameters are set by default, if one runs RonDB data nodes in a shared environment it is possible to provide hints to these automatic configurations. We can set the number of CPUs in the configuration parameter NumCPUs. In this case we will not lock threads to CPUs since we assume that in this case we use a shared environment. This setup is very useful in testing RonDB and where RonDB is a small part of the application setup.
The automatic memory configuration can be adapted by setting the configuration parameter TotalMemoryConfig. It isn't possible to set this smaller than 2 GByte. Automatic memory configuration is designed to ensure that we avoid problems with lacking resources, if we know exactly the amount of memory we need for the various parts it is possible to set a lot of configuration parameters to assist in this. However RonDB is focused on easy configuration rather than the perfect optimal use of memory resources. If this is interesting the support organisation of RonDB at Hopsworks AB will be able to assist in these setups.
Since both automatic memory and thread configuration is default it means that no configuration parameters need to set for those things if one uses full VMs for the RonDB data nodes.
One more important part of NDB configurations was to define the size of the REDO logs. This was especially important in versions up to MySQL Cluster 7.5. However in RonDB the checkpoints use Partial LCP which is enabled by default and also Adaptive control of checkpoint speed to make sure that we don't run out of REDO logs. This means that if checkpoints don't cut the REDO log tail fast enough we will increase checkpoint speed to ensure that we don't run out of REDO log.
Experiments have showed that 64 GByte of REDO log will support almost any setup for RonDB. Thus only very extreme setups will require any configuration that affects the size of the REDO log. The size of the REDO log defaults to using 4 log parts, using 16 log files per log part and each log file is 1 GByte in size. Thus no configuration is required for setting REDO log sizes either.
Given that we use Automatic Thread Configuration we make use of query threads, this means that we can also standardize on the number of partitions in a table. This defaults to 2 partitions per data node in the RonDB cluster. This should be sufficient for most every application. However if an application is very update intensive and very much focused on key lookups, then we could use more partitions per table.
Thus using the configuration PartitionsPerNode can be used in rare cases. In a 2-node setup with 4 partitions per table, one table should be able to handle about 400.000 updates per second. If this isn't enough it can be useful to increase the PartitionsPerNode configuration parameter.
Thus in principle the only configuration that needs to be specified is the placement of the disk files, we also need to configure the hostnames for the nodes in the cluster.
History of RonDB configuration#
Configuration of RonDB is important. Since NDB was designed for predictable latency, highest performance and highest possible availability, it was important to not use a very dynamic memory allocation. RonDB has strict control over all types of resources. At startup of a data node we set the limits on the various resources we are using.
This strict memory allocation behaviour made it a bit more challenging to configure NDB. Since MySQL Cluster 7.5 there has been constant work to make ensure that memory allocation is more flexible, this has been a number of large development projects which was finalised in RonDB 21.04.0 with the introduction of automatic memory configuration.
In NDB a significant number of configuration parameters was developed over the years. There are a few parameters that should be set in most clusters. There is a substantial number of configuration parameters that will only be used by a small percent of all users. Often these parameters were developed to solve a specific problem in some specific user setup.
I will explain shortly most of those configuration parameters in the chapters on advanced configurations of RonDB. Most every user of RonDB should be able to ignore those configuration parameters.
Basic configuration setup#
When you look at the forums it is quite obvious that the absolute majority of the NDB users are looking for a very basic high availability setup with one VM per node in the cluster.
We will look at setting up the configuration for such a basic cluster as an exercise now. The previous chapters have given us a lot of input on what parameters that matters the most. We want to setup a memory efficient setup that can handle some load, but not the very highest load. We are not aiming for the absolute highest performance here, it is enough to setup things for a basic user that wants a very basic HA setup.
The most basic HA setup has 5 VMs, one for the RonDB management server, two VMs for the data nodes and two VMs for the MySQL servers. In addition we will add one API node for use by NDB tools.
We assume that the management server is using IP address 192.168.1.100, the data nodes are setup on 192.168.1.101 and 192.168.1.102 and the MySQL servers are using 192.168.1.103 and 192.168.1.104.
As preparation for setting up the cluster we assume that the VMs have opened up port 1186 to communicate to and from the RonDB management server, 3306 to communicate to and from the MySQL servers and port 11860 to communicate to and from the data nodes. This is particularly important to consider when the OS by default closes down all ports.
When setting up things in the cloud it is very common that one can define the ports to open up, in a cloud setup almost all VMs open up port 22 for SSH access, but in addition we need port 1186, 3306 and 11860 opened up here for the VMs.
In /etc/my.cnf we store the configuration used to startup the MySQL server nodes. In this file we need to ensure that the MySQL server can use the NDB storage engine and we need to ensure that it can connect to the management server.
In the RonDB configuration file we need to create 6 nodes, the management server with node id 65, the two data nodes with node id 1 and 2 and the two MySQL servers with node id 67 and 68. In addition we provide an API node that is using node id 231. This node is not bound to any specific host.
The below configuration uses both AutomaticMemoryConfig and AutomaticThreadConfig. We have shown how to limit the amount of memory used through TotalMemoryConfig and how to limit the number of CPUs used through NumCPUs.
When using AutomaticMemoryConfig we will calculate how much memory is required for a lot of resources and the remainder is used for DataMemory and DiskPageBufferMemory where 90% of the remainder is dedicated to DataMemory.
The amount of schema memory required by default is to handle the maximum amount of tables allowed in RonDB which is 20320 and a large number of columns and triggers to support those tables. We have shown how we can adapt the calculations for automatic memory configuration by setting MaxNoOfTables, MaxNoOfAttributes and MaxNoOfTriggers.
Similarly we can also override the amount of TransactionMemory we use, this memory is used for all sorts of ongoing database operations, thus the larger transactions and the more concurrent transactions we use, the more TransactionMemory we need. SharedGlobalMemory is a resource that can be used to extend memory resources of some kinds to allow for a more flexible memory allocation. This can also be adapted in the automatic memory configuration.
All these parameter have been commented out below.
We have prepared the configuration such that we only start with 2 replicas in 2 data nodes. But the configuration is already prepared to add a third replica if required. Similarly the configuration is prepared to add a second management server.
We need to set DataDir to ensure that the placement of NDB files is set.
Whether to set the node id of the MySQL server, data nodes and management server in the my.cnf is a matter of choice. Here I have done so, but this means that the file must be changed in each VM. Otherwise one can provide the node id in the startup command.