Configurable number of replicas#
The configuration of RonDB requires setting the number of replicas. This configuration parameter cannot be changed in a cluster after starting the cluster.
However to support changing the number of replicas RonDB supports a new configuration parameter NodeActive. Setting this to 0 means that the node is present in the cluster configuration, but it cannot be started and no one will wait for it to be started.
To start it one must first set the configuration variable to 1. This is performed using a command in the management client that also ensures that all running nodes in the cluster, also the API nodes are informed of the new active state of the node.
Similarly to decrease the number of replicas we can deactivate a node using a command in the management client.
Thus when creating a cluster in Hopsworks we will always set the NoOfReplicas=3 to ensure that we can always scale to 3 replicas. Hopsworks and RonDB currently doesn't support 4 replicas.
It is necessary to have at least one node setting NodeActive to 1 in each node group.
Since the node is in the configuration in all nodes of the cluster, the node can always be activated without requiring a restart of the nodes in the cluster. Even the API nodes can immediately activate a new data node in the cluster.
The activate and deactivate feature is also supported for management servers and API nodes.
Not only can this feature be used to activate and deactivate nodes. We can also set a new hostname of a deactivated node. Thus the hostname set in the nodes that are not active can be an arbitrary IP address that is correct. Before activating one should set the hostname of the new data node. It is quite unlikely that this IP address was known when the cluster was created.
This means that we can also use this feature to move a node from one computer to another computer. In a cloud setting it can thus be used to move from one VM to another VM.
Another advantage of this feature is that it supports temporarily increasing the number of replicas. This means that we can perform online changes even with only 1 replica. We temporarily add a second replica to the RonDB cluster and perform the required changes. When the changes are done, we can remove the extra data node again.
Thus RonDB supports a very flexible model of operation in a cloud setting.
To support this feature three new commands have been added in the management client.
2 ACTIVATE 2 HOSTNAME 192.168.1.113 2 DEACTIVATE
The above commands activates/deactivates node 2 and the HOSTNAME command changes the hostname of node 2 to 192.168.1.113. A node must be deactivated to make it possible to change its hostname.
Here is the output from show for a cluster with 3 replicas, 2 of them active, it has the possibility to run with 2 MGM servers, but only one is active, it has the possibility to run with 4 API nodes, but only one of them is active:
ndb_mgm> show Cluster Configuration --------------------- [ndbd(NDB)] 3 node(s) id=3 @127.0.0.1 (RonDB-21.04.8, Nodegroup: 0, *) id=4 @127.0.0.1 (RonDB-21.04.8, Nodegroup: 0) id=5 (not connected, node is deactivated) [ndb_mgmd(MGM)] 2 node(s) id=1 @127.0.0.1 (RonDB-21.04.8) id=2 (not connected, node is deactivated) [mysqld(API)] 4 node(s) id=6 (not connected, node is deactivated) id=7 (not connected, node is deactivated) id=8 (not connected, node is deactivated) id=9 (not connected, accepting connect from any host)
For API nodes this also makes it possible to make RonDB safer. It makes it possible to configure RonDB with all API nodes only able to connect from a specific hostname. It is possible to have many more API nodes configured, but those can not be used until someone activates them. Before activating them one should then set the hostname of those nodes such that only the VM that is selected can connect to the cluster.
Thus we can ensure that no one can access the cluster unless they are using trusted servers.
Combining this feature with the ability to add new node groups online means we can scale a single node cluster running on 4 CPUs to a 64 data node cluster running on 128 CPUs each with 1 TByte of memory and 10 TByte of disk space each. Every step in this scaling from small to extremely large can be performed as online operation without any downtime. Thus a cluster can be up and running for decades without any downtime while growing and shrinking as required by the application.
It is necessary to start a node that has been recently activated for the first time using the --initial flag. If not used we could get into a situation where the nodes don't recognize that they are part of the same cluster.
When moving a node to a new VM there are two possibilities for this. One possibility is that we use cloud storage as disks, this means that the new VM can reuse the file system from the old VM, in this case a normal restart can be performed. Thus the node is first deactivated (includes stopping it), the new hostname is set for the new VM, the node is activated and finally the node is started again in a new VM.
The other option is where the node is moved to a VM without access to the file system of the old VM, in this case the node need to be started using --initial. This ensures that the node gets the current version of its data from other live nodes in the RonDB cluster. This works for both RonDB data nodes as well as for RonDB management nodes.