Operating a Fault-tolerant Cluster#
In contrast to the Local Quickstart Tutorial, we will here show how to operate a fault-tolerant cluster using multiple machines. This also means that we need to discuss setup parameters that could previously be ignored. These are:
-
Availability Zone (AZ) setup
-
NVMe drives vs. block storage
-
Adding security measures
-
Adding/removing machines
-
Rolling software up- / downgrades
To make it more straightforward to implement, this chapter will also give basic guidelines on hardware requirements and example network setups. It can therefore also be viewed as a tutorial.
Managed RonDB#
The following sections show how to operate and optimize a distributed and resilient RonDB cluster. All of this knowledge is used to run both in Managed RonDB, and RonDB on Kubernetes, which will be published soon.
Whilst Kubernetes uses a well-known operator-based approach, Managed RonDB runs a distributed agent on each machine. This agent also works with a desired state and makes sure the cluster converges towards it using a reconciliation loop. It uses the Raft protocol to both decide the leader that runs the reconciliation loop and to share a state between agents. It can be tested locally using RonDB Docker.