Operating a Fault-tolerant Cluster#

In contrast to the Local Quickstart Tutorial, we will here show how to operate a fault-tolerant cluster using multiple machines. This also means that we need to discuss setup parameters that could previously be ignored. These are:

Availability Zone (AZ) setup
NVMe drives vs. block storage
Adding security measures
Adding/removing machines
Rolling software up- / downgrades

To make it more straightforward to implement, this chapter will also give basic guidelines on hardware requirements and example network setups. It can therefore also be viewed as a tutorial.

Managed RonDB#

The following sections show how to operate and optimize a distributed and resilient RonDB cluster. All of this knowledge is used to run both in Managed RonDB, and RonDB on Kubernetes, which will be published soon.

Whilst Kubernetes uses a well-known operator-based approach, Managed RonDB runs a distributed agent on each machine. This agent also works with a desired state and makes sure the cluster converges towards it using a reconciliation loop. It uses the Raft protocol to both decide the leader that runs the reconciliation loop and to share a state between agents. It can be tested locally using RonDB Docker.