Managing Management Servers#
The management server has the important task of containing the cluster configuration and passing this on to other nodes. These will always request it at startup via a network call - this is why rolling restarts are required with configuration changes.
Management servers themselves use the following 3 ways to load the configuration:
Configuration database (using
Configuration file (using
For any configuration change, there must be unanimous consensus
amongst all management servers with
NodeActive=1. This means that if
one management server is down, no configuration changes can be made. Any
starting MGMd will also not be available until it knows that it is
part of the consensus. If one MGMd dies, another MGMd can however
continue forwarding configurations since it knows that it contains a
valid configuration and no change can be agreed without it.
Any new agreement made via the consensus will be persisted in the
configuration database. This is a binary file that is shared amongst
all active MGMds. It has the format
ndb_<node id>_config.bin.<sequence number>, whereby the sequence
number is bumped for every configuration change.
Any start of an MGMd requires at least the argument where to read and write the configuration database. In our case:
If a configuration database exists and we are not using the
config.ini file will be ignored. When using the
flag, the sequence number will be reset to 0 and previous configuration
databases will be removed.
Changes to the configuration are either done by restarting a MGMd using
--reload flag or by using the management client
Using the management client is particularly convenient, since it does not require a rolling cluster restart afterwards. As mentioned earlier, the management client supports
changing nodes’ hostnames (when inactive)
--reload flag is on the other hand needed if one wishes to:
Add, remove or change node slots (beyond activating / deactivating)
Add node groups
Change fields of the
The following table shows the lifecycle of using multiple MGMds, whereby we run the following actions:
Move MGMds between hosts
Restart MGMds upon failure
Change the cluster configuration via the management client
Change the cluster configuration via the
|Manual: Create config_v0.ini and replicate across hosts
|🔄 Consensus v0 🔄
|⬆️⬆️⬆️ Rest of cluster starts ⬆️⬆️⬆️
|❌ MGMd crashes unexpectedly
ndb_mgm -e "65 deactivate"
|🔄 Consensus v1 🔄
|⬇️ MGMd goes down
|Manual: Persist change to config_v1.ini and replicate across hosts
🔄 Consensus v3 🔄
🔄 Consensus v4 🔄 Note: Edge-case of consensus without agreement
|Manual: Persist change to config_v4.ini and replicate across hosts
|🔄 Consensus v4 🔄
TotalMemoryConfig in config_v5.ini and replicate across
|🔄 Consensus v5 🔄
|🔄🔄🔄 Rolling cluster restart 🔄🔄🔄
A thing that may become apparent is that persisting and replicating
changes to the config.ini file is very important. If any MGMd is ever
restarted with the
--reload parameter, it will use the config.ini
file. If the changes have not been persisted, a reload may use an
entirely outdated config.ini file, which could break the cluster.
Handling MGMd Machine Failures#
One may have noted that an irretrievable MGMd machine is problematic
since it blocks our ability to change the configuration. The only way
out of this situation is to escape consensus and start a new cluster
configuration sequence. This is done via the
Continuing from the previous table, this situation can be handled as follows:
|❌ Host crashes unexpectedly
Hostname=HOST_1 for node 65 in config_v6.ini and replicate
|🔄 Consensus v0 🔄
|🔄🔄🔄 Rolling cluster restart 🔄🔄🔄
This shows how we use an old configuration file v6 to start a new cluster sequence v0.
An issue with an irretrievable host is however that one may not know whether it is down or whether one has a network partition. One does not want two partitions running with different configurations.
Fortunately, RonDB uses an arbitrator to handle partitions. If a partitioned cluster has a minority of data nodes, they will simply fail directly. If both partitions contain 50% of data nodes, the first partition that contacts the arbitrator will survive.
Therefore, when using
--initial, one can first check whether any
data nodes are running. If so, one is in the winning partition and can
continue changing the configuration. If not, one leaves the partition
idle. To check whether any data nodes are running, one can run the
ndb_mgm -e "show".
Adding a MGMd Slot#
In contrast to activating a MGMd node slot, adding a new MGMd node
slot will also require an
--initial restart of the live MGMd. This
conforms more to the consensus idea - the live MGMd should not be
available until it knows that the other MGMd has agreed to the
If not wanting two active MGMds, we therefore still recommend adding one inactive MGMd slot. This will avoid rolling cluster restarts when moving the MGMd to another host.
Number of Running Management Servers#
When deciding the number of running management servers, one should take into account that:
Data nodes cannot start up without a running MGMd
Live MGMds require an initial restart if another MGMd host is down
An initial MGMd restart requires a rolling restart of the cluster
Setting up a cluster with multiple running management servers therefore has both pros and cons:
+ Data node process recovery is more stable
- The cluster is more likely to require rolling restarts once in a while