Prepare primary cluster#

In this tutorial, we will discuss how to prepare a primary cluster to replicate its changes to a secondary cluster. We will not spin up the secondary cluster yet.

We will discuss how both starting and operating a cluster will need to be altered. The regular startup and operation of a cluster are described in the chapters Local quickstart and Operating a cluster.

Starting a cluster#

The main difference to starting a standard cluster, is that we now also start MySQL replication servers. In the case of the primary cluster, these will be the binlog servers.

When replicating from a primary cluster, all data in the primary cluster must be available either as:

a backup of the primary cluster
binlog files in the MySQL replication servers of the primary cluster
a mixture of both; in this case, the backup is a condensed version of data to which the binlog is appended

Therefore, the primary binlog server must be started either:

before data is written to the cluster (restoring a backup does not count)
before a backup is taken

A backup binlog server on the other hand can also be started at a later point in time, assuming that the primary has not lost any binlog data. As mentioned in the section GAP events, any missing data in the binlog files of the primary binlog server will cause the replication channel to stop.

When setting up multiple binlog servers, each will attempt to persist all binlog files since its startup. This makes it acceptable for a given GAP event to occur in one binlog server, as long as it is not present in all binlog servers. Therefore, we will be setting up two MySQL replication servers in the primary cluster. In this example, we will be using server_id=100 for the primary binlog server and server_id=101 for the backup binlog server:

Primary binlog server Backup binlog server

mysqld \
    --server-id=100 \
    --ndbcluster \
    --log-bin=mysql-bin \
    --ndb-log-bin=1 \
    --sync-binlog=1 \
    --binlog-format=ROW \
    --binlog-cache-size=1M

mysqld \
    --server-id=101 \
    --ndbcluster \
    --log-bin=mysql-bin \
    --ndb-log-bin=1 \
    --sync-binlog=1 \
    --binlog-format=ROW \
    --binlog-cache-size=1M

If one is interested in applying active-active replication with only two clusters in total in the upcoming tutorials, one can also apply the parameter log-replica-updates=OFF. In active-active, the primary cluster will also contain replica appliers. Using this parameter, the binlog servers will not write the replica applier’s changes into the binlog. This will save disk space and I/O resources on the binlog servers.

These and further configuration parameters are also listed here.

Confirm binlog is active#

To check if the binlog is active, one can use the following command in the MySQL Server:

SHOW VARIABLES LIKE 'log_bin';

This will return either ON or OFF.

If the binlog was not active, one can still add a MySQL replication server using the --log-bin option. However, since one is missing the previous binlog files, one must then run a backup of the cluster.

Heartbeat table (optional)#

The replication is only active when something is written to the database. To ensure that there is always activity, we create a custom heartbeat_table that we define in the primary cluster. All writes to the table will be replicated to replica clusters. This can be a simple table that looks as follows:

CREATE DATABASE heartbeat_db;
CREATE TABLE heartbeat_db.heartbeat_table (
    server_id INT NOT NULL PRIMARY KEY,
    counter BIGINT NOT NULL
) engine NDB;

Here, every entity hosting a MySQL replication server can run a user-defined script to continuously insert and update a row corresponding to the MySQL server’s server_id. We then use a counter increment on each write to ensure that the row changes and thus is replicated on the other side:

Primary binlog server	Backup binlog server
`UPDATE heartbeat_db.heartbeat_table SET counter=UPDATE_COUNTER WHERE server_id=100;`	`UPDATE heartbeat_db.heartbeat_table SET counter=UPDATE_COUNTER WHERE server_id=101;`

This will also require a corresponding MySQL user with the necessary privileges to be created on the primary cluster.

Purging MySQL binlog files#

MySQL Replication binlog files have a user-defined size. After completing the write of one binlog file, the file is closed and a new one is written. Restarting the MySQL Server will also cause a new binlog file to be created.

Binlog files will either be purged by the DBA using the command:

PURGE BINARY LOGS TO <binlog_file>;

or using the MySQL configuration parameter binlog_expire_logs_seconds. Since the size of total written binlog files over a given time period is indeterministic, it is however important to implement a strategy for purging binlog files. As long as one is not replicating to a secondary cluster, this strategy can be as simple as regularly running:

create backup
purge all binlog files that are contained in the backup

We will show how to implement this strategy, but beware of changing it when setting up a replication channel. This is because replication will fail if binlog files are purged before they are replicated.

Identify purgeable binlog files#

When creating a backup, one wants to know which epochs were contained in it and therefore which binlog files can be purged. Here, there are two solutions:

(Simple but inaccurate) Check the current binlog file before triggering a backup

For this solution, one runs this command on the binlog server before triggering a backup:

> SHOW MASTER STATUS;
+---------------+----------+--------------+------------------+-------------------+
| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+---------------+----------+--------------+------------------+-------------------+
| binlog.000001 |     5847 |              |                  |                   |
+---------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

Depending on how much time passes between this command and the start of the backup, the binlog may have moved beyond binlog.000001. Hence, this solution can be inaccurate.

(Complex but accurate) Extract epoch number from backup

When running the backup command via the ndb_mgm client, the command output will contain the StopGCP number. This can be used to calculate the epoch number of the backup.

In the example below the StopGCP is 3845:
```
ndb_mgm> START BACKUP
Waiting for completed, this may take several minutes
Node 2: Backup 1 started from node 1
Node 2: Backup 1 started from node 1 completed
    StartGCP: 3839 StopGCP: 3845 
    #Records: 193234 #LogRecords: 12349
    Data: 20519789 bytes Log: 1334560 bytes
```
A 64-bit epoch number can then be formed as follows:
```
BACKUP_EPOCH=<32-bit StopGCP><32-bit <0xffffffff>
```
This epoch number is the last epoch number that was backed up. Using this, one can now identify which binlog files can be purged. This can be done by querying the ndb_binlog_index table in the primary cluster:
```
SELECT SUBSTRING_INDEX(next_file, '/', -1)
    FROM mysql.ndb_binlog_index
    WHERE epoch <= $BACKUP_EPOCH>
    ORDER BY epoch DESC LIMIT 1;
```
(Next tutorial) Extract epoch number from secondary cluster after restoring backup
(Next tutorial) Extract replicated epoch number from secondary cluster without restoring backup

In all cases, we store the result in the shell variable BINLOG_FILE. Now, all binlog files before this one can be deleted. We run:

PURGE BINARY LOGS TO $BINLOG_FILE;

Handling data gaps#

Whenever a data gap occurs in the binlog, one should trigger a backup. The following scenarios can cause a data gap in the primary cluster:

All binlog servers contain a given GAP event

This can best be mitigated by placing the binlog servers into different data centers or availability zones.
All binlog servers run out of disk space

This can be mitigated by adding a new binlog server before the other binlog servers run out of disk space. One could set a maximum threshold for the total binlog size to grow before triggering a backup and scaling down again. However, if we are not replicating to a secondary cluster, we will regularly hit this limit.
All binlog servers die

Generally, it is safer to place the binlogs on block storage and persist them across binlog server restarts. However, if all binlog servers die, this would not help.

Let’s assume we have detected a data gap and triggered a backup. The backup has now finished. Even if we have a secondary cluster running that is dependent on our binlogs, we can now safely purge the binlog files without checking how far the secondary cluster has replicated. This is because the secondary cluster will have to be restarted using the backup in any case. The next section will explain which binlog files can be purged.

Persisting binlog files#

In a primary cluster with a single binlog server, a binlog server that has failed will highly likely have data gaps from the time it was down. Since a replication channel that hits a gap will automatically stop, the binlog server will soon become useless. This outcome would be unaffected by whether the binlog files had been persisted across the binlog server restart. Thereby, persisting the binlog files would not be very effective.

The situation changes however if one has multiple binlog servers in the primary cluster. This is shown in the figure below:

Here, we first have an active replication channel between binlog server 1 and replica applier 1. Then, binlog server 1 fails, so the replication channel switches and starts running between binlog server 2 and replica applier 2.

Importantly, binlog server 2 had failed earlier to this switch but had persisted its binlog files across its restart. This was vital because these persisted files contained the data gap produced by binlog server 1’s failure. This gap had still not replicated to the replica cluster. By keeping the previous binlog files, binlog server 2 enabled the global replication setup to survive.

Note that when persisting binlog files, one should also persist the ndb_binlog_index table. This table is local to every MySQL server. By default, it will be stored in the MySQL datadir. If the binlog directory is persisted, but the datadir is not, the binlog files will not be useable.