MySQL Replication Internals#

The following chapter will describe the internals of the MySQL replication. This includes the configuration parameters related to MySQL replication and the internal MySQL tables used for replication.

MySQL configuration paramaters#

Basics#

server-id: Each MySQL replication server must have its unique server ID. The responsibility to make it unique and set up the replication configuration lies on the DBA. The server ID is a number that must be unique among all the MySQL replication Servers in a global replication setup. This number is in no way related to any other number and thus, not related to node IDs. These numbers can be selected in any fashion. If not using the MySQL Server for replication, it is not necessary to set server_id, it will default to 0, but won’t be used for anything.
ndbcluster: Use NDB (RonDB) as a storage engine
ndb-extra-logging=99: Can be useful for debugging purposes

Binlog servers#

Starting up the MySQL replication servers in the primary cluster should look approximately as follows:

mysqld \
    --server-id=$id \
    --ndbcluster \
    --log-bin=mysql-bin \
    --ndb-log-bin=1 \
    --sync-binlog=1 \
    --binlog-format=ROW \
    --binlog-cache-size=1M

The following describes what these (and further) configuration parameters do:

log-bin (optional): Set the base filename of the binlog files
binlog-format=ROW: RonDB only supports row-based replication; this will use row-based replication for normal operations
ndb-log-bin=1 (disabled by default): Activates replication from the NDB storage engine; If disabled, only metadata statements will be recorded in the binlog
disable-log-bin: Disable the binlog for a given MySQL server
ndb-log-updated-only=ON (default): Only log the changed columns in the binlog after a write transaction is completed
ndb-log-update-as-write=ON (default): Log updates as writes in the binlog for idempotency; set to OFF if replicating to another storage engine or if using a conflict detection
sync-binlog=1: Ensure that the epochs are written to disk as soon as they are completed
ndb-log-empty-epochs (disabled by default): Log epochs that did not perform any changes. It is not recommended to use this feature.
log-replica-updates=ON (default): Assume having an active-active setup between cluster A and cluster B and our binlog server X is in cluster B. If this parameter is enabled, binlog server X will log all writes coming from cluster A into its binlog even if it is not acting as a replica applier. To avoid circular replication, it is then necessary to use IGNORE_SERVER_IDS in the CHANGE REPLICATION SOURCE TO command of cluster A. If log-replica-updates=OFF, this will not be necessary. In this setup, disabling this parameter is recommended, since it spares I/O resources on the binlog server of cluster B. However, if there is a cluster C that is replicating from cluster B, it can be helpful to enable this parameter. Cluster C will then not need to merge binlogs from cluster A and cluster B.

Replica appliers#

Thus starting up a MySQL replication server in the backup cluster should at least contain the following options:

mysqld \
    --server-id=$id \
    --ndbcluster \
    --skip-replica-start \
    --replica-allow-batching

The following describes what these (and further) configuration parameters do:

RonDB 22.10

MySQL Cluster 8.0.35

RonDB 21.04

MySQL Cluster 8.0.23

Functionality

skip-replica-start skip-slave-start Avoid that the replica replication starts before we are ready

replica-allow-batching slave-allow-batching Allow batching of operations at the replica

If the MySQL server only operates as a replica applier, one does not need to configure the binlog.

MySQL Replication Tables#

Global Replication uses a set of internal tables to provide information about replication status. These tables are created when installing a MySQL Server.

The ndb_binlog_index table is an InnoDB table that is local to each MySQL Server used to replicate cluster tables to another cluster or another storage engine.

The ndb_apply_status, ndb_schema and ndb_replication tables are RonDB tables.

All those tables belong to the mysql database.

ndb_binlog_index#

The ndb_binlog_index table contains an index on what has been written in the local binlog in a MySQL Server handling Global Replication. By default, it will contain one row for each epoch that performed changes and has been written to the binlog. This table is created by the MySQL installation program.

The table looks as follows:

CREATE TABLE ndb_binlog_index
(
  Position bigint unsigned NOT NULL,
  File varchar(255) NOT NULL,
  epoch bigint unsigned NOT NULL,
  inserts int unsigned NOT NULL,
  updates int unsigned NOT NULL,
  deletes int unsigned NOT NULL,
  schemaops int unsigned NOT NULL,
  orig_server_id int unsigned NOT NULL,
  orig_epoch bigint unsigned NOT NULL,
  gci int unsigned NOT NULL,
  next_position bigint unsigned NOT NULL,
  next_file varchar(255) NOT NULL,
  PRIMARY KEY (`epoch`, `orig_server_id`,`orig_epoch`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

The following assumes knowledge of the Global Checkpoint Index (GCI). More on the GCI can be found in the chapter on the Global Checkpoint Protocol.

Position: specifies the start position of the epoch in the binlog
File: the filename of the file where the start position is placed
epoch: lists the 64-bit id of the epoch number consisting of the 32-bit GCI and a 32-bit micro-GCI. The GCI is also reported in the column gci.
inserts, updates, deletes and schemaops: count count how many of those operations the epoch consists of
next_file and next_position: point to the position and file of the next epoch after this one
orig_server_id: updated when the MySQL server variable ndb-log-orig is set to 1 (defaults to 0). In this case, it is set to the originating server id of this epoch and orig_epoch contains the epoch number in the originating cluster.

When circular replication is used, each epoch will generate as many entries in this table as there are clusters in the circle. Assume we have cluster A with MySQL server with server_id=1 and cluster B with MySQL server with server_id=2. Further, assume that epoch 13 is originated in cluster A and is now applied to cluster B in epoch 209. This epoch 209 will thereby generate two rows in this table:

(epoch=209, orig_epoch=13, server_id=1)
(epoch=209, orig_epoch=209, server_id=2)

These multiple entries in the ndb_binlog_index table are used by the conflict detection code as we will later discuss.

If one generally wishes to also log epochs that did not perform any changes, one can apply the MySQL server configuration parameter --ndb-log-empty-update. We do not recommend using this feature since it produces a lot of rows in this table. One should rather have a failover solution that has a regularly updated heartbeat table (we will discuss this in some detail in later chapters). This ensures that we minimize the overhead of keeping the replication channel active such that we know when to fail over to the backup replication channel.

ndb_apply_status#

The ndb_apply_status table contains rows that describe epochs that are written into the cluster by a replica applier. Updates to this table are not generated by events in the cluster - they are generated by a MySQL replication server acting as a binlog server. Thus in the primary cluster, this table will be empty. The table will be filled in the backup cluster when applying the epoch in the replica applier. This table has a primary key that is the server id, thus each epoch will overwrite the values in this table. Only the latest applied epoch is represented in this table for each server id.

Each epoch with changes contains one write into this table.

CREATE TABLE ndb_apply_status
(
  server_id int unsigned NOT NULL,
  epoch bigint unsigned NOT NULL,
  log_name varchar(255) binary NOT NULL,
  start_pos bigint unsigned NOT NULL,
  end_pos bigint unsigned NOT NUL
  PRIMARY KEY (server_id) USING HASH
) ENGINE=NDB DEFAULT CHARSET=latin1;

server_id: contains the server id from the MySQL Server generating the binlog entry for the epoch. Thus it is the server id of the MySQL replication server in the primary cluster.
epoch: the epoch number in the primary cluster
log_name: the file name of the start position of the epoch
start_pos, end_pos: the start and end position in the binlog of the epoch.

Given that transactions are applied atomically and this write is part of the epoch transaction, it is possible to deduce if an epoch has been applied in the backup cluster by looking at this table.

Normally the writes to this table are not replicated from the backup cluster. But for globally distributed database implementations using multi-primary it is necessary to also replicate these changes towards the next cluster to ensure that conflict detection can work correctly. This is controlled by the ndb-log-apply-status parameter. If the server id is the same as written it will be ignored to avoid replication loops.

ndb_schema#

Schema changes (CREATE TABLE, RENAME TABLE, DROP TABLE, ALTER TABLE, etc.) are replicated as well. They are replicated through the aid of the ndb_schema table. When a schema change is executed, a row is inserted into the ndb_schema table. This table is never replicated to the replicas.

Each MySQL Server listens to events on this table. Thus each MySQL Server will hear about any schema modifications as part of the schema change process. In addition, each MySQL Server listens to events that provide detailed information about the resulting table(s) from the event.

The schema change is not necessarily executed in the same MySQL Server as where the binlog is written. By writing a row in the ndb_schema table we transport information to the MySQL replication server(s) about the schema change. We also ensure that the schema change is executed in a specific epoch.

CREATE TABLE ndb_schema (
    db varbinary(63) NOT NULL,
    name varbinary(63) NOT NULL,
    slock binary(32) NOT NULL,
    query blob NOT NULL,
    node_id int unsigned NOT NULL,
    epoch bigint unsigned NOT NULL,
    id int unsigned NOT NULL,
    version int unsigned NOT NULL,
    type int unsigned NOT NULL,
    PRIMARY KEY USING HASH (db,name)
) ENGINE=NDB DEFAULT CHARSET=latin1;

db: the database name
name: the table name
slock: a bitmap that is updated by each MySQL Server when they hear about the schema change. Thus through this table, we ensure that all MySQL servers are informed of all metadata changes. If a MySQL Server is down when the schema modifications occur, it can find about this schema change in the recovery phase by looking into this table.
query: the executed query. This might be a bit rewritten for multi-table schema changes since ndb_schema will only contain single-table schema changes.
node_id: contains the node id of the MySQL Server that executed the schema change
id: the table id
version: the table version
type: the table type
epoch: the epoch number of the schema change

ndb_replication#

The ndb_replication table has two major use cases. The first is that it can be used in normal replication setups to control which tables and what data should be replicated. The table doesn’t exist by default, thus the user must create the table and fill it with data for it to be used. If the table doesn’t exist the MySQL server settings through the options control what to replicate and by default, no conflict detection is used.

The table should be placed in the mysql database and below is the CREATE TABLE command used to create it:

CREATE TABLE ndb_replication (
  db varbinary(63),
  table_name varbinary(63),
  server_id int unsigned,
  binlog_type int unsigned,
  conflict_fn varbinary(128),
  PRIMARY KEY USING HASH (db, table_name, server_id)
) ENGINE=NDB PARTITION BY KEY(db,table_name);

db: the database name which can contain % and ? such that the database name becomes a regular expression. In this case, the columns will apply to all tables that have a database that matches the regular expression.
table_name: the table name, which can also contain regular expressions
server_id: if this is set to 0 it applies to all MySQL replication servers in the cluster. If set to a non-zero value it only applies to the server id listed in the table.
binlog_type: how we binlog the specified table(s). The values used here are defined in sql/ndb_share.h. The default settings for all tables are specified using the MySQL configuration parameters ndb-log-update-as-write, ndb-log-updated-only, and ndb-log-update-as-minimal. ndb-log-update-as-minimal=OFF for all options except 8 and 9. The 8 and 9 settings can however only be used in conjunction with the conflict detection functions NDB$EPOCH2 and NDB$EPOCH2_TRANS.
conflict_fn: the conflict function, which is covered in the chapter on the globally distributed database

The following describes the different values for binlog_type in more detail:

0 (NBT_DEFAULT): server defaults rules. This is the default if no row for a table exists or if the entire ndb_replication table is missing.
1 (NBT_NO_LOGGING): the table or set of tables will not be logged in the binary log.
2 (NBT_UPDATED_ONLY): the same thing as when ndb-log-updated-only and ndb-log-update-as-write are set. Thus only changed values are sent in events to MySQL replication servers and thus only changed columns are written into binlog. This is the recommended value for tables that are replicated using RonDB without conflict detection. Using the default settings of the configuration options will use this binlog type.
3 (NBT_FULL): leads to the full row being sent in an event from the data node and written into binlog. We will still only write the after image. Same behavior as when ndb-log-updated-only=OFF and no other changes to default settings.
4 or 6 (NBT_USE_UPDATE or NBT_UPDATED_ONLY_USE_UPDATE): leads to that updates are logged as updates and inserts as writes. This is the setting one gets by setting ndb-log-update-as-write=OFF and ndb-log-updated-only=ON. These settings are the ones to use for replication to another storage engine.
7 (NBT_FULL_USE_UPDATE): leads to the same as setting 6 but with ndb-log-updated-only=OFF. This is the setting when using conflict detection with either of the functions NDB$OLD(column_name) and NDB$MAX_DELETE_WIN(column_name).
8 (NBT_UPDATED_ONLY_MINIMAL): used for replication with conflict detection using either of the conflict detection functions ndb$EPOCH2 or NDB$EPOCH2_TRANS. It writes updates as updates and writes only the changed columns in the after image and only the primary key columns of the before image.
9 (NBT_UPDATED_FULL_MINIMAL): the same as 8 except that all non-primary key columns are written in the after image. This can be used to replicate to older versions of RonDB.