Schema Transaction Protocol#
The Schema Transaction Protocol is a protocol that is used to handle schema changes (i.e. MySQL metadata) in a distributed manner.
The metadata changes RonDB supports are listed here. The current implementation of schema transactions is limited to supporting the subset of Data Definition Language (DDL) operations that MySQL supports:
This also means that the protocol is limited to operations on a single table.
Cluster Limitations during Schema Transactions#
Whilst a schema transaction is running, the following limitations apply:
No other schema transactions can be run
Backups cannot be taken
Data nodes cannot be (re-)started
Thereby, one should take into account when to run schema transactions.
Schema transactions are not fast, but they are very reliable and can handle arbitrarily complex schema changes. We use a lock to ensure that only a single schema transaction can be run at a time.
A schema transaction can be run via the NDB API or via a MySQL Server. Specifically creating tables is however recommended to run via the MySQL Server. The reason is that the MySQL Server creates files that needs to exist for a table to be accessible for MySQL servers.
When using the MySQL server, it will initiate the protocol and communicate with the master data node. The master data node will then communicate with the other data nodes. Between the data nodes, it is the DBDICT block running the protocol.
A SQL statement to change the schema will be broken down into a set of smaller schema operations. For example, creating a table might consist of the operation to add a new table, a new hash map and a new set of indexes (hashed and ordered). It might also add foreign keys to the table.
Each operation is run via a set of 8 different phases. Each of these phases has a corresponding roll-back mechanism. Each transaction will define a commit point amongst its schema operations. If a node fails before the transaction has reached the commit point, every phase of all finished operations will roll-back. If a node (or even the cluster) fails after the commit point, the surviving / restarted nodes will continue with the transaction. This is called a roll forward.
Each data node stores its completed phases / operations in the local schema.log file.
Recovery & Restoring Schema#
Apart from the schema operations, the aforementioned schema.log file will contain information about tables, indexes, partitions, foreign keys and other meta data objects. Thereby, a data node starting up can read from this file in order to:
restore its schema
finish or abort a schema transaction
ignore data in the REDO log and local checkpoints that has been dropped in the schema
If a transaction has passed its commit point, the schema.log file should suffice to finish the transaction locally on a restarting data node and only run the necessary remaining operations.
The most complex schema transaction we support is the operation to reorganize a table to make use of new node groups added since the table was created. This is done via the following statement:
This involves creating a set of new fragments on the new node groups. It also means that we have to copy over the data from the old table partitions to the new table partitions.