Foreign Key handling#
Foreign keys is a concept in SQL that enables the application to create constraints on the data in the tables.
A foreign key is defined on the table that is referred to as the child table. The parent table is the referenced table. The parent table must be created before the child table can create a foreign key referencing it.
The idea with a foreign key is that the child table have a reference in its table that gives a unique reference to a row in the parent table. The reference consists of one or more columns, normally these columns are simply the same columns as the primary key of the parent table.
We will call the columns in the child table the reference columns and the columns in the parent table the referenced columns.
Here is a simple example of a foreign key constraint.
mysql> CREATE TABLE tab_parent (a int primary key) engine NDB; mysql> CREATE TABLE tab_child (a int primary key, b int) mysql> INDEX (b) mysql> FOREIGN KEY (b) REFERENCES tab_parent(a) mysql> ON UPDATE SET NULL mysql> ON DELETE SET NULL mysql> ENGINE NDB;
The first table in this example, the parent table, has only one column, the primary key column a, this is the referenced column. The second table, the child table has a reference to the parent table in column b, this is the referencing column, and it has its own primary key column a.
It is required that the foreign key columns in the child table have an index defined on the referencing columns. This is required to ensure that the foreign key actions can be executed efficiently. Note that there can be multiple rows in the child table referring to the same row in the parent table. Thus the naming parent-child. There is only one parent row, but there can be multiple child rows referring to the parent.
The parent table must have a primary key or a unique key defined on the referenced columns.
In InnoDB it is sufficient to have an index defined on the parent table. But again the reference to the rows in the parent table should be referring to one and only one row, for all practical use cases the parent table should have a primary key index or a unique key index defined on the parent table for the referenced columns (also true for InnoDB).
The foreign key constraint is either checked immediately when the change is done or checked in a special pre-commit phase before the actual commit starts, but after all modifications have been prepared in the transaction. This is done through a read on the parent table, if the read fails the transaction is aborted.
If the reference columns are set to NULL no trigger will be fired, the trigger fires already in the prepare phase, the transaction coordinator either executes it immediately or stores the trigger until the pre-commit phase and when the pre-commit phase starts it will send off all stored pre-commit triggers.
Thus the child table have a trigger defined on changes of the reference columns. The trigger fires on all INSERTs and on all UPDATEs that change the reference columns. A DELETE cannot cause any inconsistency in this case, so no need to trigger at DELETEs.
The updates on the referencing columns and inserts into the child table are checked immediately except when we are using ON UPDATE NO ACTION. In the NO ACTION case the check is deferred until the pre-commit phase. In the checks in the pre-commit phase we check the final value committed and ignore any intermediate states we have passed through during the execution of the transaction.
The handling of NO ACTION differs from InnoDB. The handling in InnoDB for NO ACTION is the same as for RESTRICT. In RonDB we decided to follow many other DBMSs that use NO ACTION to indicate that checks are deferred to commit time.
Now if the row in the parent is deleted or if the referenced columns are updated it will affect the child row as well. Here we have a number of different options of what to do. We can CASCADE the changes to the child table such that they continue to refer to the correct row. We can SET NULL on the referencing columns (only if setting those to NULL is allowed). We can RESTRICT the change (the transaction will be aborted).
We define this when we define the table. The default is RESTRICT. We define ON DELETE to specify what to do when a row in the parent table is deleted. We specify ON UPDATE to specify what happens when we change the referenced columns and when we update the referencing columns in the child table and when we insert into the child table. For the child table the only option that differs is NO ACTION that defers the check to the pre-commit phase. All other options means that changes in the child table are immediately verified.
When CASCADE is used we rely on foreign key handling to remove the rows in the child table when a delete occurs in the parent table.
CASCADE is potentially a very dangerous mode. One delete could delete an entire database since the cascading delete could lead to a new delete which in turn leads to a new cascading delete and so forth. If this option is used it is important that the application developer have very good control over his changes.
ON UPDATE CASCADE is only supported if the referenced columns form a unique key. It will be rejected if the referenced columns form a primary key. The reason is that it isn't possible to update the primary key in RonDB, an update of the primary key is performed by first dropping the row and then reinserting the row.
SET NULL can trigger a fair amount of updates, in this case the change can at least not cascade into even more tables. Setting it to NULL in the foreign key trigger ensures that at least the tables have no incorrect references.
The RESTRICT is used when the application expects to take care of any changes to ensure that they start with a consistent database and ends with a consistent database. However the application developer wants the MySQL Server to ensure that the changes are consistent. This is in my view the proper usage of foreign keys in an application with requirements on predictable latency.
NO ACTION is the same as RESTRICT except that for RESTRICT the check of the foreign key relation is immediate whereas for NO ACTION it is deferred to commit time. This means that using RESTRICT you should always drop or change the child rows before the parent rows. Using NO ACTION it is sufficient to ensure that the entire transaction is consistent, the order of changes do not matter.
In RonDB, foreign keys are checked and verified in the RonDB data nodes. Thus if you define a foreign key on a table, this foreign key relation will be checked and maintained by transactions performed from applications using native RonDB API applications as well as SQL applications.
This means that updating the primary key in a parent table that have a primary key on the referenced columns is only possible using the SET NULL option. For RESTRICT and NO ACTION the DELETE trigger and INSERT trigger will ensure that the update of the primary key is disallowed and ON UPDATE CASCADE on a primary key is not even allowed.
Foreign keys have a fairly high overhead in that all changes requires checks and those checks will extend the time for the transactions. It is seldomly used for very time-critical applications. It is a great aid in quickly developing new applications that need to maintain consistency of the database.
Most applications using foreign keys in InnoDB should transfer easily into RonDB since for foreign keys the differences between InnoDB and RonDB are very small.
Foreign keys can be created when creating the table. Foreign keys can be created using the ALTER TABLE command. They can also be dropped as part of the ALTER TABLE command. Both creation and dropping of a foreign key is an online operation that can happen while the table is updated.
Dropping a table while it is involved in a foreign key relation is not allowed since that would be a breach of the foreign key constraint.
InnoDB supports a special parameter called foreign_key_checks that disables all foreign key checks if set to 0. This will not have any effect on RonDB. Once a foreign key have been defined in RonDB it will perform the foreign key checks. Foreign keys can be created and dropped as online operations in RonDB, this removes the need to use this parameter.
Foreign keys is well integrated with Global Replication. Even if the replica side doesn't have the foreign keys defined on the tables, the triggered actions will still be performed on the replica side since all row operations that are part of a transaction will be sent to the binlog. The binlog stores all row changes for a transaction and this means that the replica side do not need to understand how those row changes came about.
Foreign key relations can only exist between tables using the same storage engine, thus the foreign key relations from an RonDB table must refer to another RonDB table.
Foreign key considerations#
Given that one focus of RonDB is real-time applications it is important to consider how to use foreign keys. Foreign keys is a strong concept to support application constraints. Foreign keys uses triggers to secure that any inserts, updates and deletes leaves the database with its constraints still maintained.
Foreign keys can create problems in three ways. The first is that a delete can trigger a delete of a record that in turn triggers another delete and so forth (called cascading deletes). Thus transaction sizes even for a single row change can grow to become any size. Often implementing application functionality directly using foreign keys is easy, but it requires that one analyses the data usage to ensure that not too big transactions are created that create problems with real-time aspects of RonDB.
The preferred options to avoid these large transactions are RESTRICT and NO ACTION.
The second problem is when creating the foreign key on a live table. If we create the foreign key in a system that are constantly changing data, it means that we cannot be absolutely sure that foreign key constraints are valid when we created the foreign key. We worked hard on ensuring that building a foreign key is done correctly such that if data is consistent, the foreign key constraints should be ok when the foreign key is created.
The third potential problem is when dropping a foreign key. In this case we drop the triggers at some point in time, after this drop we no longer maintain the foreign key relation. Such a drop is not possible to serialise towards all other parallel changes in a distributed database without a global cluster lock which isn't desirable in a real-time database that is supposed to be AlwaysOn.
The recommended manner to use foreign keys is to create them when creating the tables and dropping them when dropping the tables and ensuring that the cascading actions don't cause extremely large transaction sizes.
We do work hard to ensure that both create and drop of foreign keys can be done in a safe way also while done in parallel with application changes of the data.