Node groups (Shards)#

A basic theme in RonDB is the use of node groups. The RonDB data nodes are formed in node groups. When starting the cluster one defines the configuration variable NoOfReplicas. This specifies the maximum number of nodes per node group. When defining the nodes in the cluster the data nodes are listed in the NDBD sections. The data nodes will be put automatically into node groups unless specifically configured to be in a node group.

E.g. if we define 4 nodes with node IDs 1,2,3 and 4 and we have NoOfReplicas=2, nodes 1 and 2 will form one node group and nodes 3 and 4 will form another node group.

Node groups work similarly to shards. Thus all the nodes in a node group (shard) are fully replicated within the node group. In the example above, node 1 and node 2 will have the same partitions and the same data. Thus all rows that exist in node 1 will also be present in node 2. The same holds for node 3 and node 4. However, none of the rows in node 3 and node 4 will be present in node 1 and node 2. An exception is tables using the fully replicated feature.

The main difference to standard sharding solutions is that RonDB supports foreign key relations between node groups and it supports distributed queries over all node groups. Due to these features, a cluster will fail if all nodes in one node group fail. Nonetheless, the fact that our data is split up into node groups means that we can generally handle a higher number of node failures. More information on how node groups make the cluster more resilient and how they handle network partitioning can be found in the Resiliency Model.

Partitioning#

Regardless of node groups, RonDB also partitions its data (also called fragments). To explain how partitioning works, we will first define the two following terms for convenience:

Number of partitions per table (NOPPT): the number of unique partitions that each table is split up into. It does not take into account any replication.
Number of partitions (NOP): the total number of unique partitions in the cluster
Active replication factor (ARF): this is the replication that takes into account the NodeActive=1 for node slots in the config.ini. It is less or equal to NoOfReplicas. It should be constant across node groups, but it is not a must.

Now:

NOPPT = PartitionsPerNode x #Nodes
NOP = NOPPT x #Tables
#Nodes = #NodeGroups x NoOfReplicas

whereby both PartitionsPerNode and NoOfReplicas are defined in the config.ini file. The usage of NoOfReplicas means that we also take into account the inactive node slots in the config.ini.

The number of times a partition is replicated is defined by the ARF of the node group the partition is in. When replicated, a partition always has a primary replica and 0-2 backup replicas (depending on ARF). A node will contain both primary and backup replicas of different partitions. The primaries will require more CPU resources.

The following diagram illustrates how partitions are distributed across node groups, taking into account both NoOfReplicas and ARF (both are 2 in this case). Note that the numbering of partitions is arbitrary and simply shows that they are unique.

Each table is split up into NOPPT partitions, which equals 8 in this example (2 x 2 x 2). Since we have two tables, we have 16 unique partitions. Furthermore, since our ARF is 2, we have 32 live partitions in total.

Since our node groups do not share any data, #NodeGroups do not affect how many partitions will be on one node. If we increase #NodeGroups, we will increase NOP, but the new partitions are all placed onto the nodes of the new node group. NoOfReplicas on the other hand directly influences how many partitions will be on one node.

So why is NoOfReplicas a factor of NOPPT? The answer is that it allows us to always have a balanced amount of primary and backup replicas per node. Let’s take a single node group, PartitionsPerNode=2 and NoOfReplicas=3. Let’s further say that:

NOPPT = PartitionsPerNode x #NodeGroups = 2 x 1 = 2

whereby NoOfReplicas is ignored. The following table shows what this would imply:

	Node 1	Node 2	Node 3
Partition 1	Primary	Backup	Backup
Partition 2	Backup	Primary	Backup

As one can see, only nodes 1 and 2 could contain a primary partition. Node 3 would just have backup replicas. This would mean heavy querying of a single table would create a load imbalance in the cluster.

What it looks like instead, is this (NOPPT = 2 x 1 x 3 = 6):

	Node 1	Node 2	Node 3
Partition 1	Primary	Backup	Backup
Partition 2	Primary	Backup	Backup
Partition 3	Backup	Primary	Backup
Partition 4	Backup	Primary	Backup
Partition 5	Backup	Backup	Primary
Partition 6	Backup	Backup	Primary

Here, all nodes have a balanced amount of primary (2) and backup partition replicas (4).