Managed RonDB#
In the above link the documentation for how to create a RonDB cluster as part of Hopsworks is presented. In this interface the user is able to decide on the following parameters
-
Number of Data Nodes
-
VM Type for the Data nodes
-
Storage Size for Data nodes
-
Number of MySQL Servers
-
VM Type for the MySQL Servers
-
Storage Size for MySQL Servers
In the Detailed section it is also possible to decide on the following parameters.
-
Number of Replicas
-
Number of API Nodes
-
VM Type of API Nodes
-
Storage Size for API nodes
The number of data nodes must be a multiple of the number of replicas.
The first decision is whether to use 1,2 or 3 replicas (4 replicas is also possible). If high availability isn't required, then 1 replica is sufficient. If high availability is required 2 replicas at least should be choosen.
For the most part the number of data nodes should be equal to the number of replicas. One reason for having more data nodes than replicas is that we need a larger database size than can be accomplished through one set of data nodes.
In a cluster with the same amount of data nodes as replicas the database size is limited by the memory size of the data node VM. The largest standard VM is the r5-metal which can house around 768 GByte of memory. This means that around 650 GBytes of memory is available for in-memory rows and indexes.
RonDB supports storing columns on disk, thus if disk columns are used this size can be even bigger. There are also even bigger VM sizes that we haven't tried out very much yet.
Thus size of the database could potentially affect the choice of the number of data nodes.
It is unlikely that the requirement on database processing requires more data nodes. If this is the case one should most likely first verify that the requirement is larger than what can be accomplished with one node group of data nodes.
The number of MySQL Servers should be choosen based on the amount of query processing is required towards RonDB. In Sysbench benchmarks we find that we need about twice as many CPUs on the MySQL Servers as we need on the data nodes. If we want high availability we should have at least 2 MySQL Servers.
We should not create VMs for MySQL Servers with more than 32 VCPUs at the moment.
API nodes can be used to house applications using the native RonDB APIs for C++, Java and JavaScript. An API node can also be used to execute benchmarks from, in this case it should be sufficient with 1 API node which contains at least 25% of the total number of VCPUs found in the MySQL Servers.
Storage size of MySQL Server and API nodes can normally use the minimum amount possible to set.
RonDB data nodes requires around 50% as much storage space as the memory size of the VM. In addition it requires around 128 GByte of storage space for REDO logs and UNDO logs. Some amount of storage space should also be allocated for any disk column storage. Thus a minimum of 256 GByte is required for a data node. A very large data node VM with 768 GByte of memory would require at least 2 TByte of storage space and potentially more if disk columns are heavily used.