Automatic Memory Configuration in RonDB#
Automatic memory configuration provides the possibility to scale the memory usage in RonDB up and down. This makes it possible to easily manage RonDB in a cloud setting. When more memory is required by the applications, it allows one to simply move the RonDB data nodes to VMs that have more memory available.
RonDB is based on NDB, and NDB was developed on top of a model where all modules handled their own memory. This memory mostly consisted of arrays of structs. The internal memory references used indexes in those arrays.
Thus to map to a real memory address we needed the start of the array and the index. This was translated using macros that also efficiently checked that all accesses were within the array, thus validating every access of each data structure in NDB.
This model was very CPU efficient, leading to a very good performance of NDB Cluster. However it also meant that hundreds of arrays had to be configured and the memory became inflexible. This lead to overcommitting many memory resources.
This started to change around 2012 and with RonDB the process has completely changed. Currently there is no longer any memory that requires configuration and any arrays not handled by these dynamic memory structures and still using fixed size arrays, have hard-coded sizes that are known to work for the use cases of RonDB.
All major memory regions are now handled by a global memory manager. The architecture still uses an index in an array to access our data structures. But now the arrays are implemented using dynamic arrays that allow holes in the arrays, and memory freed from one array can soon be reused by another array.
This increase in the memory management's flexibility also means that less memory requires to be allocated in general.
RonDB Memory Regions#
The global memory manager consists of 13 regions as shown in the figure below.
Each region has a reserved space, a maximum space and a priority. In some cases, a region will interpret the maximum space as a threshold upon which its priority is lowered. In the following, we will briefly describe the role of each region.
This is the most important memory where the actual in-memory rows are stored. It also stores the hash index and the ordered indexes.
The size of the DataMemory is calculated at startup and this memory is for the most part not touched by other areas. RonDB works best if DataMemory is homogenous in all nodes. If one node has a smaller DataMemory than others, it means that the other nodes will not be able to use all of their DataMemory since data is replicated and requires all replicas to store the data. It is however ok to temporarily increase the DataMemory while increasing the storage in the cluster.
Even decreasing size of DataMemory could work, but a node will not be able to startup if it doesn't have enough memory to store the data recovered from disk. If this happens the node restart fails and the node need to be restarted from the original size again.
DataMemory is fixed in size to ensure that we can always handle recovery. Since recovery can sometimes grow the data size a bit, we don't allow the DataMemory to be filled beyond 95% in normal operation. In recovery it can use the full DataMemory size. Those extra 5% memory resources are also reserved for critical operations such as growing the cluster with more nodes and reorganising the data inside RonDB.
This is the page cache used by disk columns in RonDB. Its size is calculated at startup and normally neither grows nor shrinks. Operations towards the disk are queued by using Disk Operation Records.
This memory contains all active transactions and their operations. It is used for all sorts of operations such as transaction records, scan records, key operation records and many more records used to handle the queries issued towards RonDB.
This memory is highly flexible and can grow and shrink quickly. Its minimum size is calculated at startup, but it can grow beyond this if required.
This memory is used to store information about tables, columns, indexes, foreign keys, internal triggers. In RonDB 21.04 it is ensured to handle a very large number of tables. In RonDB 22.10 the memory is flexible and can grow, using the global pool of free memory as required.
This memory holds information used by events in RonDB. Events are changes of rows in RonDB that API nodes can subscribe to. In particular this is used to replicate to other RonDB clusters. It is also used to integrate with OpenSearch in Hopsworks.
This memory contains change events that are in the process of being sent to the API nodes subscribed to it.
RonDB can handle large parallel join queries. This memory is used to store data used in this query execution. It can grow and shrink very fast and does not use any memory if no join queries are performed.
When sending to another node in the cluster we place the data into a send buffer before sending it. This memory has a minimum size calculated at startup, but can grow beyond this as needed.
This is memory used for internal communication channels. Its size is calculated at startup, but there is some flexibility in using it.
Shared Global Memory#
This memory is the amount of memory free to use by other regions. It is a pool of memory available for all the other regions to borrow from when they need more memory than currently available in their region.
The size of this memory is calculated at startup.
Redo and Undo Log Buffers#
These are regions that are fixed in size and allocate memory at startup. There is some functionality to handle overload on those buffers by queueing operations when those buffers are full.
The Undo log is only used for operations on disk pages.
There is memory used by Backup Schema Memory, Disk Operation Records and Schema Transaction Memory as well.
In contrast to Schema Memory, Schema Transaction Memory is memory used while creating/dropping/altering tables or indexes and is usually set to 2 MB; Schema Memory can be Gigabytes.
Memory Region Classifications#
A memory region is classified by one or more of the following qualities:
regions that are fixed in size
regions that are critical and cannot handle failure to allocate memory
regions have no natural upper limit and are unlimited in size
regions that are flexible in size and that can work together to achieve the best use of memory
We can furthermore divide regions based on whether the memory is allocated short term or long term.
These qualities are generally important, as they help assigning the memory regions a priority. The priority can however also be affected by the amount of memory that the region has allocated. In the following, we will describe the qualities in more detail.
These have a fixed size, this is used for database objects, the Redo log Buffer, the Undo log buffer, the DataMemory and the DiskPageBufferMemory (the page cache for disk pages). There is code to ensure that we queue up when those resources are no longer available.
Fixed regions: Redo Log Buffer, Undo Log Buffer, Data Memory, Disk Page Buffer Memory
These are regions where a request to allocate memory would cause a crash. This relates to the job buffer which is used for internal messages inside a node, it also relates to send buffers which are used for messages to other nodes. DataMemory is a critical region during recovery - if we fail to allocate memory for database objects during recovery we would not be able to recover the database. Thus DataMemory is a critical region in the startup phase, but not during normal operation. Disk Operation Records are also a critical resource since otherwise we cannot maintain the disk data columns. Finally we also treat memory used by backups as critical since not being able to perform a backup would make it very hard to manage RonDB.
Critical regions: Job Buffer, Send Buffer, Data Memory, Disk Operation Records, Backup Schema Memory
These are regions that can grow indefinitely but that have to set limits on their own growth to ensure that other flexible regions are also allowed to grow. Thus one flexible resource isn't allowed to utilise all the shared memory resources. There are limits to how much memory a resource can occupy before its priority is significantly lowered.
Flexible regions: Transaction Memory, Replication Memory, Schema Memory, Query Memory, Schema Transaction Memory, Send Buffers, Backup Memory, Disk Operation Records
These have no natural upper limit, thus as long as memory is available at the right priority level, the memory region can continue to grow.
Unlimited regions: Backup Memory, Query Memory and Schema Transaction Memory
Short term versus long term#
Finally we have short term versus long term memory regions. A short term memory region allocation is of smaller significance compared to a long term memory region. In particular this relates to Schema Memory. Schema Memory contains metadata about tables, indexes, columns, triggers, foreign keys and so forth. Once allocated, this memory will stay for a very long time. Thus if we allow it to grow too much into the shared memory, we will not have space to handle large transactions that require Transaction Memory.
Long term regions: Schema Memory
Memory Region Prioritisations#
The memory region prioritisations handle to what extent the regions have access to the shared global memory. Most regions have access to it - however, they only do so once their reserved memory is used up.
4% of the shared global memory is only accessible to the highest priority regions plus half of the reserved space for job buffers and communication buffers.
10% of the shared global memory is only available to high priority requesters. The remainder of the shared global memory is accessible to all memory regions that are allowed to allocate from the shared global memory.
The actual limits might change over time as we learn more about how to adapt the memory allocations.
In the following we will discuss how the flexible memory regions share their access to the shared global memory.
The TransactionMemory region has a reserved space, but it can grow up to 50% of the shared global memory beyond that. However, it only has access to the lowest priority region of the shared global memory.
Failure to allocate memory in this region leads to aborted transactions.
This region contains a lot of meta data objects representing tables, fragments, fragment replicas, columns, and triggers. These are long-term objects that will be there long-term. Thus we want this region to be flexible in size, but we don't want it grow such that it diminishes the possibility to execute queries towards region. Thus we calculate a reserved part and allow this part to grow into at most 20% of the shared memory region in addition to its reserved region. This region cannot access the higher priority memory regions of the shared global memory.
Failure to allocate SchemaMemory causes meta data operations to be aborted.
These are memory structures used to represent replication towards other clusters supporting Global Replication. It can also be used to replicate changes from RonDB to other systems such as ElasticSearch. The memory in this region is of temporary nature with memory buffers used to store the changes that are being replicated. The meta data of the replication is stored in the SchemaMemory region.
This region has a reserved space, but it can also grow to use up to 30% of the shared global memory. After that it will only have access to the lower priority regions of the shared global memory.
Failure to allocate memory in this region lead to failed replication. Thus replication have to be set up again. This is a fairly critical error, but it is something that can be handled.
This memory has no reserved space, but it can use the shared global lower priority regions. This memory is used to handle complex SQL queries.
Failure to allocate memory in this region will lead to complex queries being aborted.