Skip to content

Rondis#

All the setup for the REST API is also prepared to run Rondis tests using the same REST API binary.

Rondis results#

The Rondis benchmarks will be executed using a simple tool called valkey-benchmark with a minor modification to make it possible to use the MGET command as a built-in command similar to how MSET is supported.

Result from SET operation#

Benchmark Throughput rows/sec Median latency 99% latency Batch Threads CPU
SET 8356/s 0.12 ms 0.17 ms 1 1 15%
SET 13670/s 0.14 ms 0.20 ms 1 2 35%
SET 24456/s 0.16 ms 0.33 ms 1 4 68%
SET 47994/s 0.16 ms 0.27 ms 1 8 127%
SET 87612/s 0.17 ms 0.28 ms 1 16 196%
SET 104196/s 0.24 ms 0.39 ms 1 32 235%
SET 108295/s 0.40 ms 0.57 ms 1 64 256%

In this benchmark we are limited by the RonDB data nodes, but also by the fact that the SET command only issues one write operation at a time.

Result from MSET operation#

To make this benchmark we did some minor changes to valkey-benchmark to make the batch size in MSET test flexible. So as with REST API server we will start to see the impact of increasing batch size.

We will report the numbers in number of rows per second changed rather than the number of commands finished as reported by the benchmark tool.

Benchmark Throughput rows/sec Median latency 99% latency Batch Threads CPU
MSET 8313/s 0.12 ms 0.17 ms 1 1 16%
MSET 14279/s 0.13 ms 0.20 ms 2 1 20%
MSET 24808/s 0.16 ms 0.21 ms 4 1 28%
MSET 45661/s 0.17 ms 0.22 ms 8 1 34%
MSET 80882/s 0.20 ms 0.25 ms 16 1 40%
MSET 139300/s 0.23 ms 0.30 ms 32 1 45%
MSET 212371/s 0.30 ms 0.41 ms 64 1 55%
MSET 307028/s 0.41 ms 0.55 ms 128 1 61%
MSET 356050/s 0.69 ms 1.03 ms 256 1 73%

From these numbers we can conclude that it takes very little effort for Rondis using MSET with large batch sizes to effectively run enough requests to keep the RonDB data node busy. Latency numbers are also very good and very stable, there is fairly small difference between median latency and the 99% latency.

So let’s see what mixing batching and threading gives in the Rondis case.

Benchmark Throughput rows/sec Median latency 99% latency Batch Threads CPU
MSET 88574/s 0.17 ms 0.36 ms 4 4 90%
MSET 260036/s 0.23 ms 0.49 ms 16 4 112%
MSET 571327/s 0.42 ms 0.77 ms 64 4 144%
MSET 782205/s 0.64 ms 0.97 ms 64 8 205%
MSET 837148/s 1.20 ms 1.57 ms 128 8 191%
MSET 927760/s 0.23 ms 0.30 ms 128 16 247%
MSET 883369/s 0.30 ms 0.41 ms 256 16 285%
MSET 970638/s 2.44 ms 3.61 ms 100 24 285%
MSET 1005025/s 2.82 ms 4.22 ms 90 32 338%

So even with write operations it is possible to get much better throughput and improved latency using batching, even pushing it past 1M key writes per second with a bit higher latency. As we previously batching can improve throughput at similar latency by a factor of 10. This factor was interestingly the same 25 years ago. So over time with so many HW developments the benefit of asynchronous programming over synchronous programming stays at a factor of 10x.

Result from GET operation#

We will look at the GET operation that is limited in performance since it can only handle one operation at a time. Also as can be seen from numbers the performance increases stepwise as number of threads increase.

Benchmark Throughput rows/sec Median latency 99% latency Batch Threads CPU
GET 32957/s 0.03 ms 0.05 ms 1 1 40%
GET 50691/s 0.04 ms 0.05 ms 1 2 79%
GET 83787/s 0.05 ms 0.10 ms 1 4 151%
GET 119653/s 0.06 ms 0.10 ms 1 8 247%
GET 117589/s 0.10 ms 0.16 ms 1 16 270%
GET 162588/s 0.12 ms 0.21 ms 1 32 356%
GET 153941/s 0.21 ms 0.37 ms 1 64 340%

Result from MGET operation#

Now we turn to MGET operations. As usual we start seeing the impact of increasing batch size.

Benchmark Throughput rows/sec Median latency 99% latency Batch Threads CPU
MGET 34249/s 0.03 ms 0.04 ms 1 1 40%
MGET 54607/s 0.04 ms 0.05 ms 2 1 52%
MGET 121902/s 0.03 ms 0.05 ms 4 1 45%
MGET 180438/s 0.05 ms 0.06 ms 8 1 58%
MGET 362154/s 0.05 ms 0.06 ms 16 1 54%
MGET 557055/s 0,05 ms 0.08 ms 32 1 61%
MGET 728307/s 0.09 ms 0.12 ms 64 1 67%
MGET 827194/s 0.14 ms 0.20 ms 128 1 79%
MGET 870185/s 0.27 ms 0.37 ms 256 1 88%

We can see that batching is significantly more efficient compared to threading.

Next we look at the combination of batching and threading.

Benchmark Throughput rows/sec Median latency 99% latency Batch Threads CPU
MGET 1703713/s 0.14 ms 0.29 ms 64 4 270%
MGET 3124480/s 0.14 ms 0.27 ms 64 8 485%
MGET 4524375/s 0.30 ms 0.51 ms 100 16 700%
MGET 4825091/s 0.41 ms 0.65 ms 100 24 745%
MGET 4880429/s 0.51 ms 0.81 ms 100 32 780%
MGET 5086117/s 0.56 ms 0.88 ms 110 32 815%

As usual we see a clear increase when threading and batching is combined. Obviously the key lookups are very small here, so this is a very synthetic benchmark, but 5M lookups per second on a single machine is still fairly impressive.