Rondis#

All the setup for the REST API is also prepared to run Rondis tests using the same REST API binary.

Rondis results#

The Rondis benchmarks will be executed using a simple tool called valkey-benchmark with a minor modification to make it possible to use the MGET command as a built-in command similar to how MSET is supported.

Result from SET operation#

Benchmark	Throughput rows/sec	Median latency	99% latency	Batch	Threads	CPU
SET	8356/s	0.12 ms	0.17 ms	1	1	15%
SET	13670/s	0.14 ms	0.20 ms	1	2	35%
SET	24456/s	0.16 ms	0.33 ms	1	4	68%
SET	47994/s	0.16 ms	0.27 ms	1	8	127%
SET	87612/s	0.17 ms	0.28 ms	1	16	196%
SET	104196/s	0.24 ms	0.39 ms	1	32	235%
SET	108295/s	0.40 ms	0.57 ms	1	64	256%

In this benchmark we are limited by the RonDB data nodes, but also by the fact that the SET command only issues one write operation at a time.

Result from MSET operation#

To make this benchmark we did some minor changes to valkey-benchmark to make the batch size in MSET test flexible. So as with REST API server we will start to see the impact of increasing batch size.

We will report the numbers in number of rows per second changed rather than the number of commands finished as reported by the benchmark tool.

Benchmark	Throughput rows/sec	Median latency	99% latency	Batch	Threads	CPU
MSET	8313/s	0.12 ms	0.17 ms	1	1	16%
MSET	14279/s	0.13 ms	0.20 ms	2	1	20%
MSET	24808/s	0.16 ms	0.21 ms	4	1	28%
MSET	45661/s	0.17 ms	0.22 ms	8	1	34%
MSET	80882/s	0.20 ms	0.25 ms	16	1	40%
MSET	139300/s	0.23 ms	0.30 ms	32	1	45%
MSET	212371/s	0.30 ms	0.41 ms	64	1	55%
MSET	307028/s	0.41 ms	0.55 ms	128	1	61%
MSET	356050/s	0.69 ms	1.03 ms	256	1	73%

From these numbers we can conclude that it takes very little effort for Rondis using MSET with large batch sizes to effectively run enough requests to keep the RonDB data node busy. Latency numbers are also very good and very stable, there is fairly small difference between median latency and the 99% latency.

So let’s see what mixing batching and threading gives in the Rondis case.

Benchmark	Throughput rows/sec	Median latency	99% latency	Batch	Threads	CPU
MSET	88574/s	0.17 ms	0.36 ms	4	4	90%
MSET	260036/s	0.23 ms	0.49 ms	16	4	112%
MSET	571327/s	0.42 ms	0.77 ms	64	4	144%
MSET	782205/s	0.64 ms	0.97 ms	64	8	205%
MSET	837148/s	1.20 ms	1.57 ms	128	8	191%
MSET	927760/s	0.23 ms	0.30 ms	128	16	247%
MSET	883369/s	0.30 ms	0.41 ms	256	16	285%
MSET	970638/s	2.44 ms	3.61 ms	100	24	285%
MSET	1005025/s	2.82 ms	4.22 ms	90	32	338%

So even with write operations it is possible to get much better throughput and improved latency using batching, even pushing it past 1M key writes per second with a bit higher latency. As we previously batching can improve throughput at similar latency by a factor of 10. This factor was interestingly the same 25 years ago. So over time with so many HW developments the benefit of asynchronous programming over synchronous programming stays at a factor of 10x.

Result from GET operation#

We will look at the GET operation that is limited in performance since it can only handle one operation at a time. Also as can be seen from numbers the performance increases stepwise as number of threads increase.

Benchmark	Throughput rows/sec	Median latency	99% latency	Batch	Threads	CPU
GET	32957/s	0.03 ms	0.05 ms	1	1	40%
GET	50691/s	0.04 ms	0.05 ms	1	2	79%
GET	83787/s	0.05 ms	0.10 ms	1	4	151%
GET	119653/s	0.06 ms	0.10 ms	1	8	247%
GET	117589/s	0.10 ms	0.16 ms	1	16	270%
GET	162588/s	0.12 ms	0.21 ms	1	32	356%
GET	153941/s	0.21 ms	0.37 ms	1	64	340%

Result from MGET operation#

Now we turn to MGET operations. As usual we start seeing the impact of increasing batch size.

Benchmark	Throughput rows/sec	Median latency	99% latency	Batch	Threads	CPU
MGET	34249/s	0.03 ms	0.04 ms	1	1	40%
MGET	54607/s	0.04 ms	0.05 ms	2	1	52%
MGET	121902/s	0.03 ms	0.05 ms	4	1	45%
MGET	180438/s	0.05 ms	0.06 ms	8	1	58%
MGET	362154/s	0.05 ms	0.06 ms	16	1	54%
MGET	557055/s	0,05 ms	0.08 ms	32	1	61%
MGET	728307/s	0.09 ms	0.12 ms	64	1	67%
MGET	827194/s	0.14 ms	0.20 ms	128	1	79%
MGET	870185/s	0.27 ms	0.37 ms	256	1	88%

We can see that batching is significantly more efficient compared to threading.

Next we look at the combination of batching and threading.

Benchmark	Throughput rows/sec	Median latency	99% latency	Batch	Threads	CPU
MGET	1703713/s	0.14 ms	0.29 ms	64	4	270%
MGET	3124480/s	0.14 ms	0.27 ms	64	8	485%
MGET	4524375/s	0.30 ms	0.51 ms	100	16	700%
MGET	4825091/s	0.41 ms	0.65 ms	100	24	745%
MGET	4880429/s	0.51 ms	0.81 ms	100	32	780%
MGET	5086117/s	0.56 ms	0.88 ms	110	32	815%

As usual we see a clear increase when threading and batching is combined. Obviously the key lookups are very small here, so this is a very synthetic benchmark, but 5M lookups per second on a single machine is still fairly impressive.