Rondis#
All the setup for the REST API is also prepared to run Rondis tests using the same REST API binary.
Rondis results#
The Rondis benchmarks will be executed using a simple tool called
valkey-benchmark
with a minor modification to make it possible to use
the MGET
command as a built-in command similar to how MSET
is
supported.
Result from SET operation#
Benchmark | Throughput rows/sec | Median latency | 99% latency | Batch | Threads | CPU |
---|---|---|---|---|---|---|
SET | 8356/s | 0.12 ms | 0.17 ms | 1 | 1 | 15% |
SET | 13670/s | 0.14 ms | 0.20 ms | 1 | 2 | 35% |
SET | 24456/s | 0.16 ms | 0.33 ms | 1 | 4 | 68% |
SET | 47994/s | 0.16 ms | 0.27 ms | 1 | 8 | 127% |
SET | 87612/s | 0.17 ms | 0.28 ms | 1 | 16 | 196% |
SET | 104196/s | 0.24 ms | 0.39 ms | 1 | 32 | 235% |
SET | 108295/s | 0.40 ms | 0.57 ms | 1 | 64 | 256% |
In this benchmark we are limited by the RonDB data nodes, but also by
the fact that the SET
command only issues one write operation at a
time.
Result from MSET operation#
To make this benchmark we did some minor changes to valkey-benchmark
to make the batch size in MSET
test flexible. So as with REST API
server we will start to see the impact of increasing batch size.
We will report the numbers in number of rows per second changed rather than the number of commands finished as reported by the benchmark tool.
Benchmark | Throughput rows/sec | Median latency | 99% latency | Batch | Threads | CPU |
---|---|---|---|---|---|---|
MSET | 8313/s | 0.12 ms | 0.17 ms | 1 | 1 | 16% |
MSET | 14279/s | 0.13 ms | 0.20 ms | 2 | 1 | 20% |
MSET | 24808/s | 0.16 ms | 0.21 ms | 4 | 1 | 28% |
MSET | 45661/s | 0.17 ms | 0.22 ms | 8 | 1 | 34% |
MSET | 80882/s | 0.20 ms | 0.25 ms | 16 | 1 | 40% |
MSET | 139300/s | 0.23 ms | 0.30 ms | 32 | 1 | 45% |
MSET | 212371/s | 0.30 ms | 0.41 ms | 64 | 1 | 55% |
MSET | 307028/s | 0.41 ms | 0.55 ms | 128 | 1 | 61% |
MSET | 356050/s | 0.69 ms | 1.03 ms | 256 | 1 | 73% |
From these numbers we can conclude that it takes very little effort for
Rondis using MSET
with large batch sizes to effectively run enough
requests to keep the RonDB data node busy. Latency numbers are also very
good and very stable, there is fairly small difference between median
latency and the 99% latency.
So let’s see what mixing batching and threading gives in the Rondis case.
Benchmark | Throughput rows/sec | Median latency | 99% latency | Batch | Threads | CPU |
---|---|---|---|---|---|---|
MSET | 88574/s | 0.17 ms | 0.36 ms | 4 | 4 | 90% |
MSET | 260036/s | 0.23 ms | 0.49 ms | 16 | 4 | 112% |
MSET | 571327/s | 0.42 ms | 0.77 ms | 64 | 4 | 144% |
MSET | 782205/s | 0.64 ms | 0.97 ms | 64 | 8 | 205% |
MSET | 837148/s | 1.20 ms | 1.57 ms | 128 | 8 | 191% |
MSET | 927760/s | 0.23 ms | 0.30 ms | 128 | 16 | 247% |
MSET | 883369/s | 0.30 ms | 0.41 ms | 256 | 16 | 285% |
MSET | 970638/s | 2.44 ms | 3.61 ms | 100 | 24 | 285% |
MSET | 1005025/s | 2.82 ms | 4.22 ms | 90 | 32 | 338% |
So even with write operations it is possible to get much better throughput and improved latency using batching, even pushing it past 1M key writes per second with a bit higher latency. As we previously batching can improve throughput at similar latency by a factor of 10. This factor was interestingly the same 25 years ago. So over time with so many HW developments the benefit of asynchronous programming over synchronous programming stays at a factor of 10x.
Result from GET operation#
We will look at the GET operation that is limited in performance since it can only handle one operation at a time. Also as can be seen from numbers the performance increases stepwise as number of threads increase.
Benchmark | Throughput rows/sec | Median latency | 99% latency | Batch | Threads | CPU |
---|---|---|---|---|---|---|
GET | 32957/s | 0.03 ms | 0.05 ms | 1 | 1 | 40% |
GET | 50691/s | 0.04 ms | 0.05 ms | 1 | 2 | 79% |
GET | 83787/s | 0.05 ms | 0.10 ms | 1 | 4 | 151% |
GET | 119653/s | 0.06 ms | 0.10 ms | 1 | 8 | 247% |
GET | 117589/s | 0.10 ms | 0.16 ms | 1 | 16 | 270% |
GET | 162588/s | 0.12 ms | 0.21 ms | 1 | 32 | 356% |
GET | 153941/s | 0.21 ms | 0.37 ms | 1 | 64 | 340% |
Result from MGET operation#
Now we turn to MGET
operations. As usual we start seeing the impact of
increasing batch size.
Benchmark | Throughput rows/sec | Median latency | 99% latency | Batch | Threads | CPU |
---|---|---|---|---|---|---|
MGET | 34249/s | 0.03 ms | 0.04 ms | 1 | 1 | 40% |
MGET | 54607/s | 0.04 ms | 0.05 ms | 2 | 1 | 52% |
MGET | 121902/s | 0.03 ms | 0.05 ms | 4 | 1 | 45% |
MGET | 180438/s | 0.05 ms | 0.06 ms | 8 | 1 | 58% |
MGET | 362154/s | 0.05 ms | 0.06 ms | 16 | 1 | 54% |
MGET | 557055/s | 0,05 ms | 0.08 ms | 32 | 1 | 61% |
MGET | 728307/s | 0.09 ms | 0.12 ms | 64 | 1 | 67% |
MGET | 827194/s | 0.14 ms | 0.20 ms | 128 | 1 | 79% |
MGET | 870185/s | 0.27 ms | 0.37 ms | 256 | 1 | 88% |
We can see that batching is significantly more efficient compared to threading.
Next we look at the combination of batching and threading.
Benchmark | Throughput rows/sec | Median latency | 99% latency | Batch | Threads | CPU |
---|---|---|---|---|---|---|
MGET | 1703713/s | 0.14 ms | 0.29 ms | 64 | 4 | 270% |
MGET | 3124480/s | 0.14 ms | 0.27 ms | 64 | 8 | 485% |
MGET | 4524375/s | 0.30 ms | 0.51 ms | 100 | 16 | 700% |
MGET | 4825091/s | 0.41 ms | 0.65 ms | 100 | 24 | 745% |
MGET | 4880429/s | 0.51 ms | 0.81 ms | 100 | 32 | 780% |
MGET | 5086117/s | 0.56 ms | 0.88 ms | 110 | 32 | 815% |
As usual we see a clear increase when threading and batching is combined. Obviously the key lookups are very small here, so this is a very synthetic benchmark, but 5M lookups per second on a single machine is still fairly impressive.