Redis is an open-source data structure server.
This benchmark has five workloads. The Phoronix article only compares the set and get workloads and that is what I have in the workloads page. However, in the examples below I go through all five workloads.
Here is a comparison between my Haswell i7-4770S and Ryzen 7 1700 box in overall performance:
Intel: 1. LPOP - 2708162.08 requests per second 2. SADD - 2028503.79 requests per second 3. LPUSH - 1605235.71 requests per second 4. GET - 2523369.17 requests per second 5. SET - 1768876.54 requests per second AMD: 1. LPOP - 2002810.63 requests per second 2. SADD - 1632486.67 requests per second 3. LPUSH - 1162483.04 requests per second 4. GET - 1897628.58 requests per second 5. SET - 1367618.97 requests per second
There is a redis benchmark page that describes it as a single-threaded in-memory database.
There are several things to note up front on the structure and functionality of redis:
Redis is *not* cpu bound
While more traditional metrics such as IPC, “topdown analysis” make sense, for a CPU-bound application this makes less sense for redis where instead times for context switches or overall latency plays a far larger role. The overall On_CPU metric is around 1% and a plot of CPU time scheduled for an entire test run shows very little time.
About this graph
Redis runs a fixed number of backend servers and then fires front end requests
Following is process tree for GET
[27384] sh cpu=3 ipc=1.74 on_cpu=0.01 on_core=0.07 elapsed=10.44 user=0.00 system=0.00 [27385] redis cpu=1 ipc=1.74 on_cpu=0.01 on_core=0.07 elapsed=10.44 user=0.00 system=0.00 (27386) redis-server cpu=5 ipc=1.99 on_cpu=0.01 on_core=0.04 elapsed=10.60 user=0.37 system=0.09 (27388) redis-server cpu=7 ipc=0.27 (27389) redis-server cpu=2 ipc=0.31 (27390) redis-server cpu=0 ipc=0.23 (27387) sleep cpu=6 ipc=0.75 (27391) redis-benchmark cpu=5 ipc=1.32 on_cpu=0.08 on_core=0.61 (27392) sed cpu=6 ipc=0.62
The process tree for GET is isomorphic
[27431] sh cpu=2 ipc=1.87 on_cpu=0.01 on_core=0.08 [27432] redis cpu=3 ipc=1.87 on_cpu=0.01 on_core=0.08 (27433) redis-server cpu=5 ipc=2.13 on_cpu=0.01 on_core=0.06 (27435) redis-server cpu=4 ipc=0.24 (27436) redis-server cpu=7 ipc=0.28 (27434) sleep cpu=0 ipc=0.77 (27438) redis-benchmark cpu=0 ipc=1.29 on_cpu=0.06 on_core=0.49 (27439) sed cpu=7 ipc=0.64Process Tree - phoronix/redis
Process Tree
These process trees show a subtle problem with more automated measurements including with a wrapper script. Following is the execution for redis.
#!/bin/sh cd ~/redis-4.0.8/ ./src/redis-server & REDIS_SERVER_PID=$! sleep 10 ./src/redis-benchmark $@ > $LOG_FILE kill $REDIS_SERVER_PID sed "s/\"/ /g" -i $LOG_FILE
The script starts two server processes, sleeps for 10 seconds and then runs the benchmark program to test the server. When the program shuts down, it fires off a kill signal and creates the results. However, it doesn’t wait for that asynchronous server to stop. Hence, a simple wrapper might finish before the entire workload finishes. One can see this running this script under perf. In particular, the output from perf comes *before* the output from the databases shutting down. This is also why my traces above didn’t have closing metrics for these server processes.
mev@popayan:~/.phoronix-test-suite/installed-tests/pts/redis-1.1.0$ perf stat ./redis.mevrun0.sh -n 1000000 -P 32 -q -c 50 --csv SET 30538:C 22 Apr 10:00:26.409 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo 30538:C 22 Apr 10:00:26.409 # Redis version=4.0.8, bits=64, commit=00000000, modified=0, pid=30538, just started 30538:C 22 Apr 10:00:26.409 # Warning: no config file specified, using the default config. In order to specify a config file use ./src/redis-server /path/to/redis.conf 30538:M 22 Apr 10:00:26.409 # You requested maxclients of 10000 requiring at least 10032 max file descriptors. 30538:M 22 Apr 10:00:26.410 # Server can't set maximum open files to 10032 because of OS error: Operation not permitted. 30538:M 22 Apr 10:00:26.410 # Current maximum open files is 4096. maxclients has been reduced to 4064 to compensate for low ulimit. If you need higher maxclients increase 'ulimit -n'. _._ _.-``__ ''-._ _.-`` `. `_. ''-._ Redis 4.0.8 (00000000/0) 64 bit .-`` .-```. ```\/ _.,_ ''-._ ( ' , .-` | `, ) Running in standalone mode |`-._`-...-` __...-.``-._|'` _.-'| Port: 6379 | `-._ `._ / _.-' | PID: 30538 `-._ `-._ `-./ _.-' _.-' |`-._`-._ `-.__.-' _.-'_.-'| | `-._`-._ _.-'_.-' | http://redis.io `-._ `-._`-.__.-'_.-' _.-' |`-._`-._ `-.__.-' _.-'_.-'| | `-._`-._ _.-'_.-' | `-._ `-._`-.__.-'_.-' _.-' `-._ `-.__.-' _.-' `-._ _.-' `-.__.-' 30538:M 22 Apr 10:00:26.410 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. 30538:M 22 Apr 10:00:26.410 # Server initialized 30538:M 22 Apr 10:00:26.410 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect. 30538:M 22 Apr 10:00:26.410 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled. 30538:M 22 Apr 10:00:26.419 * DB loaded from disk: 0.009 seconds 30538:M 22 Apr 10:00:26.419 * Ready to accept connections ./redis.mevrun0.sh: 8: ./redis.mevrun0.sh: cannot create : Directory nonexistent 30538:signal-handler (1524409236) Received SIGTERM scheduling shutdown... sed: no input files Performance counter stats for './redis.mevrun0.sh -n 1000000 -P 32 -q -c 50 --csv SET': 21.881188 task-clock (msec) # 0.002 CPUs utilized 108 context-switches # 0.005 M/sec 1 cpu-migrations # 0.046 K/sec 4,271 page-faults # 0.195 M/sec 39,246,703 cycles # 1.794 GHz 33,157,147 instructions # 0.84 insn per cycle 6,193,791 branches # 283.065 M/sec 151,729 branch-misses # 2.45% of all branches 10.004613235 seconds time elapsed mev@popayan:~/.phoronix-test-suite/installed-tests/pts/redis-1.1.0$ 30538:M 22 Apr 10:00:36.441 # User requested shutdown... 30538:M 22 Apr 10:00:36.441 * Saving the final RDB snapshot before exiting. 30538:M 22 Apr 10:00:36.508 * DB saved on disk 30538:M 22 Apr 10:00:36.508 # Redis is now ready to exit, bye bye...
I was able to get around this issue by running some of my tests by hand with perf using a modified version of the script.
#!/bin/sh cd redis-4.0.8/ ./src/redis-server & REDIS_SERVER_PID=$! sleep 10 ./src/redis-benchmark $@ > $LOG_FILE kill $REDIS_SERVER_PID sed "s/\"/ /g" -i $LOG_FILE sleep 10
Running this results in a higher total number of instructions and also a higher IPC when the redis server gets taken into account instead of just the benchmark driver process. However, it is also useful to note that this “20.007107276” elapsed time now includes 20 seconds of “sleep”, so the amount of processor time actually spent in the benchmark might be extremely small (looking through the test profile which invokes the benchmark as “-n 1000000 -P 32 -q -c 50 –csv”, only 1,000,000 requests are sent). This extremely short runtime is also why some of my metrics in the process tree above look suspect.
Performance counter stats for './redis.mevrun.sh -n 1000000 -P 32 -q -c 50 --csv SET': 47.668260 task-clock (msec) # 0.002 CPUs utilized 114 context-switches # 0.002 M/sec 2 cpu-migrations # 0.042 K/sec 4,383 page-faults # 0.092 M/sec 95,880,320 cycles # 2.011 GHz 206,132,871 instructions # 2.15 insn per cycle 37,482,101 branches # 786.311 M/sec 234,095 branch-misses # 0.62% of all branches 20.006002954 seconds time elapsed
Next Steps:
Figure out how interesting these workloads are given that a small fraction of a second (<0.1) is spent actually doing the work that contributes to the benchmark or whether one needs to extend this further. If so, analyze these workloads from perspective of factors influencing latency, i.e. not the IPC, topdown type analysis used for cpu-bound processes.