Phoronix posted an article comparing POWER vs x86 on CPU benchmarks. This post looks at some of the workloads and adds comments.
I don’t have a POWER9 system and haven’t run these on a server. However, I’ve run the workloads on AMD (Ryzen) and Intel (Haswell) clients and made some observations. Details in the table below.
Also a few overall comments:
- x264 likely has tuned x86 assembly
- compress-zstd is single-threaded on x86 and much faster on POWER9, is there something else going on
- single-threaded benchmarks aren’t differentiated as much in the article, e.g. price/performance of phpbench a bit strange
|Benchmark||Phoronix observations||My observations||Analysis|
|parboil: LBM||EPYX 7401P and 7601 slightly faster than 18-core power.||LBM: OpenMP program with 95% On_CPU. IPC of 1.15 with more frontend stalls than backend.||Analysis|
|x264||POWER9 system quite a bit slower than corresponding x86 systems.||On_CPU 71%, many voluntary context switches and I/O read input. IPC 1.3||Analysis|
|compress-p7zip||POWER9 system slightly faster than EPYC 7601 and 7401||On_CPU 88% with some I/O to limit scaling. IPC 0.83 with 27% speculation misses (branch prediction).||Analysis|
|stockfish||POWER9 system faster than EPYC 7401P and slower than EPYC 7601||On_CPU 100%, IPC of 1.0 with frontend stalls and bad speculation.||Analysis|
|build-llvm||POWER9 system faster than EPYC 7401P and slower than EPYC 7601||On_CPU 99% much higher than other build-* benchmarks. High amount of frontend stalls. Some speculative misses due to branch prediction. Overall IPC on low side of 0.64.|
Technically speaking different workloads since one builds an x86 executable and the other a POWER executable.
|primesieve||POWER9 system slower than EPYX 7401P and 7601||On_CPU almost 100%. Backend stalls are largest issue but overall IPC is 0.69.||Analysis|
|compress-zstd||POWER9 system fastest overall, EPYC slowest, Intel in between||Single-threaded benchmark (compress-zstd seems to use multiple threads), with a high level of backend stalls.||Analysis|
|encode-flac||POWER9 system slowest overall, much slower than EPYC||Single threaded with On_Core 90%. Some I/O more writes than reads. IPC 2.43 with 27% backend stalls. L2 miss ratio 20% and L3 6%.||Analysis|
|encode-mp3||POWER9 system slowest overall much slower than EPYC||Single threaded with On_core of 100%. IPC of 1.90. 35% backend stalls with L2 miss ratio 47% and L3 miss ratio 6%.||Analysis|
|openssl||POWER9 system slightly faster than EPYX 7401P and slower than EPYC 7601||On_CPU 100% with IPC 1.66 (Intel) vs 1.12 (AMD) appears related to hand-coded assembly with MULX instructions. High retirement rate of 90%.||Analysis|
|pgbench||POWER9 system faster than EPYC on read-only and read-write, closer on the latter||PostgreSQL database and frontend driver with multiple options. Read keeps cores busier than read/write but light usage overage - at least as much latency issues. Frontend stalls are the largest issues, e.g. icache.||Analysis|
|pybench||POWER9 system slightly slower on pybench||Single threaded micro-benchmarks of different python operations. IPC over 2.5 with frontend stalls the largest issue.||Analysis|
|phpbench||POWER9 system slightly faster than EPYC 7601.||Single threaded, micro-benchmarks of php operations. IPC of 2.76 with ~15% of frontend stalls and ~15% of backend stalls.||Analysis|
|scikit-learn||POWER9 system slower than EPYC systems by reasonable amount.||Single threaded, python code. High IPC with some backend memory stalls.||Analysis|
|tinymembench||POWER9 system slightly slower than EPYC||Backend bound, IPC 0.4, testing memory performance and 90% backend stalls.||Analysis|
|blender: classroom, pabellion barcelona||POWER9 system somewhat slower than EPYC||On_CPU close to 100% with ~20% stalls in backend and ~20% stalls in front end and bad speculation of ~10%. An overall IPC of 0.80||Analysis|