Phoronix article – compiler benchmarks (2018-05-10)
Phoronix posted an article benchmarking GCC 8.1 today. These benchmarks compare GCC 8.1 vs GCC 7.3 on a variety of Ubuntu 18.04 systems.
While I haven’t reproduced the exact comparisons on Ubuntu, in this article I make some notes about the workloads to help characterize what is being measured. Also useful to make some notes of what benchmarks seem to be more sensitive to GCC compiler optimizations.
One of the comments made by a GCC engineer on changes includes:
While GCC 8 supports now skylake-512, it will auto-vectorize to avx256 because avx512 downclocks the CPU.
There are a lot of differences in auto-vectorization between GCC 7 and GCC 8 because cost-model has been reworked. This affects all core CPUs and Zen. There is also new generic tuning that should work better on modern architectures and while working on that core tuning was revisited.
Also an observation by one of the posters:
I know these benchmarks are not intended for comparing different CPUs against each other, but rather compiler versions on the same CPU, but I am still curious: why are the EPYC results so low for most of the benchmarks? What caused that system to perform so much worse in many benchmarks compared to the others?
In the table below, I’ll note observations made by Phoronix as well as my characterization of the benchmarks. Some general observations is that Skylake seems to have the largest number of gains on benchmarks and at least half of the benchmarks below are single-threaded resulting in machine comparisons that don’t show off machine with many cores.
|Benchmark||Phoronix observations||My observations||Analysis|
|mpcbench||Skylake shows improvement with gcc 8.1, and no change for AMD or other Intel platforms.||Unable to download mpcbench; test refers to mpfr-4.0.0.tar.bz2 and web site has updated to mpfr-4.0.1.tar.bz2|
|hmmer||Skylake faster but other system just slightly slower with gcc 8.1||Multithreaded with On_CPU of ~90% and many voluntary context switches; may not scale with cores due to I/O. Moderately high IPC and slightly higher on AMD than Intel||analysis|
|fhourstones||Improvement on all platforms from GCC 7.3 to GCC 8.1||Single-threaded, lower IPC with bad speculation and backend stalls.||analysis|
|scimark2||No big changes between gcc 7.3 and 8.1||Single-threaded; five workloads but overall more backend stalls.||analysis|
|tscp||Small slowdowns on all systems with gcc 8||Single threade, runs for second, frontend bound with branch misses.||analysis|
|graphicsmagick: blur||Skylake improvements, other systems flat.||31% On_CPU, both backend and speculation. May not scale to many cores since seems limited by I/O.||analysis|
|graphicsmagick: sharpen||Skylake improvements, other systems flat.||56% On_CPU, highest IPC of the overall set. May not scale to many cores since seems limited by I/O.||analysis|
|graphicsmagick: resize||Skylake and Ryzen 2700 improvements, other systems flat.||33% On_CPU, very similar to blur. May not scale to many cores since seems limited by I/O.||analysis|
|graphicsmagick: hwb color||Skylake improvements, other systems flat.||25% On_CPU, higher backend and speculation. May not scale to many cores since seems limited by I/O.||analysis|
|himeno||Incremental performance improvements with gcc 8.1||On CPU of 100%, single-threaded, backend stall limited.||Analysis|
|ebizzy||Mixed results depending on platform of what slowed down or sped up.||Tests kernel memory system. Interesting uops/instructions ratio on Intel.||Analysis|
|build-linux-kernel||Compile times of GCC 8.1 are generally higher than GCC 7.3.||Runs ~29000 processes, many short-lived. Frontend stalls limit. Fairly parallel with On_CPU of 87.5%||Analysis|
|timed php compilation||Compile times of GCC 8.1 are generally higher than GCC 7.3.||Runs ~44000 processes, many short-lived. Frontend stalls limit. Fairly parallel with On_CPU of 82%||Analysis|
|c-ray||Improvements on Zen and coffeelake with regressions on Skylake||100% On_Cpu with IPC of 1.4, Some backend stalls.||Analysis|
|stockfish||Improvements on most systems with largest on EPYC.||On_Cpu of 100% with frontend stalls and speculation being primary issues.||Analysis|
|aobench||Skylake improvements others are flat.||Single threaded with some backend stalls and branch misses.||Analysis|
|bullet: raytests||Skylake and zen slight improvements.||Single-thread, small (7 workloads run in 5 seconds); some backend stalls.||Analysis|
|bullet: 3000 fall||Skylake and zen slight improvements.||Single-thread, small (7 workloads run in 5 seconds); some backend stalls.||Analysis|
|bullet: convex trimesh||Skylake and zen slight improvements.||Single-thread, small (7 workloads run in 5 seconds); some backend stalls.||Analysis|
|encode-flac||Faster on Intel systems, slightly slower on AMD systems.||Single threaded, some I/O with On_core of 90%. backend stalls but moderate IPC||Analysis|
|encode-mp3||Most systems faster on gcc 8.1 than 7.3||Single threaded with On_core of 100%. backend stalls but moderate IPC||Analysis|
|redis: get||Slight improvements with gcc 8.1||Only 1% On_CPU so this more about latency than throughput. Less than 1 second of runtime.||Analysis|
|redis: set||Slight improvements with gcc 8.1||Only 1% On_CPU so this more about latency than throughput. Less than 1 second of runtime.||Analysis|
|nginx||Mostly same between gcc 7.3 and 8.1||Single threaded driver and database (db not measured since not in process tree). Of that measured both frontend and backend stalls.||Analysis|
Phoronix article – compiler benchmarks (2018-05-10) — No Comments
HTML tags allowed in your comment: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>