This test times how long it takes to build the GNU Compiler Collection (GCC).
The test example below was building the latest (gcc 7.2.0) compiler. As part of the benchmark setup, it goes through a “download_prerequisites” step and also builds with the environment on the local system. Hence, this will make the benchmark robust to different environments, but also introduces an additional variable. The benchmark might behave differently not only because of different processors, but also of different execution environments.
Metrics (Intel) - phoronix/build-gccsh - pid 22890 On_CPU 0.722 On_Core 5.775 IPC 0.868 Retire 0.345 (34.5%) FrontEnd 0.361 (36.1%) Spec 0.210 (21.0%) Backend 0.083 (8.3%) Elapsed 1321.26 Procs 353475 Maxrss 633K Minflt 164920285 Majflt 0 Nvcsw 2584028 (68.0%) Nivcsw 1214887 Utime 7272.709321 Stime 357.995173 Start 157412.07 Finish 158733.33
The overall benchmark takes 22 minutes to build. During that time, parts of the build are parallel (e.g. many compiles) and parts are serial (e.g. a single link), hence the “On_CPU” amount is 0.722 partially because of inherent parallelism limits in the build and this might go down with more cores and time stretching out slightly. Having 68% of the context switches be voluntary suggests either I/O or more shorter processes or both.
Both the amount of speculative misses and the amount of front-end stalls are above average on average compared with other workloads. The total number of processes is over 350,000 and the benchmark runs 3x making this a reasonable stress test for wspy processtree generation with >1,000,000 total processes. Otherwise the process tree is too large to include here, but an inventory of what processes run follows. Many are short-lived where on-average each process accumulates 0.022 seconds of user+system time.
136379 bash 46703 sed 28526 rm 19807 cat 15836 basename 15039 as 12145 cc1 11336 xgcc 10630 mv 4918 autoconf 4563 grep 4174 expr 4077 dirname 4053 cc1plus 3655 mkdir 2996 ld 2679 gcc 2667 collect2 2179 rmdir 2175 ln 2084 xg++ 1581 cp 1406 ? 1328 strip 1113 file 1080 which 948 uname 939 g++ 678 cmp 627 config.status 533 print 505 make 495 automake 472 sort 466 tr 463 chmod 452 mawk 350 f951 330 gfortran 285 conftest 232 touch 225 hostname 216 ar 182 ls 169 diff 168 fixincl 150 nm 146 awk 122 ranlib 80 objdump 79 echo 75 install 65 sh 64 mktemp 64 find 61 arch 54 tmpmultilib3 49 a.out 42 uniq 42 getconf 39 makeinfo 38 sleep 30 tmpmultilib4 27 cc 24 tmpmultilib 24 msgmerge 24 cut 18 msgfmt 18 dd 16 true 14 move-if-change 12 xgettext 12 mt 12 genhooks 11 cc1obj 9 genpreds 9 genmodes 8 perl 8 genchecksum 6 tmpmultilib2 6 realpath 6 od 6 ld.gold 6 genmatch 6 gengtype 6 gen-fib 6 gencfn-macros 6 gen-bases 5 pod2man 4 tar 3 tail 3 readelf 3 mkheader.sh 3 head 3 gen-trialdivtab 3 gentarget-def 8 perl 8 genchecksum 6 tmpmultilib2 6 realpath 6 od 6 ld.gold 6 genmatch 6 gengtype 6 gen-fib 6 gencfn-macros 6 gen-bases 5 pod2man 4 tar 3 tail 3 readelf 3 mkheader.sh 3 head 3 gen-trialdivtab 3 gentarget-def 3 genrecog 3 gen-psqr 3 genpeep 3 genoutput 3 genopinit 3 genmddeps 3 gen-jacobitab 3 gengenrtl 3 genflags 3 gen-fac 3 genextract 3 genenums 3 genemit 3 genconstants 3 genconfig 3 gencondmd 3 genconditions 3 gencodes 3 gencheck 3 genautomata 3 genattrtab 3 genattr-common 3 genattr 3 gcov-iov 3 date 3 c++filt 3 bison 2 pwd 1 build-gccMetrics (AMD) - phoronix/build-gcc
sh - pid 31588 On_CPU 0.491 On_Core 7.848 IPC 0.856 FrontCyc 0.238 (23.8%) BackCyc 0.099 (9.9%) Elapsed 1108.81 Procs 353642 Maxrss 633K Minflt 165383257 Majflt 0 Nvcsw 2534566 (66.8%) Nivcsw 1261566 Utime 8251.230752 Stime 450.955154 Start 466348.86 Finish 467457.67
Comparing the metrics on my AMD system, the On_CPU percentage has dropped from 72% to 49% reflecting the workload isn’t as able to take advantage of extra cores as the serial parts of the build dominate. Otherwise the IPC is similar.
Plotting the total user time and system time across all cores shows times where the build is parallel and when it is much more intermittently single-threaded.
About this graph
Same plot as before, but separated out by core shows similar behavior.
The overall IPC mostly stays in a band slightly lower than 1.
About this graph
Front end tends to be the largest blocking issue, but also moderate amounts of bad speculation.
Next steps: Drill down further into front end stall to characterize further.