build-gcc – Performance analysis, tools and experiments

Description - phoronix/build-gcc

This test times how long it takes to build the GNU Compiler Collection (GCC).

The test example below was building the latest (gcc 7.2.0) compiler. As part of the benchmark setup, it goes through a “download_prerequisites” step and also builds with the environment on the local system. Hence, this will make the benchmark robust to different environments, but also introduces an additional variable. The benchmark might behave differently not only because of different processors, but also of different execution environments.

Metrics (Intel) - phoronix/build-gcc

sh - pid 22890
	On_CPU   0.722
	On_Core  5.775
	IPC      0.868
	Retire   0.345	(34.5%)
	FrontEnd 0.361	(36.1%)
	Spec     0.210	(21.0%)
	Backend  0.083	(8.3%)
	Elapsed  1321.26
	Procs    353475
	Maxrss   633K
	Minflt   164920285
	Majflt   0
	Nvcsw    2584028	(68.0%)
	Nivcsw   1214887
	Utime    7272.709321
	Stime    357.995173
	Start    157412.07
	Finish   158733.33

The overall benchmark takes 22 minutes to build. During that time, parts of the build are parallel (e.g. many compiles) and parts are serial (e.g. a single link), hence the “On_CPU” amount is 0.722 partially because of inherent parallelism limits in the build and this might go down with more cores and time stretching out slightly. Having 68% of the context switches be voluntary suggests either I/O or more shorter processes or both.

Both the amount of speculative misses and the amount of front-end stalls are above average on average compared with other workloads. The total number of processes is over 350,000 and the benchmark runs 3x making this a reasonable stress test for wspy processtree generation with >1,000,000 total processes. Otherwise the process tree is too large to include here, but an inventory of what processes run follows. Many are short-lived where on-average each process accumulates 0.022 seconds of user+system time.

 136379 bash
  46703 sed
  28526 rm
  19807 cat
  15836 basename
  15039 as
  12145 cc1
  11336 xgcc
  10630 mv
   4918 autoconf
   4563 grep
   4174 expr
   4077 dirname
   4053 cc1plus
   3655 mkdir
   2996 ld
   2679 gcc
   2667 collect2
   2179 rmdir
   2175 ln
   2084 xg++
   1581 cp
   1406 ?
   1328 strip
   1113 file
   1080 which
    948 uname
    939 g++
    678 cmp
    627 config.status
    533 print
    505 make
    495 automake
    472 sort
    466 tr
    463 chmod
    452 mawk
    350 f951
    330 gfortran
    285 conftest
    232 touch
    225 hostname
    216 ar
    182 ls
    169 diff
    168 fixincl
    150 nm
    146 awk
    122 ranlib
     80 objdump
     79 echo
     75 install
     65 sh
     64 mktemp
     64 find
     61 arch
     54 tmpmultilib3
     49 a.out
     42 uniq
     42 getconf
     39 makeinfo
     38 sleep
     30 tmpmultilib4
     27 cc
     24 tmpmultilib
     24 msgmerge
     24 cut
     18 msgfmt
     18 dd
     16 true
     14 move-if-change
     12 xgettext
     12 mt
     12 genhooks
     11 cc1obj
      9 genpreds
      9 genmodes
      8 perl
      8 genchecksum
      6 tmpmultilib2
      6 realpath
      6 od
      6 ld.gold
      6 genmatch
      6 gengtype
      6 gen-fib
      6 gencfn-macros
      6 gen-bases
      5 pod2man
      4 tar
      3 tail
      3 readelf
      3 mkheader.sh
      3 head
      3 gen-trialdivtab
      3 gentarget-def
      8 perl
      8 genchecksum
      6 tmpmultilib2
      6 realpath
      6 od
      6 ld.gold
      6 genmatch
      6 gengtype
      6 gen-fib
      6 gencfn-macros
      6 gen-bases
      5 pod2man
      4 tar
      3 tail
      3 readelf
      3 mkheader.sh
      3 head
      3 gen-trialdivtab
      3 gentarget-def
      3 genrecog
      3 gen-psqr
      3 genpeep
      3 genoutput
      3 genopinit
      3 genmddeps
      3 gen-jacobitab
      3 gengenrtl
      3 genflags
      3 gen-fac
      3 genextract
      3 genenums
      3 genemit
      3 genconstants
      3 genconfig
      3 gencondmd
      3 genconditions
      3 gencodes
      3 gencheck
      3 genautomata
      3 genattrtab
      3 genattr-common
      3 genattr
      3 gcov-iov
      3 date
      3 c++filt
      3 bison
      2 pwd
      1 build-gcc

Metrics (AMD) - phoronix/build-gcc

sh - pid 31588
        On_CPU   0.491
        On_Core  7.848
        IPC      0.856
        FrontCyc 0.238  (23.8%)
        BackCyc  0.099  (9.9%)
        Elapsed  1108.81
        Procs    353642
        Maxrss   633K
        Minflt   165383257
        Majflt   0
        Nvcsw    2534566        (66.8%)
        Nivcsw   1261566
        Utime    8251.230752
        Stime    450.955154
        Start    466348.86
        Finish   467457.67

Comparing the metrics on my AMD system, the On_CPU percentage has dropped from 72% to 49% reflecting the workload isn’t as able to take advantage of extra cores as the serial parts of the build dominate. Otherwise the IPC is similar.

Plotting the total user time and system time across all cores shows times where the build is parallel and when it is much more intermittently single-threaded.

^{About this graph}
Same plot as before, but separated out by core shows similar behavior.

The overall IPC mostly stays in a band slightly lower than 1.

^{About this graph}
Front end tends to be the largest blocking issue, but also moderate amounts of bad speculation.

Next steps: Drill down further into front end stall to characterize further.