Description - phoronix/build-php

This test times how long it takes to build PHP 5 with the Zend engine.

An individual benchmark run takes ~100 seconds. During that time, ~44,400 mostly short quick processes are run. This makes this build process about 1/8th as large as the build-gcc workload and 1.5x larger than the build-linux-kernel. The On_CPU percentage is ~82% so while there are some sequential periods this is also mostly a parallel workload keeping the cores busy.

Metrics (Intel) - phoronix/build-php
sh - pid 27726
	On_CPU   0.819
	On_Core  6.552
	IPC      0.768
	Retire   0.378	(37.8%)
	FrontEnd 0.332	(33.2%)
	Spec     0.122	(12.2%)
	Backend  0.169	(16.9%)
	Elapsed  99.00
	Procs    44411
	Maxrss   450K
	Minflt   13820421
	Majflt   0
	Inblock  0
	Oublock  1415056
	Msgsnd   0
	Msgrcv   0
	Nsignals 0
	Nvcsw    303470	(56.6%)
	Nivcsw   233106
	Utime    623.132218
	Stime    25.527337
	Start    589839.45
	Finish   589938.45

Overall metrics show an IPC of 0.77 with frontend stalls being the largest issue and the level of speculative execution also being higher. Not sure I believe the Inblock of 0 in the list above, but there is some I/O as can also be seen with ~57% voluntary context switches.

Metrics (AMD) - phoronix/build-php
sh - pid 29424
	On_CPU   0.571
	On_Core  9.139
	IPC      0.753
	FrontCyc 0.118	(11.8%)
	BackCyc  0.118	(11.8%)
	Elapsed  80.03
	Procs    44411
	Maxrss   450K
	Minflt   13907559
	Majflt   0
	Inblock  0
	Oublock  1414952
	Msgsnd   0
	Msgrcv   0
	Nsignals 0
	Nvcsw    301119	(52.4%)
	Nivcsw   273760
	Utime    694.810475
	Stime    36.567826
	Start    599345.94
	Finish   599425.97

AMD metrics show the On_CPU percentage has dropped to 57% so doubling the number of cores no longer completely scales. The IPC is fairly close.

Process Tree - phoronix/build-php
Process Tree
The process tree is large to show here, so will include instead a summary of how many of each type of process.

  25821 bash
  11978 sed
   1429 rm
   1068 cat
    976 grep
    716 x86_64-linux-gn
    716 cc
    712 cc1
    396 tr
    356 mv
    141 shtool
     40 mkdir
     18 ?
      9 wc
      9 awk
      4 collect2
      3 expr
      3 cp
      2 sort
      2 ranlib
      2 php
      2 chmod
      1 time-compile-ph
      1 sh
      1 make
      1 m4
      1 ln
      1 ldconfig.real
      1 bison
      1 ar

Interestingly it shows a lot of short-lived quick bash or sed processes particularly in proportion to the number of “cc” compilations.


Adding up the time across all cores shows that most of the initial period is close to 100% and only at the end does the utilization drop down reducing parallel activities.

About this graph
Breaking this amongst individual cores shows similar allocation.

About this graph
Topdown metrics show frontend stalls as the largest limiter, with backend stalls coming during peak periods (linking phase?).

Overall topdown metrics.

retire         0.344
ms_uops                0.012
speculation    0.115
branch_misses          94.73%
machine_clears         5.27%
frontend       0.312
idq_uops_delivered_0   0.091
idq_uops_delivered_1   0.133
idq_uops_delivered_2   0.177
idq_uops_delivered_3   0.224
backend        0.228
resource_stalls.sb     0.035
stalls_ldm_pending     0.432

Still need to build out the frontend stalls beyond showing number of times with fewer than 4 uops. The speculative misses are branch misses and the backend stalls are moderate.

Next steps: Dig deeper on frontend stall issues.