PolyBench-C is a C-language polyhedral benchmark suite made at the Ohio State University.
A link to the polybench-c page is here.
Phoronix runs these benchmarks with the LARGE model and hence they do not fit in L3 and overall are memory-bound. The code is single-threaded and tests below were pinned to one core. Three workloads are run in order
- covariance
- correlation
- matrix multiplication
Metrics (Intel) - phoronix/polybench-c
Metrics for the workloads:
sh - pid 14672 //covariance On_CPU 0.125 On_Core 0.999 IPC 0.149 Retire 0.030 (3.0%) FrontEnd 0.006 (0.6%) Spec 0.008 (0.8%) Backend 0.956 (95.6%) Elapsed 10.66 Procs 3 Maxrss 46K Minflt 14527 Majflt 0 Inblock 0 Oublock 16 Msgsnd 0 Msgrcv 0 Nsignals 0 Nvcsw 18 (45.0%) Nivcsw 22 Utime 10.637998 Stime 0.012624 Start 85156.99 Finish 85167.65 sh - pid 14683 // correlation On_CPU 0.125 On_Core 1.000 IPC 0.149 Retire 0.030 (3.0%) FrontEnd 0.006 (0.6%) Spec 0.008 (0.8%) Backend 0.956 (95.6%) Elapsed 10.66 Procs 3 Maxrss 46K Minflt 14538 Majflt 0 Inblock 0 Oublock 16 Msgsnd 0 Msgrcv 0 Nsignals 0 Nvcsw 18 (50.0%) Nivcsw 18 Utime 10.647670 Stime 0.015996 Start 85199.00 Finish 85209.66 sh - pid 14696 // matrix multiply On_CPU 0.125 On_Core 1.000 IPC 0.473 Retire 0.081 (8.1%) FrontEnd 0.002 (0.2%) Spec 0.021 (2.1%) Backend 0.895 (89.5%) Elapsed 10.38 Procs 3 Maxrss 64K Minflt 21502 Majflt 0 Inblock 0 Oublock 16 Msgsnd 0 Msgrcv 0 Nsignals 0 Nvcsw 18 (52.9%) Nivcsw 16 Utime 10.350641 Stime 0.031998 Start 85241.09 Finish 85251.47
Overall, On_Core of 100% with low IPC and very much backend (memory) bound. The matrix multiply is least memory bound.
Metrics (AMD) - phoronix/polybenchsh - pid 4917 // covariance On_CPU 0.062 On_Core 1.000 IPC 0.426 FrontCyc 0.001 (0.1%) BackCyc 0.071 (7.1%) Elapsed 3.86 Procs 3 Maxrss 45K Minflt 14530 Majflt 0 Inblock 0 Oublock 16 Msgsnd 0 Msgrcv 0 Nsignals 0 Nvcsw 18 (4.5%) Nivcsw 385 Utime 3.847715 Stime 0.012120 Start 154067.80 Finish 154071.66 sh - pid 4928 // correlation On_CPU 0.062 On_Core 0.999 IPC 0.424 FrontCyc 0.001 (0.1%) BackCyc 0.070 (7.0%) Elapsed 3.89 Procs 3 Maxrss 46K Minflt 14541 Majflt 0 Inblock 0 Oublock 16 Msgsnd 0 Msgrcv 0 Nsignals 0 Nvcsw 18 (4.4%) Nivcsw 387 Utime 3.864962 Stime 0.020848 Start 154089.34 Finish 154093.23 sh - pid 4939 // matrix multiply On_CPU 0.062 On_Core 0.998 IPC 1.202 FrontCyc 0.001 (0.1%) BackCyc 0.171 (17.1%) Elapsed 4.25 Procs 3 Maxrss 64K Minflt 21506 Majflt 0 Inblock 0 Oublock 16 Msgsnd 0 Msgrcv 0 Nsignals 0 Nvcsw 18 (4.0%) Nivcsw 431 Utime 4.219678 Stime 0.021351 Start 154110.93 Finish 154115.18
Perhaps different instructions used as IPC on AMD is slightly higher.
Process Tree - phoronix/polybench-c
Process Tree
The process trees are simple
14672) sh 14673) polybench-c 14674) covariance_benc 14683) sh 14684) polybench-c 14685) correlation_ben 14696) sh 14697) polybench-c 14698) 3mm_bench
Overall 100% On_Core.
IPC is very low.
Backend stalls are the key issue.
retire 0.061 ms_uops 0.003 speculation 0.002 branch_misses 16.32% machine_clears 83.68% frontend 0.011 idq_uops_delivered_0 0.003 icache_stall 0.001 itlb_misses 0.000 idq_uops_delivered_1 0.005 idq_uops_delivered_2 0.006 idq_uops_delivered_3 0.008 dsb_ops 5.26% backend 0.926 resource_stalls.sb 0.002 stalls_ldm_pending 0.921
Overall the stalls are memory read related.
Next steps: None