Metrics (Intel) - phoronix/tensorflowThis is a benchmark of the Tensorflow deep learning framework using the CIFAR10 data set.
sh - pid 20011 On_CPU 0.873 On_Core 6.987 IPC 1.062 Retire 0.559 (55.9%) FrontEnd 0.101 (10.1%) Spec 0.025 (2.5%) Backend 0.315 (31.5%) Elapsed 91.25 Procs 50 Maxrss 623K Minflt 20401674 Majflt 0 Inblock 16 Oublock 397768 Msgsnd 0 Msgrcv 0 Nsignals 0 Nvcsw 1043029 (83.0%) Nivcsw 213430 Utime 591.999773 Stime 45.550762 Start 753175.79 Finish 753267.04
The program has a steady amount of kernel time as well as user time ans some output I/O leaving an On_CPU of 87%. The largest limiters are backend stalls.
Metrics (AMD) - phoronix/tensorflowsh - pid 16175 On_CPU 0.722 On_Core 11.546 IPC 0.891 FrontCyc 0.017 (1.7%) BackCyc 0.050 (5.0%) Elapsed 80.41 Procs 214 Maxrss 842K Minflt 20562855 Majflt 0 Inblock 8 Oublock 397680 Msgsnd 0 Msgrcv 0 Nsignals 0 Nvcsw 1631884 (85.8%) Nivcsw 269298 Utime 851.494691 Stime 76.885744 Start 2520.93 Finish 2601.34
IPC for AMD is ~10% lower than Intel.
Process Tree - phoronix/tensorflow
Process Tree
Multiple python processes started on each core
20011) sh
20012) tensorflow
20013) python3
20021) ldconfig.real
20022) sh
20023) uname
20014) python3
20015) python3
20016) python3
20017) python3
20018) python3
20019) python3
20020) python3
20024) python3
20025) python3
20026) python3
20027) python3
20028) python3
20029) python3
20030) python3
20031) python3
20032) python3
20033) python3
20034) python3
20035) python3
20036) python3
20037) python3
20038) python3
20039) python3
20040) python3
20041) python3
20042) python3
20043) python3
20044) python3
20045) python3
20046) python3
20047) python3
20048) python3
20049) python3
20050) python3
20051) python3
20052) python3
20053) python3
20054) python3
20055) python3
20056) python3
20057) python3
20058) python3
20059) python3
20060) python3

Adding times for all the CPUs shows a small amount of system time.
About this graph
Overall scheduled on all cores.

IPC is consistently around 1.
About this graph
Shows how backend stalls are the largest issue.
retire 0.526 ms_uops 0.016 speculation 0.024 branch_misses 16.70% machine_clears 83.30% frontend 0.099 idq_uops_delivered_0 0.035 idq_uops_delivered_1 0.042 idq_uops_delivered_2 0.052 idq_uops_delivered_3 0.067 backend 0.351 resource_stalls.sb 0.109 stalls_ldm_pending 0.319
Backend stalls tend to dominate.
Next steps: Dig deeper in backend stalls.
