Metrics (Intel) - phoronix/primesievePrimesieve generates prime numbers using a highly optimized sieve of Eratosthenes implementation. Primesieve benchmarks the CPU’s L1/L2 cache performance.
sh - pid 13547 On_CPU 0.998 On_Core 7.982 IPC 0.688 Retire 0.373 (37.3%) FrontEnd 0.137 (13.7%) Spec 0.173 (17.3%) Backend 0.317 (31.7%) Elapsed 83.16 Procs 11 Maxrss 45K Minflt 20280 Majflt 0 Inblock 0 Oublock 16 Msgsnd 0 Msgrcv 0 Nsignals 0 Nvcsw 192 (5.0%) Nivcsw 3671 Utime 663.706994 Stime 0.058072 Start 85394.72 Finish 85477.88
The code is On_CPU almost 100%. There are a fair number of backend stalls, also reflecting the “test of cache performance” note in the description. Also a moderate amount of speculative misses.
Metrics (AMD) - phoronix/primesievesh - pid 28397 On_CPU 0.994 On_Core 15.911 IPC 0.788 FrontCyc 0.000 (0.0%) BackCyc 0.000 (0.0%) Elapsed 37.81 Procs 19 Maxrss 24K Minflt 9960 Majflt 0 Inblock 0 Oublock 16 Msgsnd 0 Msgrcv 0 Nsignals 0 Nvcsw 657 (1.1%) Nivcsw 60232 Utime 601.568254 Stime 0.043967 Start 956973.83 Finish 957011.64
IPC on AMD is just slightly higher.
About this graph
CPU cores are kept scheduled at 100%.
Process Tree - phoronix/primesieve
Process Tree
The process tree is simple.
13547) sh 13548) primesieve-test 13549) primesieve 13550) primesieve 13551) primesieve 13552) primesieve 13553) primesieve 13554) primesieve 13555) primesieve 13556) primesieve 13557) primesieve
IPC is mostly steady, but rising slowly in the test.
About this graph
The ride in IPC also corresponds to fewer backend stalls.
on_cpu 0.978 elapsed 254.425 utime 1991.035 stime 0.359 nvcsw 1537 (11.42%) nivcsw 11920 (88.58%) inblock 0 onblock 728 retire 0.465 ms_uops 0.001 speculation 0.080 branch_misses 97.61% machine_clears 2.39% frontend 0.135 idq_uops_delivered_0 0.036 icache_stall 0.000 itlb_misses 0.000 idq_uops_delivered_1 0.046 idq_uops_delivered_2 0.049 idq_uops_delivered_3 0.139 dsb_ops 70.41% backend 0.320 resource_stalls.sb 0.170 stalls_ldm_pending 0.323 l2_refs 0.093 l2_misses 0.013 l2_miss_ratio 14.18% l3_refs 0.012 l3_misses 0.000 l3_miss_ratio 1.21%
L2 miss ratio of 14% and L3 of 1.2% likely help drive the backend stalls. The frontend stalls appear to be more through packing. Around 70% of these come from the uop cache. Bad speculation is branch misses.