I have updated the topdown wrapper script to add some additional cache statistics for level 3 backend related information. I have also added a -x option to display some rusage information.
Following is an illustration using the following command that looks at level 3 frontend information for the build-linux-kernel benchmark.
./wspy/topdown -l 3 -x -o topdown.txt phoronix-test-suite batch-run scimark2
The output in topdown.txt is as follows:
on_cpu 0.112 elapsed 86.737 utime 77.504 stime 77.504 nvcsw 413 (79.73%) nivcsw 105 (20.27%) inblock 8 inblock 952 retire 0.510 ms_uops 0.002 speculation 0.058 branch_misses 72.66% machine_clears 27.34% frontend 0.027 idq_uops_delivered_0 0.005 icache_stall 0.001 itlb_misses 0.000 idq_uops_delivered_1 0.011 idq_uops_delivered_2 0.018 idq_uops_delivered_3 0.020 dsb_ops 56.38% backend 0.405 resource_stalls.sb 0.001 stalls_ldm_pending 0.555 l2_refs 0.013 l2_misses 0.007 l2_miss_ratio 55.80% l3_refs 0.001 l3_misses 0.001 l3_miss_ratio 39.04%
A brief explanation using this output
- The On_cpu ratio comes from using the elapsed time and system and user time. It isn’t quite 12.5% for the single-threaded scimark2 because it also includes some periods where the phoronix test suite is idle.
- The l2 and l3 statistics are the number of references/misses relative to the number of cycles. That will always be slightly small e.g. if a reference takes 11 cycles then we’re really talking about 13*11 = 132 cycles of l2 reference time per 1000 cycles, however keeps it in the same units as other metrics
- The l2 and l3 miss ratios are relative of the two statistics. Overall it shows still a moderate level of l2 and l3 misses are contributing the the backend stall nature of this benchmark
Overall, this combination of metrics gives a fairly quick and dirty overview of a workload run that one can then dig deeper for specifics.