I have added a basic tool “topdown” that is a wrapper to collect topdown metrics.
Here is the usage:
prompt% ./wspy/topdown -? warning: unknown option: ? fatal error: usage: ./wspy/topdown -[abcfrs][-l <1|2|3|4>][-o
] ... -l - expand out levels (default 1) -c - show cores as separate -o - send output to -a - expand all areas -b - expand backend stalls area -f - expand frontend stalls area -r - expand retiring area -s - expand speculation area
Here is an example command line:
prompt% ./wspy/topdown -l 2 -o topdown.txt phoronix-test-suite batch-run c-ray
This creates the following output
retire 0.738 ms_uops 0.000 speculation 0.008 branch_misses 98.67% machine_clears 1.33% frontend 0.105 idq_uops_delivered_0 0.047 idq_uops_delivered_1 0.048 idq_uops_delivered_2 0.051 idq_uops_delivered_3 0.064 backend 0.150 resource_stalls.sb 0.000 stalls_ldm_pending 0.162
A quick summary follows:
- The retire/speculation/frontend/backend metrics are as expected overall topdown metrics summed across all cores
- The ms_uops is the percentage of uops delivered from the microcode sequencer. In this case, extremely few. If it were 0, then this wouldn’t be printed
- The branch_misses and machine_clears are percentages of the overall raw counts. The total amount of bad speculation in c-ray is small (0.008) and of this, most all are branch misses
- The frontend are percentages of the time that 0, <=1, <=2, or <=3 uops were delivered instead of the full complement of 4. In the 0 case, this would be a stall such as iTLB or iCache. Need to sanity check these values
- The resource_stalls.sb and stalls_ldm_pending are total number of cycles the backend has an outstanding store or an outstanding memory request. Note this is not normalized to subtract out frontend stalls or speculation so can be higher than the total backend cycles.
These additional counters give a quick additional look to start drilling deeper in particular areas.