This page describes metrics used and methods used to measure them.

On_CPU
What percent of the theoretically available time is this workload scheduled to run on all cores of the CPU.

The Linux kernel schedules processes on cores and records this as user time (utime) and system time (stime) depending on whether the process is running in user space or kernel space. This includes time the process might be waiting on memory or similar bottleneck.

There can be a variety of reasons a process is not scheduled to run. It might be waiting on an external resource such as disk or network. It might be waiting on another process. There might be other processes running to take the available slots.

The On_CPU metric is thus a high-level metric of the total amount of time the workload has run (utime + stime) divided by the total available time. It is a metric of how “cpu-bound” a process is vs. “disk-bound”, “network-bound”. It is also a metric of how much it uses many cores in parallel vs. serially running on just one (or a few) cores.

On_CPU metric is measured in the wspy program by catching the exit(2) events with ptrace(2), looking /proc/pid/status file for each completed process and adding a total across the process tree and dividing this by the elapsed time.

See also my blog post describing this metric.

On_Core
What percent of theoretically available time for one core, is this workload scheduled to run. Note: Can be greater than 100% with multiple cores running in parallel. Similar to the On_CPU metric, but instead multiplied across all cores.

This metric becomes useful when one compares systems with different numbers of cores or situations where the number of cores has been constrained e.g. hyperthreading on/off. It also makes it easier to spot and evaluate single-threaded applications since they will have at maximum an On_Core metric of 100% while the On_CPU metric will be inversely proportional to the number of cores.

See also my blog post describing this metric.

IPC
The average number of instructions executed each clock cycle.

The IPC metric is measured in the wspy program by starting performance counters at fork(2) events caught using ptrace(2) and then counting them again at the corresponding ptrace(2) exit(2) events. The IPC metric can thus be aggregated for each leaf and node of the corresponding process tree. The amounts reported for the workload is thus the sum of all instructions for all processes in the workload divided by the sum of all cpu cycles.

This metric could be measured more simply at the top level of the entire workload run. However, the wspy method lets me tease out differences in different processes run during the workload.

See also my blog post describing this metric.

TD_Retire
TD is an abbreviation for “top down” as described in this Intel slideset In particular TD Retire is the topdown first level metric for percent of available slots retiring instructions. In general, having an workload spend a high proportion retiring instructions can be a good thing because it is not stalled in the front end, stalled the backend or speculating instructions that are never retired. However, in a “topdown” fashion one can then look at instruction mix being retired for further improvements.

This metric is measured using the “topdown-*” architectural counters either directly in wspy or using the perf(1) –topdown option.

TD_Front_End
TD is an abbreviation for “top down” as described in this Intel slideset In particular TD Front End is the topdown first level metric for percent of available slots spent delayed in microprocessor front end. The front end includes the instruction cache, instruction TLB and overall fetch and decode steps. When an application spends a proportionally higher amount of time stalled in the front end, this guides next levels of analysis in “topdown” fashion.

This metric is measured using the “topdown-*” architectural counters either directly in wspy or using the perf(1) –topdown option.

TD_Spec
TD is an abbreviation for “top down” as described in this Intel slideset In particular TD Spec is the topdown first level metric for percent of available slots speculating instructions that are never used. Speculating instructions that are never retired can come from several reasons including branch mis-prediction or machine clears. When an application spends a proportionally higher amount of time speculating slots that are never used, this guides next levels of analysis in “topdown” fashion.

This metric is measured using the “topdown-*” architectural counters either directly in wspy or using the perf(1) –topdown option.

TD_Back_End
TD is an abbreviation for “top down” as described in this Intel slideset In particular TD Back End is the topdown first level metric for percent of available slots delayed in the microprocessor back end. The backend includes the cache hierarchy, memory as well as the execution units. When an application spends a proportionally higher amount of time stalled in the back end, this guides next levels of analysis in “topdown” fashion.

This metric is measured using the “topdown-*” architectural counters either directly in wspy or using the perf(1) –topdown option.