Finding performance counters for memory traffic on Haswell
I have found the counters necessary for wspy to get memory reads/writes.
It wasn’t completely straightforward, so this documents the steps I took.
My 4.13 linux kernel displays a set of events in /sys/devices that seem particular to each different CPU. Different between AMD and Intel and also on ARM. The ones in /sys/devices/cpu look familiar to many of the raw performance counters.
On my Intel i7-4770:
mev@popayan:~$ find /sys/devices/ -type d -name events /sys/devices/uncore_cbox_3/events /sys/devices/cstate_pkg/events /sys/devices/uncore_cbox_1/events /sys/devices/power/events /sys/devices/cpu/events /sys/devices/uncore_cbox_2/events /sys/devices/uncore_imc/events /sys/devices/uncore_cbox_0/events /sys/devices/cstate_core/events /sys/devices/msr/events mev@popayan:~$ ls /sys/devices/cpu/events branch-instructions el-abort ref-cycles tx-abort branch-misses el-capacity topdown-fetch-bubbles tx-capacity bus-cycles el-commit topdown-recovery-bubbles tx-commit cache-misses el-conflict topdown-recovery-bubbles.scale tx-conflict cache-references el-start topdown-slots-issued tx-start cpu-cycles instructions topdown-slots-retired cycles-ct mem-loads topdown-total-slots cycles-t mem-stores topdown-total-slots.scale mev@popayan:~$ more /sys/devices/cpu/events/instructions event=0xc0 mev@popayan:~$
On my AMD A10-7850K:
mev@cuenca:~$ find /sys/devices -type d -name events /sys/devices/cpu/events /sys/devices/amd_iommu_0/events /sys/devices/msr/events mev@cuenca:~$ ls /sys/devices/cpu/events branch-instructions cache-misses cpu-cycles stalled-cycles-backend branch-misses cache-references instructions stalled-cycles-frontend mev@cuenca:~$ more /sys/devices/cpu/events/instructions event=0xc0 mev@cuenca:~$
I also noticed that these same names appear when I do “perf list” to show the list of performance counters. The event directories for memory seem to be the /sys/devices/uncore_imc for the integrated memory controller on Intel (and /sys/devices/amd_iommu_0 on AMD).
It appears that perf can report memory reads and writes with two different events. The “*.scale” files keep the kernel from doing floating point arithmetic. The reciprocal of this number is 16384 to computer MiB/s. If a value of 16384 is 1 MiB, then this means each individual count is 64 bytes.
mev@popayan:~$ more /sys/devices/uncore_imc/events/* :::::::::::::: /sys/devices/uncore_imc/events/data_reads :::::::::::::: event=0x01 :::::::::::::: /sys/devices/uncore_imc/events/data_reads.scale :::::::::::::: 6.103515625e-5 :::::::::::::: /sys/devices/uncore_imc/events/data_reads.unit :::::::::::::: MiB :::::::::::::: /sys/devices/uncore_imc/events/data_writes :::::::::::::: event=0x02 :::::::::::::: /sys/devices/uncore_imc/events/data_writes.scale :::::::::::::: 6.103515625e-5 :::::::::::::: /sys/devices/uncore_imc/events/data_writes.unit :::::::::::::: MiB
However, trying a raw performance counter of 0x01 or 0x02 in perf was returning a 0 value rather than a megabyte value:
mev@popayan:~$ perf stat -e data_reads pwd /home/mev Performance counter stats for 'system wide': 0.69 MiB data_reads 0.000516034 seconds time elapsed mev@popayan:~$ perf stat -e r01 pwd /home/mev Performance counter stats for 'pwd': 0 r01 0.000723624 seconds time elapsed
Using strace of the perf executable, I discovered that the perf_event_open(2) system calls used a different event type PERF_EVENT_UPROBE. A google search shows this was very recently added to the linux kernel and also isn’t yet in my header file (/usr/include/linux/perf_event.h). However, with this knowledge, I should be able to add similar performance counter access to wspy. Since the memory controller is shared by all CPUs on my i7-4770, I only need to measure it on one core and will add it as the generic set of counters collected.
mev@popayan:~$ strace -v -e perf_event_open perf stat -e data_writes pwd ... perf_event_open({type=0x7 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0x2, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER, read_format=PERF_FORMAT_TOTAL_TIME_ENABLED|PERF_FORMAT_TOTAL_TIME_RUNNING, disabled=1, inherit=1, pinned=0, exclusive=0, exclusive_user=0, exclude_kernel=0, exclude_hv=0, exclude_idle=0, mmap=0, comm=0, freq=0, inherit_stat=0, enable_on_exec=0, task=0, watermark=0, precise_ip=0 /* arbitrary skid */, mmap_data=0, sample_id_all=0, exclude_host=0, exclude_guest=0, exclude_callchain_kernel=0, exclude_callchain_user=0, mmap2=0, comm_exec=0, use_clockid=0, context_switch=0, write_backward=0, wakeup_events=0, config1=0, config2=0, sample_regs_user=0, sample_regs_intr=0, aux_watermark=0, sample_max_stack=0}, -1, 0, -1, 0) = 3 /home/mev ...
What is still an open area for better refactoring to clean things up, is more generically giving wspy arguments to configure the performance counters collected including this one.
After rebooting my system, the /sys/devices/uncore_imc directory was no longer available. I was initially confused what might be causing this. Investigating further, I figured out that the system had booted into a Xen hypervisor virtual machine. Interestingly enough some of the event files such as /sys/devices/cpu were available, while others were not.
Later activity to still explore virtualization and performance counters, but for now went back to a bare metal OS.
Also a useful reminder that when rebooting from remotely where I won’t see the console on bringup to make sure the system is in the expected configuration…