As April finished, useful to take stock of where analysis is at as well as next steps. During May, I may also take a few week break to finish publishing my book and a cycle trip- so also useful to capture the loose ends.
As April comes to a close, following are done:
- First level analysis of ~20 workloads, mostly Phoronix, but also a quick pass through Geekbench. Added pages in the “Workloads” tab
- wspy tool supports three modes
- Periodic sampling of /proc/stat and /proc/diskstats
- Process trees built with three “engines”: ftrace, ptrace and ptrace2
- Performance counters in process or core or application model
- Implemented Intel’s first level “topdown” analysis; and some expanded measurements of the “backend” category for memory/core, bandwidth/frequency distinctions; memory needs some revisions/fixes.
- Basic measurements on Intel Haswell and AMD Ryzen
- Basic survey of related capabilities coming in perf(1) and likwid-perfctr(1)
This leaves multiple areas for further development on multiple fronts. Some of the more prominent loose ends include:
- Workloads – expand from breadth and pick up some larger typical cases
- Keep adding incrementally from Phoronix, both the outstanding list and as additional ones are included in Phoronix articles.
- Add a larger “real world” application, top candidate is “gromacs”, both as computational chemistry and relatively well done from a build perspective. Other similar apps such as OpenFoam or WRF to follow.
- Add some workloads from standards cases, e.g. the SPEC wpc benchmarks seem to be free for non-commercial use.
- Tool breadth for wspy including capabilities
- Complete –memstats and –netstats
- Finish adding third mode to “process”, “core” to have the entire workload. While this overlaps to some extent what perf or likwid do in the entire program; it can also be a hook for more general metrics, e.g. memory metrics not just plotted but also quantified
- Robustness, e.g. handle NMI timer, better handle running out of file descriptors
- Architecture breadth and depth in analysis
- Expand “topdown” analysis to AMD platform to find equivalent counters for higher level “flow” type picture, e.g. do the “stall” metrics now help or are there others
- Add an ARM test platform, either TX1 or Raspberry Pi.
- Analysis depth going further: expand out front-end analysis, find the speculation cases, add cache hierarchy levels
- Start drilling down on the “next steps” added to many of the workload pages
- Web page loose ends
- Finish the “about this graph” notes in snippets
No shortage of areas to pursue further and I expect some of them to be intertwined, e.g. adding workloads causing enhancements to the tools for more specific areas.
If nothing else, this list will be useful to revisit later to see what has been addressed and otherwise what has changed.