This post describes a few high-level takeaways I have from the analysis:
- OpenFOAM is sensitive to the number of threads used to run, with consistent improvement until the number of threads equals the number of physical cores. After that having the number of thread equal the hyperthread number is slightly slower on my Intel system and slightly faster on my AMD system. In this range the percentage of system time jumps dramatically with top routines appearing to be memory management and scheduling related.
- OpenFOAM appears limited backend stalls and memory with the overall composite application run showing 40% L2 miss ratio and 50% L3 miss ratio. The ratio of iTLB misses is also surprisingly high 0.011. Not sure if this is related to recent kernel spectre/meltdown or something else.
- The overall IPC and backend stalls vary some as the application runs, but is roughly slightly less than 1 on Intel and slightly higher on AMD. There doesn’t seem to be a particular bias between my AMD and Intel systems.
- Overall, I get the sense when running the application, keeping track of memory management and threads including affinity are particularly important.