FFTE is a package by Daisuke Takahashi to compute Discrete Fourier Transforms of 1-, 2- and 3- dimensional sequences of length (2^p)*(3^q)*(5^r)
The test runs quickly in ~5 seconds. While it may start up processes on multiple cores, it otherwise behaves single-threaded. All tests were run pinned to core 1.
Metrics (Intel) - phoronix/fftesh - pid 19980 On_CPU 0.125 On_Core 1.000 IPC 2.833 Retire 0.710 (71.0%) FrontEnd 0.013 (1.3%) Spec 0.001 (0.1%) Backend 0.276 (27.6%) Elapsed 3.52 Procs 4 Maxrss 10K Minflt 328 Majflt 0 Inblock 0 Oublock 16 Msgsnd 0 Msgrcv 0 Nsignals 0 Nvcsw 25 (69.4%) Nivcsw 11 Utime 3.519370 Stime 0.000646 Start 657764.29 Finish 657767.81
Elapsed time of 3.5 seconds and On_Core of 100%. This has a high IPC with backend stalls the largest issue. There are a number of voluntary context switches, presumably I/O related.
Metrics (AMD) - phoronix/fftephoronix-test-s - pid 27370 On_CPU 0.042 On_Core 0.677 IPC 3.586 FrontCyc 0.026 (2.6%) BackCyc 0.041 (4.1%) Elapsed 13.13 Procs 288 Maxrss 36K Minflt 59927 Majflt 0 Inblock 0 Oublock 640 Msgsnd 0 Msgrcv 0 Nsignals 0 Nvcsw 3033 (59.5%) Nivcsw 2065 Utime 8.650602 Stime 0.243819 Start 667869.77 Finish 667882.90
AMD system shows an even higher IPC.
Process Tree - phoronix/ffte
Process Tree
The process tree is simple.
19980) sh 19981) ffte 19982) ffte 19983) speed1d
On_CPU goes to 100%, most of noise due to very short running of this benchmark.
The IPC is consistently high.
There is a high retire rate and some backend stalls.
retire 0.688 ms_uops 0.004 speculation 0.006 branch_misses 42.02% machine_clears 57.98% frontend 0.027 idq_uops_delivered_0 0.007 idq_uops_delivered_1 0.012 idq_uops_delivered_2 0.015 idq_uops_delivered_3 0.023 backend 0.280 resource_stalls.sb 0.003 stalls_ldm_pending 0.506
Backend stalls seem to be memory related.
Overall, a tiny toy benchmark.
Next steps: None