Metrics (Intel) - phoronix/ffmpegThis test uses FFmpeg for testing the system’s audio/video encoding performance.
On_CPU 0.678 On_Core 5.428 IPC 1.271 Retire 0.265 (26.5%) FrontEnd 0.102 (10.2%) Spec 0.153 (15.3%) Backend 0.481 (48.1%) Elapsed 7.20 Procs 34 Maxrss 152K Minflt 41862 Majflt 0 Inblock 0 Oublock 8 Msgsnd 0 Msgrcv 0 Nsignals 0 Nvcsw 20501 (97.5%) Nivcsw 515 Utime 38.723219 Stime 0.356720 Start 688861.48 Finish 688868.68
Overall, On_CPU of only 68% with many voluntary context switches, so latency timing of the processes matters. The IPC is moderately high and backend stalls are the primary limiter. Overall duration of 7.2 seconds is short.
Metrics (AMD) - phoronix/ffmpegsh - pid 1228 On_CPU 0.276 On_Core 4.408 IPC 1.637 FrontCyc 0.000 (0.0%) BackCyc 0.000 (0.0%) Elapsed 9.51 Procs 65 Maxrss 227K Minflt 62022 Majflt 0 Inblock 0 Oublock 8 Msgsnd 0 Msgrcv 0 Nsignals 0 Nvcsw 34648 (90.7%) Nivcsw 3544 Utime 41.518247 Stime 0.403098 Start 572298.45 Finish 572307.96
IPC on AMD is a fair amount higher than Intel, although the elapsed duration is longer. the On_CPU is also less than half that of Intel. Twice as many threads run, the core is used less, and the elapsed time reported is longer.
I ran this by hand, passing in a “-threads” option that is normally set to the number of cores. Following were times reported by time(1) for various thread configurations on my AMD system:
1 thread - 12.746 seconds real time 2 threads - 9.453 seconds real time 4 threads - 9.472 seconds real time 8 threads - 9.470 seconds real time 16 threads - 9.327 seconds real time 32 threads - 9.491 seconds real time 64 threads - 9.563 seconds real time
So while having 2 threads is better than 1, the particular workload selected to run ffmpeg is not able to take advantage of more threads than that and stays flat before having time go up.
Process Tree - phoronix/ffmpeg
Process Tree
There are four processes per core.
28646) sh
28647) ffmpeg
28648) ffmpeg
28649) ffmpeg
28650) ffmpeg
28651) ffmpeg
28652) ffmpeg
28653) ffmpeg
28654) ffmpeg
28655) ffmpeg
28656) ffmpeg
28657) ffmpeg
28658) ffmpeg
28659) ffmpeg
28660) ffmpeg
28661) ffmpeg
28662) ffmpeg
28663) ffmpeg
28664) ffmpeg
28665) ffmpeg
28666) ffmpeg
28667) ffmpeg
28668) ffmpeg
28669) ffmpeg
28670) ffmpeg
28671) ffmpeg
28672) ffmpeg
28673) ffmpeg
28674) ffmpeg
28675) ffmpeg
28676) ffmpeg
28677) ffmpeg
28678) ffmpeg
28679) ffmpeg

The total On_CPU time is ~2/3 of the total, the rest marked as idle.
About this graph
Individual cores are scheduled out.

Individual IPC have some noise mostly with quick times for the benchmark.
About this graph
Backend stalls are the largest issue.
A Phoronix benchmark article shows Clearlinux with a considerable lead on this benchmark. Likely a customized version or very targeted optimizations?
Next steps: What does Clearlinux do to advantage the benchmark? Why does AMD have higher IPC and lower scores?
