octave-benchmark – Performance analysis, tools and experiments

Description - phoronix/octave-benchmark

This test profile measures how long it takes to complete several reference GNU Octave files via octave-benchmark. GNU Octave is used for numerical computations and is an open-source alternative to MATLAB.

The benchmark runs six workloads in ~10 seconds. These tests are single-threaded and all testing was done pinned to core 1.

Description - phoronix/octave-benchmark

sh - pid 17169
	On_CPU   0.125
	On_Core  0.999
	IPC      1.555
	Retire   0.323	(32.3%)
	FrontEnd 0.075	(7.5%)
	Spec     0.115	(11.5%)
	Backend  0.486	(48.6%)
	Elapsed  10.56
	Procs    10
	Maxrss   593K
	Minflt   1370931
	Majflt   0
	Inblock  0
	Oublock  144
	Msgsnd   0
	Msgrcv   0
	Nsignals 0
	Nvcsw    104	(70.3%)
	Nivcsw   44
	Utime    9.525428
	Stime    1.029286
	Start    65087.73
	Finish   65098.29

The test is On_CPU almost 100%. The IPC is moderately high with backend stalls as the largest issue.

Metrics (AMD) - phoronix/octave-benchmark

sh - pid 2530
	On_CPU   0.062
	On_Core  0.999
	IPC      1.476
	FrontCyc 0.076	(7.6%)
	BackCyc  0.162	(16.2%)
	Elapsed   9.89
	Procs    10
	Maxrss   594K
	Minflt   1370690
	Majflt   0
	Inblock  0
	Oublock  104
	Msgsnd   0
	Msgrcv   0
	Nsignals 0
	Nvcsw    66	(6.3%)
	Nivcsw   982
	Utime    8.835452
	Stime    1.040407
	Start    172659.53
	Finish   172669.42

IPC on AMD is slightly lower.

Process Tree - phoronix/octave-benchmark
Process Tree

    17169) sh
      17170) octave-benchmar
        17171) octave-cli
        17174) octave-cli
        17175) octave-cli
        17176) octave-cli
        17177) octave-cli
        17178) octave-cli
        17179) head
        17180) cut

The process tree is simple.

The workloads have some variation with the last workloads taking some system time and the first ones being predominately user time.

The IPC also varies some by workload. There is a one-second granularity of measurements, so not quite enough to sort out the six workloads from each other.

Topdown metrics show similar variations with occasional spikes in backend memory stalls and periods of speculative branch misses.

Topdown (Intel)

on_cpu         0.111
elapsed        37.064
utime          29.467
stime          3.397
nvcsw          1086 (89.98%)
nivcsw         121 (10.02%)
inblock        0
onblock        1192
retire         0.396
ms_uops                0.026
speculation    0.042
branch_misses          65.55%
machine_clears         34.45%
frontend       0.084
idq_uops_delivered_0   0.022
icache_stall               0.004
itlb_misses                0.002
idq_uops_delivered_1   0.034
idq_uops_delivered_2   0.047
idq_uops_delivered_3   0.062
dsb_ops                    65.39%
backend        0.479
resource_stalls.sb     0.069
stalls_ldm_pending     0.644
l2_refs                    0.032
l2_misses                  0.022
l2_miss_ratio              69.97%
l3_refs                    0.009
l3_misses                  0.003
l3_miss_ratio              37.13%

Surprisingly high cache miss rates on L2/L3. Approximately 2/3 of the uops come from the uop cache. Overall speculation is low.