Scikit-learn is a Python module for machine learning
The benchmark runs a python program with the following sample output:
python bench_random_projections.py Dataset statics =========================== n_samples = 500 n_features = 10000 n_components = 298 (auto) n_elements = 5000000 n_nonzeros = 10 per feature ratio_nonzeros = 0.001 Benchmarks =========================== Generate dataset benchmarks... done Perform benchmarks for GaussianRandomProjection... iter 0...done iter 1...done iter 2...done iter 3...done iter 4...done Perform benchmarks for SparseRandomProjection... iter 0...done iter 1...done iter 2...done iter 3...done iter 4...done Script arguments =========================== Arguments | Value -------------------------|-------------- random_seed | 13 density | 0.333333333333 eps | 0.5 ratio_nonzeros | 0.001 n_components | auto n_samples | 500 dense | False n_times | 5 selected_transformers | GaussianRandomProjection,SparseRandomProjection n_features | 10000 Transformer performance: =========================== Results are averaged over 5 repetition(s). Transformer | fit | transform -------------------------------|--------------|-------------- GaussianRandomProjection | 1.8445s | 3.5983s SparseRandomProjection | 0.0700s | 0.4334s
The benchmark is single-threaded and all testing was done pinned to core 1.
Metrics (Intel) - phoronix/scikit-learnsh - pid 24271 On_CPU 0.125 On_Core 1.000 IPC 2.724 Retire 0.685 (68.5%) FrontEnd 0.018 (1.8%) Spec 0.036 (3.6%) Backend 0.260 (26.0%) Elapsed 29.85 Procs 3 Maxrss 470K Minflt 601865 Majflt 0 Inblock 0 Oublock 8 Msgsnd 0 Msgrcv 0 Nsignals 0 Nvcsw 18 (32.1%) Nivcsw 38 Utime 29.356082 Stime 0.483580 Start 758582.50 Finish 758612.35
The tests run On_Core at 100%. Little time is spend on frontend stalls or bad speculation. The IPC overall is high with some backend stalls.
Metrics (AMD) - phoronix/scikit-learnsh - pid 19157 On_CPU 0.062 On_Core 0.999 IPC 1.908 FrontCyc 0.005 (0.5%) BackCyc 0.302 (30.2%) Elapsed 124.25 Procs 3 Maxrss 464K Minflt 601672 Majflt 0 Inblock 0 Oublock 8 Msgsnd 0 Msgrcv 0 Nsignals 0 Nvcsw 18 (0.1%) Nivcsw 12219 Utime 123.690029 Stime 0.434131 Start 9171.95 Finish 9296.20
The IPC on AMD is a fair amount lower than Intel.
Process Tree - phoronix/scikit-learn
Process Tree
The process tree is simple.
24271) sh 24272) scikit-learn 24273) python
On_Core is 100% with some small amounts of system time. This likely correlates with I/O.
The IPC shows five iterations on each of three runs.
The retirement rate is high, with backend stalls and to lesser extent speculation going in the gaps.
retire 0.678 ms_uops 0.004 speculation 0.041 branch_misses 79.01% machine_clears 20.99% frontend 0.023 idq_uops_delivered_0 0.007 idq_uops_delivered_1 0.009 idq_uops_delivered_2 0.013 idq_uops_delivered_3 0.018 backend 0.258 resource_stalls.sb 0.007 stalls_ldm_pending 0.536
Backend stalls seem to be mostly memory.
Next steps: Understand gap between AMD and Intel.