Description - phoronix/scikit-learn

Scikit-learn is a Python module for machine learning

The benchmark runs a python program with the following sample output:

python bench_random_projections.py 
Dataset statics
===========================
n_samples 	= 500
n_features 	= 10000
n_components 	= 298 (auto)
n_elements 	= 5000000
n_nonzeros 	= 10 per feature
ratio_nonzeros 	= 0.001

Benchmarks
===========================
Generate dataset benchmarks... done
Perform benchmarks for GaussianRandomProjection...
	iter 0...done
	iter 1...done
	iter 2...done
	iter 3...done
	iter 4...done
Perform benchmarks for SparseRandomProjection...
	iter 0...done
	iter 1...done
	iter 2...done
	iter 3...done
	iter 4...done

Script arguments
===========================
Arguments        	 |    Value     
-------------------------|--------------
random_seed      	 |      13      
density          	 | 0.333333333333 
eps              	 |     0.5      
ratio_nonzeros   	 |    0.001     
n_components     	 |     auto     
n_samples        	 |     500      
dense            	 |    False     
n_times          	 |      5       
selected_transformers 	 | GaussianRandomProjection,SparseRandomProjection 
n_features       	 |    10000     

Transformer performance:
===========================
Results are averaged over 5 repetition(s).

Transformer                    |     fit      |  transform  
-------------------------------|--------------|--------------
GaussianRandomProjection       |   1.8445s    |   3.5983s   
SparseRandomProjection         |   0.0700s    |   0.4334s   

The benchmark is single-threaded and all testing was done pinned to core 1.

Metrics (Intel) - phoronix/scikit-learn
sh - pid 24271
	On_CPU   0.125
	On_Core  1.000
	IPC      2.724
	Retire   0.685	(68.5%)
	FrontEnd 0.018	(1.8%)
	Spec     0.036	(3.6%)
	Backend  0.260	(26.0%)
	Elapsed  29.85
	Procs    3
	Maxrss   470K
	Minflt   601865
	Majflt   0
	Inblock  0
	Oublock  8
	Msgsnd   0
	Msgrcv   0
	Nsignals 0
	Nvcsw    18	(32.1%)
	Nivcsw   38
	Utime    29.356082
	Stime    0.483580
	Start    758582.50
	Finish   758612.35

The tests run On_Core at 100%. Little time is spend on frontend stalls or bad speculation. The IPC overall is high with some backend stalls.

Metrics (AMD) - phoronix/scikit-learn
sh - pid 19157
	On_CPU   0.062
	On_Core  0.999
	IPC      1.908
	FrontCyc 0.005	(0.5%)
	BackCyc  0.302	(30.2%)
	Elapsed  124.25
	Procs    3
	Maxrss   464K
	Minflt   601672
	Majflt   0
	Inblock  0
	Oublock  8
	Msgsnd   0
	Msgrcv   0
	Nsignals 0
	Nvcsw    18	(0.1%)
	Nivcsw   12219
	Utime    123.690029
	Stime    0.434131
	Start    9171.95
	Finish   9296.20

The IPC on AMD is a fair amount lower than Intel.

Process Tree - phoronix/scikit-learn
Process Tree
The process tree is simple.

   24271) sh
      24272) scikit-learn
        24273) python


On_Core is 100% with some small amounts of system time. This likely correlates with I/O.


The IPC shows five iterations on each of three runs.


The retirement rate is high, with backend stalls and to lesser extent speculation going in the gaps.

Topdown (Intel)
retire         0.678
ms_uops                0.004
speculation    0.041
branch_misses          79.01%
machine_clears         20.99%
frontend       0.023
idq_uops_delivered_0   0.007
idq_uops_delivered_1   0.009
idq_uops_delivered_2   0.013
idq_uops_delivered_3   0.018
backend        0.258
resource_stalls.sb     0.007
stalls_ldm_pending     0.536

Backend stalls seem to be mostly memory.

Next steps: Understand gap between AMD and Intel.