java-scimark2 – Performance analysis, tools and experiments

Description - phoronix/java-scimark2

This test runs the Java version of SciMark 2.0, which is a benchmark for scientific and numerical computing developed by programmers at the National Institute of Standards and Technology. This benchmark is made up of Fast Foruier Transform, Jacobi Successive Over-relaxation, Monte Carlo, Sparse Matrix Multiply, and dense LU matrix factorization benchmarks.

scimark2 and java-scimark2 were developed around 1999 by NIST: NIST java SciMark 2.0 page. They are single-threaded and designed in a time when caches were smaller. All tests below were run pinned to core 1.

java-scimark2 is single-threaded and designed in a time when caches were smaller, though is run with the -large option for larger data sets. All tests below were run pinned to core 1. The test runs all five workloads below in a single process and reports both individual scores and a composite. As you’ll see in the graphs, these workloads have somewhat different characteristics and can be spotted separately on the graphs and overall metrics like IPC will also be a composite. The overall runtime is shorter than the C version and also appears from data that a smaller memory model is used.

FFT
SOR
MonteCarlo
Sparse matmul
LU

described in more detail here.

Metrics (Intel) - phoronix/java-scimark2

sh - pid 5679
	On_CPU   0.125
	On_Core  0.998
	IPC      2.770
	Retire   0.499	(49.9%)
	FrontEnd 0.040	(4.0%)
	Spec     0.160	(16.0%)
	Backend  0.301	(30.1%)
	Elapsed  24.97
	Procs    12
	Minflt   3518
	Majflt   0
	Utime    24.92   	(100.0%)
	Stime    0.01    	(0.0%)
	Start    5675.37
	Finish   5700.34

Metrics above were adjusted to account for the process being single-threaded, i.e. twice as many slots available and my tool only assumed these across two processes. The front-end time is very small, so fits in iCache and TLB. Speculation somewhat high and backend also higher, these will break out from some of the workloads below.

Metrics (AMD) - phoronix/java-scimark2

sh - pid 28778
	On_CPU   0.062
	On_Core  0.999
	IPC      3.666
	FrontCyc 0.046	(4.6%)
	BackCyc  0.530	(53.0%)
	Elapsed  27.99
	Procs    12
	Minflt   3422
	Majflt   0
	Utime    27.94   	(100.0%)
	Stime    0.01    	(0.0%)
	Start    312069.64
	Finish   312097.63

The IPC measurement seems high, perhaps sanity check here.

Process Tree - phoronix/java-scimark2

Process Tree

   5679) sh elapsed=24.97 start=2.56 finish=27.53
      5680) java-scimark2 elapsed=24.97 start=2.56 finish=27.53
        5681) java elapsed=24.97 start=2.56 finish=27.53
        5682) java elapsed=24.96 start=2.57 finish=27.53
        5683) java elapsed=24.96 start=2.57 finish=27.53
        5684) java elapsed=24.95 start=2.58 finish=27.53
        5685) java elapsed=24.95 start=2.58 finish=27.53
        5686) java elapsed=24.94 start=2.59 finish=27.53
        5687) java elapsed=24.94 start=2.59 finish=27.53
        5688) java elapsed=24.94 start=2.59 finish=27.53
        5689) java elapsed=24.94 start=2.59 finish=27.53
        5690) java elapsed=24.94 start=2.59 finish=27.53

Multiple java threads are spawned, though since the workload itself was single-threaded, I kept these pinned to a single core.

Processor core 1 is kept scheduled almost 100% of the time.

IPCs for the five workloads can be seen with FFT (3.5), SOR (0.8), MonteCarlo (~3), Sparse Matmul (~4) and LU (~2.5) creating the composite IPC of 2.77. These are all higher than the C version of scimark2 which uses the -large model, so expect the models are not the same.

The topdown metrics also show variations between the five workloads where the “retiring” are a little lower than I would expect for the IPC:

FFT – retires over 80% of slots and is much less backend bound than the C version
SOR – has a backend issue
MonteCarlo – is retiring 65%
SparseMatmul – is highest retirement rate overall
LU – has higher speculation misses

Overall a next level of analysis could tease these apart to characterize them separately.

Next steps: Separate out workloads. Investigate IPC that seem higher as compared to the retirement rate. Look at speculative misses.