Description - phoronix/java-scimark2

This test runs the Java version of SciMark 2.0, which is a benchmark for scientific and numerical computing developed by programmers at the National Institute of Standards and Technology. This benchmark is made up of Fast Foruier Transform, Jacobi Successive Over-relaxation, Monte Carlo, Sparse Matrix Multiply, and dense LU matrix factorization benchmarks.

scimark2 and java-scimark2 were developed around 1999 by NIST: NIST java SciMark 2.0 page. They are single-threaded and designed in a time when caches were smaller. All tests below were run pinned to core 1.

java-scimark2 is single-threaded and designed in a time when caches were smaller, though is run with the -large option for larger data sets. All tests below were run pinned to core 1. The test runs all five workloads below in a single process and reports both individual scores and a composite. As you’ll see in the graphs, these workloads have somewhat different characteristics and can be spotted separately on the graphs and overall metrics like IPC will also be a composite. The overall runtime is shorter than the C version and also appears from data that a smaller memory model is used.

  • FFT
  • SOR
  • MonteCarlo
  • Sparse matmul
  • LU
  • described in more detail here.

    Metrics (Intel) - phoronix/java-scimark2
    sh - pid 5679
    	On_CPU   0.125
    	On_Core  0.998
    	IPC      2.770
    	Retire   0.499	(49.9%)
    	FrontEnd 0.040	(4.0%)
    	Spec     0.160	(16.0%)
    	Backend  0.301	(30.1%)
    	Elapsed  24.97
    	Procs    12
    	Minflt   3518
    	Majflt   0
    	Utime    24.92   	(100.0%)
    	Stime    0.01    	(0.0%)
    	Start    5675.37
    	Finish   5700.34

    Metrics above were adjusted to account for the process being single-threaded, i.e. twice as many slots available and my tool only assumed these across two processes. The front-end time is very small, so fits in iCache and TLB. Speculation somewhat high and backend also higher, these will break out from some of the workloads below.

    Metrics (AMD) - phoronix/java-scimark2
    sh - pid 28778
    	On_CPU   0.062
    	On_Core  0.999
    	IPC      3.666
    	FrontCyc 0.046	(4.6%)
    	BackCyc  0.530	(53.0%)
    	Elapsed  27.99
    	Procs    12
    	Minflt   3422
    	Majflt   0
    	Utime    27.94   	(100.0%)
    	Stime    0.01    	(0.0%)
    	Start    312069.64
    	Finish   312097.63

    The IPC measurement seems high, perhaps sanity check here.

    Process Tree - phoronix/java-scimark2
    Process Tree
       5679) sh elapsed=24.97 start=2.56 finish=27.53
          5680) java-scimark2 elapsed=24.97 start=2.56 finish=27.53
            5681) java elapsed=24.97 start=2.56 finish=27.53
            5682) java elapsed=24.96 start=2.57 finish=27.53
            5683) java elapsed=24.96 start=2.57 finish=27.53
            5684) java elapsed=24.95 start=2.58 finish=27.53
            5685) java elapsed=24.95 start=2.58 finish=27.53
            5686) java elapsed=24.94 start=2.59 finish=27.53
            5687) java elapsed=24.94 start=2.59 finish=27.53
            5688) java elapsed=24.94 start=2.59 finish=27.53
            5689) java elapsed=24.94 start=2.59 finish=27.53
            5690) java elapsed=24.94 start=2.59 finish=27.53

    Multiple java threads are spawned, though since the workload itself was single-threaded, I kept these pinned to a single core.

    Processor core 1 is kept scheduled almost 100% of the time.

    IPCs for the five workloads can be seen with FFT (3.5), SOR (0.8), MonteCarlo (~3), Sparse Matmul (~4) and LU (~2.5) creating the composite IPC of 2.77. These are all higher than the C version of scimark2 which uses the -large model, so expect the models are not the same.

    The topdown metrics also show variations between the five workloads where the “retiring” are a little lower than I would expect for the IPC:

    • FFT – retires over 80% of slots and is much less backend bound than the C version
    • SOR – has a backend issue
    • MonteCarlo – is retiring 65%
    • SparseMatmul – is highest retirement rate overall
    • LU – has higher speculation misses

    Overall a next level of analysis could tease these apart to characterize them separately.

    Next steps: Separate out workloads. Investigate IPC that seem higher as compared to the retirement rate. Look at speculative misses.