Description - phoronix/y-cruncher

Y-Cruncher is a multi-threaded Pi benchmark.

This benchmark is also described here. It claims are record for computing the most digits of pi.

Metrics (Intel) - phoronix/y-cruncher
sh - pid 18152
	On_CPU   0.897
	On_Core  7.173
	IPC      1.178
	Retire   0.480	(48.0%)
	FrontEnd 0.138	(13.8%)
	Spec     0.122	(12.2%)
	Backend  0.260	(26.0%)
	Elapsed  65.10
	Procs    21
	Maxrss   2565K
	Minflt   661484
	Majflt   0
	Inblock  0
	Oublock  976608
	Msgsnd   0
	Msgrcv   0
	Nsignals 0
	Nvcsw    31375	(94.5%)
	Nivcsw   1820
	Utime    465.602228
	Stime    1.364009
	Start    90263.07
	Finish   90328.17

The program has ~94.5% voluntary context switches and blocks written out, so there is some I/O. Otherwise a limiter are backend stalls, resulting in an IPC slightly over 1 and a low retirement rate.

Metrics (AMD) - phoronix/y-cruncher
sh - pid 6382
	On_CPU   0.761
	On_Core  12.168
	IPC      0.842
	FrontCyc 0.008	(0.8%)
	BackCyc  0.014	(1.4%)
	Elapsed  64.46
	Procs    37
	Maxrss   2563K
	Minflt   661224
	Majflt   0
	Inblock  32
	Oublock  976608
	Msgsnd   0
	Msgrcv   0
	Nsignals 0
	Nvcsw    68925	(96.9%)
	Nivcsw   2198
	Utime    781.762266
	Stime    2.612581
	Start    159119.22
	Finish   159183.68

AMD IPC is just slightly lower.

Process Tree - phoronix/y-cruncher
Process Tree
The program runs two processes per virtual core.

    18152) sh
      18153) y-cruncher
        18154) y-cruncher
          18156) sh
            18157) 13-HSW ~ Airi
            18158) 13-HSW ~ Airi
            18159) 13-HSW ~ Airi
            18160) 13-HSW ~ Airi
            18161) 13-HSW ~ Airi
            18162) 13-HSW ~ Airi
            18163) 13-HSW ~ Airi
            18164) 13-HSW ~ Airi
            18165) 13-HSW ~ Airi
            18166) 13-HSW ~ Airi
            18167) 13-HSW ~ Airi
            18168) 13-HSW ~ Airi
            18169) 13-HSW ~ Airi
            18170) 13-HSW ~ Airi
            18171) 13-HSW ~ Airi
            18172) 13-HSW ~ Airi
        18155) sed

About this graph
Some noise as these processes are scheduled on all cores.

The overall IPC is consistent and slightly over 1.

About this graph
Backend stalls are the largest limiter.

Topdown (Intel)
retire         0.588
ms_uops                0.001
speculation    0.004
branch_misses          5.28%
machine_clears         94.72%
frontend       0.137
idq_uops_delivered_0   0.054
icache_stall               0.009
itlb_misses                0.000
idq_uops_delivered_1   0.060
idq_uops_delivered_2   0.071
idq_uops_delivered_3   0.088
dsb_ops                    55.17%
backend        0.271     0.033
stalls_ldm_pending     0.214

Overall retirement rate is higher than reported above (and this also seems more consistent with the IPC). Shows a few frontend stalls (branch resteers?) and some memory stalls.

Next steps: None