Phoronix article – hyperthreading (2018-06-20)
Phoronix posted an article comparing hyperthreading on/off on an Intel i7. This post reviews some of the workloads and add comments.
The Phoronix article is interesting as choice of hyper-threading is important choice. Apparently BSD has chosen to default to hyper-threading on/off. My uninformed hypothesis are that (a) hyper-threading should help where natural parallelism is limiting use of the processor (b) hyper-threading should hurt in cases where it exacerbates a natural bottleneck like backend memory stalls. Otherwise hyper-threading should likely be neutral neither helping nor hurting including single-threaded programs.
So what I’ll first do is look at my existing metrics/comments for the workloads Phoronix tested and then see if I can take some measurements of more interesting workloads that come out. The following table summarizes the benchmarks measured and my past analysis on some of these benchmarks. Overall he saw improvements on most all the benchmarks, disputing the original hyper-threading hypothesis for BSD.
Benchmark | Phoronix observations | My observations | Analysis |
---|---|---|---|
blender | Hyperthreading improved ~30% over non-hyperthreading | On_CPU close to 100% with ~20% stalls in backend and ~20% stalls in front end and bad speculation of ~10% due to branch misses. An overall IPC of 0.80. Blender runs more than one thread at a time, so interesting that this doesn't result in gains. | Analysis |
parboil: cutcp | Hyperthreading improves ~15% | On_CPU of 92%, IPC of 0.84. Some backend stalls as largest issue. | Analysis |
parboil: stencil | Hyperthreading improves ~10% | On_CPU of 91%, IPC of 0.28 and considerably lower on AMD. Backend stalls account for 78% with many memory misses. | Analysis |
rodinia: cfd solver | Hyperthreading improves ~8% | CFD solver: parallel, On_CPU 97%. IPC of 0.64 with a large number of memory stalls. L2 miss rate 62% and L3 miss rate 69% so cache/memory plays big part in overall performance. | Analysis |
rodinia: streamcluster | Hyperthreading improves ~8% | streamcluster: parallel, On_CPU 97%. IPC of 0.91 with a moderate number of memory stalls. L2 miss rate 62% and L3 miss rate 69% so cache/memory plays big part in overall performance. | Analysis |
hmmer | Hyperthreading improves ~10% | On_CPU of 90% with an IPC of 1.30. Some I/O with 90% voluntary context switches. Overall some backend stalls but high retirement rate. | Analysis |
ttsiod-renderer | Hyperthreading improves 30% | On_CPU 98%. IPC 0.85. Frontend/backend stalls similar at 21%/22%. L2 miss rate 67% and L3 miss rate of 10% so cache sizes likely play a factor. Frontend seems to be more inefficiencies in allocating (uop cache 44%) than icache or itlb misses. | Analysis |
vpxenc | Hyperthreading decreases ~2% | On_CPU 30%, with only 4 threads run. IPC is over 2 and 99.5% voluntary context switches. Some backend memory stalls. | Analysis |
x264 | Hyperthreading improves 15% | On_CPU 71%, many voluntary context switches and I/O read input. IPC 1.3 | Analysis |
graphics-magick: blur, sharpen, resize | Hyperthreading makes no change | Benchmark has five operations; overall On_CPU 30%. A pool of backend processing threads but not always busy. Backend stalls are largest issue. | Analysis |
compress-p7zip | Hyperthreading improves ~30% | On_CPU 88% with some I/O to limit scaling. IPC 0.83 with 27% speculation misses (branch prediction). | Analysis |
stockfish | Hyperthreading improves ~30% | On_CPU 100%, IPC of 1.0 with frontend stalls and bad speculation. | Analysis |
asmfish | Hyperthreading improves ~30% | On_CPU 100% with IPC 0.96. Frontend stalls of 27% and speculation of 19% (all branch misses) | Analysis |
build-linux-kernel | Hyperthreading improves ~20% | On_CPU 88%, mostly parallel compiles with a sequential period at end. High frontend stalls. # processes less in subsequent runs so might not do thorough "clean". | Analysis |
build-php | Hyperthreading improves ~15% | On_CPU 82% so less parallel times than build-linux-kernel. Frontend stalls high. Many small short-lived processes. | Analysis |
c-ray | Hyperthreading improves ~7% | On_CPU almost 100% with moderately high IPC of 1.44. Frontend stalls of 10% and backend of 15%. | Analysis |
povray | Hyperthreading improves 20% | On_CPU almost 100% with IPC of 1.30. More frontend stalls than backend stalls. | Analysis |
n-queens | Hyperthreading improves 20% | OpenMP with On_CPU almost 100% and high number of speculative branch misses. | Analysis |
The lack of improvement for graphics-magick and vpxenc makes sense if one considers the On_CPU percentages are 30% and 42% respectively. So these are not fully able to take advantage to be scheduled on my 8 hyper-threaded processor so likely the jump in Phoronix tests from 6 physical cores to 12 virtual cores also doesn’t help.
What is interesting is that every other benchmark shows gains, sometimes up to 30%. This suggests natural limits to parallelism that aren’t always able to take advantage of all the parallel units on a throughput type application.
Comments
Phoronix article – hyperthreading (2018-06-20) — No Comments
HTML tags allowed in your comment: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>