wspy run for phoronix cpu benchmarks
As one of the first steps inpriming the pump for Phoronix benchmarks, I ran the wspy program on 21 candidate CPU benchmarks. My goal was to start with a rough characterization, e.g. single-threaded vs. multi-threaded or cpu-bound vs not.
#!/bin/bash while read benchmark do /home/mev/wspy/wspy -o cpu-wspy-${benchmark}.txt phoronix-test-suite batch-r un ${benchmark} > cpu-${benchmark}.output.txt 2>&1 done < cpu-benchlist.txt
Results are listed below. A few more general comments:
- Examining the wspy profiles, I break the tests into several groups:
- graphics programs: padman, etqw-demo; multiple threads, one or two very busy others less so.
- multi-core programs: john-the-ripper, ttsiod-renderer, compress-pbip2, compress-7zip ffmpeg, openssl, c-ray, povray, smallpt, tachyon, stream; all CPUs, close to 100% user time, symmetric operations. How many are small in-cache toys and how many are bigger?
- multi-core programs: not 100% user time: x264, apache, mafft; what is taking up other times?
- single-threaded programs: encode-mp3, encode-flac, himeno, crafty, tscp
These classifications also give me some additional things to look for when looking further at performance counters.
- Benchmark scores are listed as a sanity check that the experiments were similar to my previous run. Since the primary goal is rough characterization I haven't done a lot to control the runs, but one can make a few observations of wspy test overhead and potentially likely noise: (a) the DES benchmark stands out as having a 15% better score run under instrumentation (b) overall 14 scores are better in this run vs. 11 worse in this run and range is +15% (des) to -4% (apache) with all but 2 scores (des, himeno) being within 5%. As a result, it doesn't look like wspy is particularly onerous in preturbing adding overhead to this run.
- Two of the benchmarks (padman, etqw-demo) report only a single run as part of the cpu suite, but run nine combinations of different graphics resolutions when run by themselves. Still comparable in scores, but also more processes in the wspy output.
Test | Original Score | New Score | Better | Test Output | wspy Output | Behavior of processes/CPUs | notes |
---|---|---|---|---|---|---|---|
pts/padman | 198.97 | 198.03 | higher | padman | padman | graphics program calculating frames per second. One CPU very busy while test runs approaching 100% utilization; short bursts on other CPUs particularly at the point tests start. Game that calculates # of frames per second, initial processing time? followed by running # of frames per second? | Nine tests are run; original score is only the last. |
pts/etqw-demo | 41.80 | 41.70 | higher | etqw-demo | etqw-demo | Graphics program calculating frames per second. Each test starts one process per core (8 total), two are different (name="threaded-ml") so interesting if placement of these threads matters relative to others. CPUs generally run in bursts of activity >50% separated by less loaded times. | Nine tests are run; original score is only the last. |
pts/john-the-ripper | 5937 blowfish 20593667 des 203603 MD5 | 6078 blowfish 23699000 des 208588 MDS | higher | john-the-ripper | john-the-ripper | All CPUs close to 100% user time. Short tests ~20 seconds per test case. | |
pts/ttsiod-renderer | 192.87 | 191.88 | higher | ttsiod-renderer | ttsiod-renderer | All CPUs close to 100% user time. Short tests ~30 seconds per test case. | |
pts/compress-pbzip2 | 9.67 | 9.74 | lower | compress-pbzip2 | compress-pbzip2 | All CPUs close to 100% user time. Short tests ~10 seconds per test case. | |
pts/compress-7zip | 20486 | 20389 | higher | compress-7zip | compress-7zip | Repeated short tests on all CPUs, close to 100% user time. Total of ~40 seconds. | |
pts/encode-mp3 | 32.77 | 32.73 | lower | encode-mp3 | encode-mp3 | Single threaded, close to 100% CPU. | |
pts/encode-flac | 11.70 | 11.16 | lower | encode-flac | encode-flac | Single threaded, close to 100%. Very short runtimes. | |
pts/x264 | 36.23 | 35.98 | higher | x264 | x264 | All CPUs, busy but not always 100%. i/o memory? | |
pts/ffmpeg | 7.19 | 7.36 | lower | ffmpeg | ffmpeg | All CPUs, busy but not 100%. Short runs of ~9 seconds each. | |
pts/openssl | 636.17 | 636.37 | higher | openssl | openssl | All CPUs, close to 100% user time. Tests ~20 seconds. | |
pts/himeno | 1916.86 | 2045.81 | higher | himeno | himeno | Single threaded. Close to 100% CPU. | |
pts/apache | 27272.28 | 26249.09 | higher | apache | apache | All CPUs, busy but not 100%. Proportionally high system time. | |
pts/c-ray | 26.36 | 26.36 | lower | c-ray | c-ray | All CPUs, many simultaneous threads, close to 100% | |
pts/povray | 131.24 | 131.17 | lower | povray | povray | All CPUs, close to 100% | |
pts/smallpt | 80 | 78 | lower | smallpt | smallpt | All CPUs, close to 100%. | |
pts/tachyon | 13.83 | 13.72 | lower | tachyon | tachyon | All CPUs, close to 100%. ~15 seconds runtime. | |
pts/crafty | 7320247 | 7314067 | higher | crafty | crafty | Single threaded, close to 100%. | |
pts/tscp | 1306401 | 1307021 | higher | tscp | tscp | Single threaded, close to 100%. Very short runtimes. | |
pts/mafft | 4.59 | 4.66 | lower | mafft | mafft | All CPUs, many small process creations, close to 100%. Very short runtime total ~5 seconds per run. | |
pts/stream | copy 19452.82 scale 14243.04 triad 16108.24 add 16135.70 | copy 19441.60 scale 14247.74 triad 16116.24 add 16154.56 | higher | stream | stream | All CPUs, close to 100% user time. |
Comments
wspy run for phoronix cpu benchmarks — No Comments
HTML tags allowed in your comment: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>