↓
 
  • Phoronix
  • gromacs
  • OpenFOAM

Performance analysis, tools and experiments

An eclectic collection

Performance analysis, tools and experiments
  • Home
  • Blog
  • Tools
    • wspy – workload spy
  • Workloads
    • Geekbench
    • gromacs
      • lysozyme tutorial
      • PRACE benchmark
    • OpenFOAM
    • Phoronix
      • aobench
      • apache
      • asmfish
      • blake2
      • blender
      • botan
      • build-gcc
      • build-linux-kernel
      • build-llvm
      • build-php
      • bullet
      • c-ray
      • cachebench
      • compilebench
      • compress-p7zip
      • compress-pbzip2
      • compress-zstd
      • ebizzy
      • encode-flac
      • encode-mp3
      • ffmpeg
      • ffte
      • fftw
      • fhourstones
      • fio
      • fs-mark
      • gimp
      • git
      • go-benchmark
      • graphics-magick
      • hackbench
      • himeno
      • hmmer
      • indigobench
      • java-gradle-perf
      • java-scimark2
      • luajit
      • m-queens
      • mafft
      • n-queens
      • nginx
      • numpy
      • octave-benchmark
      • openssl
      • osbench
        • osbench – create processes
        • osbench – create threads
        • osbench – memory
      • parboil
      • pgbench
      • phpbench
      • polybench-c
      • povray
      • primesieve
      • pybench
      • radiance
      • rbenchmark
      • redis
      • rodinia
      • scikit-learn
      • scimark2
      • sqlite
      • stockfish
      • stream
      • stress-ng
      • tensorflow
      • tinymembench
      • tjbench
      • tscp
      • ttsiod-renderer
      • vpxenc
      • x264
      • y-cruncher
  • Experiments
Home 1 2 3 … 6 7 >>

Post navigation

← Older posts

Ryzen 1950x vs Ryzen 3950x

Performance analysis, tools and experiments Posted on 2020-03-05 by mev2020-03-29

This blog post provides a comparison of my Ryzen 1950x (Threadripper) and Ryzen 3950 (Desktop) CPU

Table elements below come from mixture of wikichip and direct measurements I’ve made with lmbench, STREAM and Phoronix Test suite.  Given the specs, I’m surprised the benchmarks show as large of a change.  Wondering if my Ryzen 1950x is properly configured or if there is another reason.

ItemRyzen Threadripper 1950xRyzen 3950xRyzen 1700xNotes
Cores16168
Threads323216
Base/Boost Clock3.4 GHz / 4.0 GHz3.5 GHz / 4.7 GHz3.4 GHz / 3.8 GHzFaster boost, expect higher single-threaded performance.
TDP180W105W95W
Memory2667 MHz DDR4

4 memory channels

79.47 GiB/s
2400 MHz DDR4

2 memory channels

47.68 GiB/s
2400 MHz DDR4

2 memory channels

39.74 GiB/s
Faster memory, expect memory-bound latency to be slightly faster.
Fewer memory controllers and memory bandwidth.
Check STREAM performance.
CoreZenZen2Zen
Cache16 x 64 KiB L1I, 4-way

16 x 32 KiB L1D, 8-way

16 x 512 KiB L2, 8-way

4 x 8 MiB L3
16 x 32KiB L1I , 8-way

16 x 32 KiB L1D , 8-way

16 x 512 KiB L2, 8-way

4 x 16 MiB L3
8 x 64 KiB L1I, 4-way

8 x 32 KiB L1D, 8-way

2 x 8 MiB L3
Less L1i and more L3. Compare across benchmarks.
lmbenchL1 - 4 cycles

L2 - 10 cycles

L3 - 16 cycles

memory - 150 cycles
L1 - 4 cycles

L2 - 10 cycles

L3 - 17 cycles

memory - 113 cycles
L1 - 4 cycles

L2 - 11 cycles

L3 - 17 cycles

memory - 100 cycles
pts/rodinia OpenMP LavaMD47.27 seconds38.67 seconds102.676 seconds
pts/rodinia OpenMP CFD solver15.082 seconds12.004 seconds32.502 seconds
pts/namd1.41998 days/ns1.13749 days/ns2.87945 days/ns
pts/x264124.37 frames/second149.44 frames/second60.71 frames/second
pts/x26534.70 frames/second54.76 frames/second7.06 frames/second
pts/compress-7zip64379 MIPS98677 MIPS31465 MIPS
pts/stockfish37267071 nodes/second51242651 nodes/second18967192 nodes/second
pts/asmfish34856084 nodes/second51295444 nodes/second19168181 nodes/second
pts/gcc compile978.643 seconds692.402 seconds1294.157 seconds
pts/linux kernel compile50.663 seconds34.973 seconds90.695 seconds
pts/povray32.672 xeconds24.128 seconds64.075 seconds
pts/radiance Serial813.706 seconds588.438 seconds878.193 seconds
pts/radiance SMP parallel260.705 seconds186.469 seconds319.56 seconds
pts/openssl3065.2 signs/second4740.1 signs/second1368.8 signs/second
pts/ctx-clock170 clocks175 clocks150 clocks
pts/sysbench30970.0407 events/second34976.4773 events/second13378.7779 events/second
pts/blender barbershop767.63 seconds532.7 seconds1472.21 seconds
Posted in hardware | Leave a reply

haswell system freezes

Performance analysis, tools and experiments Posted on 2018-08-04 by mev2018-08-04

Not sure what is causing it, but my Haswell system has started to freeze up when running “wspy –config topdown.config”. This started happening after I updated the system and started running benchmarks after a month on the road.

Some additional diagnosis and items I’ve tried:

  1. Observed that the hangs also happened in a debugger running single-step so investigated how many single-steps before it hung. That may have been a false lead, as the single step in middle of fopen(3C) suddenly jumps ahead. However, along the way, tightened up my strtok() calls to be strtok_r() to make sure nothing strange was happening with recursive open_config_file()/parse_command_line calls. These were set up to be tail-recursive, so shouldn’t matter but cleaned up anyways.
  2. Next observed that failure seemed to happen in setting up performance counters. Created small test program that made the same performance counters, and it didn’t hang.
  3. Looked through logs in /var/log and didn’t see any smoking guns. The kernel completely locks up, even for other logged in processes – so even if the program is faulty, there is vulnerability to locking the kernel.
  4. Further analysis started looking at grub to boot to an older kernel. In the process, uncovered that I was running under Xen hypervisor. Booting into bare metal fixed the problem. I’ve added a check to my 123.sh script that calls wspy. Not sure why this hung the system, but I don’t have virtual counters enabled, so this shouldn’t work. TODO item to look at more robust error detection in wspy to avoid stumbling into this again.

Conclusion: running under bare metal fixed the issue.

Posted in analysis | Tagged system hang, virtualization | Leave a reply

Phoronix article – benchmarks of high-end Intel/AMD desktops

Performance analysis, tools and experiments Posted on 2018-08-03 by mev2018-08-03

Phoronix posted an article comparing Intel and AMD desktops on the Linux 4.18 kernel. The article says 100+ benchmarks were measured, though only half a dozen are displayed as part of the article.

I haven’t done these benchmarks on 4.18, but can look at analysis to see what is measured. This posting summarizes the phoronix conclusions as well as my observations of the benchmarks. Looks like an opportunity to look at a few new benchmarks. These are described in the table below.

BenchmarkPhoronix observationsMy observationsAnalysis
indigobenchRyzen & Threadripper faster than i7 and slower than i9 platformsOn_CPU of 97% with an IPC of 0.65. Many backend stalls and L2/L3 cache misses.Analysis
hpccThreadripper fastest, i9 next followed by Ryzen 7 2700 and Core i7.Requires specific variables during install, still need to figure these out.
compress-p7zipi9 fastest followed by threadripper. Ryzen 7 2700 similar to i7.On_CPU 88% with some I/O to limit scaling. IPC 0.83 with 27% speculation misses (branch prediction).Analysis
build-linux-kerneli9 fastest, threadripper close, i7 slowest.On_CPU 88%, mostly parallel compiles with a sequential period at end. High frontend stalls. # processes less in subsequent runs so might not do thorough "clean".Analysis
c-rayThreadripper fastest, i9 next and i7 slowest.On_CPU almost 100% with moderately high IPC of 1.44. Frontend stalls of 10% and backend of 15%.Analysis
octave-benchmarki7 fastest, Ryzen 7 next and i9 after that.Single-threaded with On_Core of 100%. Six workloads varying slightly but including backend memory stalls.Analysis
v-rayi9 fastest, threadripper/ryzen next and i7 slowest.Installation instructions point to site to register and download the benchmark to place in download cache. Even following these steps had difficulty getting it installed.

Posted in analysis | Tagged phoronix benchmark article | Leave a reply

TODO list at end of June

Performance analysis, tools and experiments Posted on 2018-06-30 by mev2018-06-25

As June is coming to a close, useful to take stock of what is completed and what still remains.

During June, the following were done:

  1. Phoronix benchmark list: I finished going through ~120 Phoronix benchmarks to at least do a “topdown” run. Approximately 60 have further “analysis” pages. Most of this was done by end of May, but finished the last at start of the month. As a result, when benchmark articles are posted, most of these I’ve already looked at and it is quicker to update the analysis.
  2. Phoronix articles: looked at articles on OS comparisons, CPU comparison and hyper-threading. Updated article based on previous analysis. Hyper-threading was most interesting, showing these smaller benchmarks all benefited unless there was obvious cause, e.g. limited thread scaling. Skipped over some OS-specific articles as I’ve looked at the benchmarks and not sure much more to add.
  3. Installed and analyzed both gromacs and OpenFoam applications. Nice to see tools created based on smaller benchmarks can work here.
  4. Added support to wspy for –memstats. This periodically samples /proc/meminfo and creates metrics. Useful for OpenFOAM
  5. Looked further at OpenSSL differences between AMD and Intel and suggested perhaps MULX instructions were related.
  6. Looked at topdown metrics for AMD, but not much traction here.

This leaves several areas for further emphasis in the future(*):

  1. Add additional “real world” codes. Top candidates are wrf and namd.
  2. Keep up with incremental phoronix articles as they are published.
  3. Look at Ryzen to create better “topdown” quick tool, e.g. add cache miss rates. It might become more of an overall tool than top down.
  4. Add ARMv8 architecture examples.
  5. Cleanups: take care of nmi timer, add “about this graph”, review test next steps
  6. Implement –netstats, the one remaining “stats” feature. However, don’t have a motivating case yet
  7. Look at tools/techniques beyond current measurements, e.g. microbenchmark measurements similar to Agner’s scripts?

I have some extended cycle touring scheduled in July, so may be slower month overall. However, also reached a general level of maturity on tools and analysis that more about rounding out edges.

Posted in tools | Tagged progress report | Leave a reply

Phoronix article – POWER9, Xeon and AMD comparison (2018-06-25)

Performance analysis, tools and experiments Posted on 2018-06-25 by mev2018-06-25

Phoronix posted an article comparing POWER vs x86 on CPU benchmarks. This post looks at some of the workloads and adds comments.
Continue reading →

Posted in analysis | Tagged phoronix benchmark article | Leave a reply

openssl – AMD vs Intel

Performance analysis, tools and experiments Posted on 2018-06-24 by mev2018-06-24

The openssl Phoronix benchmark is interesting because the IPC on Intel Haswell system (1.66) is considerably higher than the IPC on AMD Ryzen (1.12). In this post, I’ll explore to look for causes.
Continue reading →

Posted in analysis, featured | Tagged analysis technique | Leave a reply

Phoronix article – hyperthreading (2018-06-20)

Performance analysis, tools and experiments Posted on 2018-06-21 by mev2018-06-21

Phoronix posted an article comparing hyperthreading on/off on an Intel i7. This post reviews some of the workloads and add comments.
Continue reading →

Posted in analysis | Tagged hyperthreading, phoronix benchmark article | Leave a reply

wspy – added support for –memstats

Performance analysis, tools and experiments Posted on 2018-06-19 by mev2018-06-19

I have added support to wspy for the –memstats option.
Continue reading →

Posted in tools | Tagged memory, wspy | Leave a reply

OpenFOAM summary

Performance analysis, tools and experiments Posted on 2018-06-15 by mev2018-06-15

I’ve analyzed the OpenFOAM CFD application using the motorbike tutorial page and created an analysis page for the results as well as added it to the overall workload summary.

This post describes a few high-level takeaways I have from the analysis:

  1. OpenFOAM is sensitive to the number of threads used to run, with consistent improvement until the number of threads equals the number of physical cores. After that having the number of thread equal the hyperthread number is slightly slower on my Intel system and slightly faster on my AMD system. In this range the percentage of system time jumps dramatically with top routines appearing to be memory management and scheduling related.
  2. OpenFOAM appears limited backend stalls and memory with the overall composite application run showing 40% L2 miss ratio and 50% L3 miss ratio. The ratio of iTLB misses is also surprisingly high 0.011. Not sure if this is related to recent kernel spectre/meltdown or something else.
  3. The overall IPC and backend stalls vary some as the application runs, but is roughly slightly less than 1 on Intel and slightly higher on AMD. There doesn’t seem to be a particular bias between my AMD and Intel systems.
  4. Overall, I get the sense when running the application, keeping track of memory management and threads including affinity are particularly important.
Posted in analysis, featured | Tagged OpenFOAM | Leave a reply

openfoam notes

Performance analysis, tools and experiments Posted on 2018-06-14 by mev2018-06-15

OpenFOAM is free CFD software. While one can build it from source, there are also prebuilt Ubuntu repositories that I used in my testing.
Continue reading →

Posted in workloads | Tagged OpenFOAM | Leave a reply

Post navigation

← Older posts
©2023 - Performance analysis, tools and experiments - Weaver Xtreme Theme
↑