I kicked off a quick run through >100 Phoronix tests to get a quick profile and overall assessment, results from table below. A few items noted:
- Some of the tests didn’t run, most likely because they didn’t completely install or were missing dependencies not found until runtime. Over time, can clean these up.
- osbench, created a situation where the process tree in wspy had a loop and hence hung. This needs further debugging to make a more robust tool.
- The hint benchmark hung, in the user code of INT program, needs diagnosis.
- First level diagnosis of how many processes and overall CPU time gave good ideas of single vs. multi-threaded tests and hence how to bind them further. In addition, some of the multi-threaded tests were very symmetric and others ran more haphazardly on multiple cores.
- Some of the tests have extremely short runtimes.
- I used the “batch-run” to avoid being prompted for a test name, unlike the default-run. However, this means all possible combinations were asked and in few cases (fio, pgbench) the combinatorics can stretch for days
Otherwise a rough cut filter, but useful to get a first screen of tests as well as testing of wspy tool. The table below is also linked in the “workloads” menu item and can be updated as I learn more about the tests.
Test Phoronix Summary Diagnosis Single vs. Multi-Threaded Runtime # processes Notes Root aobench AOBench is a lightweight ambient occlusion renderer, written in C. The test profile is using a size of 2048 x 2048. single 42s x 7 2 ./aobench apache This is a test of ab, which is the Apache benchmark program. This test profile measures how many requests per second a given system can sustain when carrying out 1,000,000 requests with 100 requests being carried out concurrently. multi 40s x 3 118 Heavier use of system time than user time. httpd asmfish This is a test of asmFish, an advanced chess benchmark written in Assembly. multi 240s x 3 11 ./asmfish blake2 This is a benchmark of BLAKE2 using the blake2s binary. BLAKE2 is a high-performance crypto alternative to MD5 and SHA-2/3. single 2s x 3 2 ./blake2 blender Blender is an open-source 3D creation software project. This test is of Blender's Cycles benchmark with various sample files. GPU computing via OpenCL or CUDA is supported. multiple cores, but not symmetric and perhaps not all 6 hours 27 /usr/lib/php/sessionclean blogbench BlogBench is designed to replicate the load of a real-world busy file server by stressing the file-system with multiple threads of random reads, writes, and rewrites. The behavior is mimicked of that of a blog by creating blogs with content and pictures, modifying blog posts, adding comments to these blogs, and then reading the content of the blogs. All of these blogs generated are created locally with fake content and pictures. multi 300s x 3 114 90% time is system, 10% user time. ./blogbench bork Bork is a small, cross-platform file encryption utility. It is written in Java and designed to be included along with the files it encrypts for long-term storage. This test measures the amount of time it takes to encrypt a sample file. single 10s x 6 20 runs on more than one core, but overall utilization dominated by single cores /usr/bin/java botan Botan is a cross-platform open-source C++ crypto library that supports most all publicly known cryptographic algorithms. single 25s x 3 2 ./botan build-apache This test times how long it takes to build the Apache HTTP Server. multi 30s x 3 12052 large #s of very small processes /bin/bash build-boost-interprocess This test times how long it takes to build Boost Interprocess examples. Error "-std=c
++11 not found". Potentially need to pass in $CXX environment variable? Needs investigation
build-eigen This test times how long it takes to build all Eigen examples. Build error, potentially missing $CXX variable. Needs investigation build-firefox This test times how long it takes to build the Firefox Web Browser. Exit non-zero exit status. Firefox directory not present. Needs investigation build-gcc This test times how long it takes to build the GNU Compiler Collection (GCC). Diagnosis multi 22m x 3 1840 /bin/bash build-imagemagick This test times how long it takes to build ImageMagick. multi 70s x 3 9479 /bin/bash build-linux-kernel This test times how long it takes to build the Linux kernel. Diagnosis multi 180s x 3 2585 /bin/bash build-llvm This test times how long it takes to build the LLVM compiler stack. multi 15m x 3 1491 /bin/bash build-mplayer This test times how long it takes to build the MPlayer media player program. Error during build needs investigation. build-php This test times how long it takes to build PHP 5 with the Zend engine. multi 90s x 3 9106 /bin/bash build-webkitfltk This test times how long it takes to build the WebKitFLTK web library. Error during build needs investigation. bullet This is a benchmark of the Bullet Physics Engine. single <5s x 7 2 ./bullet byte This is a test of BYTE. single 17m various up to 98 ./byte c-ray This is a test of C-Ray, a simple raytracer designed to test the floating-point CPU performance. This test is multi-threaded (16 threads per core), will shoot 8 rays per pixel for anti-aliasing, and will generate a 1600 x 1200 image. Analysis multi 26s x 3 130 ./c-ray cachebench This is a performance test of CacheBench, which is part of LLCbench. CacheBench is designed to test the memory and cache bandwidth performance single 125s x 3 3 ./cachebench clomp CLOMP is the C version of the Livermore OpenMP benchmark developed to measure OpenMP overheads and other performance impacts due to threading in order to influence future system designs. This particular test profile configuration is currently set to look at the OpenMP static schedule speed-up across all available CPU cores using the recommended test configuration. multi 6s x 5 0 ./clomp compress-7zip This is a test of 7-Zip using p7zip with its integrated benchmark feature or upstream 7-Zip for the Windows x64 build. Diagnosis multi 40s x 3 82 ./compress-7zip compress-gzip This test measures the time needed to archive/compress two copies of the Linux 4.13 kernel source tree using Gzip compression. single 40s x 3 5 runs on selective cores ./compress-gzip compress-lzma This test measures the time needed to compress a file using LZMA compression. single 280s x 3 2 ./compress-lzma compress-pbzip2 This test measures the time needed to compress a file (a .tar package of the Linux kernel source code) using BZIP2 compression.
multi 10s x 6 13 ./compress-pbzip2 cpuminer-opt Cpuminer benchmark. multi 30s x 3 12 ./cpuminer crafty This is a performance test of Crafty, an advanced open-source chess engine. single 30s x 3 3 ./crafty-benchmark cyclictest Cyclictest is a high-resolution test program for measuring the Linux kernel latencies. single 50s x 3 3 not cpu-bound ./cyclictest cython-bench Stress benchmark tests to measure time consumed by cython code. single 30s x 3 2 ./cython-bench dcraw This test times how long it takes to convert several high-resolution RAW NEF image files to PPM image format using dcraw. single 50s x 3 2 ./dcraw dolfyn Dolfyn is a Computational Fluid Dynamics (CFD) code of modern numerical simulation techniques. The Dolfyn test profile measures the execution time of the bundled computational fluid dynamics demos that are bundled with Dolfyn. No result, needs further investigation ebizzy This is a test of ebizzy, a program to generate workloads resembling web server workloads. multi 20s x 6 18 ./ebizzy encode-flac This test times how long it takes to encode a sample WAV file to FLAC format five times. Diagnosis single 12s x 5 6 ./encode-flac encode-mp3 LAME is an MP3 encoder licensed under the LGPL. This test measures the time required to encode a WAV file to MP3 format. Diagnosis single 35s x 3 2 ./lame encode-ogg This test times how long it takes to encode a sample WAV file to Ogg format using vorbis-tools, libvorbis, and libogg. single 7s x 3 2 ./encode-ogg encode-opus Opus is an open audio codec. Opus is a lossy audio compression format designed primarily for interactive real-time applications over the Internet. This test uses Opus-Tools and measures the time required to encode a WAV file to Opus and then to decode the generated Opus file. single 9s x 5 4 ./encode-opus encode-wavpack This test times how long it takes to encode a sample WAV file to WavPack format. single 8s x 5 2 ./encode-wavpack espeak This test times how long it takes the eSpeak speech synthesizer to read Project Gutenberg's The Outline of Science and output to a WAV file. single 40s x 6 3 ./espeak etqw-demo This test calculates the average frame-rate within the demo for the game Enemy Territory: Quake Wars demo game. multi (heavy on one CPU) 300s x 9 11 Initial burst of computation; longer run across threads. Heavy on one CPU ./etqw fahbench FAHBench is a Folding@Home benchmark on the GPU. No result, needs further investigation ffmpeg This test uses FFmpeg for testing the system's audio/video encoding performance. multi 10s x 4 33 ./ffmpeg ffte FFTE is a package by Daisuke Takahashi to compute Discrete Fourier Transforms of 1-, 2- and 3- dimensional sequences of length (2^p)*(3^q)*(5^r). single* 5s x 6 10 Processes started on all CPUs, but all but one are idle. ./ffte fftw FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions. single 26m, varying time depending on size 2 32 possible options. Interesting to see as performance drops dramatically at particular size. Cache effects? /bin/sh fhourstones This integer benchmark solves positions in the game of Connect-4, as played on a vertical 7x6 board. By default, it uses a 64Mb transposition table with the twobig replacement strategy. Positions are represented as 64-bit bitboards, and the hash function is computed using a single 64-bit modulo operation, giving 64-bit machines a slight edge. The alpha-beta searcher sorts moves dynamically based on the history heuristic. single 15s x 3 2 ./fhourstones-benchmark fio Fio is an advanced disk benchmark that depends upon the kernel's AIO access library. single, several threads Large time due to 2048 combinations 12 2048 combinations, batch run tries them all. Mostly system time. ./fio-run gcrypt This is a benchmark of libgcrypt's integrated benchmark with the CAMELLIA256-ECB cipher and 100 repetitions. 3 Compilation errors during installation. git This test measures the time needed to carry out some sample Git operations on an example, static repository that happens to be a copy of the GNOME GTK tool-kit repository. multi 6s x 3 58 ./git glibc-bench The GNU C Library project provides the core libraries for the GNU system and GNU/Linux systems, as well as many other systems that use Linux as the kernel. These libraries provide critical APIs including ISO C11, POSIX.1-2008, BSD, OS-specific APIs and more. single 3s x 15 2 warnings that test ended quickly. ./glibc-bench gnupg This test times how long it takes to encrypt a file using GnuPG. single 12s x 3 2 ./gnupg go-benchmark Benchmark for monitoring real time performance of the Go implementation for HTTP, JSON and garbage testing per iteration. multi 12s x 3 - three workloads 66 Three workloads with varying profiles. ./go-benchmark gpu-residency This test measures the GPU residency of a given state for a 60 second interval. Test quit with non-zero status, needs investigation graphics-magick This is a test of GraphicsMagick with its OpenMP implementation that performs various imaging tests to stress the system's CPU. multi 60s x 3 9 Workloads uneven across CPUs ./graphics-magick hackbench This is a benchmark of Hackbench, a test of the Linux kernel scheduler. multi 30m up to 1008 12 options, combinations of threads and processes; 90% system time. himeno The Himeno benchmark is a linear solver of pressure Poisson using a point-Jacobi method. single 60s x 3 2 ./himrno hint This test runs the U.S. Department of Energy's Ames Laboratory Hierarchical INTegration (HINT) benchmark. single 25m 2 Third test hung; problem in tools or test? Needs investigation. ./hint hmmer This test searches through the Pfam database of profile hidden markov models. The search finds the domain structure of Drosophila Sevenless protein. multi 10s x 3 11 ./hmmer hpcg HPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCC multi 55s x 3 12 All ~30% busy; investigate idle times. ./hpcg interbench Interbench is an interactivity benchmark written by Con Kolivas. Interbench is primarily intended to test out the system kernel and its CPU scheduler while running a simulated test with a given simulated load in the background. Each benchmark / load is run for 60 seconds per test. multi 4h 4 81 combinations; many with no result ./interbench java-jmh This test runs the stock benchmark of the Java JMH benchmark via Maven. multi 7m 355 Almost 100% CPU ./java-jmh java-scimark2 This test runs the Java version of SciMark 2.0, which is a benchmark for scientific and numerical computing developed by programmers at the National Institute of Standards and Technology. This benchmark is made up of Fast Foruier Transform, Jacobi Successive Over-relaxation, Monte Carlo, Sparse Matrix Multiply, and dense LU matrix factorization benchmarks. single 2m 21 6 tests ./java-scimark2 john-the-ripper This is a benchmark of John The Ripper, which is a password cracker. multi (20s+40s+20s ) x 3 9 ./john-the-ripper lammps LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. Test quit with non-zero exit status, needs investigation llvm-test-suite This test times how long it takes to run the LLVM Test Suite. single 220s x 3 1561 ./llvm-test-suite luajit This test profile is a collection of Lua scripts/benchmarks run against a locally-built copy of LuaJIT upstream. single 100s 2 Six tests ./luajit luxmark LuxMark is a multi-platform OpenGL benchmark using LuxRender. LuxMark supports targeting different OpenCL devices and has multiple scenes available for rendering. LuxMark is a fully open-source OpenCL program with real-world rendering examples. Test quit with non-zero exit status, needs investigation lzbench lzbench is an in-memory benchmark of various compressors. The file used for compression is a Linux kernel source tree tarball. single 6m 2 ./lzbench mafft This test performs an alignment of 100 pyruvate decarboxylase sequences. multi 7s x 6 143 Many short little processes. ./mafft mencoder This test uses mplayer's mencoder utility and the libavcodec family for testing the system's audio/video encoding performance. single 20s x 3 2 ./mencoder minion Minion is an open-source constraint solver that is designed to be very scalable. This test profile uses Minion's integrated benchmarking problems to solve. single 15m 2 Three tests ./minion mrbayes This test performs a bayesian analysis of a set of primate genome sequences in order to estimate their phylogeny. Test quit with non-zero exit status, needs investigation multichase This is a benchmark of Google's multichase pointer chaser program. single & multi 100s 3 Five tests ./multichase n-queens This is a test of the OpenMP version of a test that solves the N-queens problem. The board problem size is 18 multi 35s x 3 9 Almost 100% busy ./n-queens nero2d This is a test of Nero2D, which is a two-dimensional TM/TE solver for Open FMM. Open FMM is a free collection of electromagnetic software for scattering at very large objects. This test profile times how long it takes to solve one of the included 2D examples. Test quit with non-zero exit status, needs investigation network-loopback This test measures the loopback network adapter performance using a micro-benchmark to measure the TCP performance. Test quit with non-zero exit status, needs investigation nginx This is a test of ab, which is the Apache Benchmark program running against nginx. This test profile measures how many requests per second a given system can sustain when carrying out 2,000,000 requests with 500 requests being carried out concurrently. single 60s x 3 2 Heavier on system time than user time. ./nginx noise-level This test measures background activity. single 60s 14 Runs sleep ./noise-level numpy This is a test to obtain the general Numpy performance. single 45m 38 ./numpy openssl OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test measures the RSA 4096-bit performance of OpenSSL. Analysis multi 20s x 3 9 ./openssl opm-git This is a test of a DUNE (Distributed and Unified Numerics Environment) module called OPM Benchmarks from the Open Porous Media project. Open Porous Media is a set of open-source tools concerning simulation of flow and transport of fluids in porous media. This test profile builds OPM and its dependencies from upstream Git. Test quit with non-zero exit status, needs investigation osbench OSBench is a collection of micro-benchmarks for measuring operating system primitives like time to create threads/processes, launching programs, creating files, and memory allocation. Diagnosis wspy hangs because incorrect tree has been built. Further debugging shows "fork()" is failing with EAGAIN errno. This also causes the test to fail when not run under wspy; two fixes required - (1) look at conditions described in fork(2) system call to avoid the failure and (2) fix wspy to properly handle fork calls that might fail. padman World of Padman is an open-source game using the ioquake3 engine. What makes this game different from other first-person shooters is that it's a cartoon-style action game. multi (heavy on one CPU) 120s x 9 7 Game ./padman parboil The Parboil Benchmarks from the IMPACT Research Group at University of Illinois are a set of throughput computing applications for looking at computing architecture and compilers. Parboil test-cases support OpenMP, OpenCL, and CUDA multi-processing environments. However, at this time the test profile is just making use of the OpenMP and OpenCL test workloads. Diagnosis multi 25m 13 Ten tests, six didn't run correctly. Missing OpenCL ./parboil perl-benchmark Perl benchmark suite that can be used to compare the relative speed of different versions of perl. multi 80s, 67s, 70s, 28s, 66s, 66s, 70s 22, 21264, 21407, 8639, 21492, 21521, 21834 More than 100,000 processes created; system time exceeds user time. ./perl-benchmark pgbench This is a simple benchmark of PostgreSQL using pgbench. Test must be run as non-root; extremely long runtime. phpbench PHPBench is a benchmark suite for PHP. It performs a large number of simple tests in order to bench various aspects of the PHP interpreter. PHPBench can be used to compare hardware, operating systems, PHP versions, PHP accelerators and caches, compiler options, etc. The number of iterations used is 1,000,000.
Diagnosis single 20s x 3 2 ./phpbench polybench-c PolyBench-C is a C-language polyhedral benchmark suite made at the Ohio State University. single 30s 2 Three workloads, last longer than first two ./polybench postmark This is a test of NetApp's PostMark benchmark designed to simulate small-file testing similar to the tasks endured by web and mail servers. This test profile will set PostMark to perform 25,000 transactions with 500 files simultaneously with the file sizes ranging between 5 and 512 kilobytes. single 40s x 3 2 Mostly system time. ./postmark povray This is a test of POV-Ray, the Persistence of Vision Raytracer. POV-Ray is used to create 3D graphics using ray-tracing. multi 135s x 3 29 ./povray primesieve Primesieve generates prime numbers using a highly optimized sieve of Eratosthenes implementation. Primesieve benchmarks the CPU's L1/L2 cache performance. multi 85s x 3 9 Almost 100% user ./primesieve psstop Shows the total number of processes running and the memory they consume. single <1s 5 Extremely short duration ./psstop pybench This test profile reports the total time of the different average timed test results from PyBench. PyBench reports average test times for different functions such as BuiltinFunctionCalls and NestedForLoops, with this total result providing a rough estimate as to Python's average performance on a given system. This test profile runs PyBench each time for 20 rounds. Diagnosis single 30s x 3 5 ./pybench ramspeed This benchmark tests the system memory (RAM) performance. double 120s x 10 3 Naming suggests varations of double-threaded stream ./ramspeed rbenchmark This test is a quick-running survey of general R performance single 0.5s x 3 11 ./rbenchmark redis Redis is an open-source data structure server. single* (multi-core but most computation on single core) 11s x 15 4 short bursts of activity, mostly idle ./redis rodinia Rodinia is a suite focused upon accelerating compute-intensive applications with accelerators. CUDA, OpenMP, and OpenCL parallel models are supported by the included applications. This profile utilizes the OpenCL and OpenMP test binaries at the moment. multiple 18m 9 Only three of nine benchmarks ran out of the box ./rodinia sample-program A simple C++ program that calculates Pi to 8,765,4321 digits using the Leibniz formula. This test can be used for showcasing how to write a basic test profile. single 3s x 5 2 ./sample-program schbench This is a benchmark of Schbench, a Linux kernel scheduler benchmark developed by Facebook. multiple 90m 13 42 different subtests ./schbench scimark2 This test runs the ANSI C version of SciMark 2.0, which is a benchmark for scientific and numerical computing developed by programmers at the National Institute of Standards and Technology. This test is made up of Fast Foruier Transform, Jacobi Successive Over-relaxation, Monte Carlo, Sparse Matrix Multiply, and dense LU matrix factorization benchmarks.
single 25s x 3 2 ./scimark2 serial-loopback This test will do a simple write/read test on all detected serial interfaces. For this test to work, the relevant serial ports should have a serial loopback plug or have otherwise wired the appropriate pins. Test quit with non-zero exit status, needs investigation smallpt Smallpt is a C++ global illumination renderer written in less than 100 lines of code. Global illumination is done via unbiased Monte Carlo path tracing and there is multi-threading support via the OpenMP library. multi 80s x 3 9 ./smallpt stockfish This is a test of Stockfish, an advanced C++11 chess benchmark that can scale up to 128 CPU cores. Diagnosis single* (multi-core but most computation on single core) 4s x 3 4 ./stockfish stream This benchmark tests the system memory (RAM) performance. Diagnosis multi 50s x 5 9 ./stream sudokut This is a test of Sudokut, which is a Sudoku puzzle solver written in Tcl. This test measures how long it takes to solve 100 Sudoku puzzles. single 12s x 3 101 Runs same process 100 times ./sudokut sunflow This test runs benchmarks of the Sunflow Rendering System. The Sunflow Rendering System is an open-source render engine for photo-realistic image synthesis with a ray-tracing core. multi 30s x 3 182 ./sunflow-benchmark system-decompress-bzip2 This test measures the time to decompress a Linux kernel tarball using BZIP2. single 10s x 3 2 ./system-decompress-bzip2 system-decompress-xz This test measures the time to decompress a Linux kernel tarball using XZ. single 4s x 3 2 ./system-decompress-xz system-libxml2 This test measures the time to parse a random XML file with libxml2 via xmllint using the streaming API. Test quit with non-zero exit status, needs investigation systemd-boot-kernel This test uses systemd-analyze to report the kernel boot time. Test quit with non-zero exit status, needs investigation systemd-boot-total This test uses systemd-analyze to report the entire boot time. Test quit with non-zero exit status, needs investigation systemd-boot-userspace This test uses systemd-analyze to report the userspace boot time. Test quit with non-zero exit status, needs investigation systester Time how long it takes to calculate pi to varying lengths. Test quit with non-zero exit status, needs investigation t-test1 This is a test of t-test1 for basic memory allocator benchmarks. Note this test profile is currently very basic and the overall time does include the warmup time of the custom t-test1 compilation. Improvements welcome. single 30s 4008 Two workloads Many processes, but seems to mostly limited sequentially. tachyon This is a test of the threaded Tachyon, a parallel ray-tracing system. multi 15s x 3 9 ./tachyon-benchmark tensorflow This is a benchmark of the Tensorflow deep learning framework using the CIFAR10 data set. multi 90s x 3 50 Python test ./tensorflow tjbench tjbench is a JPEG decompression/compression benchmark part of libjpeg-turbo. single 8s x 3 15 ./tjbench tscp This is a performance test of TSCP, Tom Kerrigan's Simple Chess Program, which has a built-in performance benchmark. single 2s x 5 2 ./tscp ttsiod-renderer A portable GPL 3D software renderer that supports OpenMP and Intel Threading Building Blocks with many different rendering modes. This version does not use OpenGL but is entirely CPU/software based. multi 30s x 3 9 ./ttsiod-renderer vpxenc This is a standard video encoding performance test of Google's libvpx library and the vpxenc command for the VP8/WebM format. four 70s x 6 5 ./vpxenc x264 This is a simple test of the x264 encoder run on the CPU (OpenCL support disabled) with a sample video file. Diagnosis multi 20s x 5 11 ./x264 xsbench XSBench is a mini-app representing a key computational kernel of the Monte Carlo neutronics application OpenMC. multi 15s x 3 9 ./xsbench y-cruncher Y-Cruncher is a multi-threaded Pi benchmark. multi 60s x 3 20 ./y-cruncher