Phoronix test suite, quick run through many tests

I kicked off a quick run through >100 Phoronix tests to get a quick profile and overall assessment, results from table below. A few items noted:

Some of the tests didn’t run, most likely because they didn’t completely install or were missing dependencies not found until runtime. Over time, can clean these up.
osbench, created a situation where the process tree in wspy had a loop and hence hung. This needs further debugging to make a more robust tool.
The hint benchmark hung, in the user code of INT program, needs diagnosis.
First level diagnosis of how many processes and overall CPU time gave good ideas of single vs. multi-threaded tests and hence how to bind them further. In addition, some of the multi-threaded tests were very symmetric and others ran more haphazardly on multiple cores.
Some of the tests have extremely short runtimes.
I used the “batch-run” to avoid being prompted for a test name, unlike the default-run. However, this means all possible combinations were asked and in few cases (fio, pgbench) the combinatorics can stretch for days

Otherwise a rough cut filter, but useful to get a first screen of tests as well as testing of wspy tool. The table below is also linked in the “workloads” menu item and can be updated as I learn more about the tests.

Phoronix Overview

Test	Phoronix Summary	Diagnosis	Single vs. Multi-Threaded	Runtime	# processes	Notes	Root
aobench	AOBench is a lightweight ambient occlusion renderer, written in C. The test profile is using a size of 2048 x 2048.		single	42s x 7	2		./aobench
apache	This is a test of ab, which is the Apache benchmark program. This test profile measures how many requests per second a given system can sustain when carrying out 1,000,000 requests with 100 requests being carried out concurrently.		multi	40s x 3	118	Heavier use of system time than user time.	httpd
asmfish	This is a test of asmFish, an advanced chess benchmark written in Assembly.		multi	240s x 3	11		./asmfish
blake2	This is a benchmark of BLAKE2 using the blake2s binary. BLAKE2 is a high-performance crypto alternative to MD5 and SHA-2/3.		single	2s x 3	2		./blake2
blender	Blender is an open-source 3D creation software project. This test is of Blender's Cycles benchmark with various sample files. GPU computing via OpenCL or CUDA is supported.		multiple cores, but not symmetric and perhaps not all	6 hours	27		/usr/lib/php/sessionclean
blogbench	BlogBench is designed to replicate the load of a real-world busy file server by stressing the file-system with multiple threads of random reads, writes, and rewrites. The behavior is mimicked of that of a blog by creating blogs with content and pictures, modifying blog posts, adding comments to these blogs, and then reading the content of the blogs. All of these blogs generated are created locally with fake content and pictures.		multi	300s x 3	114	90% time is system, 10% user time.	./blogbench
bork	Bork is a small, cross-platform file encryption utility. It is written in Java and designed to be included along with the files it encrypts for long-term storage. This test measures the amount of time it takes to encrypt a sample file.		single	10s x 6	20	runs on more than one core, but overall utilization dominated by single cores	/usr/bin/java
botan	Botan is a cross-platform open-source C++ crypto library that supports most all publicly known cryptographic algorithms.		single	25s x 3	2		./botan
build-apache	This test times how long it takes to build the Apache HTTP Server.		multi	30s x 3	12052	large #s of very small processes	/bin/bash
build-boost-interprocess	This test times how long it takes to build Boost Interprocess examples.					Error "-std=c ++11 not found". Potentially need to pass in $CXX environment variable? Needs investigation
build-eigen	This test times how long it takes to build all Eigen examples.					Build error, potentially missing $CXX variable. Needs investigation
build-firefox	This test times how long it takes to build the Firefox Web Browser.					Exit non-zero exit status. Firefox directory not present. Needs investigation
build-gcc	This test times how long it takes to build the GNU Compiler Collection (GCC).	Diagnosis	multi	22m x 3	1840		/bin/bash
build-imagemagick	This test times how long it takes to build ImageMagick.		multi	70s x 3	9479		/bin/bash
build-linux-kernel	This test times how long it takes to build the Linux kernel.	Diagnosis	multi	180s x 3	2585		/bin/bash
build-llvm	This test times how long it takes to build the LLVM compiler stack.		multi	15m x 3	1491		/bin/bash
build-mplayer	This test times how long it takes to build the MPlayer media player program.					Error during build needs investigation.
build-php	This test times how long it takes to build PHP 5 with the Zend engine.		multi	90s x 3	9106		/bin/bash
build-webkitfltk	This test times how long it takes to build the WebKitFLTK web library.					Error during build needs investigation.
bullet	This is a benchmark of the Bullet Physics Engine.		single	<5s x 7	2		./bullet
byte	This is a test of BYTE.		single	17m	various up to 98		./byte
c-ray	This is a test of C-Ray, a simple raytracer designed to test the floating-point CPU performance. This test is multi-threaded (16 threads per core), will shoot 8 rays per pixel for anti-aliasing, and will generate a 1600 x 1200 image.	Analysis	multi	26s x 3	130		./c-ray
cachebench	This is a performance test of CacheBench, which is part of LLCbench. CacheBench is designed to test the memory and cache bandwidth performance		single	125s x 3	3		./cachebench
clomp	CLOMP is the C version of the Livermore OpenMP benchmark developed to measure OpenMP overheads and other performance impacts due to threading in order to influence future system designs. This particular test profile configuration is currently set to look at the OpenMP static schedule speed-up across all available CPU cores using the recommended test configuration.		multi	6s x 5	0		./clomp
compress-7zip	This is a test of 7-Zip using p7zip with its integrated benchmark feature or upstream 7-Zip for the Windows x64 build.	Diagnosis	multi	40s x 3	82		./compress-7zip
compress-gzip	This test measures the time needed to archive/compress two copies of the Linux 4.13 kernel source tree using Gzip compression.		single	40s x 3	5	runs on selective cores	./compress-gzip
compress-lzma	This test measures the time needed to compress a file using LZMA compression.		single	280s x 3	2		./compress-lzma
compress-pbzip2	This test measures the time needed to compress a file (a .tar package of the Linux kernel source code) using BZIP2 compression.		multi	10s x 6	13		./compress-pbzip2
cpuminer-opt	Cpuminer benchmark.		multi	30s x 3	12		./cpuminer
crafty	This is a performance test of Crafty, an advanced open-source chess engine.		single	30s x 3	3		./crafty-benchmark
cyclictest	Cyclictest is a high-resolution test program for measuring the Linux kernel latencies.		single	50s x 3	3	not cpu-bound	./cyclictest
cython-bench	Stress benchmark tests to measure time consumed by cython code.		single	30s x 3	2		./cython-bench
dcraw	This test times how long it takes to convert several high-resolution RAW NEF image files to PPM image format using dcraw.		single	50s x 3	2		./dcraw
dolfyn	Dolfyn is a Computational Fluid Dynamics (CFD) code of modern numerical simulation techniques. The Dolfyn test profile measures the execution time of the bundled computational fluid dynamics demos that are bundled with Dolfyn.					No result, needs further investigation
ebizzy	This is a test of ebizzy, a program to generate workloads resembling web server workloads.		multi	20s x 6	18		./ebizzy
encode-flac	This test times how long it takes to encode a sample WAV file to FLAC format five times.	Diagnosis	single	12s x 5	6		./encode-flac
encode-mp3	LAME is an MP3 encoder licensed under the LGPL. This test measures the time required to encode a WAV file to MP3 format.	Diagnosis	single	35s x 3	2		./lame
encode-ogg	This test times how long it takes to encode a sample WAV file to Ogg format using vorbis-tools, libvorbis, and libogg.		single	7s x 3	2		./encode-ogg
encode-opus	Opus is an open audio codec. Opus is a lossy audio compression format designed primarily for interactive real-time applications over the Internet. This test uses Opus-Tools and measures the time required to encode a WAV file to Opus and then to decode the generated Opus file.		single	9s x 5	4		./encode-opus
encode-wavpack	This test times how long it takes to encode a sample WAV file to WavPack format.		single	8s x 5	2		./encode-wavpack
espeak	This test times how long it takes the eSpeak speech synthesizer to read Project Gutenberg's The Outline of Science and output to a WAV file.		single	40s x 6	3		./espeak
etqw-demo	This test calculates the average frame-rate within the demo for the game Enemy Territory: Quake Wars demo game.		multi (heavy on one CPU)	300s x 9	11	Initial burst of computation; longer run across threads. Heavy on one CPU	./etqw
fahbench	FAHBench is a Folding@Home benchmark on the GPU.					No result, needs further investigation
ffmpeg	This test uses FFmpeg for testing the system's audio/video encoding performance.		multi	10s x 4	33		./ffmpeg
ffte	FFTE is a package by Daisuke Takahashi to compute Discrete Fourier Transforms of 1-, 2- and 3- dimensional sequences of length (2^p)(3^q)(5^r).		single*	5s x 6	10	Processes started on all CPUs, but all but one are idle.	./ffte
fftw	FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions.		single	26m, varying time depending on size	2	32 possible options. Interesting to see as performance drops dramatically at particular size. Cache effects?	/bin/sh
fhourstones	This integer benchmark solves positions in the game of Connect-4, as played on a vertical 7x6 board. By default, it uses a 64Mb transposition table with the twobig replacement strategy. Positions are represented as 64-bit bitboards, and the hash function is computed using a single 64-bit modulo operation, giving 64-bit machines a slight edge. The alpha-beta searcher sorts moves dynamically based on the history heuristic.		single	15s x 3		2	./fhourstones-benchmark
fio	Fio is an advanced disk benchmark that depends upon the kernel's AIO access library.		single, several threads	Large time due to 2048 combinations	12	2048 combinations, batch run tries them all. Mostly system time.	./fio-run
gcrypt	This is a benchmark of libgcrypt's integrated benchmark with the CAMELLIA256-ECB cipher and 100 repetitions.				3	Compilation errors during installation.
git	This test measures the time needed to carry out some sample Git operations on an example, static repository that happens to be a copy of the GNOME GTK tool-kit repository.		multi	6s x 3	58		./git
glibc-bench	The GNU C Library project provides the core libraries for the GNU system and GNU/Linux systems, as well as many other systems that use Linux as the kernel. These libraries provide critical APIs including ISO C11, POSIX.1-2008, BSD, OS-specific APIs and more.		single	3s x 15	2	warnings that test ended quickly.	./glibc-bench
gnupg	This test times how long it takes to encrypt a file using GnuPG.		single	12s x 3	2		./gnupg
go-benchmark	Benchmark for monitoring real time performance of the Go implementation for HTTP, JSON and garbage testing per iteration.		multi	12s x 3 - three workloads	66	Three workloads with varying profiles.	./go-benchmark
gpu-residency	This test measures the GPU residency of a given state for a 60 second interval.					Test quit with non-zero status, needs investigation
graphics-magick	This is a test of GraphicsMagick with its OpenMP implementation that performs various imaging tests to stress the system's CPU.		multi	60s x 3	9	Workloads uneven across CPUs	./graphics-magick
hackbench	This is a benchmark of Hackbench, a test of the Linux kernel scheduler.		multi	30m	up to 1008	12 options, combinations of threads and processes; 90% system time.
himeno	The Himeno benchmark is a linear solver of pressure Poisson using a point-Jacobi method.		single	60s x 3	2		./himrno
hint	This test runs the U.S. Department of Energy's Ames Laboratory Hierarchical INTegration (HINT) benchmark.		single	25m	2	Third test hung; problem in tools or test? Needs investigation.	./hint
hmmer	This test searches through the Pfam database of profile hidden markov models. The search finds the domain structure of Drosophila Sevenless protein.		multi	10s x 3	11		./hmmer
hpcg	HPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCC		multi	55s x 3	12	All ~30% busy; investigate idle times.	./hpcg
interbench	Interbench is an interactivity benchmark written by Con Kolivas. Interbench is primarily intended to test out the system kernel and its CPU scheduler while running a simulated test with a given simulated load in the background. Each benchmark / load is run for 60 seconds per test.		multi	4h	4	81 combinations; many with no result	./interbench
java-jmh	This test runs the stock benchmark of the Java JMH benchmark via Maven.		multi	7m	355	Almost 100% CPU	./java-jmh
java-scimark2	This test runs the Java version of SciMark 2.0, which is a benchmark for scientific and numerical computing developed by programmers at the National Institute of Standards and Technology. This benchmark is made up of Fast Foruier Transform, Jacobi Successive Over-relaxation, Monte Carlo, Sparse Matrix Multiply, and dense LU matrix factorization benchmarks.		single	2m	21	6 tests	./java-scimark2
john-the-ripper	This is a benchmark of John The Ripper, which is a password cracker.		multi	(20s+40s+20s ) x 3	9		./john-the-ripper
lammps	LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator.					Test quit with non-zero exit status, needs investigation
llvm-test-suite	This test times how long it takes to run the LLVM Test Suite.		single	220s x 3	1561		./llvm-test-suite
luajit	This test profile is a collection of Lua scripts/benchmarks run against a locally-built copy of LuaJIT upstream.		single	100s	2	Six tests	./luajit
luxmark	LuxMark is a multi-platform OpenGL benchmark using LuxRender. LuxMark supports targeting different OpenCL devices and has multiple scenes available for rendering. LuxMark is a fully open-source OpenCL program with real-world rendering examples.					Test quit with non-zero exit status, needs investigation
lzbench	lzbench is an in-memory benchmark of various compressors. The file used for compression is a Linux kernel source tree tarball.		single	6m	2		./lzbench
mafft	This test performs an alignment of 100 pyruvate decarboxylase sequences.		multi	7s x 6	143	Many short little processes.	./mafft
mencoder	This test uses mplayer's mencoder utility and the libavcodec family for testing the system's audio/video encoding performance.		single	20s x 3	2		./mencoder
minion	Minion is an open-source constraint solver that is designed to be very scalable. This test profile uses Minion's integrated benchmarking problems to solve.		single	15m	2	Three tests	./minion
mrbayes	This test performs a bayesian analysis of a set of primate genome sequences in order to estimate their phylogeny.					Test quit with non-zero exit status, needs investigation
multichase	This is a benchmark of Google's multichase pointer chaser program.		single & multi	100s	3	Five tests	./multichase
n-queens	This is a test of the OpenMP version of a test that solves the N-queens problem. The board problem size is 18		multi	35s x 3	9	Almost 100% busy	./n-queens
nero2d	This is a test of Nero2D, which is a two-dimensional TM/TE solver for Open FMM. Open FMM is a free collection of electromagnetic software for scattering at very large objects. This test profile times how long it takes to solve one of the included 2D examples.					Test quit with non-zero exit status, needs investigation
network-loopback	This test measures the loopback network adapter performance using a micro-benchmark to measure the TCP performance.					Test quit with non-zero exit status, needs investigation
nginx	This is a test of ab, which is the Apache Benchmark program running against nginx. This test profile measures how many requests per second a given system can sustain when carrying out 2,000,000 requests with 500 requests being carried out concurrently.		single	60s x 3	2	Heavier on system time than user time.	./nginx
noise-level	This test measures background activity.		single	60s	14	Runs sleep	./noise-level
numpy	This is a test to obtain the general Numpy performance.		single	45m	38		./numpy
openssl	OpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test measures the RSA 4096-bit performance of OpenSSL.	Analysis	multi	20s x 3	9		./openssl
opm-git	This is a test of a DUNE (Distributed and Unified Numerics Environment) module called OPM Benchmarks from the Open Porous Media project. Open Porous Media is a set of open-source tools concerning simulation of flow and transport of fluids in porous media. This test profile builds OPM and its dependencies from upstream Git.					Test quit with non-zero exit status, needs investigation
osbench	OSBench is a collection of micro-benchmarks for measuring operating system primitives like time to create threads/processes, launching programs, creating files, and memory allocation.	Diagnosis				wspy hangs because incorrect tree has been built. Further debugging shows "fork()" is failing with EAGAIN errno. This also causes the test to fail when not run under wspy; two fixes required - (1) look at conditions described in fork(2) system call to avoid the failure and (2) fix wspy to properly handle fork calls that might fail.
padman	World of Padman is an open-source game using the ioquake3 engine. What makes this game different from other first-person shooters is that it's a cartoon-style action game.		multi (heavy on one CPU)	120s x 9	7	Game	./padman
parboil	The Parboil Benchmarks from the IMPACT Research Group at University of Illinois are a set of throughput computing applications for looking at computing architecture and compilers. Parboil test-cases support OpenMP, OpenCL, and CUDA multi-processing environments. However, at this time the test profile is just making use of the OpenMP and OpenCL test workloads.	Diagnosis	multi	25m	13	Ten tests, six didn't run correctly. Missing OpenCL	./parboil
perl-benchmark	Perl benchmark suite that can be used to compare the relative speed of different versions of perl.		multi	80s, 67s, 70s, 28s, 66s, 66s, 70s	22, 21264, 21407, 8639, 21492, 21521, 21834	More than 100,000 processes created; system time exceeds user time.	./perl-benchmark
pgbench	This is a simple benchmark of PostgreSQL using pgbench.					Test must be run as non-root; extremely long runtime.
phpbench	PHPBench is a benchmark suite for PHP. It performs a large number of simple tests in order to bench various aspects of the PHP interpreter. PHPBench can be used to compare hardware, operating systems, PHP versions, PHP accelerators and caches, compiler options, etc. The number of iterations used is 1,000,000.	Diagnosis	single	20s x 3	2		./phpbench
polybench-c	PolyBench-C is a C-language polyhedral benchmark suite made at the Ohio State University.		single	30s	2	Three workloads, last longer than first two	./polybench
postmark	This is a test of NetApp's PostMark benchmark designed to simulate small-file testing similar to the tasks endured by web and mail servers. This test profile will set PostMark to perform 25,000 transactions with 500 files simultaneously with the file sizes ranging between 5 and 512 kilobytes.		single	40s x 3	2	Mostly system time.	./postmark
povray	This is a test of POV-Ray, the Persistence of Vision Raytracer. POV-Ray is used to create 3D graphics using ray-tracing.		multi	135s x 3	29		./povray
primesieve	Primesieve generates prime numbers using a highly optimized sieve of Eratosthenes implementation. Primesieve benchmarks the CPU's L1/L2 cache performance.		multi	85s x 3	9	Almost 100% user	./primesieve
psstop	Shows the total number of processes running and the memory they consume.		single	<1s	5	Extremely short duration	./psstop
pybench	This test profile reports the total time of the different average timed test results from PyBench. PyBench reports average test times for different functions such as BuiltinFunctionCalls and NestedForLoops, with this total result providing a rough estimate as to Python's average performance on a given system. This test profile runs PyBench each time for 20 rounds.	Diagnosis	single	30s x 3	5		./pybench
ramspeed	This benchmark tests the system memory (RAM) performance.		double	120s x 10	3	Naming suggests varations of double-threaded stream	./ramspeed
rbenchmark	This test is a quick-running survey of general R performance		single	0.5s x 3	11		./rbenchmark
redis	Redis is an open-source data structure server.		single* (multi-core but most computation on single core)	11s x 15	4	short bursts of activity, mostly idle	./redis
rodinia	Rodinia is a suite focused upon accelerating compute-intensive applications with accelerators. CUDA, OpenMP, and OpenCL parallel models are supported by the included applications. This profile utilizes the OpenCL and OpenMP test binaries at the moment.		multiple	18m	9	Only three of nine benchmarks ran out of the box	./rodinia
sample-program	A simple C++ program that calculates Pi to 8,765,4321 digits using the Leibniz formula. This test can be used for showcasing how to write a basic test profile.		single	3s x 5	2		./sample-program
schbench	This is a benchmark of Schbench, a Linux kernel scheduler benchmark developed by Facebook.		multiple	90m	13	42 different subtests	./schbench
scimark2	This test runs the ANSI C version of SciMark 2.0, which is a benchmark for scientific and numerical computing developed by programmers at the National Institute of Standards and Technology. This test is made up of Fast Foruier Transform, Jacobi Successive Over-relaxation, Monte Carlo, Sparse Matrix Multiply, and dense LU matrix factorization benchmarks.		single	25s x 3	2		./scimark2
serial-loopback	This test will do a simple write/read test on all detected serial interfaces. For this test to work, the relevant serial ports should have a serial loopback plug or have otherwise wired the appropriate pins.					Test quit with non-zero exit status, needs investigation
smallpt	Smallpt is a C++ global illumination renderer written in less than 100 lines of code. Global illumination is done via unbiased Monte Carlo path tracing and there is multi-threading support via the OpenMP library.		multi	80s x 3	9		./smallpt
stockfish	This is a test of Stockfish, an advanced C++11 chess benchmark that can scale up to 128 CPU cores.	Diagnosis	single* (multi-core but most computation on single core)	4s x 3	4		./stockfish
stream	This benchmark tests the system memory (RAM) performance.	Diagnosis	multi	50s x 5	9		./stream
sudokut	This is a test of Sudokut, which is a Sudoku puzzle solver written in Tcl. This test measures how long it takes to solve 100 Sudoku puzzles.		single	12s x 3	101	Runs same process 100 times	./sudokut
sunflow	This test runs benchmarks of the Sunflow Rendering System. The Sunflow Rendering System is an open-source render engine for photo-realistic image synthesis with a ray-tracing core.		multi	30s x 3	182		./sunflow-benchmark
system-decompress-bzip2	This test measures the time to decompress a Linux kernel tarball using BZIP2.		single	10s x 3	2		./system-decompress-bzip2
system-decompress-xz	This test measures the time to decompress a Linux kernel tarball using XZ.		single	4s x 3	2		./system-decompress-xz
system-libxml2	This test measures the time to parse a random XML file with libxml2 via xmllint using the streaming API.					Test quit with non-zero exit status, needs investigation
systemd-boot-kernel	This test uses systemd-analyze to report the kernel boot time.					Test quit with non-zero exit status, needs investigation
systemd-boot-total	This test uses systemd-analyze to report the entire boot time.					Test quit with non-zero exit status, needs investigation
systemd-boot-userspace	This test uses systemd-analyze to report the userspace boot time.					Test quit with non-zero exit status, needs investigation
systester	Time how long it takes to calculate pi to varying lengths.					Test quit with non-zero exit status, needs investigation
t-test1	This is a test of t-test1 for basic memory allocator benchmarks. Note this test profile is currently very basic and the overall time does include the warmup time of the custom t-test1 compilation. Improvements welcome.		single	30s	4008	Two workloads	Many processes, but seems to mostly limited sequentially.
tachyon	This is a test of the threaded Tachyon, a parallel ray-tracing system.		multi	15s x 3	9		./tachyon-benchmark
tensorflow	This is a benchmark of the Tensorflow deep learning framework using the CIFAR10 data set.		multi	90s x 3	50	Python test	./tensorflow
tjbench	tjbench is a JPEG decompression/compression benchmark part of libjpeg-turbo.		single	8s x 3	15		./tjbench
tscp	This is a performance test of TSCP, Tom Kerrigan's Simple Chess Program, which has a built-in performance benchmark.		single	2s x 5	2		./tscp
ttsiod-renderer	A portable GPL 3D software renderer that supports OpenMP and Intel Threading Building Blocks with many different rendering modes. This version does not use OpenGL but is entirely CPU/software based.		multi	30s x 3	9		./ttsiod-renderer
vpxenc	This is a standard video encoding performance test of Google's libvpx library and the vpxenc command for the VP8/WebM format.		four	70s x 6	5		./vpxenc
x264	This is a simple test of the x264 encoder run on the CPU (OpenCL support disabled) with a sample video file.	Diagnosis	multi	20s x 5	11		./x264
xsbench	XSBench is a mini-app representing a key computational kernel of the Monte Carlo neutronics application OpenMC.		multi	15s x 3	9		./xsbench
y-cruncher	Y-Cruncher is a multi-threaded Pi benchmark.		multi	60s x 3	20		./y-cruncher

Performance analysis, tools and experiments

An eclectic collection

Phoronix Overview

Comments

Phoronix test suite, quick run through many tests — No Comments

Leave a Reply Cancel reply