↓
 
  • Phoronix
  • gromacs
  • OpenFOAM

Performance analysis, tools and experiments

An eclectic collection

  • Home
  • Blog
  • Tools
    • wspy – workload spy
  • Workloads
    • Geekbench
    • gromacs
      • lysozyme tutorial
      • PRACE benchmark
    • OpenFOAM
    • Phoronix
      • aobench
      • apache
      • asmfish
      • blake2
      • blender
      • botan
      • build-gcc
      • build-linux-kernel
      • build-llvm
      • build-php
      • bullet
      • c-ray
      • cachebench
      • compilebench
      • compress-p7zip
      • compress-pbzip2
      • compress-zstd
      • ebizzy
      • encode-flac
      • encode-mp3
      • ffmpeg
      • ffte
      • fftw
      • fhourstones
      • fio
      • fs-mark
      • gimp
      • git
      • go-benchmark
      • graphics-magick
      • hackbench
      • himeno
      • hmmer
      • indigobench
      • java-gradle-perf
      • java-scimark2
      • luajit
      • m-queens
      • mafft
      • n-queens
      • nginx
      • numpy
      • octave-benchmark
      • openssl
      • osbench
        • osbench – create processes
        • osbench – create threads
        • osbench – memory
      • parboil
      • pgbench
      • phpbench
      • polybench-c
      • povray
      • primesieve
      • pybench
      • radiance
      • rbenchmark
      • redis
      • rodinia
      • scikit-learn
      • scimark2
      • sqlite
      • stockfish
      • stream
      • stress-ng
      • tensorflow
      • tinymembench
      • tjbench
      • tscp
      • ttsiod-renderer
      • vpxenc
      • x264
      • y-cruncher
  • Experiments
Home - Page 7 << 1 2 … 5 6 7 8 >>

Post navigation

← Older posts
Newer posts →

Finding performance counters for memory traffic on Haswell

Performance analysis, tools and experiments Posted on 2018-04-05 by mev2018-04-13

I have found the counters necessary for wspy to get memory reads/writes.

It wasn’t completely straightforward, so this documents the steps I took.
Continue reading →

Posted in experiments | Tagged memory, performance counters, wspy | 1 Reply

Phoronix analysis – Benchmarks used in April 4th article

Performance analysis, tools and experiments Posted on 2018-04-05 by mev2018-04-18

There was an article posted on phoronix comparing several Linux servers including POWER9, Intel and AMD EPYX.

Analysis for this article was minimal; so in this post I dug in a bit more on characteristics of the benchmarks of what was actually being compared. I also ran the tests on my i7-4770 system to get a reference. I went through all the benchmarks to do the next level of diagnosis and characterization and provided links in the table below.

NOTE: The diagnosis pages are slowly being updated to reflect newer tools and methods.

BenchmarkBetteri7-4770EPYC 7601Notes
parboil LBMlower194.9537.39Diagnosis
parboil CUTCPlower14.402.61Diagnosis
parboil Stencillower26.8114.26Diagnosis
x264higher36.15128.17Diagnosis
7-ziphigher2024879708Diagnosis
Timed GCC buildlower1281.00707.34Diagnosis
Timed kernel buildlower156.2035.66Diagnosis
Stockfishlower34744474Diagnosis
Encode-flaclower10.9311.79Diagnosis
Encode-mp3lower32.7443.57Diagnosis
OpenSSLhigher635.574598.47Analysis
Pybenchlower14622216Diagnosis
PHPbenchhigher537847393659Diagnosis
OSbench: threadslower9.5030.71Analysis
OSbench: processlowerERROR fork()->EAGAIN59.61N/A
OSbench: memorylower82.4695.14Analysis
Posted in analysis, workloads | Tagged parboil, phoronix | Leave a reply

Phoronix test suite, quick run through many tests

Performance analysis, tools and experiments Posted on 2018-04-04 by mev2018-04-13

I kicked off a quick run through >100 Phoronix tests to get a quick profile and overall assessment, results from table below. A few items noted:

  • Some of the tests didn’t run, most likely because they didn’t completely install or were missing dependencies not found until runtime. Over time, can clean these up.
  • osbench, created a situation where the process tree in wspy had a loop and hence hung. This needs further debugging to make a more robust tool.
  • The hint benchmark hung, in the user code of INT program, needs diagnosis.
  • First level diagnosis of how many processes and overall CPU time gave good ideas of single vs. multi-threaded tests and hence how to bind them further. In addition, some of the multi-threaded tests were very symmetric and others ran more haphazardly on multiple cores.
  • Some of the tests have extremely short runtimes.
  • I used the “batch-run” to avoid being prompted for a test name, unlike the default-run. However, this means all possible combinations were asked and in few cases (fio, pgbench) the combinatorics can stretch for days
  • Otherwise a rough cut filter, but useful to get a first screen of tests as well as testing of wspy tool. The table below is also linked in the “workloads” menu item and can be updated as I learn more about the tests.

    Phoronix Overview

    TestPhoronix SummaryDiagnosisSingle vs. Multi-ThreadedRuntime# processesNotesRoot
    aobenchAOBench is a lightweight ambient occlusion renderer, written in C. The test profile is using a size of 2048 x 2048.single42s x 72./aobench
    apacheThis is a test of ab, which is the Apache benchmark program. This test profile measures how many requests per second a given system can sustain when carrying out 1,000,000 requests with 100 requests being carried out concurrently.multi40s x 3118Heavier use of system time than user time.httpd
    asmfishThis is a test of asmFish, an advanced chess benchmark written in Assembly.multi240s x 311./asmfish
    blake2This is a benchmark of BLAKE2 using the blake2s binary. BLAKE2 is a high-performance crypto alternative to MD5 and SHA-2/3.single2s x 32./blake2
    blenderBlender is an open-source 3D creation software project. This test is of Blender's Cycles benchmark with various sample files. GPU computing via OpenCL or CUDA is supported.multiple cores, but not symmetric and perhaps not all6 hours27/usr/lib/php/sessionclean
    blogbenchBlogBench is designed to replicate the load of a real-world busy file server by stressing the file-system with multiple threads of random reads, writes, and rewrites. The behavior is mimicked of that of a blog by creating blogs with content and pictures, modifying blog posts, adding comments to these blogs, and then reading the content of the blogs. All of these blogs generated are created locally with fake content and pictures.multi300s x 311490% time is system, 10% user time../blogbench
    borkBork is a small, cross-platform file encryption utility. It is written in Java and designed to be included along with the files it encrypts for long-term storage. This test measures the amount of time it takes to encrypt a sample file.single10s x 620runs on more than one core, but overall utilization dominated by single cores/usr/bin/java
    botanBotan is a cross-platform open-source C++ crypto library that supports most all publicly known cryptographic algorithms.single25s x 32./botan
    build-apacheThis test times how long it takes to build the Apache HTTP Server.multi30s x 312052large #s of very small processes/bin/bash
    build-boost-interprocessThis test times how long it takes to build Boost Interprocess examples.Error "-std=c
    ++11 not found". Potentially need to pass in $CXX environment variable? Needs investigation
    build-eigenThis test times how long it takes to build all Eigen examples.Build error, potentially missing $CXX variable. Needs investigation
    build-firefoxThis test times how long it takes to build the Firefox Web Browser.Exit non-zero exit status. Firefox directory not present. Needs investigation
    build-gccThis test times how long it takes to build the GNU Compiler Collection (GCC).Diagnosismulti22m x 31840/bin/bash
    build-imagemagickThis test times how long it takes to build ImageMagick.multi70s x 39479/bin/bash
    build-linux-kernelThis test times how long it takes to build the Linux kernel.Diagnosismulti180s x 32585/bin/bash
    build-llvmThis test times how long it takes to build the LLVM compiler stack.multi15m x 31491/bin/bash
    build-mplayerThis test times how long it takes to build the MPlayer media player program.Error during build needs investigation.
    build-phpThis test times how long it takes to build PHP 5 with the Zend engine.multi90s x 39106/bin/bash
    build-webkitfltkThis test times how long it takes to build the WebKitFLTK web library.Error during build needs investigation.
    bulletThis is a benchmark of the Bullet Physics Engine.single<5s x 72./bullet
    byteThis is a test of BYTE.single17mvarious up to 98./byte
    c-rayThis is a test of C-Ray, a simple raytracer designed to test the floating-point CPU performance. This test is multi-threaded (16 threads per core), will shoot 8 rays per pixel for anti-aliasing, and will generate a 1600 x 1200 image.Analysismulti26s x 3130./c-ray
    cachebenchThis is a performance test of CacheBench, which is part of LLCbench. CacheBench is designed to test the memory and cache bandwidth performancesingle125s x 33./cachebench
    clompCLOMP is the C version of the Livermore OpenMP benchmark developed to measure OpenMP overheads and other performance impacts due to threading in order to influence future system designs. This particular test profile configuration is currently set to look at the OpenMP static schedule speed-up across all available CPU cores using the recommended test configuration.multi6s x 50./clomp
    compress-7zipThis is a test of 7-Zip using p7zip with its integrated benchmark feature or upstream 7-Zip for the Windows x64 build.Diagnosismulti40s x 382./compress-7zip
    compress-gzipThis test measures the time needed to archive/compress two copies of the Linux 4.13 kernel source tree using Gzip compression.single40s x 35runs on selective cores./compress-gzip
    compress-lzma This test measures the time needed to compress a file using LZMA compression.single280s x 32./compress-lzma
    compress-pbzip2This test measures the time needed to compress a file (a .tar package of the Linux kernel source code) using BZIP2 compression.
    multi10s x 613./compress-pbzip2
    cpuminer-optCpuminer benchmark.multi30s x 312./cpuminer
    craftyThis is a performance test of Crafty, an advanced open-source chess engine.single30s x 33./crafty-benchmark
    cyclictestCyclictest is a high-resolution test program for measuring the Linux kernel latencies.single50s x 33not cpu-bound./cyclictest
    cython-benchStress benchmark tests to measure time consumed by cython code.single30s x 32./cython-bench
    dcrawThis test times how long it takes to convert several high-resolution RAW NEF image files to PPM image format using dcraw.single50s x 32./dcraw
    dolfynDolfyn is a Computational Fluid Dynamics (CFD) code of modern numerical simulation techniques. The Dolfyn test profile measures the execution time of the bundled computational fluid dynamics demos that are bundled with Dolfyn.No result, needs further investigation
    ebizzyThis is a test of ebizzy, a program to generate workloads resembling web server workloads.multi20s x 618./ebizzy
    encode-flacThis test times how long it takes to encode a sample WAV file to FLAC format five times.Diagnosissingle12s x 56./encode-flac
    encode-mp3LAME is an MP3 encoder licensed under the LGPL. This test measures the time required to encode a WAV file to MP3 format.Diagnosissingle35s x 32./lame
    encode-oggThis test times how long it takes to encode a sample WAV file to Ogg format using vorbis-tools, libvorbis, and libogg.single7s x 32./encode-ogg
    encode-opusOpus is an open audio codec. Opus is a lossy audio compression format designed primarily for interactive real-time applications over the Internet. This test uses Opus-Tools and measures the time required to encode a WAV file to Opus and then to decode the generated Opus file.single9s x 54./encode-opus
    encode-wavpackThis test times how long it takes to encode a sample WAV file to WavPack format.single8s x 52./encode-wavpack
    espeakThis test times how long it takes the eSpeak speech synthesizer to read Project Gutenberg's The Outline of Science and output to a WAV file.single40s x 63./espeak
    etqw-demoThis test calculates the average frame-rate within the demo for the game Enemy Territory: Quake Wars demo game.multi (heavy on one CPU)300s x 911Initial burst of computation; longer run across threads. Heavy on one CPU./etqw
    fahbenchFAHBench is a Folding@Home benchmark on the GPU.No result, needs further investigation
    ffmpegThis test uses FFmpeg for testing the system's audio/video encoding performance.multi10s x 433./ffmpeg
    ffteFFTE is a package by Daisuke Takahashi to compute Discrete Fourier Transforms of 1-, 2- and 3- dimensional sequences of length (2^p)*(3^q)*(5^r).single*5s x 610Processes started on all CPUs, but all but one are idle../ffte
    fftwFFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions.single26m, varying time depending on size232 possible options. Interesting to see as performance drops dramatically at particular size. Cache effects?/bin/sh
    fhourstonesThis integer benchmark solves positions in the game of Connect-4, as played on a vertical 7x6 board. By default, it uses a 64Mb transposition table with the twobig replacement strategy. Positions are represented as 64-bit bitboards, and the hash function is computed using a single 64-bit modulo operation, giving 64-bit machines a slight edge. The alpha-beta searcher sorts moves dynamically based on the history heuristic.single15s x 32./fhourstones-benchmark
    fioFio is an advanced disk benchmark that depends upon the kernel's AIO access library.single, several threadsLarge time due to 2048 combinations122048 combinations, batch run tries them all. Mostly system time../fio-run
    gcryptThis is a benchmark of libgcrypt's integrated benchmark with the CAMELLIA256-ECB cipher and 100 repetitions.3Compilation errors during installation.
    gitThis test measures the time needed to carry out some sample Git operations on an example, static repository that happens to be a copy of the GNOME GTK tool-kit repository.multi6s x 358./git
    glibc-benchThe GNU C Library project provides the core libraries for the GNU system and GNU/Linux systems, as well as many other systems that use Linux as the kernel. These libraries provide critical APIs including ISO C11, POSIX.1-2008, BSD, OS-specific APIs and more.single3s x 152warnings that test ended quickly../glibc-bench
    gnupgThis test times how long it takes to encrypt a file using GnuPG.single12s x 32./gnupg
    go-benchmarkBenchmark for monitoring real time performance of the Go implementation for HTTP, JSON and garbage testing per iteration.multi12s x 3 - three workloads66Three workloads with varying profiles../go-benchmark
    gpu-residencyThis test measures the GPU residency of a given state for a 60 second interval.Test quit with non-zero status, needs investigation
    graphics-magickThis is a test of GraphicsMagick with its OpenMP implementation that performs various imaging tests to stress the system's CPU.multi60s x 39Workloads uneven across CPUs./graphics-magick
    hackbenchThis is a benchmark of Hackbench, a test of the Linux kernel scheduler.multi30mup to 100812 options, combinations of threads and processes; 90% system time.
    himenoThe Himeno benchmark is a linear solver of pressure Poisson using a point-Jacobi method.single60s x 32./himrno
    hintThis test runs the U.S. Department of Energy's Ames Laboratory Hierarchical INTegration (HINT) benchmark.single25m2Third test hung; problem in tools or test? Needs investigation../hint
    hmmerThis test searches through the Pfam database of profile hidden markov models. The search finds the domain structure of Drosophila Sevenless protein.multi10s x 311./hmmer
    hpcgHPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCCmulti55s x 312All ~30% busy; investigate idle times../hpcg
    interbenchInterbench is an interactivity benchmark written by Con Kolivas. Interbench is primarily intended to test out the system kernel and its CPU scheduler while running a simulated test with a given simulated load in the background. Each benchmark / load is run for 60 seconds per test.multi4h481 combinations; many with no result./interbench
    java-jmhThis test runs the stock benchmark of the Java JMH benchmark via Maven.multi7m355Almost 100% CPU./java-jmh
    java-scimark2This test runs the Java version of SciMark 2.0, which is a benchmark for scientific and numerical computing developed by programmers at the National Institute of Standards and Technology. This benchmark is made up of Fast Foruier Transform, Jacobi Successive Over-relaxation, Monte Carlo, Sparse Matrix Multiply, and dense LU matrix factorization benchmarks.single2m216 tests./java-scimark2
    john-the-ripper This is a benchmark of John The Ripper, which is a password cracker.multi(20s+40s+20s ) x 39./john-the-ripper
    lammpsLAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator.Test quit with non-zero exit status, needs investigation
    llvm-test-suiteThis test times how long it takes to run the LLVM Test Suite.single220s x 31561./llvm-test-suite
    luajitThis test profile is a collection of Lua scripts/benchmarks run against a locally-built copy of LuaJIT upstream.single100s2Six tests./luajit
    luxmarkLuxMark is a multi-platform OpenGL benchmark using LuxRender. LuxMark supports targeting different OpenCL devices and has multiple scenes available for rendering. LuxMark is a fully open-source OpenCL program with real-world rendering examples.Test quit with non-zero exit status, needs investigation
    lzbenchlzbench is an in-memory benchmark of various compressors. The file used for compression is a Linux kernel source tree tarball.single6m2./lzbench
    mafftThis test performs an alignment of 100 pyruvate decarboxylase sequences.multi7s x 6143Many short little processes../mafft
    mencoderThis test uses mplayer's mencoder utility and the libavcodec family for testing the system's audio/video encoding performance.single20s x 32./mencoder
    minionMinion is an open-source constraint solver that is designed to be very scalable. This test profile uses Minion's integrated benchmarking problems to solve.single15m2Three tests./minion
    mrbayesThis test performs a bayesian analysis of a set of primate genome sequences in order to estimate their phylogeny.Test quit with non-zero exit status, needs investigation
    multichaseThis is a benchmark of Google's multichase pointer chaser program.single & multi100s3Five tests./multichase
    n-queensThis is a test of the OpenMP version of a test that solves the N-queens problem. The board problem size is 18multi35s x 39Almost 100% busy./n-queens
    nero2dThis is a test of Nero2D, which is a two-dimensional TM/TE solver for Open FMM. Open FMM is a free collection of electromagnetic software for scattering at very large objects. This test profile times how long it takes to solve one of the included 2D examples.Test quit with non-zero exit status, needs investigation
    network-loopbackThis test measures the loopback network adapter performance using a micro-benchmark to measure the TCP performance.Test quit with non-zero exit status, needs investigation
    nginxThis is a test of ab, which is the Apache Benchmark program running against nginx. This test profile measures how many requests per second a given system can sustain when carrying out 2,000,000 requests with 500 requests being carried out concurrently.single60s x 32Heavier on system time than user time../nginx
    noise-levelThis test measures background activity.single60s14Runs sleep./noise-level
    numpyThis is a test to obtain the general Numpy performance.single45m38./numpy
    opensslOpenSSL is an open-source toolkit that implements SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. This test measures the RSA 4096-bit performance of OpenSSL.Analysismulti20s x 39./openssl
    opm-gitThis is a test of a DUNE (Distributed and Unified Numerics Environment) module called OPM Benchmarks from the Open Porous Media project. Open Porous Media is a set of open-source tools concerning simulation of flow and transport of fluids in porous media. This test profile builds OPM and its dependencies from upstream Git.Test quit with non-zero exit status, needs investigation
    osbenchOSBench is a collection of micro-benchmarks for measuring operating system primitives like time to create threads/processes, launching programs, creating files, and memory allocation.Diagnosiswspy hangs because incorrect tree has been built. Further debugging shows "fork()" is failing with EAGAIN errno. This also causes the test to fail when not run under wspy; two fixes required - (1) look at conditions described in fork(2) system call to avoid the failure and (2) fix wspy to properly handle fork calls that might fail.
    padmanWorld of Padman is an open-source game using the ioquake3 engine. What makes this game different from other first-person shooters is that it's a cartoon-style action game.multi (heavy on one CPU)120s x 97Game./padman
    parboilThe Parboil Benchmarks from the IMPACT Research Group at University of Illinois are a set of throughput computing applications for looking at computing architecture and compilers. Parboil test-cases support OpenMP, OpenCL, and CUDA multi-processing environments. However, at this time the test profile is just making use of the OpenMP and OpenCL test workloads.Diagnosismulti25m13Ten tests, six didn't run correctly. Missing OpenCL./parboil
    perl-benchmarkPerl benchmark suite that can be used to compare the relative speed of different versions of perl.multi80s, 67s, 70s, 28s, 66s, 66s, 70s22, 21264, 21407, 8639, 21492, 21521, 21834More than 100,000 processes created; system time exceeds user time../perl-benchmark
    pgbenchThis is a simple benchmark of PostgreSQL using pgbench.Test must be run as non-root; extremely long runtime.
    phpbenchPHPBench is a benchmark suite for PHP. It performs a large number of simple tests in order to bench various aspects of the PHP interpreter. PHPBench can be used to compare hardware, operating systems, PHP versions, PHP accelerators and caches, compiler options, etc. The number of iterations used is 1,000,000.
    Diagnosissingle20s x 32./phpbench
    polybench-cPolyBench-C is a C-language polyhedral benchmark suite made at the Ohio State University.single30s2Three workloads, last longer than first two./polybench
    postmarkThis is a test of NetApp's PostMark benchmark designed to simulate small-file testing similar to the tasks endured by web and mail servers. This test profile will set PostMark to perform 25,000 transactions with 500 files simultaneously with the file sizes ranging between 5 and 512 kilobytes.single40s x 32Mostly system time../postmark
    povrayThis is a test of POV-Ray, the Persistence of Vision Raytracer. POV-Ray is used to create 3D graphics using ray-tracing.multi135s x 329./povray
    primesievePrimesieve generates prime numbers using a highly optimized sieve of Eratosthenes implementation. Primesieve benchmarks the CPU's L1/L2 cache performance.multi85s x 39Almost 100% user./primesieve
    psstopShows the total number of processes running and the memory they consume.single<1s5Extremely short duration./psstop
    pybenchThis test profile reports the total time of the different average timed test results from PyBench. PyBench reports average test times for different functions such as BuiltinFunctionCalls and NestedForLoops, with this total result providing a rough estimate as to Python's average performance on a given system. This test profile runs PyBench each time for 20 rounds.Diagnosissingle30s x 35./pybench
    ramspeedThis benchmark tests the system memory (RAM) performance.double120s x 103Naming suggests varations of double-threaded stream./ramspeed
    rbenchmarkThis test is a quick-running survey of general R performancesingle0.5s x 311./rbenchmark
    redisRedis is an open-source data structure server.single* (multi-core but most computation on single core)11s x 154short bursts of activity, mostly idle./redis
    rodiniaRodinia is a suite focused upon accelerating compute-intensive applications with accelerators. CUDA, OpenMP, and OpenCL parallel models are supported by the included applications. This profile utilizes the OpenCL and OpenMP test binaries at the moment.multiple18m9Only three of nine benchmarks ran out of the box./rodinia
    sample-programA simple C++ program that calculates Pi to 8,765,4321 digits using the Leibniz formula. This test can be used for showcasing how to write a basic test profile.single3s x 52./sample-program
    schbenchThis is a benchmark of Schbench, a Linux kernel scheduler benchmark developed by Facebook.multiple90m1342 different subtests./schbench
    scimark2This test runs the ANSI C version of SciMark 2.0, which is a benchmark for scientific and numerical computing developed by programmers at the National Institute of Standards and Technology. This test is made up of Fast Foruier Transform, Jacobi Successive Over-relaxation, Monte Carlo, Sparse Matrix Multiply, and dense LU matrix factorization benchmarks.
    single25s x 32./scimark2
    serial-loopbackThis test will do a simple write/read test on all detected serial interfaces. For this test to work, the relevant serial ports should have a serial loopback plug or have otherwise wired the appropriate pins.Test quit with non-zero exit status, needs investigation
    smallptSmallpt is a C++ global illumination renderer written in less than 100 lines of code. Global illumination is done via unbiased Monte Carlo path tracing and there is multi-threading support via the OpenMP library.multi80s x 39./smallpt
    stockfishThis is a test of Stockfish, an advanced C++11 chess benchmark that can scale up to 128 CPU cores.Diagnosissingle* (multi-core but most computation on single core)4s x 34./stockfish
    streamThis benchmark tests the system memory (RAM) performance.Diagnosismulti50s x 59./stream
    sudokutThis is a test of Sudokut, which is a Sudoku puzzle solver written in Tcl. This test measures how long it takes to solve 100 Sudoku puzzles.single12s x 3101Runs same process 100 times./sudokut
    sunflowThis test runs benchmarks of the Sunflow Rendering System. The Sunflow Rendering System is an open-source render engine for photo-realistic image synthesis with a ray-tracing core.multi30s x 3182./sunflow-benchmark
    system-decompress-bzip2This test measures the time to decompress a Linux kernel tarball using BZIP2.single10s x 32./system-decompress-bzip2
    system-decompress-xzThis test measures the time to decompress a Linux kernel tarball using XZ.single4s x 32./system-decompress-xz
    system-libxml2This test measures the time to parse a random XML file with libxml2 via xmllint using the streaming API.Test quit with non-zero exit status, needs investigation
    systemd-boot-kernelThis test uses systemd-analyze to report the kernel boot time.Test quit with non-zero exit status, needs investigation
    systemd-boot-totalThis test uses systemd-analyze to report the entire boot time.Test quit with non-zero exit status, needs investigation
    systemd-boot-userspaceThis test uses systemd-analyze to report the userspace boot time.Test quit with non-zero exit status, needs investigation
    systesterTime how long it takes to calculate pi to varying lengths.Test quit with non-zero exit status, needs investigation
    t-test1This is a test of t-test1 for basic memory allocator benchmarks. Note this test profile is currently very basic and the overall time does include the warmup time of the custom t-test1 compilation. Improvements welcome.single30s4008Two workloadsMany processes, but seems to mostly limited sequentially.
    tachyonThis is a test of the threaded Tachyon, a parallel ray-tracing system.multi15s x 39./tachyon-benchmark
    tensorflowThis is a benchmark of the Tensorflow deep learning framework using the CIFAR10 data set.multi90s x 350Python test./tensorflow
    tjbenchtjbench is a JPEG decompression/compression benchmark part of libjpeg-turbo.single8s x 315./tjbench
    tscpThis is a performance test of TSCP, Tom Kerrigan's Simple Chess Program, which has a built-in performance benchmark.single2s x 52./tscp
    ttsiod-rendererA portable GPL 3D software renderer that supports OpenMP and Intel Threading Building Blocks with many different rendering modes. This version does not use OpenGL but is entirely CPU/software based.multi30s x 39./ttsiod-renderer
    vpxencThis is a standard video encoding performance test of Google's libvpx library and the vpxenc command for the VP8/WebM format.four70s x 65./vpxenc
    x264This is a simple test of the x264 encoder run on the CPU (OpenCL support disabled) with a sample video file.Diagnosismulti20s x 511./x264
    xsbenchXSBench is a mini-app representing a key computational kernel of the Monte Carlo neutronics application OpenMC.multi15s x 39./xsbench
    y-cruncherY-Cruncher is a multi-threaded Pi benchmark.multi60s x 320./y-cruncher

Posted in experiments, workloads | Tagged phoronix | Leave a reply

wspy – diskstats, set-cpumask

Performance analysis, tools and experiments Posted on 2018-04-03 by mev2018-04-03

I have now enhanced wspy to add an option for –diskstats. This option samples, /sys/block/*/stat files to save away disk read and write statistics. The same information is also reported in /proc/diskstats.

Another option added at same time is –set-cpumask which sets the mask of processes that the application can run, essentially a “pin” option to pin the workload only to a particular core or set of cores. I’ve used similar syntax as the taskset –cpulist option.

Added but not yet implemented is –memstats (to pull information from /proc/meminfo) and –netstats (to pull information from /proc/net/dev). The general idea behind all four of these options is as a first cut sampling to find cpu/disk/network/memory profile of an application for a rough triage.

Posted in tools | Tagged wspy | Leave a reply

wspy – performance counters

Performance analysis, tools and experiments Posted on 2018-03-27 by mev2018-03-27

I have enhanced wspy to read performance counters. It now has three different instrumentation methods:

  • Reading ftrace logfiles from kernel subsystem. Currently reads the scheduler events for fork/exec/exit to construct process trees
  • Reading /proc/stat on periodic basis (once per second) to monitor CPU user, system and idle times. Similar method could be extended to read disk (/proc/diskstats) and network (/proc/net/dev)
  • Reading performance counters. Each core has its own counters and for now read slightly different counters on each of the first four coresm and then repeat round-robin fashion for higher core numbers

Some future directions are both to extend these subsystems to additional measurements and to make it all more configurable, perhaps with a set of config files and command options.

Below are examples of gnuplot graphs coming from wspy:

User and system time for all CPUs (total)

User and system time for all CPUs (separate)

User and system time for one CPU

Instructions per cycle for all CPUs

Instructions per cycle for one CPU

Branch prediction miss rate on one CPU

Branch rates on one CPU

Last level cache misses from one CPU

L1D cache miss ratio from one CPU

L1D cache activity

Posted in tools | Tagged performance counters, wspy | Leave a reply

wspy – gnuplot and zip archive

Performance analysis, tools and experiments Posted on 2018-03-23 by mev2018-03-23

Two improvements have been added to the wspy program:

  • Added a -z option that creates a zip archive with data files related to the wspy run
  • Added a script that calls gnuplot to plot CPU usage over time.

Below is a short listing showing contents of the zip archive as well as running of the gnuplot script.

mev@popayan:~/wspy-exp$ unzip -l c-ray
Archive:  c-ray.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
     3417  2018-03-23 19:25   allcpu.csv
     2399  2018-03-23 19:25   cpu-gnuplot.sh
     3417  2018-03-23 19:25   cpu0.csv
     3417  2018-03-23 19:25   cpu1.csv
     3417  2018-03-23 19:25   cpu2.csv
     3417  2018-03-23 19:25   cpu3.csv
     3417  2018-03-23 19:25   cpu4.csv
     3417  2018-03-23 19:25   cpu5.csv
     3417  2018-03-23 19:25   cpu6.csv
     3417  2018-03-23 19:25   cpu7.csv
    25725  2018-03-23 19:25   processtree.txt
---------                     -------
    58877                     11 files
mev@popayan:~/wspy-exp$ unzip c-ray.zip
Archive:  c-ray.zip
  inflating: allcpu.csv              
  inflating: cpu-gnuplot.sh          
  inflating: cpu0.csv                
  inflating: cpu1.csv                
  inflating: cpu2.csv                
  inflating: cpu3.csv                
  inflating: cpu4.csv                
  inflating: cpu5.csv                
  inflating: cpu6.csv                
  inflating: cpu7.csv                
  inflating: processtree.txt         
mev@popayan:~/wspy-exp$ ./cpu-gnuplot.sh 
mev@popayan:~/wspy-exp$ ls
allcpu.csv  cpu1.csv  cpu3.csv  cpu5.csv  cpu7.csv        c-ray.zip
allcpu.png  cpu1.png  cpu3.png  cpu5.png  cpu7.png        processtree.txt
cpu0.csv    cpu2.csv  cpu4.csv  cpu6.csv  cpu-gnuplot.sh
cpu0.png    cpu2.png  cpu4.png  cpu6.png  cpulist.png

One CSV file is created for each CPU showing the /proc/stat line each second. A text file is created for the process tree as well.

Below are some examples of the PNG files showing the plots for this benchmark.


Posted in tools | Tagged wspy | Leave a reply

likwid-perfctr run for phoronix cpu benchmarks

Performance analysis, tools and experiments Posted on 2018-03-22 by mev2018-03-22

Kicked off a run with the Phoronix CPU suite using likwid-perfctr. Benchmarks that were single-threaded were pinned to a single CPU, others to all CPUs.

#!/bin/bash
likwid-perfctr -a | tail +3 | awk '{ print $1 }' | while read group
do
    while read benchmark
    do
	likwid-perfctr -f -c 0-7 -g ${group} --output perfctr-03-15/${group}/cpu
-${benchmark}-perfctr.txt phoronix-test-suite batch-run ${benchmark} > perfctr-0
3-15/${group}/cpu-${benchmark}-output.txt 2>&1
    done < cpu-benchlist.txt
done

A lot of data to still look through, so after the benchmark list, I've placed a table with the results files.

TestCoresCPI
pts/padmanmulti1.03
pts/etqw-demomulti0.74
pts/john-the-rippermulti0.67
pts/ttsiod-renderermulti1.16
pts/compress-pbzip2multi1.09
pts/compress-7zipmulti1.20
pts/encode-mp3single2.11
pts/encode-flacsingle2.12
pts/x264multi0.77
pts/ffmpegmulti0.79
pts/opensslmulti0.60
pts/himenosingle2.16
pts/apachemulti2.16
pts/c-raymulti0.70
pts/povraymulti0.77
pts/smallptmulti0.84
pts/tachyonmulti0.97
pts/craftysingle2.14
pts/tscpsingle2.15
pts/mafftmulti0.76
pts/streammulti18.79
BRANCH CACHES CLOCK CYCLE_ACTIVITY DATA ENERGY FALSE_SHARE FLOPS_AVX ICACHE L2 L2CACHE L3 L3CACHE RECOVERY TLB_DATA TLB_INSTR UOPS UOPS_EXEC UOPS_ISSUE UOPS_RETIRE
padman BRANCH CACHES CLOCK CYCLE_ACTIVITY DATA ENERGY FALSE_SHARE FLOPS_AVX ICACHE L2 L2CACHE L3 L3CACHE RECOVERY TLB_DATA TLB_INSTR UOPS UOPS_EXEC UOPS_ISSUE UOPS_RETIRE
etqw-demo BRANCH CACHES CLOCK CYCLE_ACTIVITY DATA ENERGY FALSE_SHARE FLOPS_AVX ICACHE L2 L2CACHE L3 L3CACHE RECOVERY TLB_DATA TLB_INSTR UOPS UOPS_EXEC UOPS_ISSUE UOPS_RETIRE
john-the-ripper BRANCH CACHES CLOCK CYCLE_ACTIVITY DATA ENERGY FALSE_SHARE FLOPS_AVX ICACHE L2 L2CACHE L3 L3CACHE RECOVERY TLB_DATA TLB_INSTR UOPS UOPS_EXEC UOPS_ISSUE UOPS_RETIRE
ttsiod-renderer BRANCH CACHES CLOCK CYCLE_ACTIVITY DATA ENERGY FALSE_SHARE FLOPS_AVX ICACHE L2 L2CACHE L3 L3CACHE RECOVERY TLB_DATA TLB_INSTR UOPS UOPS_EXEC UOPS_ISSUE UOPS_RETIRE
compress-pbzip2 BRANCH CACHES CLOCK CYCLE_ACTIVITY DATA ENERGY FALSE_SHARE FLOPS_AVX ICACHE L2 L2CACHE L3 L3CACHE RECOVERY TLB_DATA TLB_INSTR UOPS UOPS_EXEC UOPS_ISSUE UOPS_RETIRE
compress-7zip BRANCH CACHES CLOCK CYCLE_ACTIVITY DATA ENERGY FALSE_SHARE FLOPS_AVX ICACHE L2 L2CACHE L3 L3CACHE RECOVERY TLB_DATA TLB_INSTR UOPS UOPS_EXEC UOPS_ISSUE UOPS_RETIRE
encode-mp3 BRANCH CACHES CLOCK CYCLE_ACTIVITY DATA ENERGY FALSE_SHARE FLOPS_AVX ICACHE L2 L2CACHE L3 L3CACHE RECOVERY TLB_DATA TLB_INSTR UOPS UOPS_EXEC UOPS_ISSUE UOPS_RETIRE
encode-flac BRANCH CACHES CLOCK CYCLE_ACTIVITY DATA ENERGY FALSE_SHARE FLOPS_AVX ICACHE L2 L2CACHE L3 L3CACHE RECOVERY TLB_DATA TLB_INSTR UOPS UOPS_EXEC UOPS_ISSUE UOPS_RETIRE
x264 BRANCH CACHES CLOCK CYCLE_ACTIVITY DATA ENERGY FALSE_SHARE FLOPS_AVX ICACHE L2 L2CACHE L3 L3CACHE RECOVERY TLB_DATA TLB_INSTR UOPS UOPS_EXEC UOPS_ISSUE UOPS_RETIRE
ffmpeg BRANCH CACHES CLOCK CYCLE_ACTIVITY DATA ENERGY FALSE_SHARE FLOPS_AVX ICACHE L2 L2CACHE L3 L3CACHE RECOVERY TLB_DATA TLB_INSTR UOPS UOPS_EXEC UOPS_ISSUE UOPS_RETIRE
openssl BRANCH CACHES CLOCK CYCLE_ACTIVITY DATA ENERGY FALSE_SHARE FLOPS_AVX ICACHE L2 L2CACHE L3 L3CACHE RECOVERY TLB_DATA TLB_INSTR UOPS UOPS_EXEC UOPS_ISSUE UOPS_RETIRE
himeno BRANCH CACHES CLOCK CYCLE_ACTIVITY DATA ENERGY FALSE_SHARE FLOPS_AVX ICACHE L2 L2CACHE L3 L3CACHE RECOVERY TLB_DATA TLB_INSTR UOPS UOPS_EXEC UOPS_ISSUE UOPS_RETIRE
apache BRANCH CACHES CLOCK CYCLE_ACTIVITY DATA ENERGY FALSE_SHARE FLOPS_AVX ICACHE L2 L2CACHE L3 L3CACHE RECOVERY TLB_DATA TLB_INSTR UOPS UOPS_EXEC UOPS_ISSUE UOPS_RETIRE
c-ray BRANCH CACHES CLOCK CYCLE_ACTIVITY DATA ENERGY FALSE_SHARE FLOPS_AVX ICACHE L2 L2CACHE L3 L3CACHE RECOVERY TLB_DATA TLB_INSTR UOPS UOPS_EXEC UOPS_ISSUE UOPS_RETIRE
povray BRANCH CACHES CLOCK CYCLE_ACTIVITY DATA ENERGY FALSE_SHARE FLOPS_AVX ICACHE L2 L2CACHE L3 L3CACHE RECOVERY TLB_DATA TLB_INSTR UOPS UOPS_EXEC UOPS_ISSUE UOPS_RETIRE
smallpt BRANCH CACHES CLOCK CYCLE_ACTIVITY DATA ENERGY FALSE_SHARE FLOPS_AVX ICACHE L2 L2CACHE L3 L3CACHE RECOVERY TLB_DATA TLB_INSTR UOPS UOPS_EXEC UOPS_ISSUE UOPS_RETIRE
tachyon BRANCH CACHES CLOCK CYCLE_ACTIVITY DATA ENERGY FALSE_SHARE FLOPS_AVX ICACHE L2 L2CACHE L3 L3CACHE RECOVERY TLB_DATA TLB_INSTR UOPS UOPS_EXEC UOPS_ISSUE UOPS_RETIRE
crafty BRANCH CACHES CLOCK CYCLE_ACTIVITY DATA ENERGY FALSE_SHARE FLOPS_AVX ICACHE L2 L2CACHE L3 L3CACHE RECOVERY TLB_DATA TLB_INSTR UOPS UOPS_EXEC UOPS_ISSUE UOPS_RETIRE
tscp BRANCH CACHES CLOCK CYCLE_ACTIVITY DATA ENERGY FALSE_SHARE FLOPS_AVX ICACHE L2 L2CACHE L3 L3CACHE RECOVERY TLB_DATA TLB_INSTR UOPS UOPS_EXEC UOPS_ISSUE UOPS_RETIRE
mafft BRANCH CACHES CLOCK CYCLE_ACTIVITY DATA ENERGY FALSE_SHARE FLOPS_AVX ICACHE L2 L2CACHE L3 L3CACHE RECOVERY TLB_DATA TLB_INSTR UOPS UOPS_EXEC UOPS_ISSUE UOPS_RETIRE
stream BRANCH CACHES CLOCK CYCLE_ACTIVITY DATA ENERGY FALSE_SHARE FLOPS_AVX ICACHE L2 L2CACHE L3 L3CACHE RECOVERY TLB_DATA TLB_INSTR UOPS UOPS_EXEC UOPS_ISSUE UOPS_RETIRE
Posted in experiments | Tagged likwid-perfctr, phoronix, phoronix cpu suite | Leave a reply

wspy run for phoronix cpu benchmarks

Performance analysis, tools and experiments Posted on 2018-03-15 by mev2018-03-15

As one of the first steps inpriming the pump for Phoronix benchmarks, I ran the wspy program on 21 candidate CPU benchmarks. My goal was to start with a rough characterization, e.g. single-threaded vs. multi-threaded or cpu-bound vs not.

#!/bin/bash
while read benchmark
do
    /home/mev/wspy/wspy -o cpu-wspy-${benchmark}.txt phoronix-test-suite batch-r
un ${benchmark} > cpu-${benchmark}.output.txt 2>&1
done < cpu-benchlist.txt

Results are listed below. A few more general comments:

  • Examining the wspy profiles, I break the tests into several groups:
    • graphics programs: padman, etqw-demo; multiple threads, one or two very busy others less so.
    • multi-core programs: john-the-ripper, ttsiod-renderer, compress-pbip2, compress-7zip ffmpeg, openssl, c-ray, povray, smallpt, tachyon, stream; all CPUs, close to 100% user time, symmetric operations. How many are small in-cache toys and how many are bigger?
    • multi-core programs: not 100% user time: x264, apache, mafft; what is taking up other times?
    • single-threaded programs: encode-mp3, encode-flac, himeno, crafty, tscp

    These classifications also give me some additional things to look for when looking further at performance counters.

  • Benchmark scores are listed as a sanity check that the experiments were similar to my previous run. Since the primary goal is rough characterization I haven't done a lot to control the runs, but one can make a few observations of wspy test overhead and potentially likely noise: (a) the DES benchmark stands out as having a 15% better score run under instrumentation (b) overall 14 scores are better in this run vs. 11 worse in this run and range is +15% (des) to -4% (apache) with all but 2 scores (des, himeno) being within 5%. As a result, it doesn't look like wspy is particularly onerous in preturbing adding overhead to this run.
  • Two of the benchmarks (padman, etqw-demo) report only a single run as part of the cpu suite, but run nine combinations of different graphics resolutions when run by themselves. Still comparable in scores, but also more processes in the wspy output.
TestOriginal ScoreNew ScoreBetterTest Outputwspy OutputBehavior of processes/CPUsnotes
pts/padman198.97198.03higherpadmanpadman
graphics program calculating frames per second.

One CPU very busy while test runs approaching 100% utilization; short bursts on other CPUs particularly at the point tests start.

Game that calculates # of frames per second, initial processing time? followed by running # of frames per second?
Nine tests are run; original score is only the last.
pts/etqw-demo41.8041.70higheretqw-demo
etqw-demo
Graphics program calculating frames per second.

Each test starts one process per core (8 total), two are different (name="threaded-ml") so interesting if placement of these threads matters relative to others.

CPUs generally run in bursts of activity >50% separated by less loaded times.
Nine tests are run; original score is only the last.
pts/john-the-ripper5937 blowfish
20593667 des
203603 MD5
6078 blowfish
23699000 des
208588 MDS
higherjohn-the-ripperjohn-the-ripperAll CPUs close to 100% user time.

Short tests ~20 seconds per test case.
pts/ttsiod-renderer192.87191.88higherttsiod-rendererttsiod-renderer
All CPUs close to 100% user time.

Short tests ~30 seconds per test case.
pts/compress-pbzip29.679.74lowercompress-pbzip2compress-pbzip2
All CPUs close to 100% user time.

Short tests ~10 seconds per test case.
pts/compress-7zip2048620389highercompress-7zip
compress-7zipRepeated short tests on all CPUs, close to 100% user time. Total of ~40 seconds.
pts/encode-mp332.7732.73lowerencode-mp3
encode-mp3
Single threaded, close to 100% CPU.
pts/encode-flac11.7011.16lowerencode-flac
encode-flacSingle threaded, close to 100%.

Very short runtimes.
pts/x26436.2335.98higherx264
x264
All CPUs, busy but not always 100%. i/o memory?
pts/ffmpeg7.197.36lowerffmpegffmpegAll CPUs, busy but not 100%. Short runs of ~9 seconds each.
pts/openssl636.17636.37higheropensslopensslAll CPUs, close to 100% user time.

Tests ~20 seconds.
pts/himeno1916.862045.81higherhimenohimeno
Single threaded. Close to 100% CPU.
pts/apache27272.2826249.09higherapache
apache
All CPUs, busy but not 100%. Proportionally high system time.
pts/c-ray26.3626.36lowerc-ray
c-ray
All CPUs, many simultaneous threads, close to 100%
pts/povray131.24131.17lowerpovray
povray
All CPUs, close to 100%
pts/smallpt8078lowersmallpt
smallptAll CPUs, close to 100%.
pts/tachyon13.8313.72lowertachyontachyon
All CPUs, close to 100%.

~15 seconds runtime.
pts/crafty73202477314067highercraftycrafty
Single threaded, close to 100%.
pts/tscp13064011307021highertscptscpSingle threaded, close to 100%.

Very short runtimes.
pts/mafft4.594.66lowermafft
mafft
All CPUs, many small process creations, close to 100%.

Very short runtime total ~5 seconds per run.
pts/streamcopy 19452.82

scale 14243.04

triad 16108.24

add 16135.70
copy 19441.60

scale 14247.74

triad 16116.24

add 16154.56
higherstreamstreamAll CPUs, close to 100% user time.
Posted in experiments | Tagged phoronix, phoronix cpu suite, wspy | Leave a reply

likwid-topology

Performance analysis, tools and experiments Posted on 2018-03-14 by mev2018-04-11

Another useful tool from the likwid performance monitoring and benchmarking suite is likwid-topology. Provides thread (hyperthread), cache (L1, L2, L3) and NUMA (memory) topology together.

Output below shows my i7-4770S CPU is a 4-core hyperthreaded processor with a 32KB L1D cache, 256 KB L2 cache per core and a shared 8 MB L3 cache.

mev@popayan:~$ likwid-topology
--------------------------------------------------------------------------------
CPU name:	Intel(R) Core(TM) i7-4770S CPU @ 3.10GHz
CPU type:	Intel Core Haswell processor
CPU stepping:	3
********************************************************************************
Hardware Thread Topology
********************************************************************************
Sockets:		1
Cores per socket:	4
Threads per core:	2
--------------------------------------------------------------------------------
HWThread	Thread		Core		Socket		Available
0		0		0		0		*
1		0		1		0		*
2		0		2		0		*
3		0		3		0		*
4		1		0		0		*
5		1		1		0		*
6		1		2		0		*
7		1		3		0		*
--------------------------------------------------------------------------------
Socket 0:		( 0 4 1 5 2 6 3 7 )
--------------------------------------------------------------------------------
********************************************************************************
Cache Topology
********************************************************************************
Level:			1
Size:			32 kB
Cache groups:		( 0 4 ) ( 1 5 ) ( 2 6 ) ( 3 7 )
--------------------------------------------------------------------------------
Level:			2
Size:			256 kB
Cache groups:		( 0 4 ) ( 1 5 ) ( 2 6 ) ( 3 7 )
--------------------------------------------------------------------------------
Level:			3
Size:			8 MB
Cache groups:		( 0 4 1 5 2 6 3 7 )
--------------------------------------------------------------------------------
********************************************************************************
NUMA Topology
********************************************************************************
NUMA domains:		1
--------------------------------------------------------------------------------
Domain:			0
Processors:		( 0 4 1 5 2 6 3 7 )
Distances:		10
Free memory:		4592.59 MB
Total memory:		15726.3 MB
--------------------------------------------------------------------------------
Posted in tools | Tagged likwid, likwid-topology | Leave a reply

Phoronix test suite CPU tests, priming the pump

Performance analysis, tools and experiments Posted on 2018-03-14 by mev2018-03-15

Below is a table that summarizes installation and run status of the Phoronix Test Suite CPU suite.

Some background context of how this fits and where I hope to head from here…

Where I’d like to head is comparing micro-architectural features (e.g. caches, branch predictors,…), NUMA properties, OS features/policies (e.g. interrupt pinning) and other aspects with a set of benchmarks. There is however, a bit of circular dependency here. To make good comparisons, one needs good benchmarks and to get good benchmarks one either needs to have a representative sample or at least a good understanding of the range.

I don’t necessarily have these benchmarks up front, so my initial idea is to “prime the pump” by comparing a set of semi-random benchmarks on their properties and iterate with both the analysis and eventually adding other workloads.

The Phoronix Test Suite is a reasonable starting point to look at an initial set of codes to start comparing because of a few reasons:

  1. It is free, with ~200 total tests including ~90 listed as “processor”
  2. There is a test harness to run tests, report results, etc

On the downsides, the tests are of varying quality e.g. a number won’t build or install for variety of reasons and other have missing download files. Also, it seems many of the tests are smaller in terms of run-time, execution size, etc.

However, that is part of the idea off priming the pump to start. To make this start I wanted a number of tests – not so many that it becomes unwieldy and not so few that it becomes harder to generalize. Hence, I picked the “cpu” suite, a set of 25 benchmarks banded together.

Results of the initial install + run exercise are listed below. Of the 25, I had problems getting four benchmarks to quickly run out of the box. So I’ll skip these for now. A quick diagnosis of what prevents these four from running:

  • graphics-magick fails to download with missing tarball
  • gcrypt code doesn’t compile on Ubuntu 17.10, diagnosis suggests some code cleanup is required
  • pgbench has security checks to prevent running as root; while might workaround by running the test as a mortal I also have some of my tools requiring a root run, skip for now
  • NAS parallel benchmarks require correct install-time variables for MPI. Can be done but skip for now.

Note what also seems true is some of these microbenchmarks have known compiler optimizations (e.g. john-the-ripper) or tuning (e.g. ffmpeg, stream) needed to get absolute highest scores. The values below are for the benchmark as it comes “out of the box”, with some optimizations still pending.

TestSuitesInstallSimple runScoreNotes
pts/padmancpuyesyes198.97
pts/etqw-democpuyesyes41.80required 32-bit libraries
pts/graphics-magickcpunono--downloads fail
pts/john-the-rippercpuyesyes5937 blowfish, 20593667 des, 203603 MD5Seems to run when I run it by itself, but not when run as part of the cpu suite.
pts/ttsiod-renderercpuyesyes192.87
pts/compress-pbzip2cpuyesyes9.67
pts/compress-7zipcpuyesyes20486
pts/encode-mp3cpuyesyes32.77
pts/encode-flaccpuyesyes11.70
pts/x264cpuyesyes36.23
pts/ffmpegcpuyesyes7.19
pts/opensslcpuyesyes636.17
pts/gcryptcpuyesno--gcrypt libgcrypt-1.4.4/tests benchmark not found; compile errors with source
pts/himenocpuyesyes1916.86
pts/pgbenchcpuyesno--The test run did not produce a result; doesn't run as root
pts/apachecpuyesyes27272.28
pts/c-raycpuyesyes26.36
pts/povraycpuyesyes131.24
pts/smallptcpuyesyes80
pts/tachyoncpuyesyes13.83
pts/craftycpuyesyes7320247
pts/tscpcpuyesyes1306401
pts/mafftcpuyesyes4.59
pts/npbcpuyesno--Test quit with non-zero exit status; most likely MPI environment variables not configured for the run
pts/streamcpuyesyescopy 19452.82, scale 14243.04, triad 16108.24, add 16135,70
Posted in workloads | Tagged phoronix, phoronix cpu suite | Leave a reply

Post navigation

← Older posts
Newer posts →
©2025 - Performance analysis, tools and experiments - Weaver Xtreme Theme
↑