Description - phoronix/pgbench

This is a simple benchmark of PostgreSQL using pgbench.

The test contains both a backend PostgreSQL database and a frontend driver. The database seems to be started separately.

Phoronix allows multiple options as shown by the following menus:

PostgreSQL pgbench 10.3:
    pts/pgbench-1.8.4
    System Test Configuration
        1: Buffer Test
        2: Mostly RAM
        3: On-Disk
        4: Test All Options
        Scaling: 1


        1: Single Thread
        2: Normal Load
        3: Heavy Contention
        4: Test All Options
        Test: 2


        1: Read Write
        2: Read Only
        3: Test All Options
        Mode: 3

In the examples below, we test the “Buffer Test” for “Normal Load” with “Read Only” followed by “Read Write”.

Metrics (Intel) - phoronix/pgbench

Metrics for the driver for “Read Only”

sh - pid 468
	On_CPU   0.109
	On_Core  0.871
	IPC      0.503
	Retire   0.243	(24.3%)
	FrontEnd 0.536	(53.6%)
	Spec     0.100	(10.0%)
	Backend  0.121	(12.1%)
	Elapsed  82.47
	Procs    21
	Minflt   1692
	Majflt   0
	Utime    19.60   	(27.3%)
	Stime    52.27   	(72.7%)
	Start    113224.50
	Finish   113306.97

This uses mostly one core for ~87%.

The metrics for the “Read Write” driver:

sh - pid 702
	On_CPU   0.005
	On_Core  0.036
	IPC      1.314
	Retire   0.268	(26.8%)
	FrontEnd 0.244	(24.4%)
	Spec     0.113	(11.3%)
	Backend  0.375	(37.5%)
	Elapsed  90.85
	Procs    21
	Minflt   1682
	Majflt   0
	Utime    1.35    	(41.2%)
	Stime    1.93    	(58.8%)
	Start    113481.61
	Finish   113572.46

With an On_CPU of 0.005, it barely uses CPU at all.

The metrics for the backend processes

postgres - pid 474
	On_CPU   0.600
	On_Core  4.803
	IPC      0.511
	Retire   0.217	(21.7%)
	FrontEnd 0.595	(59.5%)
	Spec     0.080	(8.0%)
	Backend  0.108	(10.8%)
	Elapsed  82.43
	Procs    43
	Minflt   554085
	Majflt   0
	Utime    293.49  	(74.1%)
	Stime    102.46  	(25.9%)
	Start    113224.51
	Finish   113306.94

Here is where a majority of the On_CPU time goes. It appears to have a lot of frontend stalls.

Metrics (AMD) - phoronix/pgbench
postgres - pid 19027
	On_CPU   0.581
	On_Core  9.288
	IPC      0.486
	FrontCyc 0.096	(9.6%)
	BackCyc  0.062	(6.2%)
	Elapsed  85.04
	Procs    75
	Minflt   930062
	Majflt   0
	Utime    622.57  	(78.8%)
	Stime    167.32  	(21.2%)
	Start    181992.09
	Finish   182077.13

Backend metrics for AMD, shows a similar lower IPC.

Process Tree - phoronix/pgbench
Process Tree
The process tree for the read driver:

    468) sh
      469) pgbench
        470) pg_ctl
          471) sh
            472) postgres
        481) sleep
        482) createdb
        484) pgbench
          485) pgbench
          486) bc
        487) pgbench
        489) pgbench
        491) pgbench
        492) pgbench
        493) pgbench
        495) pgbench
        497) pgbench
        498) pgbench
        500) pgbench
        560) dropdb
        565) pg_ctl

The process tree for the readwrite driver looks pretty much the same.

The backend is also simple:

474) postgres
  475) postgres
  476) postgres
  477) postgres
  478) postgres
  479) postgres
  480) postgres
  483) postgres
  488) postgres
  490) postgres
  494) postgres
  496) postgres
  499) postgres
  501) postgres
  502) postgres
  503) postgres
  504) postgres
  505) postgres
  507) postgres
  508) postgres
  509) postgres
  510) postgres
  511) postgres
  512) postgres
  513) postgres
  514) postgres
  515) postgres
  516) postgres
  517) postgres
  518) postgres
  519) postgres
  520) postgres
  521) postgres
  522) postgres
  523) postgres
  524) postgres
  525) postgres
  526) postgres
  527) postgres
  528) postgres
  529) postgres
  530) postgres
  561) postgres

About this graph
Read tests keep the cores busier than read/write.


Some noise in the IPC, particularly for read/write.

About this graph
Topdown metrics show frontend seems to dominate with a relatively low retirement rate.

Topdown (Intel)
retire         0.261
ms_uops                0.027
speculation    0.022
branch_misses          82.48%
machine_clears         17.52%
frontend       0.601
idq_uops_delivered_0   0.238
icache_stall               0.156
itlb_misses                0.061
idq_uops_delivered_1   0.278
idq_uops_delivered_2   0.323
idq_uops_delivered_3   0.365
dsb_ops                    14.37%
backend        0.116
resource_stalls.sb     0.029
stalls_ldm_pending     0.351

Many icache and itlb misses overall and a huge number of backend stalls.

Also some logistical issues that made this benchmark harder to run (1) the code will only run as a normal user and not root and (2) something seemed to confuse the ptrace2 process tree builder, so metrics above came from the ptrace driver.

Next steps: Understand issue for ptrace2 driver for this code.