[Postgres-xl-developers] pgbench -k (distribute by branch) behaving a bit unexpectedly

Tomas Vondra tomas.vondra at 2ndquadrant.com
Sat Jun 24 15:09:45 PDT 2017


While fixing some breakage in pgbench, caused by the PostgreSQL 9.6 
merge, I've realized that with "-k" it behaves a bit unexpectedly.

When used during initialization, the '-k' makes the tables distributed 
by branch (bid), instead of by first column (primary key) of each table. 
That kinda makes sense, as it shards data by branch.

During the benchmarking phase, we however do this with '-k':

     \set aid random(1, naccounts * :scale)
     \set bid random(1, nbranches * :scale)
     \set tid random(1, ntellers * :scale)
     \set delta random(-5000, 5000)


     UPDATE pgbench_accounts SET abalance = abalance + :delta
      WHERE aid = :aid AND bid = :bid;

     SELECT abalance FROM pgbench_accounts
      WHERE aid = :aid AND bid = :bid;

     UPDATE pgbench_tellers SET tbalance = tbalance + :delta
      WHERE tid = :tid AND bid = :bid;

     UPDATE pgbench_branches SET bbalance = bbalance + :delta
      WHERE bid = :bid;

     INSERT INTO pgbench_history (tid, bid, aid, delta, mtime)
     VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);


The issue is that we pick randomly values for all three keys - aid, bid 
and tid, and then use combinations of them in the UPDATE commands.

That makes it highly unlikely we'll match an existing record, so most of 
the UPDATE commands is actually skipped (e.g. no WAL writes). That seems 
like a significant discrepancy compared to regular pgbench.

The question is, what to do about this. The only option I come up with 
is to change how we generate the values, to always generate a valid 
combination of values.

Any other ideas?


Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

More information about the Postgres-xl-developers mailing list