[Postgres-xl-bugs] pg XL insert slowness issue

Frederic Vander Elst fve at vander-elst.org
Thu May 28 05:01:44 PDT 2015


Hello PGXL

Firstly, may I send a huge thank you to the authors of postgres-xl… Good 
work.

I’m finding QUERY performance scales very well with nodes, but INSERT 
performance to be very poor.I'm hoping I'm doing something wrong, 
because it's 5-10x slower than stock postgres.

I’ve written the quick test inserter below:

#!/usr/bin/python

# -*- coding: utf-8 -*-

import psycopg2

import sys

import socket

import datetime

HOST=socket.gethostname()

if len(sys.argv) > 1:

# fake the host

HOST=sys.argv[1]

con = None

def test_insert():

if HOST == "pg9-0":

con = psycopg2.connect(database='test', user='postgres', port=20020, 
host='localhost')

elif HOST == "pg9-single":

con = psycopg2.connect(database='test', user='postgres', port=5432, 
host='localhost')

print con

     cur = con.cursor()

mk_tables = (

'drop table foo'

,'drop table foor'

,'CREATE TABLE foo  (id numeric(6,0),    name character varying(30)) 
DISTRIBUTE BY HASH (id)'

,'CREATE TABLE foor (id numeric(6,0),    name character varying(30)) 
DISTRIBUTE BY replication'

,'CREATE TABLE foo  (id numeric(6,0),    name character varying(30))'

)

for q in mk_tables:

try:

cur.execute(q)

      except Exception, e:

print 'ignoring exc', str(e)

con.commit()

con.autocommit=False

con.set_session(autocommit=False)

cur = con.cursor()

# now insert, a LOT, into 'foo'

now = datetime.datetime.now()

for x in xrange(500000):

cur.execute("insert into foo (id,name) values (%s, %s)",  (x, x))

if x % 1000 == 0:

sys.stdout.write('.')

sys.stdout.flush()

if x % 10000 == 0:

delta = (datetime.datetime.now()-now).seconds

           now = datetime.datetime.now()

if delta > 0:

print x, delta, 's', '%5.2f recs/s' % (10000 / delta)

con.commit()


test_insert()

Note I’ve done all I could to disable autocommit.

Running this on _*pg-xl*_ (PGXL 9.2.0, based on PG 9.2.4 (Postgres-XL 
9.2.0)), one gtm, gtm-standby, 3 coordinators, 8 data nodes (16 gb RAM 
each) - biggish VM’s (python 2.7.8):


$ python ins.py

<connection object at 0x7fb2cf475910; dsn: 'dbname=test user=postgres 
host=localhost port=20020', closed: 0>

ignoring exc relation "foo" already exists

...............................30000 51 s 588.00 recs/s

..............................60000 48 s 625.00 recs/s

..............................90000 49 s 612.00 recs/s

..............................120000 49 s 612.00 recs/s

..............................150000 48 s 625.00 recs/s

..............................180000 51 s 588.00 recs/s

..............................210000 51 s 588.00 recs/s

..............................240000 50 s 600.00 recs/s

..............................270000 50 s 600.00 recs/s

..............................300000 49 s 612.00 recs/s

..............................330000 47 s 638.00 recs/s

..............................360000 48 s 625.00 recs/s

..............................390000 51 s 588.00 recs/s

..............................420000 52 s 576.00 recs/s

..............................450000 50 s 600.00 recs/s

..............................480000 50 s 600.00 recs/s


Running this on a stock postgres 9.4.2 on a separate VM (same spec), I 
get (ignore the exception noise):

$ python ins.py

<connection object at 0x7f9f32520910; dsn: 'dbname=test user=postgres 
host=localhost port=5432', closed: 0>

ignoring exc table "foor" does not exist

ignoring exc syntax error at or near "DISTRIBUTE"

LINE 1: ... (id numeric(6,0),    name character varying(30)) DISTRIBUTE...

^

ignoring exc syntax error at or near "DISTRIBUTE"

LINE 1: ... (id numeric(6,0),    name character varying(30)) DISTRIBUTE...

^

ignoring exc current transaction is aborted, commands ignored until end 
of transaction block

...............................30000 5 s 6000.00 recs/s

..............................60000 5 s 6000.00 recs/s

..............................90000 5 s 6000.00 recs/s

..............................120000 5 s 6000.00 recs/s

..............................150000 5 s 6000.00 recs/s

..............................180000 5 s 6000.00 recs/s

..............................210000 5 s 6000.00 recs/s

..............................240000 5 s 6000.00 recs/s

..............................270000 5 s 6000.00 recs/s

..............................300000 5 s 6000.00 recs/s

..............................330000 6 s 5000.00 recs/s

..............................360000 5 s 6000.00 recs/s

..............................390000 5 s 6000.00 recs/s

..............................420000 5 s 6000.00 recs/s

..............................450000 5 s 6000.00 recs/s

..............................480000 5 s 6000.00 recs/s

I observe that PG XL is **much** slower at doing this simple insert, _by 
a factor of 10_.


It’s something to do with the INSERTs.  I know pgxl CAN do a

   CREATE TABLE .. from DBLINK ( some other postgres db)

And ingest millions of records very quickly - so there’s no issue with 
the underlying storage, or fanning the data out to the shards, it’s just 
INSERTs that are way slow.

I’ve had to create a stock pg 9.4 database and populate it just so I can 
then **later** re-import all the data via dblinks  - happy with the 
extra step for testing, but in production, that won't work.

Am I doing something wrong here ?  Again the SELECT and 
create-from-dblink seem to say my VM/Disk/PGXL *can* do it !


Thanks for any help or suggestions you may offer.

Best regards

Freddie Vander Elst

*Frederic Vander Elst
Head of Special Projects
Experian pH
*Royalty Studios | 105-109 Lancaster Road | London | W11 1QF
T: 442075980310 | M: +447939582838
www.experian.co.uk <http://www.experian.co.uk/>| www.phgroup.com 
<http://www.phgroup.com/>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.postgres-xl.org/private.cgi/postgres-xl-bugs-postgres-xl.org/attachments/20150528/cb6a0f9a/attachment.htm>


More information about the postgres-xl-bugs mailing list