[Postgres-xl-bugs] pg XL insert slowness issue
Frederic Vander Elst
fve at vander-elst.org
Thu May 28 05:01:44 PDT 2015
Hello PGXL
Firstly, may I send a huge thank you to the authors of postgres-xl… Good
work.
I’m finding QUERY performance scales very well with nodes, but INSERT
performance to be very poor.I'm hoping I'm doing something wrong,
because it's 5-10x slower than stock postgres.
I’ve written the quick test inserter below:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import psycopg2
import sys
import socket
import datetime
HOST=socket.gethostname()
if len(sys.argv) > 1:
# fake the host
HOST=sys.argv[1]
con = None
def test_insert():
if HOST == "pg9-0":
con = psycopg2.connect(database='test', user='postgres', port=20020,
host='localhost')
elif HOST == "pg9-single":
con = psycopg2.connect(database='test', user='postgres', port=5432,
host='localhost')
print con
cur = con.cursor()
mk_tables = (
'drop table foo'
,'drop table foor'
,'CREATE TABLE foo (id numeric(6,0), name character varying(30))
DISTRIBUTE BY HASH (id)'
,'CREATE TABLE foor (id numeric(6,0), name character varying(30))
DISTRIBUTE BY replication'
,'CREATE TABLE foo (id numeric(6,0), name character varying(30))'
)
for q in mk_tables:
try:
cur.execute(q)
except Exception, e:
print 'ignoring exc', str(e)
con.commit()
con.autocommit=False
con.set_session(autocommit=False)
cur = con.cursor()
# now insert, a LOT, into 'foo'
now = datetime.datetime.now()
for x in xrange(500000):
cur.execute("insert into foo (id,name) values (%s, %s)", (x, x))
if x % 1000 == 0:
sys.stdout.write('.')
sys.stdout.flush()
if x % 10000 == 0:
delta = (datetime.datetime.now()-now).seconds
now = datetime.datetime.now()
if delta > 0:
print x, delta, 's', '%5.2f recs/s' % (10000 / delta)
con.commit()
test_insert()
Note I’ve done all I could to disable autocommit.
Running this on _*pg-xl*_ (PGXL 9.2.0, based on PG 9.2.4 (Postgres-XL
9.2.0)), one gtm, gtm-standby, 3 coordinators, 8 data nodes (16 gb RAM
each) - biggish VM’s (python 2.7.8):
$ python ins.py
<connection object at 0x7fb2cf475910; dsn: 'dbname=test user=postgres
host=localhost port=20020', closed: 0>
ignoring exc relation "foo" already exists
...............................30000 51 s 588.00 recs/s
..............................60000 48 s 625.00 recs/s
..............................90000 49 s 612.00 recs/s
..............................120000 49 s 612.00 recs/s
..............................150000 48 s 625.00 recs/s
..............................180000 51 s 588.00 recs/s
..............................210000 51 s 588.00 recs/s
..............................240000 50 s 600.00 recs/s
..............................270000 50 s 600.00 recs/s
..............................300000 49 s 612.00 recs/s
..............................330000 47 s 638.00 recs/s
..............................360000 48 s 625.00 recs/s
..............................390000 51 s 588.00 recs/s
..............................420000 52 s 576.00 recs/s
..............................450000 50 s 600.00 recs/s
..............................480000 50 s 600.00 recs/s
Running this on a stock postgres 9.4.2 on a separate VM (same spec), I
get (ignore the exception noise):
$ python ins.py
<connection object at 0x7f9f32520910; dsn: 'dbname=test user=postgres
host=localhost port=5432', closed: 0>
ignoring exc table "foor" does not exist
ignoring exc syntax error at or near "DISTRIBUTE"
LINE 1: ... (id numeric(6,0), name character varying(30)) DISTRIBUTE...
^
ignoring exc syntax error at or near "DISTRIBUTE"
LINE 1: ... (id numeric(6,0), name character varying(30)) DISTRIBUTE...
^
ignoring exc current transaction is aborted, commands ignored until end
of transaction block
...............................30000 5 s 6000.00 recs/s
..............................60000 5 s 6000.00 recs/s
..............................90000 5 s 6000.00 recs/s
..............................120000 5 s 6000.00 recs/s
..............................150000 5 s 6000.00 recs/s
..............................180000 5 s 6000.00 recs/s
..............................210000 5 s 6000.00 recs/s
..............................240000 5 s 6000.00 recs/s
..............................270000 5 s 6000.00 recs/s
..............................300000 5 s 6000.00 recs/s
..............................330000 6 s 5000.00 recs/s
..............................360000 5 s 6000.00 recs/s
..............................390000 5 s 6000.00 recs/s
..............................420000 5 s 6000.00 recs/s
..............................450000 5 s 6000.00 recs/s
..............................480000 5 s 6000.00 recs/s
I observe that PG XL is **much** slower at doing this simple insert, _by
a factor of 10_.
It’s something to do with the INSERTs. I know pgxl CAN do a
CREATE TABLE .. from DBLINK ( some other postgres db)
And ingest millions of records very quickly - so there’s no issue with
the underlying storage, or fanning the data out to the shards, it’s just
INSERTs that are way slow.
I’ve had to create a stock pg 9.4 database and populate it just so I can
then **later** re-import all the data via dblinks - happy with the
extra step for testing, but in production, that won't work.
Am I doing something wrong here ? Again the SELECT and
create-from-dblink seem to say my VM/Disk/PGXL *can* do it !
Thanks for any help or suggestions you may offer.
Best regards
Freddie Vander Elst
*Frederic Vander Elst
Head of Special Projects
Experian pH
*Royalty Studios | 105-109 Lancaster Road | London | W11 1QF
T: 442075980310 | M: +447939582838
www.experian.co.uk <http://www.experian.co.uk/>| www.phgroup.com
<http://www.phgroup.com/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.postgres-xl.org/private.cgi/postgres-xl-bugs-postgres-xl.org/attachments/20150528/cb6a0f9a/attachment.htm>
More information about the postgres-xl-bugs
mailing list