[Postgres-xl-general] Best XL Configuration For Large Volume INSERT Statements
koichi.dbms at gmail.com
Tue Sep 30 08:43:08 PDT 2014
I'm afraid there are almost nothing for the tuning of this type of
query. Instead, I think XL needs some improvement in the core so
that intermediate result can be cached in a better fashion and
transferred to the target datanodes using COPY protocol.
Further, if myschema.projects is distributed by smb_project_id and
myschema.customers is distributed by id, the whole statment, including
insert, can be completely pushable to datanodes. Target datanode for
each tuple is local.
I'm afraid current XL planner may not produce such a plan.
Any further inputs?
2014-10-01 0:03 GMT+09:00 Silva, Richard <rsilva at javelin.mg>:
> Postgres-XL Community,
> Our XL environment currently has 10 data-nodes, 1 coordinator, 1 gtm proxy,
> 1 gtm, 1 gtm standby all running on 8 core machines with 16GB of RAM and 1TB
> SSD drives. Up to this point, everything has been considerably faster than
> our single server production environment with the exception of INSERT
> We conduct a simple join of two tables and write the results out to a new
> CREATE TABLE myschema.mytable AS
> SELECT m.id
> FROM (SELECT smb_project_id
> FROM myschema.projects) AS p
> INNER JOIN (SELECT id
> FROM myschema.customers) AS m
> ON m.project_id = p.project_id;
> The resulting SELECT statement returns close to 100 million records which
> then get inserted into a table for further aggregation (among other things).
> The current run time averages about 45 minutes for both physical and
> temporary tables. I understand there are lots of different ways to
> accomplish inserts of this nature such as COPY; however, thats not really
> what I’m looking for.
> What are the recommended configuration settings for XL when it comes to
> large volume inserts?
> Here are the postgresql.conf settings I’ve changed on the datanodes, which
> have had little impact.
> wal_level = minimal
> wal_sync_method = fdatasync
> synchronous_commit = off
> default_statistics_target = 50
> maintenance_work_mem = 208MB
> constraint_exclusion = on
> checkpoint_completion_target = 0.9
> effective_cache_size = 2560MB
> work_mem = 20MB
> wal_buffers = -1
> checkpoint_segments = 16
> shared_buffers = 832MB
> max_prepared_transactions = 100
> Any suggestions or possible links to suggestions?
> Richard Silva ETL Architect
> Javelin • Making the brand promise personal, profitable and provable.
> Access Manager: This email is intended only for the person or entity to
> which it is addressed and may contain information that is privileged,
> confidential or otherwise protected from disclosure. Dissemination,
> distribution or copying of this e-mail or the information herein by anyone
> other than the intended recipient, or an employee or agent responsible for
> delivering the message to the intended recipient, is prohibited. If you have
> received this e-mail in error, please immediately notify us by calling the
> Help Desk at (855) 237-8324.
> Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
> Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
> Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
> Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
> Postgres-xl-general mailing list
> Postgres-xl-general at lists.sourceforge.net
More information about the postgres-xl-general