[Postgres-xl-general] Best XL Configuration For Large Volume INSERT Statements
Mason Sharp
msharp at translattice.com
Tue Sep 30 09:27:29 PDT 2014
Hi Richard,
On Tue, Sep 30, 2014 at 11:03 AM, Silva, Richard <rsilva at javelin.mg> wrote:
> Postgres-XL Community,
>
> Our XL environment currently has 10 data-nodes, 1 coordinator, 1 gtm
> proxy, 1 gtm, 1 gtm standby all running on 8 core machines with 16GB of RAM
> and 1TB SSD drives. Up to this point, everything has been considerably
> faster than our single server production environment with the exception of
> INSERT statements.
>
>
Great to hear that it is faster in most cases.
> *Senario:*
> We conduct a simple join of two tables and write the results out to a new
> table.
>
> CREATE TABLE myschema.mytable AS
> SELECT m.id
> ,m.project_id
> ,m.account_id
> ,m.company_name
> ,p.start_date
> ,p.end_date
> FROM (SELECT smb_project_id
> ,start_date
> ,end_date
> FROM myschema.projects) AS p
> INNER JOIN (SELECT id
> ,project_id
> ,account_id
> ,company_name
> FROM myschema.customers) AS m
> ON m.project_id = p.project_id;
>
> The resulting SELECT statement returns close to 100 million records
> which then get inserted into a table for further aggregation (among other
> things). The current run time averages about 45 minutes for both physical
> and temporary tables. I understand there are lots of different ways to
> accomplish inserts of this nature such as COPY; however, thats not really
> what I’m looking for.
>
> *Question:*
> What are the recommended configuration settings for XL when it comes to
> large volume inserts?
>
> Here are the postgresql.conf settings I’ve changed on the datanodes,
> which have had little impact.
>
> wal_level = minimal
> wal_sync_method = fdatasync
> synchronous_commit = off
> default_statistics_target = 50
> maintenance_work_mem = 208MB
> constraint_exclusion = on
> checkpoint_completion_target = 0.9
> effective_cache_size = 2560MB
> work_mem = 20MB
> wal_buffers = -1
> checkpoint_segments = 16
> shared_buffers = 832MB
> max_prepared_transactions = 100
>
>
Any suggestions or possible links to suggestions?
>
>
As Koichi Suzuki mentioned, joining on respective partition keys would
help. While not ideal, a bit of denormalization could help, say, putting
customer id in the projects table and including that in the WHERE clause.
You could try increasing work_mem some in the current session using SET. I
suspect you won't get too much of a bump in performance, however.
Another parameter to try is to increase shared_queue_size. Increasing it
will take up more shared memory, however, do so cautiously. Here, too, I
expect you won't get too much of an increase, however.
Are you CPU bound when this runs? I recently was looking at some code and
saw something I think I can improve in execution of row shipping to make it
more efficient. I will post a patch when I can.
Regards,
Mason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.postgres-xl.org/pipermail/postgres-xl-general-postgres-xl.org/attachments/20140930/ae06bbe2/attachment.htm>
More information about the postgres-xl-general
mailing list