[Postgres-xl-general] Best XL Configuration For Large Volume INSERT Statements

Andrei Martsinchyk andrei.martsinchyk at gmail.com
Tue Sep 30 09:30:04 PDT 2014


Hi Richard,

If your tables are properly distributed that query can run much faster.
Both your source tables should be distributed by project_id (join key) and
result table should also be distributed by that field.
That may already be the case that your source tables are distributed
by project_id, so you will just need to have myschema.mytable is
distributed properly.
You can either add DISTRIBUTE BY clause to the CREATE TABLE statement:

CREATE TABLE myschema.mytable DISTRIBUTE BY hash(project_id) AS ...

or simply specify m.project_id first in the SELECT list.
Basically, by default XL chooses first hasheable column as a distribution
key.
The XL planner will do the job properly and launch 10 parallel queries, one
on each datanode, each operating on local data.
If distribution differs from suggested, the data shipping will occur, I
guess that's why the query is slower than you expect.



2014-09-30 18:43 GMT+03:00 Koichi Suzuki <koichi.dbms at gmail.com>:

> I'm afraid there are almost nothing for the tuning of this type of
> query.    Instead, I think XL needs some improvement in the core so
> that intermediate result can be cached in a better fashion and
> transferred to the target datanodes using COPY protocol.
>
> Further, if myschema.projects is distributed by smb_project_id and
> myschema.customers is distributed by id, the whole statment, including
> insert, can be completely pushable to datanodes.   Target datanode for
> each tuple is local.
>
> I'm afraid current XL planner may not produce such a plan.
>
> Any further inputs?
>
> Regards;
> ---
> Koichi Suzuki
>
>
> 2014-10-01 0:03 GMT+09:00 Silva, Richard <rsilva at javelin.mg>:
> > Postgres-XL Community,
> >
> >
> > Our XL environment currently has 10 data-nodes, 1 coordinator, 1 gtm
> proxy,
> > 1 gtm, 1 gtm standby all running on 8 core machines with 16GB of RAM and
> 1TB
> > SSD drives. Up to this point, everything has been considerably faster
> than
> > our single server production environment with the exception of INSERT
> > statements.
> >
> > Senario:
> > We conduct a simple join of two tables and write the results out to a new
> > table.
> >
> > CREATE TABLE myschema.mytable AS
> > SELECT m.id
> >       ,m.project_id
> >       ,m.account_id
> >       ,m.company_name
> >       ,p.start_date
> >       ,p.end_date
> > FROM (SELECT smb_project_id
> >             ,start_date
> >             ,end_date
> >       FROM myschema.projects) AS p
> >   INNER JOIN (SELECT id
> >                    ,project_id
> >                    ,account_id
> >                    ,company_name
> >              FROM myschema.customers) AS m
> >     ON m.project_id = p.project_id;
> >
> > The resulting SELECT statement returns close to 100 million records which
> > then get inserted into a table for further aggregation (among other
> things).
> > The current run time averages about 45 minutes for both physical and
> > temporary tables. I understand there are lots of different ways to
> > accomplish inserts of this nature such as COPY; however, thats not really
> > what I’m looking for.
> >
> > Question:
> > What are the recommended configuration settings for XL when it comes to
> > large volume inserts?
> >
> > Here are the postgresql.conf settings I’ve changed on the datanodes,
> which
> > have had little impact.
> >
> > wal_level = minimal
> > wal_sync_method = fdatasync
> > synchronous_commit = off
> > default_statistics_target = 50
> > maintenance_work_mem = 208MB
> > constraint_exclusion = on
> > checkpoint_completion_target = 0.9
> > effective_cache_size = 2560MB
> > work_mem = 20MB
> > wal_buffers = -1
> > checkpoint_segments = 16
> > shared_buffers = 832MB
> > max_prepared_transactions = 100
> >
> > Any suggestions or possible links to suggestions?
> >
> > searching...
> > Richard Silva ETL Architect
> > Javelin • Making the brand promise personal, profitable and provable.
> >
> >
> >
> >
> >
> > Access Manager: This email is intended only for the person or entity to
> > which it is addressed and may contain information that is privileged,
> > confidential or otherwise protected from disclosure. Dissemination,
> > distribution or copying of this e-mail or the information herein by
> anyone
> > other than the intended recipient, or an employee or agent responsible
> for
> > delivering the message to the intended recipient, is prohibited. If you
> have
> > received this e-mail in error, please immediately notify us by calling
> the
> > Help Desk at (855) 237-8324.
> >
> >
> >
> ------------------------------------------------------------------------------
> > Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
> > Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
> > Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
> > Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
> >
> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
> > _______________________________________________
> > Postgres-xl-general mailing list
> > Postgres-xl-general at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/postgres-xl-general
> >
>
>
> ------------------------------------------------------------------------------
> Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
> Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
> Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
> Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
>
> http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
> _______________________________________________
> Postgres-xl-general mailing list
> Postgres-xl-general at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/postgres-xl-general
>



-- 
Andrei Martsinchyk

StormDB - http://www.stormdb.com
The Database Cloud
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.postgres-xl.org/private.cgi/postgres-xl-general-postgres-xl.org/attachments/20140930/f0039f4c/attachment.htm>


More information about the postgres-xl-general mailing list