[Postgres-xl-general] Failed to close remote subplan

Aaron Jackson ajackson at revionics.com
Tue Sep 23 10:47:43 PDT 2014


We have 6GB shared_buffers and 1GB work_mem per datanode.

I'm going to pull from git and rebuild.

Aaron
________________________________
From: Mason Sharp [msharp at translattice.com]
Sent: Tuesday, September 23, 2014 12:30 PM
To: Aaron Jackson
Cc: postgres-xl-general at lists.sourceforge.net
Subject: Re: [Postgres-xl-general] Failed to close remote subplan



On Tue, Sep 23, 2014 at 11:41 AM, Aaron Jackson <ajackson at revionics.com<mailto:ajackson at revionics.com>> wrote:
The table in question is structured something like this:

create table client.sales(
    storeid        int,
    productid      int,
    revenue        numeric
) with( OIDS = false ) distribute by hash(storeid)

So, I decided to try to aggregate revenue by store rather than product with a limit.  This works just fine, so I believe it may be related to performing a limited aggregation on a non-distributed column.

I am having trouble reproducing this.

test1=# create schema client;
CREATE SCHEMA

test1=# create table client.sales(
    storeid        int,
    productid      int,
    revenue        numeric
) with( OIDS = false ) distribute by hash(storeid);
CREATE TABLE

test1=# insert into client.sales select ceiling(2000 * random()), ceiling(5000 * random()), generate_series(1,100000);
INSERT 0 100000

Then I could successfully run both of these queries:

SELECT s.productid, count(s.revenue) AS revenue FROM client.sales s group by s.productid LIMIT 1000;

SELECT s.productid, sum(s.revenue) AS revenue FROM client.sales s group by s.productid LIMIT 1000;

My explain plans look a little different though.

test1=#          explain  SELECT s.productid, count(s.revenue) AS revenue FROM client.sales s group by s.productid LIMIT 1000;
                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Limit  (cost=2042.00..2052.00 rows=1000 width=10)
   ->  HashAggregate  (cost=2042.00..2066.97 rows=2497 width=10)
         ->  Remote Subquery Scan on all (dn1,dn2)  (cost=0.00..1542.00 rows=100000 width=10)
               ->  HashAggregate  (cost=0.00..1542.00 rows=2497 width=10)
                     ->  Seq Scan on sales s  (cost=0.00..1542.00 rows=100000 width=10)
(5 rows)

I am just running two datanodes for this though, not 6 like you.

Does this above work for you ok?

Is there anything in the datanode log files? Like hitting some resource maximum?

Also, I don't think it would impact you, but have you increased shared_buffers and work_mem from the defaults in the datanodes?

Thanks,

Mason





Aaron
________________________________
From: Aaron Jackson [ajackson at revionics.com<mailto:ajackson at revionics.com>]
Sent: Tuesday, September 23, 2014 10:15 AM
To: postgres-xl-general at lists.sourceforge.net<mailto:postgres-xl-general at lists.sourceforge.net>
Subject: Re: [Postgres-xl-general] Failed to close remote subplan

I've made a smallish replica table client.sales with a productid and revenue.  The idea is to summarize revenue by product...

EXPLAIN SELECT s.productid, sum(s.revenue) AS revenue FROM client.sales s group by s.productid LIMIT 1000;

 Limit  (cost=13820090.77..13833634.87 rows=1000 width=10)
   ->  GroupAggregate  (cost=13820090.77..14297425.44 rows=35243 width=10)
         ->  Remote Subquery Scan on all (datanode1,datanode2,datanode3,datanode4,datanode5,datanode6)  (cost=1382
0090.77..13979084.85 rows=63597632 width=10)
               ->  GroupAggregate  (cost=13820090.77..13979084.85 rows=35243 width=10)
                     ->  Sort  (cost=13820090.77..13979084.85 rows=63597632 width=10)
                           Sort Key: productid
                           ->  Seq Scan on sales s  (cost=0.00..1229548.32 rows=63597632 width=10)

Wanted to compare that to a similar non-aggregate query.. which works.

EXPLAIN SELECT s.productid, count(s.revenue) AS revenue FROM client.sales s group by s.productid LIMIT 1000;

 Limit  (cost=13820090.77..13833634.87 rows=1000 width=10)
   ->  GroupAggregate  (cost=13820090.77..14297425.44 rows=35243 width=10)
         ->  Remote Subquery Scan on all (datanode1,datanode2,datanode3,datanode4,datanode5,datanode6)  (cost=13820090.77..13979084.85 rows=63597632 width=10)
               ->  GroupAggregate  (cost=13820090.77..13979084.85 rows=35243 width=10)
                     ->  Sort  (cost=13820090.77..13979084.85 rows=63597632 width=10)
                           Sort Key: productid
                           ->  Seq Scan on sales s  (cost=0.00..1229548.32 rows=63597632 width=10)

And to verify that this fails on aggregates, I did an average as well.

SELECT s.productid, avg(s.revenue) AS revenue FROM client.sales s group by s.productid LIMIT 10;
ERROR:  Failed to close remote subplan

Note that I also adjusted the limit down to see if that was the bottleneck.  Then I eliminated the limit entirely.

SELECT s.productid, sum(s.revenue) AS revenue FROM client.sales s group by s.productid;
SELECT s.productid, avg(s.revenue) AS revenue FROM client.sales s group by s.productid;

Both queries without the limit clause succeeded.

Aaron
________________________________
From: Aaron Jackson [ajackson at revionics.com<mailto:ajackson at revionics.com>]
Sent: Monday, September 22, 2014 3:21 PM
To: postgres-xl-general at lists.sourceforge.net<mailto:postgres-xl-general at lists.sourceforge.net>
Subject: [Postgres-xl-general] Failed to close remote subplan

I promise to stop throwing out questions... soon!

I have a case where most queries work just fine, however, the following query fails every time, with the message "ERROR: Failed to close remote subplan".

SELECT s.productid, sum(s.value) AS revenue FROM ns.summary s group by s.productid LIMIT 1000

If I modify the query every so slightly, it's fine...

SELECT s.productid, count(s.value) AS revenue FROM ns.summary s group by s.productid LIMIT 1000

The coordinator (DEBUG1) logs "timeout while waiting for response" - which corresponds to the post-timeout in pgxc_node_receive.  Unfortunately, the datanodes don't seem to have much information.  Anyone have a specific suggestion about where to look to see what the remote data node did not respond.

Aaron

------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk
_______________________________________________
Postgres-xl-general mailing list
Postgres-xl-general at lists.sourceforge.net<mailto:Postgres-xl-general at lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/postgres-xl-general




--
Mason Sharp

TransLattice - http://www.translattice.com<http://www.translattice.com/>
Clustered and Distributed Database Solutions


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.postgres-xl.org/private.cgi/postgres-xl-general-postgres-xl.org/attachments/20140923/dfbe81a6/attachment.htm>


More information about the postgres-xl-general mailing list