[postgres-xl-general] something wrong with my Postgres-XL

Alexander Voytsekhovskyy av at mobile-ua.com
Fri Oct 27 13:24:21 PDT 2017


Do you need the data also, or just tables structure?

If you need the data also, can i send you postgresql access with a private message (it’s a test servers, you can do all you want there)

> Can you share the DDL to create the tables?
> 
> On 10/27/2017 06:57 PM, Alexander Voytsekhovskyy wrote:
>> It’s become a bit better, but still 10x times slower than normal Postgres
>> 
>> 
>> Postgres XL:
>> 
>> ddc_splat=# explain analyze SELECT sum(amount)
>> ddc_splat-# FROM delivery_data
>> ddc_splat-# WHERE delivery_data.delivery_date between '2017-01-01' and '2017-06-30' 
>> ddc_splat-# group by id_product;
>>                                                                            QUERY PLAN                                                                            
>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> HashAggregate  (cost=5348565.74..5348568.96 rows=258 width=36) (actual time=120302.639..120302.694 rows=268 loops=1)
>>   Group Key: delivery_data_1.id_product
>>   ->  Result  (cost=100.00..4872914.45 rows=95130258 width=10) (actual time=5.199..108002.749 rows=95131345 loops=1)
>>         ->  Append  (cost=100.00..3921611.87 rows=95130258 width=10) (actual time=5.186..86639.201 rows=95131345 loops=1)
>>               ->  Remote Subquery Scan on all (datanode1,datanode2)  (cost=100.00..100.02 rows=1 width=10) (actual time=3.924..3.924 rows=0 loops=1)
>>               ->  Remote Subquery Scan on all (datanode2)  (cost=100.00..1715101.85 rows=41565557 width=10) (actual time=1.261..9182.957 rows=41566663 loops=1)
>>               ->  Remote Subquery Scan on all (datanode1)  (cost=100.00..2206410.00 rows=53564700 width=10) (actual time=2.393..73932.986 rows=53564682 loops=1)
>> Planning time: 3.807 ms
>> Execution time: 120 320.977 ms
>> (9 rows)
>> 
>> 
>> PostgreSQL 10:
>> 
>> Time: 0.880 ms
>> ddc_splat=# explain analyze SELECT sum(amount)
>> ddc_splat-# FROM delivery_data
>> ddc_splat-# WHERE delivery_data.delivery_date between '2017-01-01' and '2017-06-30' 
>> ddc_splat-# group by id_product;
>>                                                                                    QUERY PLAN                                                                                    
>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> Finalize GroupAggregate  (cost=1639431.29..1639443.79 rows=200 width=36) (actual time=10002.406..10003.308 rows=268 loops=1)
>>   Group Key: delivery_data.id_product
>>   ->  Sort  (cost=1639431.29..1639433.79 rows=1000 width=36) (actual time=10002.396..10002.474 rows=1414 loops=1)
>>         Sort Key: delivery_data.id_product
>>         Sort Method: quicksort  Memory: 185kB
>>         ->  Gather  (cost=1639278.96..1639381.46 rows=1000 width=36) (actual time=10001.554..10002.021 rows=1414 loops=1)
>>               Workers Planned: 5
>>               Workers Launched: 5
>>               ->  Partial HashAggregate  (cost=1638278.96..1638281.46 rows=200 width=36) (actual time=9994.173..9994.316 rows=236 loops=6)
>>                     Group Key: delivery_data.id_product
>>                     ->  Result  (cost=0.00..1543145.13 rows=19026766 width=10) (actual time=0.048..6148.430 rows=15855637 loops=6)
>>                           ->  Append  (cost=0.00..1352877.47 rows=19026766 width=10) (actual time=0.047..4866.553 rows=15855637 loops=6)
>>                                 ->  Parallel Seq Scan on delivery_data  (cost=0.00..0.00 rows=1 width=22) (actual time=0.000..0.000 rows=0 loops=6)
>>                                       Filter: ((delivery_date >= '2017-01-01'::date) AND (delivery_date <= '2017-06-30'::date))
>>                                 ->  Parallel Seq Scan on delivery_data_2017_q1  (cost=0.00..592759.76 rows=8313584 width=10) (actual time=0.047..1823.432 rows=6927986 loops=6)
>>                                       Filter: ((delivery_date >= '2017-01-01'::date) AND (delivery_date <= '2017-06-30'::date))
>>                                 ->  Parallel Seq Scan on delivery_data_2017_q2  (cost=0.00..760117.71 rows=10713181 width=10) (actual time=0.574..2156.324 rows=8927650 loops=6)
>>                                       Filter: ((delivery_date >= '2017-01-01'::date) AND (delivery_date <= '2017-06-30'::date))
>> Planning time: 29.080 ms
>> Execution time: 10 005.189 ms
>> 
>> 
>> ——
>> 
>> Why it’s not grouping data on both nodes?
>> 
>> 
>>> 
>>> Please use "reply to all" so that the messages are sent to the mailing
>>> list and not just directly to me ...
>>> 
>>> On 10/27/2017 04:37 PM, Alexander Voytsekhovskyy wrote:
>>>> Hi,
>>>> 
>>>> thanks for fast reply
>>>> 
>>> 
>>> So, did that resolve the issue? What plan/performance do you get after
>>> the ANALYZE?
>>> 
>>>> here is statistic issue even after analyze on more simple example
>>>> 
>>> 
>>> And what is the "issue"?
>>> 
>>>> 
>>>> ddc_splat=# show  default_statistics_target ;
>>>> default_statistics_target 
>>>> ---------------------------
>>>> 10000
>>>> (1 row)
>>>> 
>>>> Time: 0.249 ms
>>>> 
>>>> ddc_splat=# ANALYZE delivery_data;
>>>> ANALYZE
>>>> Time: 53596.951 ms (00:53.597)
>>>> 
>>>> ddc_splat=# ANALYZE delivery_data_2017_q1;
>>>> ANALYZE
>>>> Time: 16868.040 ms (00:16.868)
>>>> 
>>>> ddc_splat=# explain analyze SELECT sum(amount)
>>>> FROM delivery_data_2017_q1
>>>> WHERE delivery_data_2017_q1.delivery_date between '2017-01-01' and '2017-06-30' 
>>>> group by id_product;;
>>>>                                                  QUERY PLAN                                                    
>>>> -----------------------------------------------------------------------------------------------------------------
>>>> Remote Fast Query Execution  (cost=0.00..0.00 rows=0 width=0) (actual time=3663.180..3663.203 rows=249 loops=1)
>>>>  Node/s: datanode2
>>>> Planning time: 0.065 ms
>>>> Execution time: 3663.269 ms
>>>> (4 rows)
>>>> 
>>>> Time: 3665.608 ms (00:03.666)
>>>> Time: 0.057 ms
>>>> 
>>>> 
>>>> any suggestions?
>>>> 
>>> 
>>> Well, you haven't really explained what is the issue, so I'm not sure
>>> what is the question? But if you're wondering about the "rows=0"
>>> estimate, then that's normal for "Remote Fast Query Execution".
>>> 
>>> In these cases the query is only parsed on the coordinator, analyzed and
>>> then shipped to the datanode directly. The planning (including
>>> estimation) happens on the datanode directly, and so the numbers are not
>>> available on the coordinator.
>>> 
>>> regards
>>> 
>>> -- 
>>> Tomas Vondra                  http://www.2ndQuadrant.com
>>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>> 
> 
> -- 
> Tomas Vondra                  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




More information about the postgres-xl-general mailing list