[postgres-xl-general] something wrong with my Postgres-XL

Tomas Vondra tomas.vondra at 2ndquadrant.com
Fri Oct 27 14:37:54 PDT 2017


I think the schema is sufficient for now. I can generate some synthetic
data on my laptop and investigate locally, which is more convenient than
remote access. Thanks for the offer, though!

regards

On 10/27/2017 10:24 PM, Alexander Voytsekhovskyy wrote:
> Do you need the data also, or just tables structure?
> 
> If you need the data also, can i send you postgresql access with a private message (it’s a test servers, you can do all you want there)
> 
>> Can you share the DDL to create the tables?
>>
>> On 10/27/2017 06:57 PM, Alexander Voytsekhovskyy wrote:
>>> It’s become a bit better, but still 10x times slower than normal Postgres
>>>
>>>
>>> Postgres XL:
>>>
>>> ddc_splat=# explain analyze SELECT sum(amount)
>>> ddc_splat-# FROM delivery_data
>>> ddc_splat-# WHERE delivery_data.delivery_date between '2017-01-01' and '2017-06-30' 
>>> ddc_splat-# group by id_product;
>>>                                                                            QUERY PLAN                                                                            
>>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> HashAggregate  (cost=5348565.74..5348568.96 rows=258 width=36) (actual time=120302.639..120302.694 rows=268 loops=1)
>>>   Group Key: delivery_data_1.id_product
>>>   ->  Result  (cost=100.00..4872914.45 rows=95130258 width=10) (actual time=5.199..108002.749 rows=95131345 loops=1)
>>>         ->  Append  (cost=100.00..3921611.87 rows=95130258 width=10) (actual time=5.186..86639.201 rows=95131345 loops=1)
>>>               ->  Remote Subquery Scan on all (datanode1,datanode2)  (cost=100.00..100.02 rows=1 width=10) (actual time=3.924..3.924 rows=0 loops=1)
>>>               ->  Remote Subquery Scan on all (datanode2)  (cost=100.00..1715101.85 rows=41565557 width=10) (actual time=1.261..9182.957 rows=41566663 loops=1)
>>>               ->  Remote Subquery Scan on all (datanode1)  (cost=100.00..2206410.00 rows=53564700 width=10) (actual time=2.393..73932.986 rows=53564682 loops=1)
>>> Planning time: 3.807 ms
>>> Execution time: 120 320.977 ms
>>> (9 rows)
>>>
>>>
>>> PostgreSQL 10:
>>>
>>> Time: 0.880 ms
>>> ddc_splat=# explain analyze SELECT sum(amount)
>>> ddc_splat-# FROM delivery_data
>>> ddc_splat-# WHERE delivery_data.delivery_date between '2017-01-01' and '2017-06-30' 
>>> ddc_splat-# group by id_product;
>>>                                                                                    QUERY PLAN                                                                                    
>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> Finalize GroupAggregate  (cost=1639431.29..1639443.79 rows=200 width=36) (actual time=10002.406..10003.308 rows=268 loops=1)
>>>   Group Key: delivery_data.id_product
>>>   ->  Sort  (cost=1639431.29..1639433.79 rows=1000 width=36) (actual time=10002.396..10002.474 rows=1414 loops=1)
>>>         Sort Key: delivery_data.id_product
>>>         Sort Method: quicksort  Memory: 185kB
>>>         ->  Gather  (cost=1639278.96..1639381.46 rows=1000 width=36) (actual time=10001.554..10002.021 rows=1414 loops=1)
>>>               Workers Planned: 5
>>>               Workers Launched: 5
>>>               ->  Partial HashAggregate  (cost=1638278.96..1638281.46 rows=200 width=36) (actual time=9994.173..9994.316 rows=236 loops=6)
>>>                     Group Key: delivery_data.id_product
>>>                     ->  Result  (cost=0.00..1543145.13 rows=19026766 width=10) (actual time=0.048..6148.430 rows=15855637 loops=6)
>>>                           ->  Append  (cost=0.00..1352877.47 rows=19026766 width=10) (actual time=0.047..4866.553 rows=15855637 loops=6)
>>>                                 ->  Parallel Seq Scan on delivery_data  (cost=0.00..0.00 rows=1 width=22) (actual time=0.000..0.000 rows=0 loops=6)
>>>                                       Filter: ((delivery_date >= '2017-01-01'::date) AND (delivery_date <= '2017-06-30'::date))
>>>                                 ->  Parallel Seq Scan on delivery_data_2017_q1  (cost=0.00..592759.76 rows=8313584 width=10) (actual time=0.047..1823.432 rows=6927986 loops=6)
>>>                                       Filter: ((delivery_date >= '2017-01-01'::date) AND (delivery_date <= '2017-06-30'::date))
>>>                                 ->  Parallel Seq Scan on delivery_data_2017_q2  (cost=0.00..760117.71 rows=10713181 width=10) (actual time=0.574..2156.324 rows=8927650 loops=6)
>>>                                       Filter: ((delivery_date >= '2017-01-01'::date) AND (delivery_date <= '2017-06-30'::date))
>>> Planning time: 29.080 ms
>>> Execution time: 10 005.189 ms
>>>
>>>
>>> ——
>>>
>>> Why it’s not grouping data on both nodes?
>>>
>>>
>>>>
>>>> Please use "reply to all" so that the messages are sent to the mailing
>>>> list and not just directly to me ...
>>>>
>>>> On 10/27/2017 04:37 PM, Alexander Voytsekhovskyy wrote:
>>>>> Hi,
>>>>>
>>>>> thanks for fast reply
>>>>>
>>>>
>>>> So, did that resolve the issue? What plan/performance do you get after
>>>> the ANALYZE?
>>>>
>>>>> here is statistic issue even after analyze on more simple example
>>>>>
>>>>
>>>> And what is the "issue"?
>>>>
>>>>>
>>>>> ddc_splat=# show  default_statistics_target ;
>>>>> default_statistics_target 
>>>>> ---------------------------
>>>>> 10000
>>>>> (1 row)
>>>>>
>>>>> Time: 0.249 ms
>>>>>
>>>>> ddc_splat=# ANALYZE delivery_data;
>>>>> ANALYZE
>>>>> Time: 53596.951 ms (00:53.597)
>>>>>
>>>>> ddc_splat=# ANALYZE delivery_data_2017_q1;
>>>>> ANALYZE
>>>>> Time: 16868.040 ms (00:16.868)
>>>>>
>>>>> ddc_splat=# explain analyze SELECT sum(amount)
>>>>> FROM delivery_data_2017_q1
>>>>> WHERE delivery_data_2017_q1.delivery_date between '2017-01-01' and '2017-06-30' 
>>>>> group by id_product;;
>>>>>                                                  QUERY PLAN                                                    
>>>>> -----------------------------------------------------------------------------------------------------------------
>>>>> Remote Fast Query Execution  (cost=0.00..0.00 rows=0 width=0) (actual time=3663.180..3663.203 rows=249 loops=1)
>>>>>  Node/s: datanode2
>>>>> Planning time: 0.065 ms
>>>>> Execution time: 3663.269 ms
>>>>> (4 rows)
>>>>>
>>>>> Time: 3665.608 ms (00:03.666)
>>>>> Time: 0.057 ms
>>>>>
>>>>>
>>>>> any suggestions?
>>>>>
>>>>
>>>> Well, you haven't really explained what is the issue, so I'm not sure
>>>> what is the question? But if you're wondering about the "rows=0"
>>>> estimate, then that's normal for "Remote Fast Query Execution".
>>>>
>>>> In these cases the query is only parsed on the coordinator, analyzed and
>>>> then shipped to the datanode directly. The planning (including
>>>> estimation) happens on the datanode directly, and so the numbers are not
>>>> available on the coordinator.
>>>>
>>>> regards
>>>>
>>>> -- 
>>>> Tomas Vondra                  http://www.2ndQuadrant.com
>>>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>>
>>
>> -- 
>> Tomas Vondra                  http://www.2ndQuadrant.com
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> 

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



More information about the postgres-xl-general mailing list