[Postgres-xl-developers] 答复: regression bug fix(Internet mail)

Tomas Vondra tomas.vondra at 2ndquadrant.com
Sun Oct 15 13:39:51 PDT 2017


Hi senhu,

On 10/12/2017 03:32 AM, senhu(胡森) wrote:
> Hi,tomas:
>     In function ' split_pathtarget_at_srfs ', we invoke ' add_new_columns_to_pathtarget ' to
> add expressions to pathtarget, but do not set sortgrouprefs of pathtarget. I think it may be better to
> set sortgrouprefs here.
> 

But those two functions (add_new_columns_to_pathtarget and
add_new_columns_to_pathtarget) are both from upstream code, right? So
why doesn't PostgreSQL have the same issue? I suspect XL is doing
something differently with the targetlists, but what?

For example the failing query in tsrf regression test

    SELECT dataa, generate_series(1,1) FROM few GROUP BY 1;

with your patch uses this plan:

                               QUERY PLAN
    ----------------------------------------------------------------
     ProjectSet
       Output: dataa, generate_series(1, 1)
       ->  Finalize HashAggregate
             Output: dataa
             Group Key: few.dataa
             ->  Remote Subquery Scan on all (datanode_1,datanode_2)
                   Output: dataa
                   ->  Partial HashAggregate
                         Output: dataa
                         Group Key: few.dataa
                         ->  Seq Scan on public.few
                               Output: id, dataa, datab
    (12 rows)

That is, it does 2-phase aggregation. So it's likely something about the
extra Remote Subquery node that breaks things. For example I've noticed
make_remotesubplan simply reuses lefttree->targetlist - could that be
the problem?

I'm asking because perhaps we can fix the root cause, instead of doing
some additional post-processing on the plan. If there really is a bug,
this may be just one place where we observed the impact - fixing the
original issue (if there's one) would fix it for good.

BTW setting max_parallel_workers_per_gather=0 disables building partial
paths, and somewhat that (somewhat unexpectedly) disables distributed
aggregation. And in that case the query is actually planned just fine
(even without your patch) with this plan

                               QUERY PLAN
    -----------------------------------------------------------------
     ProjectSet
       Output: dataa, generate_series(1, 1)
       ->  HashAggregate
             Output: dataa
             Group Key: few.dataa
             ->  Remote Subquery Scan on all (datanode_1,datanode_2)
                   Output: dataa
                   ->  Seq Scan on public.few
                         Output: dataa
    (9 rows)

That probably means Remote Subquery discards some important bit of the
subpath targetlist, no?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


More information about the Postgres-xl-developers mailing list