[Postgres-xl-developers] Postgres-XL 9.6 merge / resolving execute on any -> all plan changes

Tomas Vondra tomas.vondra at 2ndquadrant.com
Tue May 23 15:07:03 PDT 2017


Hi XL hackers,

You are probably aware that we've pushed Postgres-XL 9.6 code (that is, 
Postgres-XL merging PostgreSQL 9.6 improvements) into the public repo a 
few days ago.

The announcement and some additional details is available here:

     https://blog.2ndquadrant.com/whats-new-in-postgres-xl-9-6/

The current code resolves all merge conflicts and most regression 
failures, and is fairly stable, there's a bunch of remaining issues 
(e.g. regression failures) that need fixing before XL 9.6 can be 
considered "stable".

Let me discuss one of the main causes of regression failures - I do 
believe my analysis and solution is correct, but I'm not entirely sure 
about that, and so would appreciate feedback from other developers.

The failures are caused by plans switching from

     -> Remote Subquery Scan on all (datanode_1)

to

     -> Remote Subquery Scan on any (datanode_1,datanode_2)

After a bit of investigation I've realized the plans change thanks to 
this block in adjust_subplan_distribution:

     if (subd && !equal(subd, pathd))
     {
         ... if "on any" then pick one node and switch to "on all" ...
     }

On XL 9.5 the condition evaluates as true and so the restriction 
switches from "any" to "all" (with just a single node). On XL 9.6 the 
condition is however evaluated as "false" because the two distributions 
are equal, and so the plans keep the "any" restriction.

I'm not entirely sure why XL 9.6 ends up with

     !equal(subd, pathd)

being true, while XL 9.5 evaluates this as false. I do believe it is 
related to the upper-planner pathification (upstream commit 3fc6e2d7f), 
which caused quite a few conflicts with the XL planner code.

An alternative possibility might be that we do set / propagate the 
distributions for paths differently, but so far I haven't found any 
issue in that.

I have however found out that simply removing the equal() part of the 
condition (see the trivial patch attached) fixes pretty much all these 
plan changes, and the regression.diff gets about 30% smaller.

While fixing the various merge conflicts, one of the principles I tried 
to follow as much as possible was not introducing unnecessary behavior 
or plan changes. This fix is consistent with this principle.

The question is whether that's the right / correct / best solution, and 
I'd appreciate your opinions on this.


FWIW I think we'll soon get rid of adjust_subplan_distribution() 
altogether. It used to be necessary because of the uppler-planner 
working with plans and not paths, but that is no longer the case. So 
ultimately most changes in createplan.c should disappear and move to the 
path construction. But that's something we shouldn't do until the code 
is stabilized.

I also wonder why exactly are we picking the node while constructing the 
plan, and not sooner (while constructing the path) or later (during 
execution). I can see arguments for both options, but createplan.c seems 
a bit weird. Can someone explain why it's done here?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
-------------- next part --------------
A non-text attachment was scrubbed...
Name: subplan-distribution-fix.patch
Type: text/x-patch
Size: 531 bytes
Desc: not available
URL: <http://lists.postgres-xl.org/private.cgi/postgres-xl-developers-postgres-xl.org/attachments/20170524/3a48807a/attachment.bin>


More information about the Postgres-xl-developers mailing list