[Postgres-xl-developers] join restriction

Tomas Vondra tomas.vondra at 2ndquadrant.com
Wed Jun 21 13:11:49 PDT 2017

Hello senhu,

On 6/21/17 9:36 AM, senhu(胡森) wrote:
> Hello, all:
> When I run some join queries on PG-XL 9.6, I find that if the data
> types of two join args are different, the
> data redistribution is forbidded, and I find this restriction on
> source code. In function ‘set_joinpath_distribution’,

Can you point us to the query triggering this?

> there is piece of code:
>    /*
>     * Check if both parts are of the same data type and choose
>     * distribution type to redistribute.
>     * XXX We may want more sophisticated algorithm to choose
>     * the best condition to redistribute parts along.
>     * For now use simple but reliable approach.
>     */
>     if (leftType != rightType)
>         continue;
> If we can remove this restriction to make join more efficient with 
> different join data types, such as int join bigint.

I guess the reason is that even if the two values end up being equal 
(after casing one of the values to the other type), that may not apply 
to the hash value. And if the hash values are different, the values may 
end up on different data nodes, breaking the whole distribution logic.

It may work for hashint4() and hashint8(), but it probably does not work 
for hashint4() and hashtext(). So the question is if we can identify 
which combinations of hash functions are safe, and which are not.

That should be fairly stable, because hash partitioning relies on that 
too, I believe.

> On the other hand, we can add some other conditions to make the 
> decision. For example, we can use the row counts of both
> Left and right rels of join.
> If both rels have small row counts, data redistributed is not needed, 
> gather the tuples of two rels, and join on one node.
> Otherwise, we try to redistribute the smaller rel as possible.

I think this should be doable by simply constructing two paths with 
different costs and distributions. But I'm not sure it's worth it, as it 
will move all the work on a single node.


Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

More information about the Postgres-xl-developers mailing list