[Postgres-xl-developers] Why the first 2 backend processes cause another 2 backend processes to handle the insert statement?

Rui Hai Jiang ruihaijiang at msn.com
Mon Sep 14 23:41:30 PDT 2015


Hello,

After days of investigation, I found that the reason is caused by the REMOTESUBPLAN in RemoteStatement #1. Coordinator composes RemoteStatement #1 and sends it to both datanode1[backend1] and datanode2[backend2]. The Result sub-plan is included in a REMOTESUBPLAN.

When datanode1[ backend1 ] executes REMOTESUBPLAN, it sends RemoteStatement #2 to  datanode1[ backend3 ], and then datanode1[ backend3 ] sends back the result.   Similarly, when datanode2[ backend2 ] executes REMOTESUBPLAN, it sends RemoteStatement #2 to  datanode1[ backend4 ], and then datanode1[ backend4 ], sends back the result.

RemoteStatement #1:
REMOTESTMT {
        planTree {
                plans=REMOTESUBPLAN {
                        lefttree=RESULT 
                }
        }                         
}

coordinator  ---( RemoteStatement #1)---->  datanode1[ backend1 ] --- (RemoteStatement #2)--> datanode1[ backend3 ]
   |                                                                                             ^                                                                                           |
   |                                                                                              | -----------<-- (Result) --------------------------<-------------+
   V
   |
   +-----------( RemoteStatement #1)---->  datanode2[ backend2 ] ----( RemoteStatement #2)--> datanode1[ backend4 ]
                                                                                              ^                                                                                               |
                                                                                               |----------------------<------(Result)--<---------------------------+


I think both datanode1[ backend1 ] and datanode2[ backend2 ] can get the result locally and there is no need to get the result by the REMOTESUBPLAN.

Do you think it’s necessary for the Coordinator to include the Result sub-plan in a REMOTESUBPLAN? 

Thanks,
Rui Hai




From: Rui Hai Jiang
Sent: 2015年8月31日 11:20
To: postgres-xl-developers at lists.sourceforge.net
Subject: [Postgres-xl-developers] Why the first 2 backend processes cause another 2 backend processes to handle the insert statement?


Hello,

I set up a Postgres-XL cluster and did some testing. I have one Coordinator and 2 Datanodes as following,

     node_name | node_type | node_port |    node_host    | nodeis_primary | nodeis_preferred |  node_id   
    -----------+-----------+-----------+-----------------+----------------+------------------+------------
     coord1    | C         |      5432 | 192.168.187.131 | f              | f                | 1885696643
     datanode1 | D         |     15432 | 192.168.187.132 | t              | f                |  888802358
     datanode2 | D         |     15433 | 192.168.187.133 | f              | f                | -905831925
    


I created the table,
        create table contact( id int, name text, phone varchar(30)) DISTRIBUTE BY REPLICATION;


and then run the insert,
        insert into contact values ( 1,'tom','1212121');

I found on datanode1, there are 3 backend processes handling the statement, and on datanode2 there is one backend process. Totally, 4 backend processes are involved.
    backend processes on datanode1:
    backend#1:postgres 18666 13460  0 11:02 ?        00:00:00 postgres: postgres test 192.168.187.131(44100) idle
    backend#3:postgres 18667 13460  0 11:02 ?        00:00:00 postgres: postgres test 192.168.187.132(36483) idle
    backend#4:postgres 18668 13460  0 11:02 ?        00:00:00 postgres: postgres test 192.168.187.133(50279) idle

    backend processes on datanode2:
    backend#2:postgres 15700 10475  0 11:02 ?        00:00:00 postgres: postgres test 192.168.187.131(47195) idle


I did some debuging and found out what happend. 
    
    When coord1 receives the insert query, it sends RemoteStatement #1 to both datanode1 and datanode2. 
    On datanode1, the backend process #1 is forked to handle RemoteStatement #1. 
    On datanode2, backend #2 is forked to handle RemoteStatement #1. 
    Next, both  backend #1 and backend #2 send RemoteStatement #2 to datanode1 so that two other processes, backend #3 and backend #4 are forked to handle the RemoteStatements.
    So totally, 4 backend processes are involved.
    


My questions are,
    1-Why is RemoteStatement #2 used here?  Why backend #3 and backend #4 are forked and what do they do?
    2-Are there any design document on message exchanging between all kinds of nodes?
                    
                   
RemoteStatement #1:
{REMOTESTMT :commandType 3 :hasReturning false :planTree {MODIFYTABLE :startup_cost 0.00 :total_cost 0.01 :plan_rows 1 :plan_width 0 :targetlist <> :qual <> :lefttree <> :righttree <> :initPlan <> :extParam (b) :allParam (b) :operation 3 :canSetTag true :resultRelations (i 1) :resultRelIndex 0 :plans ({REMOTESUBPLAN :startup_cost 0.00 :total_cost 0.01 :plan_rows 1 :plan_width 0 :targetlist ({TARGETENTRY :expr {CONST :consttype pg_catalog int4 :consttypmod -1 :constcollid <> <> -1 :constlen 4 :constbyval true :constisnull false :location 29 :constvalue \1} :resno 1 :resname id :ressortgroupref 0 :resorigtbl <> <> :resorigcol 0 :resjunk false} {TARGETENTRY :expr {CONST :consttype pg_catalog text :consttypmod -1 :constcollid pg_catalog default -1 :constlen -1 :constbyval false :constisnull false :location 31 :constvalue tom} :resno 2 :resname name :ressortgroupref 0 :resorigtbl <> <> :resorigcol 0 :resjunk false} {TARGETENTRY :expr {CONST :consttype pg_catalog varchar :consttypmod 34 :constcollid pg_catalog default -1 :constlen -1 :constbyval false :constisnull false :location -1 :constvalue \1212121} :resno 3 :resname phone :ressortgroupref 0 :resorigtbl <> <> :resorigcol 0 :resjunk false}) :qual <> :lefttree {RESULT :startup_cost 0.00 :total_cost 0.01 :plan_rows 1 :plan_width 0 :targetlist ({TARGETENTRY :expr {CONST :consttype pg_catalog int4 :consttypmod -1 :constcollid <> <> -1 :constlen 4 :constbyval true :constisnull false :location 29 :constvalue \1} :resno 1 :resname id :ressortgroupref 0 :resorigtbl <> <> :resorigcol 0 :resjunk false} {TARGETENTRY :expr {CONST :consttype pg_catalog text :consttypmod -1 :constcollid pg_catalog default -1 :constlen -1 :constbyval false :constisnull false :location 31 :constvalue tom} :resno 2 :resname name :ressortgroupref 0 :resorigtbl <> <> :resorigcol 0 :resjunk false} {TARGETENTRY :expr {CONST :consttype pg_catalog varchar :consttypmod 34 :constcollid pg_catalog default -1 :constlen -1 :constbyval false :constisnull false :location -1 :constvalue \1212121} :resno 3 :resname phone :ressortgroupref 0 :resorigtbl <> <> :resorigcol 0 :resjunk false}) :qual <> :lefttree <> :righttree <> :initPlan <> :extParam (b) :allParam (b) :resconstantqual <>} :righttree <> :initPlan <> :extParam (b) :allParam (b) :scanrelid 0 :distributionType R :distributionKey 0 :distributionNodes (i 0 1) :distributionRestrict (i 0 1) :nodeList (i 0) :execOnAll true :sort <> :cursor p_1_1619_1 :unique 0}) :returningLists <> :rowMarks <> :epqParam 0} :rtable ({RTE :alias <> :eref {ALIAS :aliasname contact :colnames ("id" "name" "phone")} :rtekind 0 :relid public contact :relkind r :inh false :inFromCl false :requiredPerms 1 :checkAsUser 0 :selectedCols (b) :modifiedCols (b 10 11 12)}) :resultRelations (i 1) :subplans <> :nParamExec 1 :nParamRemote 0 :rowMarks <> :distributionType O :distributionKey 0 :distributionNodes <> :distributionRestrict <>}                   


RemoteStatement #2:
{REMOTESTMT :commandType 1 :hasReturning false :planTree {RESULT :startup_cost 0.00 :total_cost 0.01 :plan_rows 1 :plan_width 0 :targetlist ({TARGETENTRY :expr {CONST :consttype pg_catalog int4 :consttypmod -1 :constcollid <> <> -1 :constlen 4 :constbyval true :constisnull false :location -1 :constvalue \1} :resno 1 :resname id :ressortgroupref 0 :resorigtbl <> <> :resorigcol 0 :resjunk false} {TARGETENTRY :expr {CONST :consttype pg_catalog text :consttypmod -1 :constcollid pg_catalog default -1 :constlen -1 :constbyval false :constisnull false :location -1 :constvalue tom} :resno 2 :resname name :ressortgroupref 0 :resorigtbl <> <> :resorigcol 0 :resjunk false} {TARGETENTRY :expr {CONST :consttype pg_catalog varchar :consttypmod 34 :constcollid pg_catalog default -1 :constlen -1 :constbyval false :constisnull false :location -1 :constvalue \1212121} :resno 3 :resname phone :ressortgroupref 0 :resorigtbl <> <> :resorigcol 0 :resjunk false}) :qual <> :lefttree <> :righttree <> :initPlan <> :extParam (b) :allParam (b) :resconstantqual <>} :rtable ({RTE :alias <> :eref {ALIAS :aliasname contact :colnames ("id" "name" "phone")} :rtekind 0 :relid public contact :relkind r :inh false :inFromCl false :requiredPerms 0 :checkAsUser 0 :selectedCols (b) :modifiedCols (b 10 11 12)}) :resultRelations <> :subplans <> :nParamExec 1 :nParamRemote 0 :rowMarks <> :distributionType R :distributionKey 0 :distributionNodes (i 0 1) :distributionRestrict (i 0 1)}


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.postgres-xl.org/private.cgi/postgres-xl-developers-postgres-xl.org/attachments/20150915/0eeabe76/attachment.htm>


More information about the Postgres-xl-developers mailing list