[Postgres-xl-developers] Bug report:TPCH Q7 network dead lock issue and patch.

jasonysli(李跃森) jasonysli at tencent.com
Tue Aug 15 01:43:51 PDT 2017


Hi all:
When running TPCH Q7 under a 20 nodes cluster, we got network dead lock. The query got stuck when running the query, after some careful investigation, we found there was network system call loop inside the cluster.
    The attached picture shows the parse tree of the plan and the dead lock loop across nodes.
         The left side is the whole execution plan of the query and we marked the cursor name on the tree. The right side shows the what the waiting loop looks like. We can see CN send query to dn12, dn12 running cursor 3395_d as a producer on dn1 and dn10, dn1 and dn10 both run cursor 3395_c on dn11 as consumers. Then we can see the waiting loop, dn12->dn1->dn11->dn10->dn12. So tricky!
We have a proposal about how to fix the issue. In FetchTuple, when fetch tuple from remote nodes, we will use timeout rather than keeping waiting. Then if we meet timeout on one connection, just switch to the next connection to receive data, by doing this we can break the waiting chain, then the deadlock will be fixed.
We also attach the patch in this mail!

Best
Regards!






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.postgres-xl.org/private.cgi/postgres-xl-developers-postgres-xl.org/attachments/20170815/ffd381a7/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: parse_tree_and_deadlock_analysis.JPG
Type: image/jpeg
Size: 1241409 bytes
Desc: parse_tree_and_deadlock_analysis.JPG
URL: <http://lists.postgres-xl.org/private.cgi/postgres-xl-developers-postgres-xl.org/attachments/20170815/ffd381a7/attachment-0001.jpeg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: deadlock.patch
Type: application/octet-stream
Size: 15544 bytes
Desc: deadlock.patch
URL: <http://lists.postgres-xl.org/private.cgi/postgres-xl-developers-postgres-xl.org/attachments/20170815/ffd381a7/attachment-0001.obj>


More information about the Postgres-xl-developers mailing list