[Postgres-xl-developers] PG-XL is in deadlock w/ one backend PID holding two Transactions
Kelvin
kelvin8077 at gmail.com
Wed Jun 13 19:09:03 PDT 2018
Hi,
After setting up an instance of PG-XL 9.5r1.6 which is from official
website, I run sysbench's read/write workload with 16 threads, and found
that the 16 threads hang with waiting for locks. This issue can be
reproduced with both Postgres-XL 9.5r1.6 and PG-XL 10. Could anyone please
let me know if this is a known issue and is there any workaround?
I looked into the lock table and found that one PID 11707 holds two
transaction ids.
[pgxl at sandbox ~]$ /usr/local/pgxl/bin/psql -U ads -d adsdb -p 15432
adsdb=> select * from pg_locks where granted ='f' order by pid;
locktype | database | relation | page | tuple | virtualxid |
transactionid | classid | objid | objsubid | virtualtransaction | pid |
mode | granted | fastpath
---------------+----------+----------+------+-------+-------
-----+---------------+---------+-------+----------+---------
-----------+-------+---------------------+---------+----------
transactionid | | | | | |
176247 | | | | 6/3269 | 11707 |
ShareLock | f | f
tuple | 16388 | 16391 | 1546 | 26 | |
| | | | 8/3288 | 11715 |
ExclusiveLock | f | f
tuple | 16388 | 16391 | 1546 | 26 | |
| | | | 9/3174 | 11719 |
AccessExclusiveLock | f | f
tuple | 16388 | 16391 | 1546 | 26 | |
| | | | 12/990 | 11727 |
ExclusiveLock | f | f
tuple | 16388 | 16391 | 1546 | 26 | |
| | | | 19/229 | 11755 |
ExclusiveLock | f | f
(5 rows)
adsdb=> select * from pg_locks where pid = 11707;
locktype | database | relation | page | tuple | virtualxid |
transactionid | classid | objid | objsubid | virtualtransaction | pid |
mode | granted | fastpath
---------------+----------+----------+------+-------+-------
-----+---------------+---------+-------+----------+---------
-----------+-------+---------------------+---------+----------
relation | 16388 | 16400 | | | |
| | | | 6/3269 | 11707 |
AccessShareLock | t | t
relation | 16388 | 16398 | | | |
| | | | 6/3269 | 11707 |
AccessShareLock | t | t
relation | 16388 | 16391 | | | |
| | | | 6/3269 | 11707 |
AccessShareLock | t | t
relation | 16388 | 16391 | | | |
| | | | 6/3269 | 11707 |
RowExclusiveLock | t | t
virtualxid | | | | | 6/3269 |
| | | | 6/3269 | 11707 |
ExclusiveLock | t | t
transactionid | | | | | |
176245 | | | | 6/3269 | 11707 |
ExclusiveLock | t | f
tuple | 16388 | 16391 | 1546 | 26 | |
| | | | 6/3269 | 11707 |
AccessExclusiveLock | t | f
transactionid | | | | | |
176247 | | | | 6/3269 | 11707 |
ShareLock | f | f
(8 rows)
adsdb=> select * from pg_stat_activity where pid =11707;
datid | datname | pid | usesysid | usename | application_name |
client_addr | client_hostname | client_port | backend_start
| xact_start | query_start
| state_change | waiting | state | backend_xid |
backend_xmin | query
-------+---------+-------+----------+---------+-------------
-----+-------------+-----------------+-------------+--------
-----------------------+------------------------------+-----
----------------------
----+-------------------------------+---------+--------+----
---------+--------------+-------------------------------------
16388 | adsdb | 11707 | 16387 | ads | pgxc |
127.0.0.1 | | 53912 | 2018-06-08 23:57:32.830016+00
| 2018-06-08 23:57:38.50713+00 | 2018-06-08 23:57:38.602988
+00 | 2018-06-08 23:57:38.606531+00 | t | active | 176245 |
176242 | DELETE FROM sbtest1 WHERE (id = $1)
[pgxl at sandbox ~]$ ps -ef | grep pgxl | grep 127.0.0.1 | grep waiting | grep
8402
pgxl 11707 8402 0 Jun08 ? 00:00:00 postgres: ads adsdb
127.0.0.1(53912) DELETE waiting
pgxl 11715 8402 0 Jun08 ? 00:00:00 postgres: ads adsdb
127.0.0.1(53928) UPDATE waiting
pgxl 11719 8402 0 Jun08 ? 00:00:00 postgres: ads adsdb
127.0.0.1(53936) DELETE waiting
pgxl 11727 8402 0 Jun08 ? 00:00:00 postgres: ads adsdb
127.0.0.1(53952) UPDATE waiting
pgxl 11755 8402 0 Jun08 ? 00:00:00 postgres: ads adsdb
127.0.0.1(54008) UPDATE waiting
adsdb=> select count(1) from pg_stat_activity;
count
-------
17
(1 row)
adsdb=> show max_connections;
max_connections
-----------------
100
(1 row)
dsdb=> select version();
version
------------------------------------------------------------
--------------------------------------------------------------------
PostgreSQL 9.5.8 (Postgres-XL 9.5r1.6) on x86_64-pc-linux-gnu, compiled by
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18), 64-bit
(1 row)
[root at sandbox postgres-xl-9.5r1.6]# gdb -q --batch --ex "bt" -p 11707
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f9295685348 in poll () from /lib64/libc.so.6
#0 0x00007f9295685348 in poll () from /lib64/libc.so.6
#1 0x00000000007c5baa in WaitLatchOrSocket (latch=0x7f928b3887dc,
wakeEvents=1, sock=-1, timeout=0) at pg_latch.c:333
#2 0x00000000007c59b6 in WaitLatch (latch=0x7f928b3887dc, wakeEvents=1,
timeout=0) at pg_latch.c:197
#3 0x0000000000848b0c in ProcSleep (locallock=0x1236f60,
lockMethodTable=0xbb9f20 <default_lockmethod>) at proc.c:1143
#4 0x0000000000842a82 in WaitOnLock (locallock=0x1236f60, owner=0x1334778)
at lock.c:1742
#5 0x000000000084176b in LockAcquireExtendedXC (locktag=0x7ffdb79a2ff0,
lockmode=5, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1
'\001', only_increment=0 '\000') at lock.c:1031
#6 0x0000000000840d47 in LockAcquireExtended (locktag=0x7ffdb79a2ff0,
lockmode=5, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1
'\001') at lock.c:718
#7 0x0000000000840cb0 in LockAcquire (locktag=0x7ffdb79a2ff0, lockmode=5,
sessionLock=0 '\000', dontWait=0 '\000') at lock.c:677
#8 0x000000000083fbf9 in XactLockTableWait (xid=176247,
rel=0x7f92963d5260, ctid=0x7ffdb79a3094, oper=XLTW_Delete) at lmgr.c:566
#9 0x00000000004c93b1 in heap_delete (relation=0x7f92963d5260,
tid=0x7ffdb79a3240, cid=2, crosscheck=0x0, wait=1 '\001',
hufd=0x7ffdb79a3180) at heapam.c:2929
#10 0x00000000006b5c78 in ExecDelete (tupleid=0x7ffdb79a3240, oldtuple=0x0,
planSlot=0x134a158, epqstate=0x1349358, estate=0x1348ff0, canSetTag=1
'\001') at nodeModifyTable.c:627
,,,
#25 0x0000000000709e3e in main (argc=8, argv=0x120f370) at main.c:228
[root at sandbox postgres-xl-9.5r1.6]# gdb -q --batch --ex "bt" -p 11715
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f9295685348 in poll () from /lib64/libc.so.6
#0 0x00007f9295685348 in poll () from /lib64/libc.so.6
#1 0x00000000007c5baa in WaitLatchOrSocket (latch=0x7f928b3890ac,
wakeEvents=1, sock=-1, timeout=0) at pg_latch.c:333
#2 0x00000000007c59b6 in WaitLatch (latch=0x7f928b3890ac, wakeEvents=1,
timeout=0) at pg_latch.c:197
#3 0x0000000000848b0c in ProcSleep (locallock=0x1236de0,
lockMethodTable=0xbb9f20 <default_lockmethod>) at proc.c:1143
#4 0x0000000000842a82 in WaitOnLock (locallock=0x1236de0, owner=0x12c6558)
at lock.c:1742
#5 0x000000000084176b in LockAcquireExtendedXC (locktag=0x7ffdb79a2f60,
lockmode=7, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1
'\001', only_increment=0 '\000') at lock.c:1031
#6 0x0000000000840d47 in LockAcquireExtended (locktag=0x7ffdb79a2f60,
lockmode=7, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1
'\001') at lock.c:718
#7 0x0000000000840cb0 in LockAcquire (locktag=0x7ffdb79a2f60, lockmode=7,
sessionLock=0 '\000', dontWait=0 '\000') at lock.c:677
#8 0x000000000083f7f5 in LockTuple (relation=0x7f92963d5260,
tid=0x7ffdb79a3064, lockmode=7) at lmgr.c:440
#9 0x00000000004cd984 in heap_acquire_tuplock (relation=0x7f92963d5260,
tid=0x7ffdb79a3064, mode=LockTupleNoKeyExclusive,
wait_policy=LockWaitBlock, have_tuple_lock=0x7ffdb79a3053 "") at
heapam.c:4978
#10 0x00000000004ca51b in heap_update (relation=0x7f92963d5260,
otid=0x7ffdb79a3240, newtup=0x1348908, cid=0, crosscheck=0x0, wait=1
'\001', hufd=0x7ffdb79a3190, lockmode=0x7ffdb79a318c) at heapam.c:3547
#11 0x00000000006b6307 in ExecUpdate (tupleid=0x7ffdb79a3240, oldtuple=0x0,
slot=0x1348168, planSlot=0x1347240, epqstate=0x1339528, estate=0x13391c0,
canSetTag=1 '\001') at nodeModifyTable.c:923
,,,
#26 0x0000000000709e3e in main (argc=8, argv=0x120f370) at main.c:228
The reproduction steps is pretty simple.
step 1: download the Postgres-XL 9.5r1.6 release and compile it with the
command "./configure --prefix=/usr/local/pgxl --without-readline
--enable-debug --enable-cassert CFLAGS='-O0 -g'", "make&make install"
step 2: Initialize the database, and deploy 1 coordinator and 2 data nodes
in one box.
su - pgxl
mkdir /usr/local/pgxl/data_coord1
mkdir /usr/local/pgxl/data_datanode_1
mkdir /usr/local/pgxl/data_datanode_2
mkdir /usr/local/pgxl/data_gtm
chown pgxl /usr/local/pgxl/data_coord1
chown pgxl /usr/local/pgxl/data_datanode_1
chown pgxl /usr/local/pgxl/data_datanode_2
chown pgxl /usr/local/pgxl/data_gtm
/usr/local/pgxl/bin/initdb -D /usr/local/pgxl/data_coord1 \
--nodename coord1
/usr/local/pgxl/bin/initdb -D /usr/local/pgxl/data_datanode_1 \
--nodename datanode_1
/usr/local/pgxl/bin/initdb -D /usr/local/pgxl/data_datanode_2 \
--nodename datanode_2
/usr/local/pgxl/bin/initgtm -D /usr/local/pgxl/data_gtm -Z gtm
/usr/local/pgxl/bin/gtm -D /usr/local/pgxl/data_gtm >logfile 2>&1 &
/usr/local/pgxl/bin/postgres --datanode -p 15432 -c pooler_port=40101 \
-D /usr/local/pgxl/data_datanode_1 >logfile 2>&1 &
/usr/local/pgxl/bin/postgres --datanode -p 15433 -c pooler_port=40102 \
-D /usr/local/pgxl/data_datanode_2 >logfile 2>&1 &
/usr/local/pgxl/bin/postgres --coordinator -c pooler_port=40100 \
-D /usr/local/pgxl/data_coord1 >logfile 2>&1 &
/usr/local/pgxl/bin/psql -c "ALTER NODE coord1 \
WITH (TYPE = 'coordinator', PORT = 5432)" postgres
/usr/local/pgxl/bin/psql -c "CREATE NODE datanode_1 \
WITH (TYPE = 'datanode', PORT = 15432)" postgres
/usr/local/pgxl/bin/psql -c "CREATE NODE datanode_2 \
WITH (TYPE = 'datanode', PORT = 15433)" postgres
/usr/local/pgxl/bin/psql -c "EXECUTE DIRECT ON (datanode_1) \
'ALTER NODE datanode_1 WITH (TYPE = ''datanode'', PORT = 15432)'" postgres
/usr/local/pgxl/bin/psql -c "EXECUTE DIRECT ON (datanode_1) \
'CREATE NODE datanode_2 WITH (TYPE = ''datanode'', PORT = 15433)'"
postgres
/usr/local/pgxl/bin/psql -c "EXECUTE DIRECT ON (datanode_1) \
'CREATE NODE coord1 WITH (TYPE = ''coordinator'', PORT = 5432)'" postgres
/usr/local/pgxl/bin/psql -c "EXECUTE DIRECT ON (datanode_2) \
'ALTER NODE datanode_2 WITH (TYPE = ''datanode'', PORT = 15433)'" postgres
/usr/local/pgxl/bin/psql -c "EXECUTE DIRECT ON (datanode_2) \
'CREATE NODE datanode_1 WITH (TYPE = ''datanode'', PORT = 15432)'"
postgres
/usr/local/pgxl/bin/psql -c "EXECUTE DIRECT ON (datanode_2) \
'CREATE NODE coord1 WITH (TYPE = ''coordinator'', PORT = 5432)'" postgres
/usr/local/pgxl/bin/psql -c "SELECT pgxc_pool_reload()" postgres
/usr/local/pgxl/bin/psql -c "EXECUTE DIRECT ON (datanode_1) \
'SELECT pgxc_pool_reload()'" postgres
/usr/local/pgxl/bin/psql -c "EXECUTE DIRECT ON (datanode_2) \
'SELECT pgxc_pool_reload()'" postgres
step 3: issue the following sysbench commands.
sysbench oltp_read_write --db-driver=pgsql --pgsql-host=127.0.0.1
--pgsql-port=5432 --pgsql-user=ads --pgsql-password=ads --pgsql-db=adsdb
--tables=10 --table-size=100000 prepare
sysbench oltp_read_write --db-driver=pgsql --pgsql-host=127.0.0.1
--pgsql-port=5432 --pgsql-user=ads --pgsql-password=ads --pgsql-db=adsdb
--threads=16 --time=9 run
Thanks,
-Yuanfei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.postgres-xl.org/pipermail/postgres-xl-developers-postgres-xl.org/attachments/20180614/88ae2bd2/attachment-0001.html>
More information about the Postgres-xl-developers
mailing list