[Postgres-xl-bugs] A issue about duplicate pg_class records when truncate table!

Mason Sharp msharp at translattice.com
Wed Dec 3 03:49:28 PST 2014


On Wed, Dec 3, 2014 at 5:14 AM, jasonysli(李跃森) <jasonysli at tencent.com>
wrote:

>  Hi all:
> We have got 4 coord, 16 datanodes, when running truncate table, we got a
> very strange issue, 2 duplicate pg_class record! As follow:
>  gdt_insight_new=# select xmin, xmax, cmax, * from pg_class where
> relname = 'gdt_overview_top_advertiser_wide_hour_p_2014120304';
>     xmin    |    xmax    | cmax |
> relname                       | relnamespace | reltype | reloftype |
> relowner | relam | relfilenode | reltablespace | relpages | reltu
> ples | relallvisible | reltoastrelid | reltoastidxid | relhasindex |
> relisshared | relpersistence | relkind | relnatts | relchecks | relhasoids
> | relhaspkey | relhasrules | relhastriggers |
> relhassubclass | relfrozenxid | relacl | reloptions
>
> ------------+------------+------+----------------------------------------------------+--------------+---------+-----------+----------+-------+-------------+---------------+----------+------
>
> -----+---------------+---------------+---------------+-------------+-------------+----------------+---------+----------+-----------+------------+------------+-------------+----------------+
> ----------------+--------------+--------+------------
> 2165981664 |          0 |    0 | gdt_overview_top_advertiser_wide_hour_p_
> 2014120304 |        16421 |  448070 |         0 |    16420 |     0 |
> 452993 |             0 |        0 |
>    0 |             0 |        448071 |             0 | f           |
> f           | p              | r       |       40 |         1 | f
> | f          | f           | f              |
> f              |   2162712515 |        |
> 2077831780 | 2165981664 |    0 | gdt_overview_top_advertiser_wide_hour_p_
> 2014120304 |        16421 |  448070 |         0 |    16420 |     0 |
> 448068 |             0 |        0 |
>    0 |             0 |        448071 |             0 | f           |
> f           | p              | r       |       40 |         1 | f
> | f          | f           | f              |
> f              |   2077831627 |        |
> (2 rows)
>
>  It seems there is something wrong with the UPDATE transaction, I try to
> get status of the transaciton, but every node return NULL, which means it
> did't even arrive the nodes, obviously it is not true! Because I added
> print log when unlink files, and I got the print log on these nodes when
> commit 2PC of '2165981664'!
>
>  This is the print LOG:
>  2014-12-03 06:33:22.103 CST,"gdt_new","gdt_insight_new",20848,"
> 10.196.129.79:55817",547e3d23.5170,10,"COMMIT PREPARED",2014-12-03
> 06:28:51 CST,38/44134,2165981670,LOG,XX000,"mdunlinkfork unlink file
> ""base/16419/448073""",,,,,,"COMMIT PREPARED
> '_$XC$2165981664'",,"mdunlinkfork, md.c:399","pgxc"
> 2014-12-03 06:33:22.103 CST,"gdt_new","gdt_insight_new",20848,"
> 10.196.129.79:55817",547e3d23.5170,11,"COMMIT PREPARED",2014-12-03
> 06:28:51 CST,38/44134,2165981670,LOG,58P01,"mdunlinkfork unlink file
> ""base/16419/448073_fsm""",,,,,,"COMMIT PREPARED
> '_$XC$2165981664'",,"mdunlinkfork, md.c:399","pgxc"
> 2014-12-03 06:33:22.103 CST,"gdt_new","gdt_insight_new",20848,"
> 10.196.129.79:55817",547e3d23.5170,12,"COMMIT PREPARED",2014-12-03
> 06:28:51 CST,38/44134,2165981670,LOG,58P01,"mdunlinkfork unlink file
> ""base/16419/448073_vm""",,,,,,"COMMIT PREPARED
> '_$XC$2165981664'",,"mdunlinkfork, md.c:399","pgxc"
> 2014-12-03 06:33:22.103 CST,"gdt_new","gdt_insight_new",20848,"
> 10.196.129.79:55817",547e3d23.5170,13,"COMMIT PREPARED",2014-12-03
> 06:28:51 CST,38/44134,2165981670,LOG,58P01,"mdunlinkfork unlink file
> ""base/16419/448073_init""",,,,,,"COMMIT PREPARED
> '_$XC$2165981664'",,"mdunlinkfork, md.c:399","pgxc"
> 2014-12-03 06:33:22.104 CST,"gdt_new","gdt_insight_new",20848,"
> 10.196.129.79:55817",547e3d23.5170,14,"COMMIT PREPARED",2014-12-03
> 06:28:51 CST,38/44134,2165981670,LOG,58P01,"mdunlinkfork unlink file
> ""base/16419/448071""",,,,,,"COMMIT PREPARED
> '_$XC$2165981664'",,"mdunlinkfork, md.c:399","pgxc"
> 2014-12-03 06:33:22.104 CST,"gdt_new","gdt_insight_new",20848,"
> 10.196.129.79:55817",547e3d23.5170,15,"COMMIT PREPARED",2014-12-03
> 06:28:51 CST,38/44134,2165981670,LOG,58P01,"mdunlinkfork unlink file
> ""base/16419/448071_fsm""",,,,,,"COMMIT PREPARED
> '_$XC$2165981664'",,"mdunlinkfork, md.c:399","pgxc"
> 2014-12-03 06:33:22.104 CST,"gdt_new","gdt_insight_new",20848,"
> 10.196.129.79:55817",547e3d23.5170,16,"COMMIT PREPARED",2014-12-03
> 06:28:51 CST,38/44134,2165981670,LOG,58P01,"mdunlinkfork unlink file
> ""base/16419/448071_vm""",,,,,,"COMMIT PREPARED
> '_$XC$2165981664'",,"mdunlinkfork, md.c:399","pgxc"
> 2014-12-03 06:33:22.104 CST,"gdt_new","gdt_insight_new",20848,"
> 10.196.129.79:55817",547e3d23.5170,17,"COMMIT PREPARED",2014-12-03
> 06:28:51 CST,38/44134,2165981670,LOG,58P01,"mdunlinkfork unlink file
> ""base/16419/448071_init""",,,,,,"COMMIT PREPARED
> '_$XC$2165981664'",,"mdunlinkfork, md.c:399","pgxc"
> 2014-12-03 06:33:22.105 CST,"gdt_new","gdt_insight_new",20848,"
> 10.196.129.79:55817",547e3d23.5170,18,"COMMIT PREPARED",2014-12-03
> 06:28:51 CST,38/44134,2165981670,LOG,58P01,"mdunlinkfork unlink file
> ""base/16419/448068""",,,,,,"COMMIT PREPARED
> '_$XC$2165981664'",,"mdunlinkfork, md.c:399","pgxc"
> 2014-12-03 06:33:22.105 CST,"gdt_new","gdt_insight_new",20848,"
> 10.196.129.79:55817",547e3d23.5170,19,"COMMIT PREPARED",2014-12-03
> 06:28:51 CST,38/44134,2165981670,LOG,58P01,"mdunlinkfork unlink file
> ""base/16419/448068_fsm""",,,,,,"COMMIT PREPARED
> '_$XC$2165981664'",,"mdunlinkfork, md.c:399","pgxc"
> 2014-12-03 06:33:22.105 CST,"gdt_new","gdt_insight_new",20848,"
> 10.196.129.79:55817",547e3d23.5170,20,"COMMIT PREPARED",2014-12-03
> 06:28:51 CST,38/44134,2165981670,LOG,58P01,"mdunlinkfork unlink file
> ""base/16419/448068_vm""",,,,,,"COMMIT PREPARED
> '_$XC$2165981664'",,"mdunlinkfork, md.c:399","pgxc"
> 2014-12-03 06:33:22.105 CST,"gdt_new","gdt_insight_new",20848,"
> 10.196.129.79:55817",547e3d23.5170,21,"COMMIT PREPARED",2014-12-03
> 06:28:51 CST,38/44134,2165981670,LOG,58P01,"mdunlinkfork unlink file
> ""base/16419/448068_init""",,,,,,"COMMIT PREPARED
> '_$XC$2165981664'",,"mdunlinkfork, md.c:399","pgxc"
>
>  This is the transaction query SQL and result,every node returned the
> same thing:
>  gdt_insight_new=# select pgxc_is_committed('2165981664'::xid);
> pgxc_is_committed
> -------------------
>  (1 row)
>
>  Then I ran the same SQL on pg_class in other nodes, I got single tuple
> on all other nodes:
>  gdt_insight_new=#  select xmin, xmax, cmax, * from pg_class where
> relname = 'gdt_overview_top_advertiser_wide_hour_p_2014120304';
>     xmin    | xmax | cmax |
> relname                       | relnamespace | reltype | reloftype |
> relowner | relam | relfilenode | reltablespace | relpages | reltuples |
> relallvisible | reltoastrelid | reltoastidxid | relhasindex | relisshared
> | relpersistence | relkind | relnatts | relchecks | relhasoids | relhaspkey
> | relhasrules | relhastriggers | relha
> ssubclass | relfrozenxid | relacl | reloptions
>
> ------------+------+------+----------------------------------------------------+--------------+---------+-----------+----------+-------+-------------+---------------+----------+-----------+
>
> ---------------+---------------+---------------+-------------+-------------+----------------+---------+----------+-----------+------------+------------+-------------+----------------+------
> ----------+--------------+--------+------------
> 2165981664 |    0 |    0 |
> gdt_overview_top_advertiser_wide_hour_p_2014120304 |        16421 |  930903
> |         0 |    16420 |     0 |      936380 |             0 |       83
> |      3657 |
>              0 |        930904 |             0 | f           | f
> | p              | r       |       40 |         1 | f          | f
> | f           | f              | f
>           |   2162712515 |        |
> (1 row)
>
> Then I gdb postgres and verified the t_infomask of the pg_class tuples,
> this is the info of the two duplicate tuples:
>  $3 = {t_choice = {t_heap = {t_xmin = 2165981664, t_xmax = 0, t_field3 =
> {t_cid = 0, t_xvac = 0}}, t_datum = {datum_len_ = -2128985632, datum_typmod
> = 0, datum_typeid = 0}}, t_ctid = {
>     ip_blkid = {bi_hi = 0, bi_lo = 233}, ip_posid = 33}, t_infomask2 = 27,
> t_infomask = 10505, t_hoff = 32 ' ', t_bits = "\377"}
>  $6 = {t_choice = {t_heap = {t_xmin = 2077831780, t_xmax = 2165981664,
> t_field3 = {t_cid = 0, t_xvac = 0}}, t_datum = {datum_len_ = 2077831780,
> datum_typmod = -2128985632,
>       datum_typeid = 0}}, t_ctid = {ip_blkid = {bi_hi = 0, bi_lo = 233},
> ip_posid = 33}, t_infomask2 = 32795, t_infomask = 10505, t_hoff = 32 ' ',
> t_bits = "\377"}
>  t_infomask  shows the 2 tuples were both set HEAP_UPDATED |
> HEAP_XMIN_COMMITTED | HEAP_XMAX_INVALID | HEAP_HASNULL | HEAP_HASOID .
>
> The following is from the other nodes of the specifed tuple:
>  $5 = {t_choice = {t_heap = {t_xmin = 2165981664, t_xmax = 0, t_field3 =
> {t_cid = 0, t_xvac = 0}}, t_datum = {datum_len_ = -2128985632, datum_typmod
> = 0, datum_typeid = 0}}, t_ctid = {
>     ip_blkid = {bi_hi = 0, bi_lo = 132}, ip_posid = 57}, t_infomask2 =
> 32795, t_infomask = 10505, t_hoff = 32 ' ', t_bits = "\377"}, t_infomask
> is the same as the duplicate 2 tuples.
>
> Theses nodes should have the same tuple as the former one, maybe
> heap_page_prune remove the older tuple because the
> HeapTupleSatisfiesVacuum told it was dead! But the
> error node tuples were first touched by HeapTupleSatisfiesNow,
> HeapTupleSatisfiesNow  told both tuples were alive and set
> HEAP_XMAX_INVALID hintbit ,it made the older tuple can be read in the other
> situations!
>
> But now the problem is still not solved, how could the transaction status
> be lost on all nodes? Is it possible that the CLOG was wrongly zeroed? Or
> any other possibility can cause  the loss of transaction status?
>
> Regards,
>  ------------------------------
> jasonysli(李跃森)
>
> Just passing along some initial thoughts on this issue, if you investigate
further.

TRUNCATE is not MVCC safe in vanilla PostgreSQL, perhaps there is an issue
with making it work with XL, although I have not seen anything like this
before.

Was this part of a larger transaction? There is ongoing work to address
issues with multiple statement transactions and internal command ids.
Something was committed recently, but there is still an issue.

We refactored XID/snapshot/GTM handling a month or so ago as part of some
work to address an issue where it was falling back to local XIDs at
connection-time which occasionally could affect the catalog. As part of
that bug fix, it appears it regressed other cases.  We are working on
addressing this one.

Thanks,

Mason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.postgres-xl.org/private.cgi/postgres-xl-bugs-postgres-xl.org/attachments/20141203/7e324234/attachment.htm>


More information about the postgres-xl-bugs mailing list