[Postgres-xl-developers] GTM_ERRCODE_TOO_OLD_XMIN: reported_xmin going backwards (-1) after restarting the cluster

Tomas Vondra tomas.vondra at 2ndquadrant.com
Sun Nov 5 15:20:26 PST 2017


Hi,

While working on the GTM patches (posted few minutes ago), I've noticed
a somewhat strange behavior, where the "reported_xmin" values in cluster
monitor (i.e. on the nodes) goes backward after restarting the cluster.

With the patches, it triggers this message in GTM log:

  GTM_ERRCODE_TOO_OLD_XMIN - node_name %s, reported_xmin %d, previously
  reported_xmin %d, GTM_GlobalXmin

but I see the same strange "move xmin backwards" on master too, and I
believe the only reason why it does not trigger the same log messages is
that master always adds CONTROL_INTERVAL to the next_gxid read from the
gtm.control file (with the patches this only happens after unclean
shutdown).

This is a sample log, generated by a datanode in a fresh cluster while
running `pgxc_ctl stop all` and `pgxc_ctl start all` multiple times.

    00:07:14.590 LOG:  latestCompletedXid=52038
    00:07:14.590 LOG:  reporting gxid 52039
    00:07:19.595 LOG:  latestCompletedXid=52038
    00:07:19.595 LOG:  reporting gxid 52039
    ... cluster restart ...
    00:07:30.299 LOG:  latestCompletedXid=52037
    00:07:30.299 LOG:  reporting gxid 52038
    00:07:35.304 LOG:  latestCompletedXid=102038
    00:07:35.305 LOG:  reporting gxid 102039
    ... cluster restart ...
    00:08:51.369 LOG:  latestCompletedXid=52037
    00:08:51.369 LOG:  reporting gxid 52038
    00:08:56.375 LOG:  latestCompletedXid=152038
    00:08:56.375 LOG:  reporting gxid 152039
    00:09:01.378 LOG:  latestCompletedXid=152038
    00:09:01.378 LOG:  reporting gxid 152039
    ... cluster restart ...
    00:09:24.273 LOG:  latestCompletedXid=52037
    00:09:24.273 LOG:  reporting gxid 52038
    00:09:29.279 LOG:  latestCompletedXid=202038
    00:09:29.279 LOG:  reporting gxid 202039

What is really interesting here is that latestCompletedXid goes
backwards (from 52038 to 52037). Eventually the xmin only moves forward
thanks to adding CONTROL_INTERVAL (50k) in GTM_RestoreTxnInfo after each
restart.

But the comment suggest this is only there because of possibly unclean
shutdowns, so I'm puzzled why should it trigger the error messages after
skipping that for clean shutdowns.

Seems a bit suspicious, I guess ...


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
-------------- next part --------------
A non-text attachment was scrubbed...
Name: xl-xmin-backwards-debug.patch
Type: text/x-patch
Size: 916 bytes
Desc: not available
URL: <http://lists.postgres-xl.org/pipermail/postgres-xl-developers-postgres-xl.org/attachments/20171106/eaacdee0/attachment.bin>


More information about the Postgres-xl-developers mailing list