[postgres-xl-bugs] Fwd: Database Rollback After GTM OOM + restart wipes data puts cluster in inconsistent state

Matthew Tamayo-Rios matthew.t.rios at gmail.com
Mon Dec 18 17:39:59 PST 2017


Forwarding to bug list.

---------- Forwarded message ----------
From: Matthew Tamayo-Rios <matthew.t.rios at gmail.com>
Date: Mon, Dec 18, 2017 at 4:57 PM
Subject: Database Rollback After GTM OOM + restart wipes data puts cluster
in inconsistent state
To: postgres-xl-general at lists.postgres-xl.org


We've been struggling with the GTM OOMing and on today's restart we
suddenly had the database rollback about a month. Fsync is on and hasn't
ever been touched. As far as we can tell things were working fine as of
Monday 5 PM ET and GTM died overnight.

The only other errors we could find in the logs other than the GTM going
down is some sort of cache failure and Snapshot too old.

ERROR:  cache lookup failed for node 16384


ERROR:  Snapshot too old - RecentGlobalXmin (16740128) has already advanced
> past the snapshot xmin (10000)
> ERROR:  Snapshot too old - RecentGlobalXmin (16740128) has already
> advanced past the snapshot xmin (10000)



postgres=# \dt
>                List of relations
>  Schema |      Name      | Type  |    Owner
> --------+----------------+-------+-------------
>  public | data           | table | postgres
>  public | edges          | table | postgres
>  public | property_types | table | postgres
> (3 rows)
> postgres=# \dt+
> ERROR:  relation "public.property_types" does not exist


My best guess right now is that somehow transactions ids rolled over and
when the GTM restarted it messed with committed transactions. Given that we
have restarted the database many times since the point it rolled back to
this seems unlikely.

It would be good to understand how this happened so we can avoid it again.
At this point I've given up on repair and will have to wipe everything and
restore from backup, but it's very scary if the database can just roll
itself back to an inconsistent state :(

Regards,
Matthew

ᐧ

ᐧ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.postgres-xl.org/pipermail/postgres-xl-bugs-postgres-xl.org/attachments/20171218/027c45d9/attachment.html>


More information about the postgres-xl-bugs mailing list