[Postgres-xl-developers] Recovering GTM after failure

Mason Sharp msharp at translattice.com
Mon Jan 26 14:30:59 PST 2015

On Sun, Jan 25, 2015 at 11:01 AM, Adrian Nicoara <anicoara at uwaterloo.ca>

> On Sun, Jan 25, 2015 at 9:12 AM, Mason Sharp <msharp at translattice.com>
> wrote:
> >
> >
> > On Thu, Jan 22, 2015 at 12:38 PM, Adrian Nicoara <anicoara at uwaterloo.ca>
> > wrote:
> >>
> >> Hello,
> >>
> >> I tried to find a description of what happens when the GTM fails. So
> >> far, I could only find:
> >>
> >>
> >>
> http://www.pgcon.org/2012/schedule/attachments/224_Postgres-XC_tutorial.pdf
> >>
> >> slide 95, that describes the requirement of a standby GTM for
> >> failover, with synchronous changes done among the GTM processes.
> >> In the absence of such a process, does the entire cluster have to be
> >> rebooted? I tried to reason out how the running transaction set, and
> >> new GXID is recovered from the data nodes/coordinators, but couldn't
> >> puzzle it out.
> >>
> >
> > You should not have to reboot the cluster. GTM periodically writes to a
> > state file, every 1000 transactions. When it restarts, it reads that file
> > and jumps ahead 1000 transactions, meaning there may be a gap of missing
> > internal transaction ids, but at least transaction ids will not be
> reused.
> What about establishing a valid snapshot for new transactions?
> Without the set of running transactions, a new transaction would
> assume that everything with a smaller ID is committed / aborted? Then,
> couldn't you have a race condition where:
> 1. A transaction started after GTM recovery (ID X+1000) attempts to
> read two items a and b, that are present at two data nodes A and B.
> 2. An old transaction started before GTM failure (ID X) is writing
> data items a and b.
> 3. The read from (ID X+1000) at A determines that the write for a by
> (ID X) isn't committed yet, from the hint bits and committed log.
> 4. The read from (ID X+1000) at B determines that the write for b by
> (ID X) is committed, from the hint bits and committed log.
If it committed at one node but not the other, it must have been a two node
transaction, and two phase commit would have been used. The transaction
must have been fully prepared on both nodes for it to have have been
committed on at least one.  If the transaction committed on one node, we
must commit on all to be consistent.  On node B it committed. On node A,
the transaction is in a prepared but not committed state. Doing a read from
would return data from B but not A until the prepared transaction is
committed. There is a utility, pgxc_clean that tries to clean up such cases.

So, there is a theoretical window if a node goes down just after it has
prepared but not committed a 2PC transaction and other nodes have
committed, where if the node is made available to users before manually
committing the prepared transaction or running the pgxc_clean utility that
one could get the result of something like you describe.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.postgres-xl.org/private.cgi/postgres-xl-developers-postgres-xl.org/attachments/20150126/a2c126f5/attachment.htm>

More information about the Postgres-xl-developers mailing list