[Postgres-xl-developers] WIP: making the GTM more reliable

Tomas Vondra tomas.vondra at 2ndquadrant.com
Sun Nov 5 15:28:34 PST 2017


Hi,

While investigating some issues on XL 10, I've noticed the GTM could be
made a bit more reliable, for example by adding fsync calls to a few
places (at this point there are pretty much none, likely due to the
assumption that we can fail-over to GTM standby).

So attached are three WIP patches, doing three separate improvements.
The patches are not that large and include more detailed description in
the header, but let me briefly explain what's going on.


1) generating register.node in a reliable way (fsync + rename)

We've been generating "node registry" (list of nodes registered in GTM),
but have been throwing it away on GTM restart. That's harmless when the
whole cluster restarts (pgxc_ctl stop all + start all), because all the
nodes will re-register.

But when only the GTM restarts (e.g. due to crash or to apply config
change), then we have nodes connected to the GTM but ignored when
computing oldest_xmin, triggering GTM_ERRCODE_NODE_NOT_REGISTERED log
messages, and so on.

So this commit enables the node registry again (and actually reads it on
GTM start), and writes it in reliable way - by generating a new file
which is then renamed, followed by fsync on both the file and directory.


2) generating gtm.control in reliable way (fsync + rename)

There were no fsync calls on the gtm.control file (which is kinda
critical for the GTM to work correctly), so it was possible to end
without the file after a system crash, or so.

This commit improves that by doing the rename+fsync just like (1).


3) tracking if gtm.control was written during clean shutdown

It also adds "shutdown" flag to the control file, with 't' if it was
written during clean shutdown and 'f' otherwise (the file is updated
regularly while the GTM is running). This allows us to tweak behavior
after a crash (e.g. add CONTROL_INVERVAL to next_gxid).


One aspect of gtm.control that I find confusing is that there are two
different functions generating the file, but each filling it with
somewhat different values.

1) GTM_SaveTxnInfo simply saves next_gxid and global_xmin

2) GTM_WriteRestorePointXid saves gt_backedUpXid (into both fields)

So you don't really know which of the values is currently there, and
gt_backedUpXid is somewhat "ahead" of next_gxid, because it's computed
like this:

    gt_backedUpXid = GTMTransactions.gt_nextXid + RestoreDuration

So what happens is that initially the file is written by GTM_SaveTxnInfo
but then it gets overwritten by GTM_WriteRestorePointXid (and the values
suddenly jump ahead), and then (e.g. at shutdown) it gets overwritten by
SaveTxnInfo and the values jump back again.

I wonder it that can cause confusion in other parts of the code ...


I'm not going to pretend this makes GTM 100% resilient to crashes
(that's why we have GTM standby), but it hopefully improves the
situation a bit.

The patches however badly need testing and second pair of eyes, so I'd
be grateful to everyone for comments.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Serialize-and-deserialize-node-registry.patch
Type: text/x-patch
Size: 24444 bytes
Desc: not available
URL: <http://lists.postgres-xl.org/pipermail/postgres-xl-developers-postgres-xl.org/attachments/20171106/40e472a3/attachment-0003.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Make-sure-gtm.control-is-written-in-durable-way.patch
Type: text/x-patch
Size: 13396 bytes
Desc: not available
URL: <http://lists.postgres-xl.org/pipermail/postgres-xl-developers-postgres-xl.org/attachments/20171106/40e472a3/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-Track-if-gtm.control-was-written-during-shutdown.patch
Type: text/x-patch
Size: 8834 bytes
Desc: not available
URL: <http://lists.postgres-xl.org/pipermail/postgres-xl-developers-postgres-xl.org/attachments/20171106/40e472a3/attachment-0005.bin>


More information about the Postgres-xl-developers mailing list