[Postgres-xl-general] Tables growing huge and ominous GTM errors

Michael Misiewicz mmisiewicz at gmail.com
Thu Feb 16 15:10:01 PST 2017


Hi all,

I'm seeing some pretty scary sounding errors in the GTM logs. In
particular, lots of errors like:

3:448997120:2017-02-16 23:03:16.003 UTC -LOG:  GTM_ERRCODE_TOO_OLD_XMIN -
node_name datanode16, reported_xmin 100146451, previously reported_xmin
107313381, GTM_GlobalXmin 107313381
LOCATION:  GTM_HandleGlobalXmin, register_common.c:1038
1:438507264:2017-02-16 23:03:16.003 UTC -LOG:  GTM_ERRCODE_TOO_OLD_XMIN -
node_name datanode8, reported_xmin 100146451, previously reported_xmin
107313381, GTM_GlobalXmin 107313381
LOCATION:  GTM_HandleGlobalXmin, register_common.c:1038
1:335542016:2017-02-16 23:03:16.014 UTC -LOG:  GTM_ERRCODE_TOO_OLD_XMIN -
node_name datanode6, reported_xmin 100146451, previously reported_xmin
107313381, GTM_GlobalXmin 107313381
LOCATION:  GTM_HandleGlobalXmin, register_common.c:1038
1:438507264:2017-02-16 23:03:16.050 UTC -LOG:  GTM_ERRCODE_TOO_OLD_XMIN -
node_name datanode3, reported_xmin 100146451, previously reported_xmin
107313381, GTM_GlobalXmin 107313381
LOCATION:  GTM_HandleGlobalXmin, register_common.c:1038
1:448997120:2017-02-16 23:03:16.050 UTC -LOG:  GTM_ERRCODE_TOO_OLD_XMIN -
node_name datanode5, reported_xmin 100146451, previously reported_xmin
107313381, GTM_GlobalXmin 107313381
LOCATION:  GTM_HandleGlobalXmin, register_common.c:1038

I have no idea what might be causing this. In addition, I've been noticing
some pretty bad problems over all. A particular table with about 31 million
rows in it grows in excess of 250 GB. However, if I truncate the table and
restore it from a backup, it only occupies about 2 gb. I assume that
there's some sort of XID issue going on here due to these errors in the
GTM? Has anyone every encountered anything like this?

Some relevant details:
- 18 datanode processes on 3 machines, 1 coord on each machine, 1 machine
has GTM and GTM Proxy running, all DNs connect to GTM proxy which has 4
threads
- 64gb of ram per machine, 5 disks.
- Despite setting the datanode work_mem to 16 MB and shrared_mem to 2048MB,
if I try to run `vacuum full` on the table that's growing out of control,
the kernel keeps killing my DN processes for OOM. I have no idea why this
is.

Any clues? What sort of stuff would be good to look into? Truncating and
restoring tables seems to help the issue but that seems wrong...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.postgres-xl.org/pipermail/postgres-xl-general-postgres-xl.org/attachments/20170216/52221c66/attachment.htm>


More information about the postgres-xl-general mailing list