[postgres-xl-general] Building out a Postgres-XL cluster and have some questions

Lee Roder lroder at sbcglobal.net
Fri Mar 9 15:36:06 PST 2018


Gonna try to keep this short.  After successfully compiling postgres-xl
on a server and using the 'quick start' tutorial to create a functioning
cluster (https://www.postgres-xl.org/documentation/install-short.html),
we migrated our data over from a standalone Postgres configuration and
successfully distributed our tables (some via replication -- ready heavy
tables), others via hash (couldn't get modulo working for some reason). 
Our app was successfully interacting with our postgres-xl cluster.  So
far so good.

Next, we decided upon an architecture as follows:

GTM master

GTM Slave

4 data nodes, each running its own GTM Proxy and Coordinator (master),
along with a data node of course.

We successfully built this out and had it up and running (not without a
bit of pain, mind you.  Although there is a great deal of documentation
out there on https://www.postgres-xl.org/documentation, finding out how
to do what we needed involved quite a bit of trial and error).

At this point, things took a bad turn somehow.  We wanted to add
standby/replication nodes for each of the data nodes.  After some
deliberation and quite a bit of reading (couldn't seem to find good
'howtos' for this), we kinda sorta got one up and running, but were not
able to get others.  When I say that, in fairness. the nodes themselves
do in fact seem to be up based on log file entries - a sample follows:

2018-03-09 22:11:03.214 UTC [11239] LOG:  entering standby mode
2018-03-09 22:11:03.231 UTC [11239] LOG:  redo starts at 0/D000028
2018-03-09 22:11:03.238 UTC [11239] LOG:  consistent recovery state
reached at 0/D000130
2018-03-09 22:11:03.239 UTC [11238] LOG:  database system is ready to
accept read only connections
2018-03-09 22:11:03.256 UTC [11244] LOG:  started streaming WAL from
primary at 0/E000000 on timeline 1

However, GTM master shows the node as not running.

Now, things seems to have gotten even stranger.  I brought everything
down, and slowly began bringing things online.  Here's my current state:

PGXC monitor all
Running: gtm master
Running: gtm slave
Running: gtm proxy chat_prod_gprox01
Not running: gtm proxy chat_prod_gprox02
Not running: gtm proxy chat_prod_gprox03
Not running: gtm proxy chat_prod_gprox04
Running: coordinator master chat_prod_coord01
Not running: coordinator master chat_prod_coord02
Not running: coordinator master chat_prod_coord03
Not running: coordinator master chat_prod_coord04
Running: datanode master chat_prod_data01
Not running: datanode slave chat_prod_data01
Not running: datanode master chat_prod_data02
Not running: datanode slave chat_prod_data02
Not running: datanode master chat_prod_data03
Not running: datanode slave chat_prod_data03
Not running: datanode master chat_prod_data04

My next step is to attempt to start my second gtm proxy node.  It
fails.  I have no idea why.  I know that the LAN IP/port address for GTM
master is open to this box.  It was working previously.  I also launched
a telnet at it from my data02 node and it responded.

Any advice would be greatly appreciated

Sincerely.

Lee Roder






More information about the postgres-xl-general mailing list