[postgres-xl-general] Building out a Postgres-XL cluster and have some questions
Lee Roder
lroder at sbcglobal.net
Fri Mar 9 15:36:06 PST 2018
Gonna try to keep this short. After successfully compiling postgres-xl
on a server and using the 'quick start' tutorial to create a functioning
cluster (https://www.postgres-xl.org/documentation/install-short.html),
we migrated our data over from a standalone Postgres configuration and
successfully distributed our tables (some via replication -- ready heavy
tables), others via hash (couldn't get modulo working for some reason).
Our app was successfully interacting with our postgres-xl cluster. So
far so good.
Next, we decided upon an architecture as follows:
GTM master
GTM Slave
4 data nodes, each running its own GTM Proxy and Coordinator (master),
along with a data node of course.
We successfully built this out and had it up and running (not without a
bit of pain, mind you. Although there is a great deal of documentation
out there on https://www.postgres-xl.org/documentation, finding out how
to do what we needed involved quite a bit of trial and error).
At this point, things took a bad turn somehow. We wanted to add
standby/replication nodes for each of the data nodes. After some
deliberation and quite a bit of reading (couldn't seem to find good
'howtos' for this), we kinda sorta got one up and running, but were not
able to get others. When I say that, in fairness. the nodes themselves
do in fact seem to be up based on log file entries - a sample follows:
2018-03-09 22:11:03.214 UTC [11239] LOG: entering standby mode
2018-03-09 22:11:03.231 UTC [11239] LOG: redo starts at 0/D000028
2018-03-09 22:11:03.238 UTC [11239] LOG: consistent recovery state
reached at 0/D000130
2018-03-09 22:11:03.239 UTC [11238] LOG: database system is ready to
accept read only connections
2018-03-09 22:11:03.256 UTC [11244] LOG: started streaming WAL from
primary at 0/E000000 on timeline 1
However, GTM master shows the node as not running.
Now, things seems to have gotten even stranger. I brought everything
down, and slowly began bringing things online. Here's my current state:
PGXC monitor all
Running: gtm master
Running: gtm slave
Running: gtm proxy chat_prod_gprox01
Not running: gtm proxy chat_prod_gprox02
Not running: gtm proxy chat_prod_gprox03
Not running: gtm proxy chat_prod_gprox04
Running: coordinator master chat_prod_coord01
Not running: coordinator master chat_prod_coord02
Not running: coordinator master chat_prod_coord03
Not running: coordinator master chat_prod_coord04
Running: datanode master chat_prod_data01
Not running: datanode slave chat_prod_data01
Not running: datanode master chat_prod_data02
Not running: datanode slave chat_prod_data02
Not running: datanode master chat_prod_data03
Not running: datanode slave chat_prod_data03
Not running: datanode master chat_prod_data04
My next step is to attempt to start my second gtm proxy node. It
fails. I have no idea why. I know that the LAN IP/port address for GTM
master is open to this box. It was working previously. I also launched
a telnet at it from my data02 node and it responded.
Any advice would be greatly appreciated
Sincerely.
Lee Roder
More information about the postgres-xl-general
mailing list