[Postgres-xl-general] Corrupted shared memory in 9.5

Felix Ng felix.ng at sopwr.com
Sun May 8 15:55:09 PDT 2016


It still happen, but much better than before.   I wish I know how to
collect more information.

Here are the log.

(Crash #1)

2016-05-08 21:53:23 HKT   LOG:  failed to connect to Datanode
2016-05-08 21:53:23 HKT root 10.0.0.1 LOG:  Remote node "datanode11",
running with pid 12558 returned an error: Failed to get pooled
connections
2016-05-08 21:53:23 HKT root 10.0.0.1 STATEMENT:  Remote Subplan
2016-05-08 21:53:23 HKT   LOG:  failed to connect to Datanode
2016-05-08 21:53:23 HKT   WARNING:  can not connect to node 16388
2016-05-08 21:53:23 HKT   WARNING:  Health map updated to reflect DOWN
node (16388)
2016-05-08 21:53:23 HKT   LOG:  Pooler could not open a connection to node 16388
2016-05-08 21:53:23 HKT root 10.0.0.8 LOG:  failed to acquire connections
2016-05-08 21:53:23 HKT root 10.0.0.8 STATEMENT:  Remote Subplan
2016-05-08 21:53:23 HKT root 10.0.0.8 ERROR:  Failed to get pooled connections
2016-05-08 21:53:23 HKT root 10.0.0.8 HINT:  This may happen because
one or more nodes are currently unreachable, either because of node or
network failure.
         Its also possible that the target node may have hit the
connection limit or the pooler is configured with low connections.
         Please check if all nodes are running fine and also review
max_connections and max_pool_size configuration parameters


(Crash #2)

2016-05-09 06:30:14 HKT   LOG:  failed to connect to Datanode
2016-05-09 06:30:14 HKT   LOG:  failed to connect to Datanode
2016-05-09 06:30:14 HKT   WARNING:  can not connect to node 16393
2016-05-09 06:30:14 HKT   WARNING:  Health map updated to reflect DOWN
node (16393)
2016-05-09 06:30:14 HKT   LOG:  Pooler could not open a connection to node 16393
2016-05-09 06:30:14 HKT root 10.0.0.6 LOG:  failed to acquire connections
2016-05-09 06:30:14 HKT root 10.0.0.6 STATEMENT:  Remote Subplan
2016-05-09 06:30:14 HKT root 10.0.0.6 ERROR:  Failed to get pooled connections
2016-05-09 06:30:14 HKT root 10.0.0.6 HINT:  This may happen because
one or more nodes are currently unreachable, either because of node or
network failure.
         Its also possible that the target node may have hit the
connection limit or the pooler is configured with low connections.
         Please check if all nodes are running fine and also review
max_connections and max_pool_size configuration parameters


(Crash #3)

2016-05-09 06:43:03 HKT   LOG:  failed to connect to Datanode
2016-05-09 06:43:03 HKT   WARNING:  can not connect to node 16387
2016-05-09 06:43:03 HKT   WARNING:  Health map updated to reflect DOWN
node (16387)
2016-05-09 06:43:03 HKT   LOG:  Pooler could not open a connection to node 16387
2016-05-09 06:43:03 HKT root 10.0.0.1 LOG:  failed to acquire connections
2016-05-09 06:43:03 HKT root 10.0.0.1 STATEMENT:  Remote Subplan
2016-05-09 06:43:03 HKT root 10.0.0.1 ERROR:  Failed to get pooled connections
2016-05-09 06:43:03 HKT root 10.0.0.1 HINT:  This may happen because
one or more nodes are currently unreachable, either because of node or
network failure.
         Its also possible that the target node may have hit the
connection limit or the pooler is configured with low connections.
         Please check if all nodes are running fine and also review
max_connections and max_pool_size configuration parameters
2016-05-09 06:43:03 HKT root 10.0.0.1 STATEMENT:  Remote Subplan
2016-05-09 06:43:03 HKT root 10.0.0.1 WARNING:  can not connect to
GTM: Connection reset by peer
2016-05-09 06:43:03 HKT root 10.0.0.1 WARNING:  can not connect to
GTM: Connection reset by peer
2016-05-09 06:43:03 HKT root 10.0.0.1 WARNING:  can not connect to
GTM: Connection reset by peer


After restart the whole cluster, things keep working.  But eventually,
crash again.  Some SQL can trigger the problem much more easy...

PS: I build another test cluster (same servers / test data) with only
simple SQL that insert / update (i.e. no complex SQL), it survived
without any error so far.




Felix Ng Chief Technology Officer
::::::: Social Power Limited :::::::
M +852 9229 2422 | T +852 3483 1338 | F +852 3010 8228
web www.sopwr.com | facebook socialpowerintelligence

DISCLAIMER

This email is confidential and intended only for the use of the
individual or entity named above and may contain information that is
privileged. If you are not the intended recipient, you are notified
that any dissemination, distribution or copying of this email is
strictly prohibited. If you have received this email in error, please
notify us immediately by return email or telephone and destroy the
original message. Thank you.


On Sun, May 8, 2016 at 9:15 PM, Pavan Deolasee <pavan.deolasee at gmail.com> wrote:
>
>
> On Sun, May 8, 2016 at 6:41 PM, Felix Ng <felix.ng at sopwr.com> wrote:
>>
>> Applied the HEAD of XL9_5_STABLE, so far so good.
>>
>> Previously, I can trigger the problem after 'just few" complicate
>> SQLs.  Right now, it is working just fine even after 10mins...
>>
>
> Great! Let me know how it goes for next few hours.
>
> Thanks,
> Pavan
>




More information about the postgres-xl-general mailing list