[Postgres-xl-developers] postgres-xl crashes for large'ish datasets

Pavan Deolasee pavan.deolasee at gmail.com
Sun May 15 23:36:34 PDT 2016


On Fri, May 13, 2016 at 6:16 PM, pierre de fermat <
fermatslittletheorem at gmail.com> wrote:

> Hello
> I have few large'ish datasets (5 sets of 200+million rows) which I need to
> analyse. The XL set up has 18 nodes (16 cores, 16GB RAM,500GB SAS on RAID).
> It has 4 coordinators,1 GTM and 1 GTM proxy.
> One of the machines run 1 coordinator, and the two GTMs. Half the nodes
> use  GTM and the other half, the proxy.
>
>
I wonder why? A better setup is to run a GTM proxy on each node and let
coordinators and datanodes running on that node to connect to the GTM
proxy. To be honest, GTM itself is not tested to handle large number of
connections and ideally it should receive only a limited number of
connections from each GTM proxy. Having said that, we must not see any
crashes and lets investigate those.


> The data structure lends itself to good indexing. The data is distributed
> with a key(sensor ID)  which ensures that any aggregation/processing that
> we do, would be localized to a node.   We have postgres functions that do
> such aggregations/processing. We run multi-threaded java or python code in
> with each thread is given a set of sensor IDs. The workloads are grouped by
> coordinators. This helps us engage all the data nodes and to process data.
> Such processing works with limited load, but the coordinator which shares
> the machine with the GTMs  consistently crashes under heavy loads (40
> connections or above per coordinator, with max_connections set as 1000).
> The only error message we get is:
>
> "terminating connection because of crash of another server process","The
> postmaster has commanded this server process to roll back the current
> transaction and exit, because another server process exited abnormally and
> possibly corrupted shared memory.","In a moment you should be able to
> reconnect to the database and repeat your command."
>
>
If core dumps are enabled on the machine, you should see a core file
getting generated, either in $PGDATA or where ever core_pattern points to.
Can you get a back trace from the core for further investigations? You'll
need to recompile with debug flags to be able to get a readable stack trace.

Please also send us GTM and failed node logs when the crash happens. Also
check if there are any other interesting log messages in the system log.

Thanks,
Pavan

-- 
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.postgres-xl.org/private.cgi/postgres-xl-developers-postgres-xl.org/attachments/20160516/3c00896f/attachment.htm>


More information about the Postgres-xl-developers mailing list