[Postgres-xl-general] "ERROR: Failed to get pooled connections" with cross-node join

Krzysztof Nienartowicz krzysztof.nienartowicz at unige.ch
Wed Oct 7 02:43:44 PDT 2015


Bowing to your perseverance!

On Wed, Oct 7, 2015 at 10:24 AM, Tobias Oberstein <
tobias.oberstein at gmail.com> wrote:

> >     GTM proxy + worker_threads = 16 did the trick!
> >
> >
> > Cool. We need to be aware of this current limit of 1024 connections to
> > proxy/GTM. I would actually look at increasing that further or even see
> > if we can deal with it at run time.
>
> My best guess of why above did the trick is:
>
> With the default number of GTM proxy worker_threads (which is 2), the
> incoming connections can't be handled quickly enough. Maybe
> worker_threads = 8/4 would also work. dunno
>
> That's a guess of course. I haven't tracked down what's really going on.
>


You might have been right that using epoll could fix this issue as your
network latency is extremely small in comparison to what you can get off
the wired network unless hundreds of clients flood the server. I was
loading data with up to 200 clients (cores) using a single coordinator and
the system was stable after some tweaking. Influx of data was going via 1Gb
ethernet though and was capped by its limits.  We are planning to upgrade
the client connection to QDR Infiniband soon. The XL interconnect is
Infiniband now but the internode throughput is not that high (while
loading) as all the load goes via 1Gb. I will be able to provide some
numbers for quite complex joins on billions of tuples in a week or so.



>
> But yes, in any case, all of these hints should be documented at least.
>

Agreed, as some values are tightly coupled (max_connections vs shared_queue
values, etc) inconsistent values should give at least warnings while GUCs
are set.



>
> It would also help to have a collection of configs/setups that work. And
> an exposition of "user battle/success stories"
>

> >     Now that I can see the light, I am willing to stay the course (I am
> >     expecting more pain).
> >
> >
> > So whats next? Are you planning to increase number of datanodes even
> > further or run more queries, different workload?
>
> Both.
>
> We will test 48 nodes (because the box has 48 physical/non-HT cores) and
> 64 nodes (because I setup the storage with 64 partitions on the 8 NVMes).
>
> We will also test more workloads (CTEs, Window functions, ..) and
> features (PL/pgSQL, PL/R, Views, Partitioning, ..).
>

Tested: plR, ported plJava and it works without problems - views depend on
the underlying queries, window functions worked, but it all depends how you
mix replicated and distributed queries in a query. For partitioning small
patch is needed that enables insert trigger [disabled by default, but Pavan
and Mason provided one], COPY does not work however (check the list, I
reported the error few weeks ago with another small patch that limits the
chance of getting the error and DB corruption but not solving the problem
that seems quite deep in the protocol) so we ended up changing the
application logic to ingest directly into partitions. Enabling FDWs would
be interesting to match XL/X2 with cstore_fdw esp with capability to have
inherited tables as FDWs. Mason mentioned it should not be that hard a year
or so - Pavan could shed some light if anything is in works for this, i.e.
after stability goals are reached?

We have aging Nvidia Teslas lying around I would like to do some tests with
PgStrom early next year on XL with them. This would be my prefered path of
upscaling the servers for (hopefully local mostly) joins and analytics.


>
> We might also get new hardware: probably a 2nd 64 haswell cores box plus
> 100GbE interconnect.
>

I think I have not responded to your question about the fabric we use, here
it goes:
4 x R720 Dell nodes with 2 x E5-2680 2.8G 10 HT cores (so 40HT cores per
box), 256GB, 1.5TB of local  SSDs, SAS HDs for WAL, and 2 x 180TB (60x 3TB
each split into two VDs) in Dell MD3260 SAS as external 'base' storage that
we are not going to use in the production as I am not very happy with the
performance. Looking at the better performing alternatives now.
Two port Infiniband cards to separate DB traffic from the compute cluster.
Tuned IP over Infiniband gives max 30Gbs real throughput for the DB
interconnect.
The setup will be doubled next year and tripled in three years, base
storage (neither index nor WAL included) will be around 800TB-1PB  of
usable space in 4 years. We are mix of all types of operations, bulk load,
export, large insert select transformations, small batch ingestions, some
updates, typical OLAP on steroids using custom indexing, plR, plJava, plv8.
All this with composite types used extensively, many basic queries are ORM
generated so are hard to change.


>
> We also have business contacts to a provider of x86 cluster solutions.
> Maybe we can arrange something .. a testbed using more common hardware
> (like 2HE boxes with 2 NVMes, but more of these). Our current hardware
> is a massively scale-up box .. not your usual server.
>

> >     Guys, PG-XL rocks.
> >
> >
> > Thanks Tobias. We are trying our best to make it better and better. And
> > as more and more users/developers like you join the fold, we can get
> > there even faster. Thanks for your patience!
>
> I perfectly understand the necessity of more resources and proper user
> feedback! I'm running a couple of OSS projects myself
> (http://crossbar.io/, http://autobahn.ws/, http://wamp.ws/), and
> contribute to others (https://twistedmatrix.com/, http://pypy.org/, ..).
>
> I am here to invest / contribute back.
>
> Cheers,
> /Tobias
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.postgres-xl.org/pipermail/postgres-xl-general-postgres-xl.org/attachments/20151007/9db3b258/attachment.htm>


More information about the postgres-xl-general mailing list