[Postgres-xl-general] Dependency on select()

Krzysztof Nienartowicz krzysztof.nienartowicz at unige.ch
Mon Oct 19 03:52:40 PDT 2015


Hello Pavan,

With increasing number of sockets opened pooler performance (based on
select()) was degrading badly and when reaching 1k connections the select
in the pooler breaks (pooler is stuck, thus cluster was stuck silently
too), so this patch was to address this. I think going to epoll/libu etc
could improve stability/perf indeed as poll() has still kernel overhead
epoll does not. If I have some time (unlikely) I would made a full switch
to epoll based networking for all server parts. Also, I noticed quite a few
limitations in the network buffers used (8k usually). Since we are on
Infiniband with 64K mtu, 64KB would be more appropriate to get the higher
throughput for the volumes we handle. Maybe a GUC for this would be good
too.

I am still experiencing network issues with pockets lost but these are
quite obscure and I am not sure if to blame network fabric or XL yet.

When testing with a higher number of datanodes and clients one can
immediately spot number of issues that might affect performance and
stability in comparison with a 'standard' setup, so I would advise to
include such tests in your validation scheme.

Best,
Krzysztof



On Mon, Oct 19, 2015 at 7:53 AM, Pavan Deolasee <pavan.deolasee at gmail.com>
wrote:

> Thanks Krzysztof. I skimmed through the patch and it looks good. Sure
> there are many unrelated changes, but I can take of that while doing the
> commit.
>
> One issue that I myself noticed last week while running pgbench read only
> (-S) tests is that repeated runs of pgbench -S shows decreasing tps. I
> tracked it down to pooler performance which gets bad as more and more
> connections are requested. Note that with persistent_connections turned OFF
> which is the default, every transaction acquires and returns connections
> back to pooler. And thats where performance starts getting worse.
>
> I'll test with your changes, but I wonder if you think there is a reason
> that this patch may also address that issue.
>
> Thanks,
> Pavan
>
> On Sat, Oct 17, 2015 at 6:13 PM, Krzysztof Nienartowicz <
> krzysztof.nienartowicz at unige.ch> wrote:
>
>> FYI:
>> Poll replacing select in pool manager (poolmgr.c) merged from X2 and new
>> implementation for execRemote.c are here:
>>
>>
>> https://github.com/yazun/postgres-xl/commit/15d812542030d1e1b26db3cac3e0e002b3893189
>> X2 could also profit from execRemote.c switch to poll.
>>
>> The patch linked is slightly bigger and could be certainly cleaned up.
>>
>> The number of open TCP sockets for 4 coords, 8 datanodes and ~140 clients
>> is reaching  ~1200 (lsof -iTCP  | wc -l) on datanodes and ~4600 on
>> coordinator/datanode. Even more with unix sockets.
>> After deploying cluster with these changes I see much much better
>> robustness (connection/pool errors are gone and performance increased few
>> percent as well), it's the first time I can say XL stability reaches
>> satisfactory level for bulk loading (for us, of course).
>>
>> Krzysztof
>>
>> On Fri, Oct 16, 2015 at 12:22 PM, Krzysztof Nienartowicz <
>> krzysztof.nienartowicz at unige.ch> wrote:
>>
>>> I would also suggest that select from pgxc_node_receive() from
>>> src/backend/pgxc/pool/pgxcnode.c be poll() based, just for the sake of
>>> performance when there are many sockets open, like in our case. If I find
>>> some time I will create a patch for this.
>>>
>>>
>>>
>>> On Fri, Oct 16, 2015 at 11:36 AM, Krzysztof Nienartowicz <
>>> krzysztof.nienartowicz at unige.ch> wrote:
>>>
>>>> Hello,
>>>> I was hit by the hanging pooler problem with ~100-120 clients on 4
>>>> coord, 8 datanodes setup after few minutes of running. After some time all
>>>> processes are stalled and pooler is running 100% CPU with no clear symptoms.
>>>> With this number of nodes number of intra connections is > 1K and
>>>> select in pooler loop breaks. I replaced select() with pool() in poolmgr.c
>>>> but then noticed X2 had already fix for this so I decided to merge it into
>>>> our local tree no to stray even more from X2. My merge was quick and
>>>> removed multiple af_unix socket directories initialisation (which I think
>>>> was bit spurious anyways - socket of the last directory on the list was the
>>>> one select'ed on) and I can provide you with the patch or maybe you should
>>>> do the merge yourself?
>>>> This could also affect any setup with many datanodes, similar to what
>>>> Tobias was aiming at and the fix is going in the direction he suggested.
>>>>
>>>>
>>>> https://github.com/postgres-x2/postgres-x2/blob/master/src/backend/pgxc/pool/poolmgr.c#L2325
>>>>
>>>> getting rid of select:
>>>> https://github.com/postgres-x2/postgres-x2/issues/143
>>>> other issue with pooler:
>>>> https://github.com/postgres-x2/postgres-x2/issues/163
>>>>
>>>> I will check other places for select and it's clear this is one of the
>>>> things that should be fixed ASAP to support more than minimal
>>>> setup/workload.
>>>>
>>>> This also brings a question of X2/XL split. Clearly X2 advanced quite a
>>>> lot and XL seems to lag now in some aspects. Given the resources available
>>>> X2/XL merge would be very welcome, also to keep both projects vivid. We
>>>> would hope it could converge.
>>>>
>>>> Best regards,
>>>> Krzysztof
>>>>
>>>>
>>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Postgres-xl-general mailing list
>> Postgres-xl-general at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/postgres-xl-general
>>
>>
>
>
> --
>  Pavan Deolasee                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Postgres-xl-general mailing list
> Postgres-xl-general at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/postgres-xl-general
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.postgres-xl.org/pipermail/postgres-xl-general-postgres-xl.org/attachments/20151019/db824618/attachment.htm>


More information about the postgres-xl-general mailing list