[Postgres-xl-developers] Why isn't inter-node ssl connection a thing

Tomas Vondra tomas.vondra at 2ndquadrant.com
Fri Nov 10 16:28:22 PST 2017


Hi Benjamin,

On 11/09/2017 10:16 AM, Benjamin DEROCHE wrote:
> Hello,
> 
> I was searching for how to enable ssl for inter-node connection, so 
> datanodes located on different machine don't send our data in clear
> text.
> 
> The only answer I found is this :
> https://bbs.archlinux.org/viewtopic.php?id=225930
> 
> So, why isn't inter-node ssl connection a thing yet ?
> 

Good question.

My understanding is that when the Postgres-XL project (or rather one of
the ancestor projects - StormDB/Postgres-XC) started, it was designed
for on-premise deployments. In those cases the cluster can be reasonably
isolated, so plaintext intra-node communication was not a major concern.

But the shift to cloud/virtualized deployments changed this assumption
quite a bit, of course. A bunch of people asked about this limitation on
the XL mailing lists recently, actually.

The only solution at this point seems to be using some form of network
tunneling/VPN/... but I agree it's not ideal. Firstly, every node may
need to talk to every other node (full mesh topology), so setting up all
the tunnels is somewhat tedious. And secondly, solutions using a central
VPN server add yet another bottleneck/point of failure to the cluster.

So it would be great to remove this limitation.

> Will it be added by the future ?

Well, someone will have to implement that, and I'm not aware of anyone
working on this bit. I personally am focused on fixing the remaining
issues caused by the Postgres-XL 10 merge, and have some other stuff on
my TODO, so I don't expect to be working on this bit anytime soon.

But I've made a quick experiment to see how much work would it be to get
the SSL working between datanodes. I wasn't really expecting to fix it
by tweaking a few lines, but I wanted to get a better idea of what would
it take to get it working.


Obviously, step #1 is changing the connection string (PGXCNodeConnStr)
to use sslmode=require (or one of the verify choices). But that is not
enough, because (even with a correctly configured SSL) the connections
to other nodes start failing like this:

    WARNING:  unexpected EOF on datanode oid connection: 16385

and on the node we get this in the log right before the failure:

    LOG:  SSL error: wrong version number

So clearly, we're not sending correctly encrypted data :-/

The reason why this is happening is the built-in connection pooling. The
way it is currently implemented is that backends do not open connections
themselves, but instead delegate this to a "pool manager" (which then
reuses the connections for other backends, to reduce the overhead).

But the pool manager only hands over file descriptors representing the
network sockets, and not the full PGconn data structure. That is enough
for send/recv on the socket (see send_some/pgxc_node_read_data), but not
enough for encryption/decryption.

Ideally, we would just call secure_write/secure_read (or rather
pqsecure_write/pqsecure_read [1] which is the front-end version). But
that requires more information, particularly these PGconn fields [2]
(PGconn is just a typedef of struct pg_conn):

    bool    ssl_in_use;
    SSL    *ssl;         /* SSL status, if have SSL connection */
    X509   *peer;        /* X509 cert of server */

Copying the ssl_in_use flag is trivial, of course. For the SSL part, I
have no idea how that can be done (and I don't think I've ever seen a
more complicated struct than ssl_st [3]).

Stackoverflow suggests it may be done by exporting the SSL_SESSION using
callbacks [4], but my experience with OpenSSL is rather limited.

But I guess there must be some way to do it - connection pooling is not
such a unique feature, I believe.


As I said before, this is not very high on my current TODO list. But it
seems like a fairly well isolated issue (pretty much just the connection
pooling code in poolmgr.c and pgxcnode.c), requiring basic C networking
knowledge and a bit of OpenSSL.

So if anyone wants to get their hands dirty and help with improving XL,
this seems like an ideal opportunity. I'm ready to help / review and
consult the patches, of course.


regards


[1]
https://github.com/postgres/postgres/blob/master/src/interfaces/libpq/fe-secure.c#L205

[2]
https://github.com/postgres/postgres/blob/master/src/interfaces/libpq/libpq-int.h#L460

[3] https://github.com/openssl/openssl/blob/master/ssl/ssl_locl.h#L1011

[4]
https://stackoverflow.com/questions/12426246/openssl-accept-tls-connection-and-then-transfer-to-another-process

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


More information about the Postgres-xl-developers mailing list