Let’s distinguish 2 terms here:
- Shard Awareness: the ability of a driver to route a request to a shard that owns the data. It requires the driver to have a connection open to the shard. This is why driver tries to open a connection to each shard of each node. When you connect to normal port (9042) you get assigned “random” shard (I think it’s not really random, but that’s not important here) - so opening connection to each one can take a lot of failed attempts and a lot of time.
- Advanced Shard Awareness - the ability to open a connection to a specific shard, so that building a full connection pool is faster. This is where port 19042 is used. When you make a connection to it, Scylla looks at the source port of this connection and assigns it to a shard number
source_port % amount_of_shards
. That allows the driver to not have any failed attempts when building connection pool.
When there is NAT between Scylla and the driver, source port visible by server will be different than the one that the client really uses - so Advanced Shard Awareness won’t work. Drivers that support Advanced Shard Awareness will detect this situation, fall back to port 9042 and build connection port the old way. Shard Awareness will still work in this situation, but fully creating a session will just take longer.
If I remember correctly, Java Driver 4.x support Shard Awareness but not Advanced Shard Awareness - @piotr or @Bouncheck please correct me if I’m wrong.
So to answer your questions:
1, Scylla Java Driver 4.x is Shard Aware, but does not have Advanced Shard Awareness (for now - because we plan to implement it).
2, Shard Awareness works behing NAT. Advanced Shard Awareness does not, but 4.x does not have it for now anyway.
3, Yes
4, No, Shard Awareness does not depend on the port. Only Advanced Shard Awareness does.
5, Correct.