How Does ScyllaDB Find the Node Containing the Data I Want?

The driver can connect to any Scylla node and perform a query. That node will be designated as the coordinator node for the given query. The coordinator node can be the replica node (the one holding the data), but it doesn’t have to be.

In ScyllaDB (and Apache Cassandra) Each node in the cluster is responsible for a set of tokens.

The coordinator node hashes the Partition Key, using the Partition Hash Function to determine which nodes are responsible for that data.

Because the partition hash function is known to the client, token-aware drivers can optimize the performance by choosing the coordinator node as one of the replica nodes.

This is efficient and as a result the number of network hops is lower and the cluster internal load gets reduced.

Scylla shard-aware drivers further increase performance by routing the query not only to the right replica node but also to the right shard (or CPU core) within that node.

Additional Resources:

Scylla Architecture - Fault Tolerance on ScyllaDB Docs

*The question was originally asked on the user slack channel