Scylla is rack-aware, not AZ-aware, but if each AZ is mapped to a rack correctly via snitch config, it achieves similar behavior.
It doesn’t explicitly “detect AZ failure”, but it reacts to node-level failures via Gossip, which can reflect partial AZ failure.
Node selection for read/write depends on replica placement, consistency level, and node liveness.
- How Scylla Selects Nodes for Read/Write
When you issue a read/write at a given consistency level, Scylla:
- Uses token ring to determine which replicas are responsible for the partition key.
- Selects nodes based on snitch-configured rack (AZ) distribution.
- Skips unreachable nodes and may fail or retry based on the consistency level.
Example:
- Replication Factor = 6
- 3 AZs → 2 replicas per AZ (assuming 1 AZ = 1 rack)
- Write with
QUORUM
(needs 4/6 replica acks):
- If 3 nodes are down (2 in AZ-1, 1 in AZ-2), only 3 are left.
- Write fails, because quorum (4) isn’t reachable.
- Is Scylla AZ-aware?
No, not directly. Scylla is rack-aware, and you can map AZs to racks via snitch.
You use GossipingPropertyFileSnitch
and set:
dc=us-east-1
rack=az-1
for each node to reflect its AZ. Scylla will then try to spread replicas across racks (AZs), and avoid reading/writing to a single rack only when possible.
- Handling Partial AZ Failure
Scylla handles this at the node level, not AZ-level:
Scenario:
- 6 replicas total (2 per AZ), and the following failures:
- AZ-1: 2 nodes down
- AZ-2: 1 node down
- AZ-3: fully up
Scylla reacts like this:
- QUORUM consistency:
- Needs 4 replicas.
- Only 3 reachable ⇒ write/read fails.
- LOCAL_QUORUM:
- Depends on how you’re writing (e.g.
LOCAL_QUORUM
in DC1
).
- Only contacts replicas in the local DC, so cross-AZ latency is avoided and availability may be better.
- Scylla does not retry on “other” nodes not in replica list. It only retries within replica set.
- If a write fails to get enough replica acks, it’s not redirected to other nodes — it fails.
- Read repairs or hinted handoff (if enabled) may help after recovery, not during.
- Does Scylla Retry or Reroute?
- No, not automatically to non-replica nodes.
- Write path is deterministic: if you need 4 replicas and only 3 are alive, the coordinator node fails the write.
- For reads:
- Scylla will try live replicas first.
- If one is down, may speculatively retry to other live replicas in the same token range.
- But always stays within replica set.
Best Practices
- Always map AZs as racks using
GossipingPropertyFileSnitch
.
- Prefer
LOCAL_QUORUM
for latency and availability.
- Avoid
QUORUM
across DCs unless consistency over latency is more important.
- Keep RF ≤ 3 per DC unless you’re really confident in high availability setup.
- Use Scylla Manager repair and monitoring to detect tablet/replica issues early.
To summarize -
Q: If a node is down, how does Scylla select read/write nodes?
A: Scylla selects replicas responsible for the token range of the partition. If a replica is down, it skips it and fails the request if not enough live replicas are available to satisfy the consistency level.
Q: If two nodes are down in AZ-1 and one in AZ-2, would Scylla be AZ-aware?
A: Scylla isn’t AZ-aware per se, but if each AZ is mapped to a separate rack via the snitch config, it behaves AZ-aware. It doesn’t explicitly “know” an AZ is down, but gossip tracks node liveness. If the remaining live replicas aren’t enough to satisfy the consistency level, the request fails.