Could scylla identify AZ-level failures

qiuqiu · April 22, 2025, 3:13am

Installation details
#ScyllaDB version:
#Cluster size:
os (RHEL/CentOS/Ubuntu/AWS AMI):

If there is failure in a datacenter, for example, a cluster with 6 replicas, which means every datacenter owns 2 replicas, when the 2 replica suddenly down without data stream, could scylla aware an az-level failure ?

Gabriel · April 23, 2025, 8:25am

A best practice is to consider an Availability Zone (AZ) as a Rack

ScyllaDB uses the NetworkTopologyStrategy and supports rack awareness via Snitch configuration (like GossipingPropertyFileSnitch).

When you define racks within a datacenter, Scylla tries to spread replicas across racks to improve resilience — assuming you have more replicas than racks.

So if:

1 datacenter = 1 region
1 rack = 1 availability zone
And you have 2 replicas per DC, with each replica placed in a different AZ (rack)

Then yes, ScyllaDB will:

Try to distribute replicas across those 2 AZs (racks).
Detect that a node in a rack (AZ) is down via gossip.
Continue serving requests as long as your consistency level allows it.

qiuqiu · April 23, 2025, 8:35am

Consider a failure like this, cluster as I describe above, a put req send to 6 nodes and two nodes in each az will get the req. Suddenly, two nodes in az-1 is down without response, another node in az-2 is down without response. Then in this case, 6 replicas with quorum write consistency level is bound to get failed. Would it retry, I means send to other nodes ??

And consider another case, one node suddenly down in each az ??

Gabriel · April 23, 2025, 9:02am

TLDR; you should use LOCAL_QUORUM instead of QUORUM - which also consumes quite significant bandwidth and will have perfomance impact (due to the need to read/write across DCs)

Then, for 3 DCs and 3 Racks, create the keyspaces as below -

CREATE KEYSPACE my_app_keyspace
WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'DC1': 3,
  'DC2': 3,
  'DC3': 3
};

For the above setup, LOCAL_QUORUM = 2, which is much tollerable for failures

Normally I would not recommend a replication factor higher than 3.

qiuqiu · May 22, 2025, 7:50am

Thanks for your nice answer. And I have another problem bothering me,

If a node is down, how it select nodes to write or read ?
If there is two nodes down in az-1 and one node down in az-2, a 6-replica, 3-az cluster, then would it aware az-level failure ? If not, how it work ?

Gabriel · May 27, 2025, 2:24pm

Scylla is rack-aware, not AZ-aware, but if each AZ is mapped to a rack correctly via snitch config, it achieves similar behavior.

It doesn’t explicitly “detect AZ failure”, but it reacts to node-level failures via Gossip, which can reflect partial AZ failure.

Node selection for read/write depends on replica placement, consistency level, and node liveness.

How Scylla Selects Nodes for Read/Write

When you issue a read/write at a given consistency level, Scylla:

Uses token ring to determine which replicas are responsible for the partition key.
Selects nodes based on snitch-configured rack (AZ) distribution.
Skips unreachable nodes and may fail or retry based on the consistency level.

Example:

Replication Factor = 6
3 AZs → 2 replicas per AZ (assuming 1 AZ = 1 rack)
Write with QUORUM (needs 4/6 replica acks):
- If 3 nodes are down (2 in AZ-1, 1 in AZ-2), only 3 are left.
- Write fails, because quorum (4) isn’t reachable.

Is Scylla AZ-aware?

No, not directly. Scylla is rack-aware, and you can map AZs to racks via snitch.
You use GossipingPropertyFileSnitch and set:

dc=us-east-1
rack=az-1

for each node to reflect its AZ. Scylla will then try to spread replicas across racks (AZs), and avoid reading/writing to a single rack only when possible.

Handling Partial AZ Failure

Scylla handles this at the node level, not AZ-level:

Scenario:

6 replicas total (2 per AZ), and the following failures:
- AZ-1: 2 nodes down
- AZ-2: 1 node down
- AZ-3: fully up

Scylla reacts like this:

QUORUM consistency:
- Needs 4 replicas.
- Only 3 reachable ⇒ write/read fails.
LOCAL_QUORUM:
- Depends on how you’re writing (e.g. LOCAL_QUORUM in DC1).
- Only contacts replicas in the local DC, so cross-AZ latency is avoided and availability may be better.
Scylla does not retry on “other” nodes not in replica list. It only retries within replica set.
- If a write fails to get enough replica acks, it’s not redirected to other nodes — it fails.
- Read repairs or hinted handoff (if enabled) may help after recovery, not during.

Does Scylla Retry or Reroute?

No, not automatically to non-replica nodes.
Write path is deterministic: if you need 4 replicas and only 3 are alive, the coordinator node fails the write.
For reads:
- Scylla will try live replicas first.
- If one is down, may speculatively retry to other live replicas in the same token range.
- But always stays within replica set.

Best Practices

Always map AZs as racks using GossipingPropertyFileSnitch.
Prefer LOCAL_QUORUM for latency and availability.
Avoid QUORUM across DCs unless consistency over latency is more important.
Keep RF ≤ 3 per DC unless you’re really confident in high availability setup.
Use Scylla Manager repair and monitoring to detect tablet/replica issues early.

To summarize -

Q: If a node is down, how does Scylla select read/write nodes?

A: Scylla selects replicas responsible for the token range of the partition. If a replica is down, it skips it and fails the request if not enough live replicas are available to satisfy the consistency level.

Q: If two nodes are down in AZ-1 and one in AZ-2, would Scylla be AZ-aware?

A: Scylla isn’t AZ-aware per se, but if each AZ is mapped to a separate rack via the snitch config, it behaves AZ-aware. It doesn’t explicitly “know” an AZ is down, but gossip tracks node liveness. If the remaining live replicas aren’t enough to satisfy the consistency level, the request fails.

qiuqiu · May 28, 2025, 9:43am

It doesn’t explicitly “know” an AZ is down, but gossip tracks node liveness. If the remaining live replicas aren’t enough to satisfy the consistency level, the request fails.

Scylla would not send requests to shutdown nodes, right? if nodes in az-1 is all down, then requests would be sent to nodes that alive in other az , then new request would succeed ?

Gabriel · May 28, 2025, 10:15am

Yes, Scylla will not send requests to nodes that are marked as down via gossip. Referencing the above example, If all nodes in AZ-1 are down, the coordinator will route requests only to live replicas in AZ-2 and AZ-3. The request will succeed if enough live replicas are available to meet the consistency level (e.g. 4 for QUORUM). If not enough live replicas exist, the request will fail immediately. So success depends on both replica distribution and consistency level.

Topic		Replies	Views
Using Read and Write Consistency Level of Quorum - how many nodes can be down? ScyllaDB	1	185	March 14, 2023
How can I determine if my data will be available on node failure depending on the Replication Factor, Consistency Level, etc.? ScyllaDB data-model , consistency , high-availability , replication	1	38	August 1, 2024
Need to move a large cluster from 1 rack to 3 racks ScyllaDB troubleshooting , operator , topology-change	2	38	May 19, 2025
Loss of Availability and Timeout errors, Kubernetes nodes de-scheduled ScyllaDB troubleshooting , kubernetes , high-availability	0	153	May 20, 2024
Mitigating AWS outage ScyllaDB cloud	1	195	August 13, 2023

Could scylla identify AZ-level failures

Example:

Scenario:

Best Practices

To summarize -

Related topics