Inter DC Replication Latency with strong consistency

Installation details Small cluster, 2 DCs , 3 Nodes in each DC, RF 2
#ScyllaDB version: 2025.2
#Cluster size: 2 x 3
os (RHEL/CentOS/Ubuntu/AWS AMI): Ubuntu 22.04 LTS

Hi,

assuming there are 2 DCs with 100ms latency amongst them and 3 nodes per DC. All nodes are created equal (same hardware, connectivity, load).

The latency inside each DC is 0.5ms

Lets say for simplicity the given link latencies are constant and all nodes are healthy, what would be the expected replication times in the whole cluster, especially between the DCs ? Cases of interest are EACH_QUORUM and ALL .

Would be glad for any answer or hint or even wild speculation

Thanks,

Adi

Well theres a reason for asking ..

A while back on a similar project Mongo was giving the team serious headaches with casual replication backlogs going from a few seconds to sometimes even minutes in the same datacenter and even more on remote. (setup was very similar: 2 DCs with 3 nodes in each for a replica set ( 1 master, 2 replicas ), moderate load, healthy nodes) - Never again!

Just want to take care not getting into similar hot water with Scylla

Thanks

Adi

When you say RF=2 do you mean one in each DC or 2 in each DC?

Either way, if you are using quorum or each quorum, Scylla will write to both DC’s before acknowledging to the application. So it’s not any replication lag - the write itself will be slower.

Also note - Scylla does not have a binary log.

It attempts every write to all targets all replicas at time of execution. If some replicas can’t be written but aren’t required to meet consistency, if hints are on, and hint will be generated (to a point).

ScyllaDB uses repair, read repair as anti-entropy to re-establish consistency, rather then playing a replication backlog as you’re imagining.

I recommend taking a look at Scylla U for consistency levels and anti-entropy.

Hi Patrick,

thanks a lot for your response !

RF=2 was a typo on my side. We go with RF=3 so a quorum can be reached. Sorry for the confusion.

So its 2 DCs, 3 nodes each with replication factor set to 3 in each datacenter so we can reach LOCAL_QUORUM quite fast (needed most of the times).

So presuming that everything runs ok, a write will attempt to propagate immediately to all nodes with a replica (in our case all 6) and the write attempts on the remote DC will start immediately after arrival there as if the write command has been issued there ?

Best Regards,

Adi

RF=3 in both DCs, great!

With LOCAL_QUORUM, 2 out of 3 replicas in the local DC need to ACK the write to be considered successful. Writes are sent to all replicas immediately.

For remote DC, there is an optimization to send to a coordinator on the remote and it then replicates to the other two replicas.

In the LOCAL DC, if replica 3 was down, and a read cones along at LOCAL_QUORUM, and replica 3 is one of the two returning results - the other replica - 1 and 2 have the current data. The difference would be discovered on the LOCAL_QUORUM read and read repair would be triggered to fix up replica 3. Or standard regular repair would catch it up.

The latest replicas would be caught up if writes failed should be via regular repairs.

Thanks for the clarification!

I get the optimization with the coordinator in the remote DC. Data travels only once over the WAN.

Only one small question: Is this coordinator a specific node or randomized ? What happens if the coordinator does not respond ?

There can be write failures. If you have hints enabled, then it could generate hints (there are limits to this) and they can get replayed. The backstop to restore full consistency is regular repairs.