Scylla 5.2 Load and Stream

ericjones2172 · November 9, 2023, 12:52pm

Hello,

I am trying to understand this feature. Nodetool refresh | ScyllaDB Docs

What I don’t understand is if you are going to from 6 nodes(RF=3) to 4 nodes(RF=2), do you need to need to load data from all 6 nodes even if the replication factor in 6 node cluster is 3? If we do need to load data from all 6 nodes into 4 node cluster, are there any risk of running out of space in the new cluster?

felipemendes · November 10, 2023, 3:51am

Excellent question. I addressed Load and Stream specifics in https://www.scylladb.com/2023/09/18/5-more-intriguing-scylladb-capabilities-you-might-have-overlooked/ , so you may also want to check on that.

if you are going to from 6 nodes(RF=3) to 4 nodes(RF=2), do you need to need to load data from all 6 nodes even if the replication factor in 6 node cluster is 3?

Copying all 6 nodes indeed seem an overkill. But the real answer is that it depends.

Are you dual-writing to both clusters? Do you expect all data present in the source cluster to match its target? Also, Do you use NetworkTopologyStrategy and spread the data to 3 AZs?

If yes, then you can start dual-writing, run a repair job and once that repair job finishes snapshot your data from a single AZ and copy it over. Both cluster should be in sync afterwards.

If we do need to load data from all 6 nodes into 4 node cluster, are there any risk of running out of space in the new cluster?

All SSTable data is going to get streamed to its replicas, so you may want to let compaction pick up as you go through each Load and Stream step.

tzach · November 13, 2023, 10:35am

@felipemendes good input. Should we add it to the docs?

ericjones2172 · November 13, 2023, 1:48pm

Do I need to disable compaction during load and stream? And then enable after each load and stream is completed.

felipemendes · November 13, 2023, 9:20pm

Not really. You may want to disable tombstones from getting compacted though, in case their gc_grace_seconds happen to expire. You can do so by setting tombstone_gc to repair. See Preventing Data Resurrection with Repair Based Tombstone Garbage Collection - ScyllaDB

@tzach , that’s definitely a good idea. Ping me if anything

Topic		Replies	Views
Nodetool Refresh to Migrate Data ScyllaDB migration , nodetool	3	275	May 26, 2024
Running Repair after changing the Replication Factor ScyllaDB	1	166	November 14, 2022
Out scaling a cluster, restoring a backup into a new cluster to avoid streaming ScyllaDB performance , administration , backup-restore	0	112	December 1, 2024
Scylla in EC2 EC2 load high ScyllaDB	6	315	August 1, 2023
Replication Strategy Change from Simple to Network ScyllaDB	3	794	March 29, 2023

Scylla 5.2 Load and Stream

Related topics