Out scaling a cluster, restoring a backup into a new cluster to avoid streaming

Guy · December 1, 2024, 4:36am

Originally from the User Slack

@Terence_Liu: Have been reading this doc https://opensource.docs.scylladb.com/stable/operating-scylla/procedures/cluster-management/add-node-to-cluster.html. By my understanding, if I just ingested to a big single-node cluster, and backed up the relevant keyspace/tables to the cloud, I should restore this backup to a node prod space, add two more nodes to make it RF=3, and wait for streaming from the first node to the other two.

Can I do this instead - restore the same backup to all three nodes, and boot them up together to avoid the streaming process? Assuming all three nodes have identical data, this should be possible?

I understand it’s harder when the original backup is more than one host, because it’s a lot harder to know how the key ranges are distributed. I assume nodetool refresh will help in this case?

ScyllaDB node will ignore the partitions in the sstables which are not assigned to this node. For example, if sstable are copied from a different node.
Adding a New Node Into an Existing ScyllaDB Cluster (Out Scale) | ScyllaDB Docs

@Pete_Aven: Hi Terence, you should be able to boot up an empty 3 node cluster, then follow the restore process for a table. It consists of:
• Create table schema on empty cluster
• Copy all the table data into the table directory - usually /var/lib/scylla/data/keyspacename/tablename-uuid/upload/ ( copy the table data from the single node to each node in the new cluster - a 1:3 copy
• Then use nodetool refresh -- keyspacename tablenamehttps://opensource.docs.scylladb.com/stable/operating-scylla/nodetool-commands/refresh.html
This will ingest the backup into a running cluster, which could also be serving traffic (especially writes)

@Terence_Liu: Thank you Pete. Do I need to execute nodetool refresh 1 time on each of the nodes? Or doing it once on any node will cause every node to take in the upload folder sstables?
Hi @Pete_Aven. If I do a 1:3 copy (RF=3), will this speed up ingestion by bypassing the load-and-stream process? Or will it actually create more load because each node needs to duplicate the streams to other nodes over essentially the same data?

@Pete_Aven: @Terence_Liu Following the suggestion above, there should be no streaming. You copy the table data over to each node . You run nodetool refresh on each node. All refresh operations will be node local. Then you run repair when all that’s done and the cluster will be in sync.

@Terence_Liu: thank you!

Topic		Replies	Views
Using nodetool refresh, load and stream to backup and restore an entire cluster ScyllaDB nodetool , backup-restore	0	168	April 14, 2024
Nodetool Refresh to Migrate Data ScyllaDB migration , nodetool	3	250	May 26, 2024
Scylla 5.2 Load and Stream ScyllaDB open-source	4	178	November 13, 2023
[RELEASE] Scylla Manager 3.4.0 Release Notes release , scylla-manager , manager-release	0	207	November 6, 2024
Backup and resore ScyllaDB scylla-manager	1	200	July 23, 2023

Out scaling a cluster, restoring a backup into a new cluster to avoid streaming

Related topics