Nodetool Refresh to Migrate Data

GarrettPoore · May 6, 2024, 6:04pm

Hi, I’m looking into options to migrate Scylla data in a few different ways. We have multiple clusters with customer data split across them. We have a few customers that were initially split (for downstream “ease” to setup), but we’re looking to merge them now. I think nodetool refresh is the way we want to go with this, but we have a few scenarios that we will run into.

Scenario 1:

Keyspace1 & Keyspace2, both on Cluster1
- Copy the SSTable files from Keyspace1 to Keyspace2 on each node, run refresh, drop Keyspace1

This seems fairly cut and dry. Should we still run load and stream, or is it redundant in this case? Will we need to do a nodetool cleanup if not?

Scenario 2:

Keyspace1 on Cluster1 (3 nodes) & Keyspace2 on Cluster2 (6 nodes)
- Copy the SSTable files across the Clusters, run refresh with load and stream, drop Keyspace1

Load and stream should handle the cluster chnage as far as I can tell, but should we just copy the SSTables Node1 > Node1, or would it make more sense to split it (Node1 > 50% to Node1/ 50% to Node2) to distribute the load on the cluster and speed things up? Or would there be issues with splitting the SSTables like this?

Scenario 3:

Keyspace1 on Cluster1 (just rename to new customer name)
- Create new Keyspace2 & tables, copy SSTable files, run refresh, drop Keyspace1

I assume we would still need a refresh if we are just renaming a keyspace (but not the tables), or is there some better way to handle that?

avikivity · May 7, 2024, 3:54pm

load-and-stream is preferable as it doesn’t require you to copy the data everywhere, and doesn’t require nodetool cleanup afterwards.

GarrettPoore · May 8, 2024, 1:27pm

Thanks for the reply Avi. Though I still have some use questions:

Would it be better to split files evenly across nodes for a load-and-stream refresh? Or all on one node? Or it doesn’t make a difference?
For renaming a keyspace, is there something less heavy than a full clone and refresh of the current keyspace?

avikivity · May 26, 2024, 3:30pm

It’s better to run work on all nodes.

There’s no fast alternative to rename a keyspace.

Topic		Replies	Views
Out scaling a cluster, restoring a backup into a new cluster to avoid streaming ScyllaDB performance , administration , backup-restore	0	100	December 1, 2024
Scylla 5.2 Load and Stream ScyllaDB open-source	4	179	November 13, 2023
How to use sstableloader in ScyllaDB docker container ScyllaDB cassandra , migration , administration , sstable	6	82	May 10, 2025
Migrating data from one cluster to another, Scylla Manager, Migrator and nodetool snapshot ScyllaDB scylla-manager , migration , nodetool , snapshot , scylla-migrator	0	59	April 27, 2025
Using nodetool refresh, load and stream to backup and restore an entire cluster ScyllaDB nodetool , backup-restore	0	183	April 14, 2024

Nodetool Refresh to Migrate Data

Related topics