Hi, I’m looking into options to migrate Scylla data in a few different ways. We have multiple clusters with customer data split across them. We have a few customers that were initially split (for downstream “ease” to setup), but we’re looking to merge them now. I think nodetool refresh
is the way we want to go with this, but we have a few scenarios that we will run into.
Scenario 1:
- Keyspace1 & Keyspace2, both on Cluster1
- Copy the SSTable files from Keyspace1 to Keyspace2 on each node, run refresh, drop Keyspace1
This seems fairly cut and dry. Should we still run load and stream, or is it redundant in this case? Will we need to do a nodetool cleanup
if not?
Scenario 2:
- Keyspace1 on Cluster1 (3 nodes) & Keyspace2 on Cluster2 (6 nodes)
- Copy the SSTable files across the Clusters, run refresh with load and stream, drop Keyspace1
Load and stream should handle the cluster chnage as far as I can tell, but should we just copy the SSTables Node1 > Node1, or would it make more sense to split it (Node1 > 50% to Node1/ 50% to Node2) to distribute the load on the cluster and speed things up? Or would there be issues with splitting the SSTables like this?
Scenario 3:
- Keyspace1 on Cluster1 (just rename to new customer name)
- Create new Keyspace2 & tables, copy SSTable files, run refresh, drop Keyspace1
I assume we would still need a refresh if we are just renaming a keyspace (but not the tables), or is there some better way to handle that?