Nodetool Refresh to Migrate Data

Hi, I’m looking into options to migrate Scylla data in a few different ways. We have multiple clusters with customer data split across them. We have a few customers that were initially split (for downstream “ease” to setup), but we’re looking to merge them now. I think nodetool refresh is the way we want to go with this, but we have a few scenarios that we will run into.

Scenario 1:

  • Keyspace1 & Keyspace2, both on Cluster1
    • Copy the SSTable files from Keyspace1 to Keyspace2 on each node, run refresh, drop Keyspace1

This seems fairly cut and dry. Should we still run load and stream, or is it redundant in this case? Will we need to do a nodetool cleanup if not?

Scenario 2:

  • Keyspace1 on Cluster1 (3 nodes) & Keyspace2 on Cluster2 (6 nodes)
    • Copy the SSTable files across the Clusters, run refresh with load and stream, drop Keyspace1

Load and stream should handle the cluster chnage as far as I can tell, but should we just copy the SSTables Node1 > Node1, or would it make more sense to split it (Node1 > 50% to Node1/ 50% to Node2) to distribute the load on the cluster and speed things up? Or would there be issues with splitting the SSTables like this?

Scenario 3:

  • Keyspace1 on Cluster1 (just rename to new customer name)
    • Create new Keyspace2 & tables, copy SSTable files, run refresh, drop Keyspace1

I assume we would still need a refresh if we are just renaming a keyspace (but not the tables), or is there some better way to handle that?

load-and-stream is preferable as it doesn’t require you to copy the data everywhere, and doesn’t require nodetool cleanup afterwards.

1 Like

Thanks for the reply Avi. Though I still have some use questions:

  • Would it be better to split files evenly across nodes for a load-and-stream refresh? Or all on one node? Or it doesn’t make a difference?
  • For renaming a keyspace, is there something less heavy than a full clone and refresh of the current keyspace?