Replication Strategy Change from Simple to Network

Hi, I have several Scylla clusters that have been running in production for multiple years. I realized last year that we still use the default SimpleStrategy for replication on all of our data. This is obviously a bad choice for production, and I have switched over what keyspaces I could using the recommended procedure.

The problem I have however, is that our largest keyspaces are 100s of TBs, and just running some exploratory repairs gives me an estimate of this whole procedure taking ~40 hours. My same estimates for the smaller keyspaces ran over almost x2, so this could take up to 80 hours. I cannot get that long of a downtime, because it will require a full company outage. So, I’m exploring other options.

The main alternatives that I’ve been able to come up are not great, so hoping that maybe there is some feature of Scylla that I’m missing that will make these easier, or some other method to accomplish this:

  • Create whole new cluster and move data over
    • Doubles cluster cost until completed
    • Requires mirroring the data, likely with a custom app since multiple datacenters would handle this in any normal situation
    • Still requires a downtime at the end for a final sync, but likely smaller window
    • All apps can be switched over at once, fairly easily
  • Create new keyspaces on the current cluster, copy data over
    • Requires x2 disk space
    • Process would be something like this:
      • Create new keyspace with correct settings
      • Snapshot old keyspaces
      • Copy snapshots into new keyspace
      • Run repairs
    • My assumption is that I would still need to do the repairs from the recommended approach, but I could do it while the normal keyspace is being used
    • If this is true, then it still requires a large final repair to finalize, so may run over on time as well

In addition to my ideas here, I got the following from Felipe on Slack:

  • Update writes to CL = ALL, run the initial repair, deal with less performance until strategy switch, then change read CL = ALL until all data has been repaired again
    • My thought here is it could alternatively cut the downtime in half, running that initial repair live and keeping data consistent until the downtime for the strategy switch and final repair
  • Migrate to new cluster with scylla-migrator (or load-and-stream with 5.1), write to both clusters, switch whenever possible
1 Like