Installation details #ScyllaDB version: 5.2.19 (the update is planned after OS update ) #Cluster size: 9 nodes os (RHEL/CentOS/Ubuntu/AWS AMI): CentOS 8 (mostly)
So I added a ttl for a table(TWCS) and want to manually delete old data that was saved without any ttl. I’ve seen recommendations about writing all not expired data into new table but I really don’t want to do that since my ttl is big and most of the data is not yet expired - I would end up using x2 disk during this migration (i think ?)
I was thinking about locating sstables that are fully expired (but don’t have ttl) with sstablemetadata and deleting them on disk. Is this possible ? The way twcs is described it shpud not be a problem.
Data outside of ttl is rarely read and I dont care if a few read request in old time windows will fail but I plan to disable read repairs during this cleanup. Don’t know if I should remove a node from cluster during this cleanup anyway ?
I was reminded that read repair can not be completely disabled ( read_repair_chance only controlls probobalistic repairs) so I guess if there are any reads with CL!=ONE deleted sstable can be partially revived by read repairs
So that is a problem
Maybe this really is not very safe )
If some one known of any other problems , please comment
Deleting sstables on the disk will work, but as you noted, if any of this data is read while this is in progress, read repair will be a nuisance and indeed there is no way to disable it.
TWCS organizes data into windows, so sstables will have a narrow timestamps range, which – as you noted – is possible to identify with sstablemetata or scylla sstable dump-statistics.
Note that you have to take the node offline (stop it) while deleting the sstables. You can do the identification of the sstables-to-delete while the node is running. Then you can do a quick rolling restart and delete sstable round.
To work around the read repair problem, you can also delete the expired data with a DELETE. I don’t know how feasible this is, i.e. how easy it is to identify the keys that are expired.
Make sure you create a backup before you do this, in case something goes wrong.