Deleting old sstables after ttl change

Installation details
#ScyllaDB version: 5.2.19 (the update is planned after OS update )
#Cluster size: 9 nodes
os (RHEL/CentOS/Ubuntu/AWS AMI): CentOS 8 (mostly)

So I added a ttl for a table(TWCS) and want to manually delete old data that was saved without any ttl. I’ve seen recommendations about writing all not expired data into new table but I really don’t want to do that since my ttl is big and most of the data is not yet expired - I would end up using x2 disk during this migration (i think ?)

I was thinking about locating sstables that are fully expired (but don’t have ttl) with sstablemetadata and deleting them on disk. Is this possible ? The way twcs is described it shpud not be a problem.
Data outside of ttl is rarely read and I dont care if a few read request in old time windows will fail but I plan to disable read repairs during this cleanup. Don’t know if I should remove a node from cluster during this cleanup anyway ?

I was reminded that read repair can not be completely disabled ( read_repair_chance only controlls probobalistic repairs) so I guess if there are any reads with CL!=ONE deleted sstable can be partially revived by read repairs

So that is a problem

Maybe this really is not very safe )

If some one known of any other problems , please comment

Deleting sstables on the disk will work, but as you noted, if any of this data is read while this is in progress, read repair will be a nuisance and indeed there is no way to disable it.

TWCS organizes data into windows, so sstables will have a narrow timestamps range, which – as you noted – is possible to identify with sstablemetata or scylla sstable dump-statistics.
Note that you have to take the node offline (stop it) while deleting the sstables. You can do the identification of the sstables-to-delete while the node is running. Then you can do a quick rolling restart and delete sstable round.

To work around the read repair problem, you can also delete the expired data with a DELETE. I don’t know how feasible this is, i.e. how easy it is to identify the keys that are expired.

Make sure you create a backup before you do this, in case something goes wrong.

2 Likes