Hello! I wanted to ask for some suggestions on what compaction strategy to use for my use case.
I’m using ScyllaDB to store the data generated by running in parallel multiple scripts. These scripts can finish running independently and run for different periods of time.
Because of that, I’ve added a default TTL value of 1 day so that the data that’s generated by a script that runs for multiple days and that’s older than 1 day is automatically deleted.
At the same time, once the scripts finish running, I have some logic that deletes everything related to the specific run with something like DELETE FROM table WHERE run_id = "specific_run"
.
My question is what kind of compaction strategy do you think would best fit this use case? Up until now I’ve been using TWCS along with these settings:
WITH default_time_to_live = 86400
AND compaction = {
'class': 'TimeWindowCompactionStrategy',
'compaction_window_size': 6,
'compaction_window_unit': 'HOURS',
'tombstone_compaction_interval': 3600
}
AND gc_grace_seconds = 3600;
In the beginning I started with a 1 day time window, but I felt that the compaction was not triggered frequently enough to properly get rid of all of the tombstones created by the DELETE part at the end of the scripts. Now I read that apparently TWCS is not that good for manual deletes and I’m not sure anymore on what to use. Can you please help me out with a suggestion?