Compaction strategy best for deletion of a large number of records

Hello! I wanted to ask for some suggestions on what compaction strategy to use for my use case.
I’m using ScyllaDB to store the data generated by running in parallel multiple scripts. These scripts can finish running independently and run for different periods of time.
Because of that, I’ve added a default TTL value of 1 day so that the data that’s generated by a script that runs for multiple days and that’s older than 1 day is automatically deleted.
At the same time, once the scripts finish running, I have some logic that deletes everything related to the specific run with something like DELETE FROM table WHERE run_id = "specific_run".

My question is what kind of compaction strategy do you think would best fit this use case? Up until now I’ve been using TWCS along with these settings:

WITH default_time_to_live = 86400
  AND compaction = {
    'class': 'TimeWindowCompactionStrategy',
    'compaction_window_size': 6,
    'compaction_window_unit': 'HOURS',
    'tombstone_compaction_interval': 3600
  }
  AND gc_grace_seconds = 3600;

In the beginning I started with a 1 day time window, but I felt that the compaction was not triggered frequently enough to properly get rid of all of the tombstones created by the DELETE part at the end of the scripts. Now I read that apparently TWCS is not that good for manual deletes and I’m not sure anymore on what to use. Can you please help me out with a suggestion? :smiley:

Tombstones are challenging with all compaction strategies, there is no “best one” for delete heavy workloads.
Therefore I would recommend going with the generalist STCS, as TWCS will handle explicit deletions very poorly and LCS is only adequate for mostly read workloads.

2 Likes