Originally from the User Slack
@Daria_Fedorova: Hi, I have a question about Time Window CompactionStrategy
I need to add TTL but I can’t just use 1 TTL value for all records (2 values maybe and a small number of records without TTL ) And I understand that it is not ideal if I leave them all in one table, but shouldn’t tombstone compaction solve the problem of memory release ?
This will add strain on disk probably but are there any other problems with multiple TTL in one table ? My only other alternative is joining the the time-series from 3 table on client side (thankfully it is not so difficult)
@Felipe_Cardeneti_Mendes: you need to remember that a bucket will only be deleted after all records expire. If you are using 2 different TTLs, then you probably should select a window size that matches your largest TTL.
For records with no TTL you shouldn’t be using TWCS, use another strategy instead
@Daria_Fedorova: I did not find anything on how to choose window size for TWCS , the one we have I think is a compromise for fast reads for a couple of days , it also maybe was chosen when we had different disks - this memory is lost , maybe 1 week is too small window
cassandra doc says something about it but they don’t explain the reasons
Ideally, operators should select a compaction_window_unit and compaction_window_size pair that produces approximately 20-30 windows - if writing with a 90 day TTL, for example, a 3 Day window would be a reasonable choice ('compaction_window_unit':'DAYS','compaction_window_size':3).
The larger ttl will be 3 or 5 years , sstables will grow very big , is it not a problem ?
if I follow cassandra doc then window size for TTL = 3 years is around 36 days
Can you explain what factors to consider when choosing a window ?
i think that reads are faster if they work with 1 sstable and for this window == 1 week seems reasonable for us
What is the motivation to make it bigger ?
@Felipe_Cardeneti_Mendes: You need to remember that SSTables will only be evicted after all data within them expires (hence why you shouldn’t use TWCS if you have records with no TTL).
If you have a window of 1 week, but 3 years of TTL, then that will result in >150 windows, which may create heavy memory pressure and slow down your reads/repairs over time.
You want to find a good balance between your reads and fewer windows to avoid having too many open files to satisfy a read.
ie: consider a 1 hour window and 30 day TTL. Reading all records will require scanning through 720 windows.
@Daria_Fedorova: Ok I see )
that is not a big concern for us
mostly reads requests a couple of days or month
and we split very long reads on client (api) anyway
@Felipe_Cardeneti_Mendes: a last thing is that you shouldn’t generally worry much about large SSTables, TWCS compacts everything altogether on a per-window basis to optimize for reads. Plus SSTables are further split by shard.
@Daria_Fedorova: I ask a college and he said that they did not want a bigger window because last window uses Size Tiered Compaction and with bigger window when we read range of last records for user it will iterate over many sstables ?
and another concern was longer compact ( it affects latency a little now , idk if it will get worse )
@Felipe_Cardeneti_Mendes: all windows use STCS
@Daria_Fedorova: thx
@Felipe_Cardeneti_Mendes: by the way, we are going to have a compaction talk in our upcoming ScyllaDB Summit scylladb.com/summit
@Daria_Fedorova: yes, I am planning to tune in