When should one use ICS vs STCS?
What is Incremental Compaction Strategy (ICS), how does it work, and what are its benefits?
ICS combines the best of Leveled Compaction Strategy (LCS) and Size-tiered Compaction Strategy (STCS). It utilizes the advantages of the two strategies to benefit from the best of both worlds.
STCS needs a lot of temporary space. A rule of thumb is about 50% free disk space. This is true for ScyllaDB as well as other databases like Datastax Enterprise. LCS solves that problem, but it introduces another problem: its write amplification.
ICS splits each large SSTable into an SSTable run with fragments. Each fragment is a roughly fixed-size SSTable, and it holds a unique range of keys, a portion of the whole SSTable run. So instead of writing one big SSTable, ICS writes one big SSTable run that is composed of many small SSTables that have the same size (fragments).
This means that instead of having to release the space only after large SSTables are compacted, leading to a worst case of 50% space amplification, in ICS, the space can be released after each fragment is compacted. This means that the space amplification is significantly reduced. A rule of thumb is to leave about 30% free disk space.
How does it work?
Let’s look at an example of compacting two SSTables runs holding 7GB each, using 7 x 1GB SSTables: instead of writing up to 14GB into a single output SSTable file, we’ll break the output SSTable into a run of up to 14 x 1GB fragments (fragment size is 1GB by default).
This new compaction approach takes runs as input and consequently outputs a new run, which is composed of one or more fragments. Also, the compaction procedure is modified to release an input fragment as soon as all of its data is safe in a new output fragment.
For example, when compacting 2 SSTables together, each 100GB in size, the worst-case temporary space requirement with STCS would be 200G.
On the other hand, ICS would have a worst-case requirement of roughly 2G (with the default fragment size of 1G) for precisely the same scenario. That’s because, with the incremental compaction approach, those 2 SSTables would be actually 2 SSTable runs, each 100 fragments long, making it possible to roughly release one input SSTable fragment for each new output SSTable fragment, both of which are 1GB in size.
How much space is actually saved?
To calculate the worst-case space requirement for ICS, you need to multiply the maximum number of ongoing compactions by the space overhead for a single compaction job. The maximum number of ongoing compactions can be figured out by multiplying the number of shards by log4 of (disk size per shard).
As an example, on a setup with ten shards and a 1TB disk, the maximum number of compactions will be 33 (10 * log4(1000/10)), which results in a worst-case space requirement of 66GB. This means that 93% of the disk space can be used, given that compaction would temporarily increase usage by 6.6% at most.
Keep in mind that in practice, additional disk space should be reserved for system usage, like commitlog, for example.
Because the space overhead is a logarithmic function of the disk size, if you increase the disk size, from the previous example, by, say, a factor of 10 to 10TB, the compaction will temporarily increase space usage by about 2% at most.
To summarize, using ICS saves you money because you don’t have to set aside 50% of the total disk space for compaction. It provides the same low write amplification as STCS. It also has the same read amplification as STCS, even though the number of SSTables in a table is increased compared to STCS. So with ICS, you get the benefits of STCS without the cost of 50% disk space.
- Incremental Compaction Strategy in ScyllaDB Documentation
- Incremental Compaction Strategy (ICS) Deep Dive lesson on ScyllaDB University
- Maximizing Disk Utilization with Incremental Compaction blog post
- Incremental Compaction 2.0: A Revolutionary Space and Write Optimized Compaction Strategy blog post
Thanks @bhalevy .
I’ll emphasize that the 30% free disk space above is a rule of thumb.
Theoretically, systems can be pushed to 80% (or higher) with certain parameters, but I’d recommend indeed keeping it in the 70s. Especially taking into account other things like snapshots, commit logs growing (when they are kept in the same file system as data), etc.
Also, keep in mind that the above recommendation is for systems that have replaced all tables from STCS to ICS.
If some tables are using STCS and some are using ICS (say the compaction strategy is being changed), the calculation becomes more complex.
Also, in terms of performance, it’s good not to allow disk usage to go to the edge.