What is Incremental Compaction Strategy (ICS), how does it work, and what are its benefits?

tzach · January 22, 2023, 9:02am

When should one use ICS vs STCS?

bhalevy · January 23, 2023, 9:14am

ICS combines the best of Leveled Compaction Strategy (LCS) and Size-tiered Compaction Strategy (STCS). It utilizes the advantages of the two strategies to benefit from the best of both worlds.

STCS needs a lot of temporary space. A rule of thumb is about 50% free disk space. This is true for ScyllaDB as well as other databases like Datastax Enterprise. LCS solves that problem, but it introduces another problem: its write amplification.

ICS splits each large SSTable into an SSTable run with fragments. Each fragment is a roughly fixed-size SSTable, and it holds a unique range of keys, a portion of the whole SSTable run. So instead of writing one big SSTable, ICS writes one big SSTable run that is composed of many small SSTables that have the same size (fragments).

This means that instead of having to release the space only after large SSTables are compacted, leading to a worst case of 50% space amplification, in ICS, the space can be released after each fragment is compacted. This means that the space amplification is significantly reduced. A rule of thumb is to leave about 30% free disk space.

How does it work?

Let’s look at an example of compacting two SSTables runs holding 7GB each, using 7 x 1GB SSTables: instead of writing up to 14GB into a single output SSTable file, we’ll break the output SSTable into a run of up to 14 x 1GB fragments (fragment size is 1GB by default).

This new compaction approach takes runs as input and consequently outputs a new run, which is composed of one or more fragments. Also, the compaction procedure is modified to release an input fragment as soon as all of its data is safe in a new output fragment.

For example, when compacting 2 SSTables together, each 100GB in size, the worst-case temporary space requirement with STCS would be 200G.

On the other hand, ICS would have a worst-case requirement of roughly 2G (with the default fragment size of 1G) for precisely the same scenario. That’s because, with the incremental compaction approach, those 2 SSTables would be actually 2 SSTable runs, each 100 fragments long, making it possible to roughly release one input SSTable fragment for each new output SSTable fragment, both of which are 1GB in size.

How much space is actually saved?

To calculate the worst-case space requirement for ICS, you need to multiply the maximum number of ongoing compactions by the space overhead for a single compaction job. The maximum number of ongoing compactions can be figured out by multiplying the number of shards by log4 of (disk size per shard).

As an example, on a setup with ten shards and a 1TB disk, the maximum number of compactions will be 33 (10 * log4(1000/10)), which results in a worst-case space requirement of 66GB. This means that 93% of the disk space can be used, given that compaction would temporarily increase usage by 6.6% at most.

Keep in mind that in practice, additional disk space should be reserved for system usage, like commitlog, for example.

Because the space overhead is a logarithmic function of the disk size, if you increase the disk size, from the previous example, by, say, a factor of 10 to 10TB, the compaction will temporarily increase space usage by about 2% at most.

To summarize, using ICS saves you money because you don’t have to set aside 50% of the total disk space for compaction. It provides the same low write amplification as STCS. It also has the same read amplification as STCS, even though the number of SSTables in a table is increased compared to STCS. So with ICS, you get the benefits of STCS without the cost of 50% disk space.

Additional Resources:

Incremental Compaction Strategy in ScyllaDB Documentation
Incremental Compaction Strategy (ICS) Deep Dive lesson on ScyllaDB University
Maximizing Disk Utilization with Incremental Compaction blog post
Incremental Compaction 2.0: A Revolutionary Space and Write Optimized Compaction Strategy blog post

Guy · February 6, 2023, 5:23am

Thanks @bhalevy .
I’ll emphasize that the 30% free disk space above is a rule of thumb.

Theoretically, systems can be pushed to 80% (or higher) with certain parameters, but I’d recommend indeed keeping it in the 70s. Especially taking into account other things like snapshots, commit logs growing (when they are kept in the same file system as data), etc.

Also, keep in mind that the above recommendation is for systems that have replaced all tables from STCS to ICS.
If some tables are using STCS and some are using ICS (say the compaction strategy is being changed), the calculation becomes more complex.
Also, in terms of performance, it’s good not to allow disk usage to go to the edge.

Topic		Replies	Views
How does the number of levels affect Leveled Compaction Strategy (LCS)? ScyllaDB compaction	2	457	January 15, 2023
Safe way change table compaction strategy ScyllaDB data-model , compaction	1	42	November 14, 2024
Different running compactions between multi dc ScyllaDB compaction , multi-dc , spark , batch	1	27	July 12, 2024
Compaction strategy best for deletion of a large number of records ScyllaDB ttl , compaction , twcs	1	124	June 14, 2024
[RELEASE] ScyllaDB Enterprise 2022.2.17 Release Notes enterprise , enterprise-release , enterprise-2022-2	0	174	February 8, 2024

What is Incremental Compaction Strategy (ICS), how does it work, and what are its benefits?

Related topics