How does the number of levels affect Leveled Compaction Strategy (LCS)?

I understand how LCS works in databases like ScyllaDB, RocksDB, Cassandra, and so on.
I also know that different algorithms use different numbers of levels for compaction.
How does the number of levels affect the compaction?
Is it possible to use just two levels?

*Based on a question originally asked on Stack Overflow by Bishnu

LCS solves the space amplification problem of Size Tiered Compaction Strategy (STCS). It also reduces read amplification (the average number of disk reads required per read request).

LCS divides the small SSTables (“fragments”) into levels:

Level 0 (L0) includes the new SSTables recently flushed from the Memtables. As their number grows (and reads become slower), the goal is to move the SSTables from this level to the next levels (L1, L2, L3, and so on). Each of the other levels is a single run of exponentially increasing size: L1 is a run of 10 SSTables, L2 is a run of 100 SSTables, L3 is a run of 1000 SSTables, and so on.

A factor of 10 is the default setting in both Scylla and Apache Cassandra.

While the space amplification problem is solved, or at least significantly improved, LCS exacerbates another problem, write amplification.

Write amplification is the number of bytes we had to write to disk for each byte of newly flushed SSTable data. Write amplification is always greater than 1.0 because we write each data item to the commit-log and then write it again to an SSTable. Every time compaction involves this data item and copies it to a new SSTable, it’s another write.

You can learn more about it here:

*Based on an answer on Stack Overflow by TomerSan

In ScyllaDB, Leveled compaction (LCS) works very similarly to how it works in Cassandra and Rocksdb (with some minor differences).

If you want a short overview of how leveled compaction works in Scylla and why, I suggest you read my Write Amplification in Leveled Compaction blog post.

Your specific question on why two levels (L0 of recently flushed SSTables, Ln of disjoint-range SSTables) are not enough - is an excellent question:

The main problem is that a single flushed MemTable (SSTable in L0), containing a random collection of writes, will often intersect all of the SSTables in Ln. This means rewriting the entire database every time there’s a new MemTable flushed, and the result is a super-huge amount of write amplification, which is completely unacceptable.

One way to reduce this write amplification significantly (but perhaps not enough) is to introduce a cascade of intermediate levels, L0, L1, …, Ln. The end result is that we have L(n-1), which is 1/10th (say) the size of Ln, and we merge L(n-1) - not a single SSTable - into Ln. This is the approach that leveled compaction strategy (LCS) uses in all systems you mentioned.

A completely different approach could be not to merge a single SSTable into Ln, but rather try to collect a large amount of data first and only then merge it into Ln. We can’t just collect 1,000 tables in L0 because this would make reads very slow. Rather, to collect this large amount of data, one could use size-tiered compaction (STCS) inside L0. In other words, this approach is a “mix” of STCS and LCS with two “levels”: L0 uses STCS on new SSTables, and Ln contains a run of SSTables (SSTables with disjoint ranges). When L0 reaches 1/10th (say) the size of Ln, L0 is compacted into Ln. Such a mixed approach could have lower write amplification than LCS, but because most of the data is in a run in Ln, it would have the same low space and read amplifications as in LCS.

ScyllaDB implements this idea with Incremental Compaction Strategy (ICS). AFAIK none of the other mentioned databases ( Cassandra or Rocksdb) support such “mixed” compaction.

You can read more about ICS in the Maximizing Disk Utilization with Incremental Compaction blog post and in the Incremental Compaction 2.0: A Revolutionary Space and Write Optimized Compaction Strategy blog post.

*Based on an answer on Stack Overflow by Nadav Har’El