Data size gets larger after a compaction - Is this normal?

I found some compaction log records like this:

compaction … 1GB to 1GB (~102% of original) in 140469ms = 9MB/s. ~10368 total partitions merged to 2459.

I thought it was some rounding issue because compacted data should be smaller than the original. So I checked the nodetool compactionhistory. And it shows the data size indeed gets larger after compaction. In the worst case, it can
get 20% larger.

The table in question is actually a materialized view. The workload is very overwrite heavy. It’s using the default STCS with zstd compression with 128k chunk size. I tried changing the compaction class to LCS and did a major compaction. But the results are the same.

The only way I can see this happening is if the output data of the compaction compresses worst than the input data did.
There is no way compaction produces more data then its input is.

1 Like