What is the impact of starting the data integrity check of the SStable file on the cluster?

I see that parameter enable_sstable_data_integrity_check can control whether to enable integrity checks on SSTables. Its default value is false, and turning it on will have an impact on performance.

I saw its implementation in the commit of sstables: introduce file interposer for integrity check. Its check occurs before writing the SSTable file, after writing the SSTable file, and after reading the SSTable file.

It should be inferred from this that it has had an impact on write performance. May I ask if it has any other impacts? Thanks!

It has a decent impact on performance.
It’s not recommended to enable it by default for a production cluster / node.
In case of a severe issue, a ScyllaDB developer may suggest to enable temporarily this in order to do some validations to the sstable.

If you want to have sstables integrity check once in a while, you can use “nodetool scrub” command with its options.

1 Like

The best way to get integrity checking for your data on disk is to enable Sstable compression, when creating your tables (can be enabled later via alter table as well). Sstable compression stores checksums next to the data in the Sstable data file, and these checksums are checked every time the data is read.

Tables have compression on by default, so unless you disabled compression for your tables when creating them, you already have integrity checking.

If you want to have sstables integrity check once in a while, you can use “nodetool scrub” command with its options.

Currently, this only checks checksums on compressed sstables We have plans to change this in the near future, such that nodetool scrub --mode=VALIDATE can be used to force a checksum check on all Sstables, compressed or not.

2 Likes

Hi!
I saw the addition of function scylla sstable scrub in 5.4, what is the difference between it and function nodetool scrub?

Currently, this only checks checksums on compressed sstables

And what is the meaning of compressed sstables?
Thanks!

I saw the addition of function scylla sstable scrub in 5.4, what is the difference between it and function nodetool scrub?

scylla-sstable scrub is just an off-line version of nodetool scrub, meaning that you don’t need a running ScyllaDB process to do the scrub. The use-case this was developed for is when sstables in a backup are found to be corrupt and the backup cannot be restored because ScyllaDB refuses the sstables. In this case, the sstables can be fixed, before loading them to ScyllaDB.

And what is the meaning of compressed sstables?

Simply, sstables for tables, for which sstable-compression is enabled. You can tell whether an sstable is compressed or not, by checking whether the CompressionInfo.db component file exists or not.

1 Like