Installation details
#ScyllaDB version: 5.2.18
#Cluster size: 35
os (RHEL/CentOS/Ubuntu/AWS AMI): centos
I have a 35 node 5 replica cluster and the traffic pattern involves a lot of writes and deletes through alternator. When there are a large number of tombstones in the cluster, cluster regular_compaction cannot clean up the tombstones in time. So I executed nodetool compact on each node one by one to clean up the tombstones. During this time, repair is still being executed. I found that the capacity of the node dropped to 15% after major_compact was executed, but after 5 days the capacity recovered to about 45%.
After dumping some sstbale, I found that a large number of tombstones that were originally cleared existed again. I suspect that after my A node executed major_compaction, other nodes have not yet executed it. At this time, repair repaired the tombstone. This cycle continues and the tombstone cannot be cleared away.
Do you have any suggestions? Should I stop repair first and then perform compaction on each node again?