Why flush_hints when gc_mode is repair?

bo_li · May 31, 2024, 4:38am

I saw in repair.cc that flush_hints are required when tombstone_gc_mode is repair.
When writing fails, request data will be recorded in the hints directory. Does flush_hints flush these records to disk?

Botond_Denes · June 3, 2024, 5:41am

Flushing hints and batchlog when repairing with tombstone_gc is required to prevent any data in either causing data resurrection. With tombstone_gc, any tombstone that was written before the last repair, can be garbage-collected, so it is important to include any data that is in the batchlog or in hints in the repair, so any tombstone that might shadow them can take effect.

bo_li · June 25, 2024, 9:45am

Specifically, to ensure that no data resurrection occurs in which scenario? I found that not performing the flush_hints operation has no effect, as shown in the following figure. Because in order to clear the tombstone, all replicas must participate in the repair. So even without flush_hints, data that fails to be written will still be repaired, including tombstones.

Botond_Denes · June 25, 2024, 10:01am

Hints can contain data which is deleted by a tombstone. Repair runs, the tombstone is GC’d, then hints are replayed and data is resurrected. This is why hints have to be replayed before repair.

bo_li · July 25, 2024, 8:09am

If coordinator (A) fails to write a=1 to node B, then A records the hints. After a period of time, there is a request to delete a=1 and write it. If the deletion fails, hints will also be remembered. And then the unified flush_hints have no effect. If the write is successful, it means B is alive, and the hints will also be sent over. If the write is successful, it means B is alive, and the hints will also be sent over. This won’t have any impact, right? So why do we have to flushvints?

Botond_Denes · July 25, 2024, 8:17am

Flushing hints is a safety mechanism, in case some hints linger on the nodes. Normally, there wouldn’t be any, this is to cover corner cases.

bo_li · July 25, 2024, 10:00am

I can’t figure it out no matter what. Can you give me a simple example? Thank you.
In other words, what are the consequences of not executing flush_hints? How is data resurrected?

Botond_Denes · August 4, 2024, 10:10am

Taking your example from:

Node (A) may miss the delete a. It may not notice (B) becoming online. Sending the hints may fail. These are unlikely but it is very hard to prove they cannot happen. So we usually program defensively and want to make sure there are no hints left on any node after a repair, so data resurrection is off the table.

Topic		Replies	Views
The tombston_gc={'mode': 'repair'} option doesn't remove tombstones ScyllaDB open-source , nodetool , tombstone	4	208	April 22, 2024
Can we use tombstone_gc = {'mode': 'repair'} for GSI tables? ScyllaDB repair , tombstone , secondary-index	7	414	May 20, 2024
Hinted Handoff with gc_mode = immediate ScyllaDB repair , tombstone , hinted-handoff	4	64	December 20, 2024
How the tombstones work in Scylla ScyllaDB twcs , tombstone	2	246	November 18, 2024
Tombstones, gc_grace_seconds, propagation_delay_in_seconds, TWCS, compaction and related questions ScyllaDB data-model , compaction , twcs , tombstone	0	82	December 4, 2024

Why flush_hints when gc_mode is repair?

Related topics