New(?) Data Resurrection Without Cleanups

GarrettPoore · September 19, 2023, 6:34pm

There was a change in the Adding a New Node Into an Existing ScyllaDB Cluster (Out Scale) documentation for 5.2, that says there is a chance of data resurrection if cleanups are not run in a timely manner.

Is this related to the data resurrection for not repairing often enough, or are they unrelated? My understanding in the past was that cleanups were just to recover disk space, so just trying to understand what/if anything changed with it or the full risks of not doing it between node additions and removals.

Botond_Denes · September 20, 2023, 1:01pm

The primary goal of cleanup is avoiding data resurrection. Consider a write W1 which is written to a node, N1. After a new node Nx is added, N1 no longer owns W1. If no cleanup is run, this data will stay on N1. Some time down the line W1 is deleted and the tombstone is garbage collected. Then at one point Nx is removed from the cluster and the ownership of W1 comes back to N1. Remember that this write was deleted earlier, but the tombstone was garbage collected. Since there is currently no newer entry for W1, the old value N1 has, becomes the latest value and therefore it is resurrected.
To avoid this we run cleanup, which ensures that no such stale data lingers on nodes after token movement. Freeing up disk space is a secondary, albeit also important aspect.

GarrettPoore · September 20, 2023, 1:53pm

Thanks for the info Botond. I assume this has always been a possibility then, and not something new?

This has some impact to our process around AWS node decommissions that we’ll have to figure out. AWS gives around 2 weeks notice that a node will be decommissioned, so we bootstrap a new node into the cluster, decommission the old one, then worry about cleanups. With this process we would need to insert the cleanups in the middle and need to get it done within the 2 week window, which could be cutting it close on some clusters (our largest has 39 nodes).

Do you have any recommendation on how you would handle that situation? Decommissioning the node first would fix it, though then we have to undersize the cluster for a short window instead of oversizing it.

Botond_Denes · September 20, 2023, 2:11pm

Doing bootstrap + decomission in quick succession, then doing cleanup after should be fine. Just don’t delay running cleanup too long.

Also, maybe look into replace operation. With replace, no cleanup is needed, although it has its drawback in that the cluster temporarily looses a replica, while the replace is going on, so read QUORUMs are more susceptible to failing.

GarrettPoore · September 20, 2023, 2:27pm

Interesting, I’m fairly sure that is new as well. We used to use the dead node replacement procedure, but ran into issues with data loss (on 4.X, before RBNO was enabled for replace actions). I can’t find the old docs, but I have notes saying the bootstrap/decommission was recommended at the time, so we switched to that.

If the replace is considered as a valid alternative then we may switch back. Thank you.

Botond_Denes · September 21, 2023, 7:10am

Yes, replace was made safe by using RBNO for it. This change was made in 4.6, where we enabled RBNO by default for replace. See ScyllaDB Open Source 4.6 - ScyllaDB.

Topic		Replies	Views
Replacing disks in running nodes, cleanup and what is the correct process? ScyllaDB administration , nodetool	0	10	February 27, 2025
Rebuilt Node Missing from Raft State, Not Starting ScyllaDB troubleshooting , administration , nodetool , raft	9	216	April 24, 2025
What are the options for replacing disks on nodes of a running cluster? ScyllaDB administration , consistency	0	15	January 29, 2025
Recovery after data loss, how do I clear data for new install? ScyllaDB troubleshooting	0	147	March 18, 2024
Reshape during node restart using too much disk space? ScyllaDB	2	255	January 11, 2023

New(?) Data Resurrection Without Cleanups

Related topics