How the tombstones work in Scylla

Hello everyone,

I’m so confused in understanding how tombstones works in scylla and will be really happy if someone could reveal my doubts.

  1. I’m using scylla version 6.1.2
  2. Tombstone “validity”
    As far as I understood to prevent data resurrection scylla uses some techniques to prevent tombstones to be removed (tombstone_gc).
    a) timeout. Uses “gc_grace_seconds” table option
    Q1: Does gc_grace_seconds is using only for tombstone_gc = timeout?
    Q2: Does gc_grace_seconds is using for something else?
    b) repair. Tombstone can be removed only if repair was done. In another word Scylla remembering last repair time and only tombstones that are older than last repair can be removed. I found the following in the documentation “For tables which use tablets repair mode is set by default.”. Tablets now are default, so I expected this will be a default mode
    Q3: but when I described the table I can see “AND tombstone_gc = {‘mode’: ‘timeout’, ‘propagation_delay_in_seconds’: ‘3600’}”. Why mode = timeout?
    Q4: what is propagation_delay_in_seconds? I can’t find any documentation about it
    c) disabled - not interested
    d) immediate
    Q5: why it’s safe to use for TWCS?
    Q6: how it’s differ from the mode = timeout and gc_grace_seconds = 0?
  3. When tombstone and underlying data can be removed? In another words: in what process tombstones and underlying data are removing.
    a) During tables compaction
    Q7: tombstone and underlying data can be removed if they are compacted together to a new SSTable, right?
    b) During tombstone compaction when only one SSTable itself got compacted to remove data.
    Q8: Does scylla has tombstone compaction? If so, what the strategy or in other words when this occurs?
    Q9: Does scylla supports nodetool garbagecollect?
    Q10: From TWCS documentation “Tombstone compaction can be enabled to remove data from partially expired SSTables, but this creates additional WA (write amplification).”. How it can be enabled?
    Q11: Does tombstone compaction enabled with tombstone_threshold, tombstone_compaction_interval and unchecked_tombstone_compaction options? Also, would like to understand more about these options.
    Q12: if I’m going to delete data with CL=ALL will tombstone created? And how to avoid tombstone creation and force scylla delete data immediately?
    Q13: Using TWCS how to delete data immediately to avoid tombstones? I have a use case when in rare cases I need to remove entire data by partition key or by partition key and clustering range. So, you’ll have tombstone in one windowed SSTable, but actual data in another windowed SSTable. According to the strategy SSTables from different windows never compacted. It means I need tombstone compaction. Currently it leads to the bad performance and it looks like scylla does not have “tombstone compaction” at all or I did something wrong,
  4. I found that in the documentation everything refer to the gc_grace_seconds, but there is no documentation about when tombstone compaction will occur if will.
  1. opensource .docs.scylladb.com/stable/kb/ttl-facts.html
  2. recent discord video about “Application tombstone” (youtube .com/watch?v=60GnK5sHgDY) also use gc_grace_seconds = 0, that’s kind of strange, because it’s does not provide control when actually data will be removed and also on that time “repair” mode already released (and maybe in their version already a default). Why they did not use it and relying on the “gc_grace_seconds” option?

Thanks to Felipe Cardeneti Mendes

scylladb-users. slack .com/archives/C2NLNBXLN/p1731423261483859?thread_ts=1731372933.542199&cid=C2NLNBXLN

Q1 : Does gc_grace_seconds is using only for tombstone_gc = timeout?

Yes

Q2 : Does gc_grace_seconds is using for something else?

No.

Q3 : but when I described the table I can see “AND tombstone_gc = {‘mode’: ‘timeout’, ‘propagation_delay_in_seconds’: ‘3600’}”. Why mode = timeout?

Might be an already fixed bug, it should be repair in this case. Did you enabled tablets in the corresponding keyspace? Either way, 6.2 should have it already changed, otherwise fill an issue.

Q4 : what is propagation_delay_in_seconds? I can’t find any documentation about it

How long after repair completes compaction is free to evict tombstones. We can’t immediately evict them, due to out of order writes.

Q5 : why it’s safe to use for TWCS?

TWCS assumes no data deletes and append-only. Thus, as soon as TTL expires compaction is free to purge tombstones. If that’s not the case, you shouldn’t be using TWCS in the first place.

Q6 : how it’s differ from the mode = timeout and gc_grace_seconds = 0?

There’s a good thelastpickle article which touches on some of the problems involving gc_grace=0, in particular related to hints replay.

Q7 : tombstone and underlying data can be removed if they are compacted together to a new SSTable, right?

Right.

Q8 : Does scylla has tombstone compaction? If so, what the strategy or in other words when this occurs?

We do. See opensource.docs. scylladb .com/stable/cql/compaction.html#common-options for the rest of the answer.

Q9 : Does scylla supports nodetool garbagecollect?

No. You can run a major, tho.

Q10 : From TWCS documentation “Tombstone compaction can be enabled to remove data from partially expired SSTables, but this creates additional WA (write amplification).”. How it can be enabled?

See link from Q8

Q11 : Does tombstone compaction enabled with tombstone_threshold, tombstone_compaction_interval and unchecked_tombstone_compaction options? Also, would like to understand more about these options.

The logic is: You configure how often you want compaction to check for SSTable eligible for tombstone compaction, you set a ratio for the single SSTable to be garbage-collected. unchecked_tombstone_compaction disables it altogether.

Q12 : if I’m going to delete data with CL=ALL will tombstone created? And how to avoid tombstone creation and force scylla delete data immediately?

Tombstone is created irrespective of your CL. Use immediate mode and they will be evicted on flush if you use CL=ALL.

Q13 : Using TWCS how to delete data immediately to avoid tombstones? I have a use case when in rare cases I need to remove entire data by partition key or by partition key and clustering range. So, you’ll have tombstone in one windowed SSTable, but actual data in another windowed SSTable. According to the strategy SSTables from different windows never compacted. It means I need tombstone compaction. Currently it leads to the bad performance and it looks like scylla does not have “tombstone compaction” at all or I did something wrong

You applied a tombstone to a different compaction window when you deleted. Compaction windows are never compacted together. You must run a major and re-asses the need for TWCS.

1 Like

Issue about default tombstone_gc mode for tables with tablets enabled default tombstone_gc mode is wrong for tables with tablets enabled · Issue #21623 · scylladb/scylladb · GitHub

1 Like