Originally from the User Slack
@Jake_LaFountainJake_LaFountain**:** I was running a repair on 6.2.1-0.20241106.a3a0ffbcd015
[shard 0:comp] compaction_manager - Compaction task 0x600012ccae00 for table system.topology_requests compaction_group=0 [0x6000054c6520]: failed: sstables::malformed_sstable_exception (Failed to read partition from SSTable /1/scylla/data/system/topology_requests-b3068525d0f239c28250584f9b1bf460/me-3guo_1qh2_4fgqo2i03wgee5oms8-big-Data.db due to Negative ttl: -1098978). Will retry in 300 seconds
and we got this. I’ve never seen this before and can’t find any results on Google either. Is there any way to re@Felipe_Cardeneti_Mendesolve this?
@Felipe_Cardeneti_Mendes**:** hmmm interesting - I think this is the closest I could find (and if its that we only really saw it on failing tests - not manifesting out in the wild) https://github.com/scylladb/scylladb/commit/1d0c6aa26f105056aab001022a0f9487850af16b
malformed_sstable_exception is effectively corruption (and sadly we had other issues in the past).
If the disks are fine, since this affect system keyspace (a node-local one), perhaps replacing the faulty node could help. As in particular there should be noth@Jake_LaFountainng to repair due to LocalStrategy
@Jake_LaFountain**:** Interesting! Some additional context: we had moved from a cluster of 5 (RF - 2) to a cluster of 4 to upgrade some drives. Adding the node back in has proved to be quite a struggle but during this last week we were able to successfully cleanup and repair the keyspace. We wanted to go up to a RF of 3, so we increased and repaired again and ran into this error on the new node.
I’ll check into the actual disks but I doubt it’s the issue here unfortunately
Replacing the node is possible I suppose but we did just do that a few days ago. Will follow up here.
@Felipe_Cardeneti_Mendes**:** yeah, sorry about that. I suppose given the same keyspace and table are involved in the above commit and your particular problem that it might be related as both the source pull request (see #21558) mention topology changes and TTL expiration as a trigger.
From what I can see the fix made it to 6.2.3 in 933ec7c so you may want to try that out - after you fix the already malformed sstable situation
and before the wanderer maintainers call me out I shall diligently write down a reminder 6.2 is no longer supported, and ask for an upgrade :))