Two nodes in my ScyllaDB cluster (which has total 4 nodes) went down because of corrupted storage. One of them was the seed node. I did a rolling restart changing the seed node among the remaining two available nodes.
But now I’m unable to remove the dead nodes from cluster. nodetool removenode command gets stuck indefnitely.
This is my nodetool status:
Datacenter: ap-south
====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 172.31.75.191 103.87 GB 256 ? 26a15884-35ee-4f05-a08b-5fb52f524de6 1a
UN 172.31.75.205 92.92 GB 256 ? d12af5c3-1cb4-43ae-b605-e0da88d6abcd 1a
DN 172.31.75.244 ? 256 ? ab5438e7-7729-4d14-9e9e-d84459525543 1a
DN 172.31.75.145 ? 256 ? eb4e084e-f61b-4ace-9250-c2c52aec1b13 1a
Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless
I ran this command:
nodetool removenode --ignore-dead-nodes ab5438e7-7729-4d14-9e9e-d84459525543 eb4e084e-f61b-4ace-9250-c2c52aec1b13
scylla logs are filled with these errors:
Feb 05 16:17:42 ip-172-31-75-205 scylla[9011]: [shard 0] cdc - Could not update CDC description table with generation (2023/08/30 16:58:04, 6b722d16-8663-48d2-abc9-6ee7a7b7fc29): exceptions::unavailable_exception (Cannot achieve consistency level for cl QUORUM. Requires 1, alive 0). Will try again.
Feb 05 16:18:40 ip-172-31-75-205 scylla[9011]: [shard 0] service_level_controller - update_from_distributed_data: failed to update configuration for more than 2160 seconds : exceptions::unavailable_exception (Cannot achieve consistency level for cl ONE. Requires 1, alive 0)
I’m unable to add any new nodes to my cluster because gossip says it can’t add new nodes until the status of any node is UNKNOWN
Is there a way I can forcefully remove the node from my cluster?