Installation details
#ScyllaDB version: 6.1.3
#Cluster size: 7 nodes (6 nodes + 1 nodes in different DCs)
os (RHEL/CentOS/Ubuntu/AWS AMI): Ubuntu 22
Today, I finally started enabling raft, by hitting the API-endpoint with curl
curl -X POST "http://curhost:10000/storage_service/raft_topology/upgrade"
Now, as nothing happened, I did a rolling restart, which in the end resulted in
Nov 27 10:32:24 o-2 scylla[632625]: [shard 0:strm] raft_topology - waiting for all nodes to finish upgrade to raft schema
The last node, however, wasn’t able to finish and instead is spitting out
Nov 27 13:18:25 o-backup scylla[3576536]: [shard 0:strm] raft_group0_upgrade - future<> service::raft_group0::wait_for_all_nodes_to_finish_upgrade(abort_source &): failed to resolve IP addresses of some of the cluster members ([e16a9c96-d8a0-47fe-8044-37be077f45b9, 8a627941-2f40-47ad-8e5d-6f6e891ab85d, d728fc9d-81ca-4f34-ab5b-3b0858144c61])
Nov 27 13:18:25 o-backup scylla[3576536]: [shard 0:strm] raft_group0_upgrade - future<> service::raft_group0::wait_for_all_nodes_to_finish_upgrade(abort_source &): sleeping for 16s seconds before retrying..
Now, to get rid off those unavailable, not replacable ghost nodes, I tried using
removenode --ignore-dead-nodes 8a627941-2f40-47ad-8e5d-6f6e891ab85d,d728fc9d-81ca-4f34-ab5b-3b0858144c61,e16a9c96-d8a0-47fe-8044-37be077f45b9 e16a9c96-d8a0-47fe-8044-37be077f45b9
but that only results in
error executing POST request to http://localhost:10000/storage_service/remove_node with parameters {"ignore_nodes": "8a627941-2f40-47ad-8e5d-6f6e891ab85d,d728fc9d-81ca-4f34-ab5b-3b0858144c61,e16a9c96-d8a0-47fe-8044-37be077f45b9", "host_id": "e16a9c96-d8a0-47fe-8044-37be077f45b9"}: remote replied with status code 500 Internal Server Error:
std::runtime_error (removenode is not allowed at this time - the node is still in the process of upgrading to raft topology)
So, the question is: How do I get rid off those ghost nodes?