Why bootstrap on new server with existing data folder?

Hi,
I replaced the hardware of a scylla server yesterday.

The complete data folder was backed up and put on the new server.

Nevertheless scylla was complaining about failed bootstrap because the IP already exists.

Does scylla read out some kind of hardware-id to detect such situations? I fail to understand why scylla did not just start up - like in a regular restart. How did it know its a new server?

cheers,
Christian

If you haven’t replaced the node by another node with the same IP, and you have backed up and restored all keyspaces, including the system keyspace, the node should have indeed be able to rejoin the cluster, as the node’s host_id is kept in system.local. We’ll probably need to look at the logs and try to figure out what’s the reason for that. Best if you could open an issue on github to collect the logs.

The only thing I can imagine is that the old node was started back up after the snapshot was taken.

Additionally the new node was started with scylla 5.2, while before it was 5.1. I would expect that should make no difference.

May 30 11:15:06 ad4 scylla[102116]:  [shard 0] init - Scylla version 5.2.18-0.20240419.dae9bef75f66 with build-id 368c0eb734d341c0b42dd3cab4af6690875b45b1 starting ...
May 30 11:15:06 ad4 scylla[102116]:  [shard 0] init - starting prometheus API server
May 30 11:15:06 ad4 scylla[102116]:  [shard 0] init - creating snitch
May 30 11:15:06 ad4 scylla[102116]:  [shard 0] init - starting tokens manager
May 30 11:15:06 ad4 scylla[102116]:  [shard 0] init - starting effective_replication_map factory
May 30 11:15:06 ad4 scylla[102116]:  [shard 0] init - starting migration manager notifier
May 30 11:15:06 ad4 scylla[102116]:  [shard 0] init - starting lifecycle notifier
May 30 11:15:06 ad4 scylla[102116]:  [shard 0] init - creating tracing
May 30 11:15:06 ad4 scylla[102116]:  [shard 0] init - starting API server
May 30 11:15:06 ad4 scylla[102116]:  [shard 0] init - Scylla API server listening on 10.2.26.7:10000 ...
May 30 11:15:06 ad4 scylla[102116]:  [shard 0] service_level_controller - update_from_distributed_data: starting configuration polling loop
May 30 11:15:06 ad4 scylla[102116]:  [shard 0] init - starting system keyspace
May 30 11:15:06 ad4 scylla[102116]:  [shard 0] init - starting gossiper
May 30 11:15:06 ad4 scylla[102116]:  [shard 0] init - seeds={10.1.0.134, 10.1.0.136, 10.2.26.4, 10.2.26.5}, listen_address=10.2.26.7, broadcast_address=10.2.26.7
May 30 11:15:06 ad4 scylla[102116]:  [shard 0] init - starting Raft address map
May 30 11:15:06 ad4 scylla[102116]:  [shard 0] init - starting direct failure detector pinger service
May 30 11:15:06 ad4 scylla[102116]:  [shard 0] init - starting direct failure detector service
May 30 11:15:06 ad4 scylla[102116]:  [shard 0] init - initializing storage service
May 30 11:15:06 ad4 scylla[102116]:  [shard 0] storage_service - Started node_ops_abort_thread
May 30 11:15:06 ad4 scylla[102116]:  [shard 1] storage_service - Started node_ops_abort_thread
May 30 11:15:06 ad4 scylla[102116]:  [shard 2] storage_service - Started node_ops_abort_thread
May 30 11:15:06 ad4 scylla[102116]:  [shard 3] storage_service - Started node_ops_abort_thread
May 30 11:15:06 ad4 scylla[102116]:  [shard 5] storage_service - Started node_ops_abort_thread
May 30 11:15:06 ad4 scylla[102116]:  [shard 4] storage_service - Started node_ops_abort_thread
May 30 11:15:06 ad4 scylla[102116]:  [shard 0] init - starting per-shard database core
May 30 11:15:06 ad4 scylla[102116]:  [shard 0] init - creating and verifying directories
May 30 11:15:06 ad4 scylla[102116]:  [shard 0] init - starting compaction_manager
May 30 11:15:06 ad4 scylla[102116]:  [shard 0] init - starting database
May 30 11:15:06 ad4 scylla[102116]:  [shard 0] compaction_manager - Set unlimited compaction bandwidth
May 30 11:15:06 ad4 scylla[102116]:  [shard 0] init - loading system sstables
May 30 11:15:06 ad4 scylla[102116]:  [shard 0] database - Populating Keyspace system

...

May 30 11:15:11 ad4 scylla[102116]:  [shard 0] init - starting schema commit log
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] commitlog - Cannot parse the version of the file: CommitLog-2-16733952.log
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] commitlog - Cannot parse the version of the file: CommitLog-2-36028797035697921.log
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] commitlog - Cannot parse the version of the file: CommitLog-2-90071992564143873.log
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] commitlog - Cannot parse the version of the file: CommitLog-2-18014398526215937.log
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] commitlog - Cannot parse the version of the file: CommitLog-2-54043195545179905.log
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] commitlog - Cannot parse the version of the file: CommitLog-2-72057594054661889.log
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] commitlog - Cannot parse the version of the file: CommitLog-2-36028797035697922.log
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] commitlog - Cannot parse the version of the file: CommitLog-2-16733953.log
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] commitlog - Cannot parse the version of the file: CommitLog-2-72057594054661890.log
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] commitlog - Cannot parse the version of the file: CommitLog-2-54043195545179906.log
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] commitlog - Cannot parse the version of the file: CommitLog-2-90071992564143874.log
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] commitlog - Cannot parse the version of the file: CommitLog-2-36028797035697923.log
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] schema_tables - Schema version changed to 59adb24e-f3cd-3e02-97f0-5b395827453f
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] init - loading non-system sstables
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] database - Skipping undefined keyspace: system_traces
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] database - Skipping undefined keyspace: system_distributed
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] database - Skipping undefined keyspace: system_distributed_everywhere
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] database - Skipping undefined keyspace: system_auth
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] database - Skipping undefined keyspace: pc
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] init - starting view update generator
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] init - starting commit log
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] commitlog - Cannot parse the version of the file: SchemaLog-2-16733952.log
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] commitlog - Cannot parse the version of the file: SchemaLog-2-16733953.log
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] init - initializing migration manager RPC verbs

...
May 30 11:15:11 ad4 scylla[102116]:  [shard 3] repair - Loading repair history for keyspace=system, table=compaction_history, table_uuid=b4dbb7b4-dc49-3fb5-b3bf-ce6e434832ca
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] init - starting CDC Generation Management service
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] init - starting CDC log service
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] init - starting storage service
May 30 11:15:11 ad4 systemd[1]: Started Scylla Server.
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] init - starting sstables loader
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] messaging_service - Starting Messaging Service on port 7000
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] storage_service - entering STARTING mode
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] storage_service - Loading persisted ring state
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] storage_service - initial_contact_nodes={10.2.26.5, 10.2.26.4, 10.1.0.136, 10.1.0.134}, loaded_endpoints={}, loaded_peer_features=0
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] storage_service - Checking remote features with gossip
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] gossip - Gossip shadow round started with nodes={10.2.26.5, 10.2.26.4, 10.1.0.136, 10.1.0.134}
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] gossip - Gossip shadow round finished with nodes_talked={10.1.0.136, 10.1.0.134, 10.2.26.4, 10.2.26.5}
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] gossip - Feature check passed. Local node 10.2.26.7 features = {AGGREGATE_STORAGE_OPTIONS, ALTERNATOR_TTL, CDC, CDC_GENERATIONS_V2, COLLECTION_INDEXING, COMPUTED_COLUMNS, CORRECT_CO>
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] init - Shutting down group 0 service

...
lots of shutdown messages
...
May 30 11:15:11 ad4 scylla[102116]:  [shard 0] init - Startup failed: std::runtime_error (A node with address 10.2.26.7 already exists, cancelling join. Use replace_address if you want to replace this node.)

There is not enough information in the log excerpt about the node’s host_id.
We’s also need nodetool gossipinfo and/or logs from the other nodes to see what they think this endpoint’s HOST_ID is.
@kbr anything else?

Maybe it tried to use a different data directory due to a different scylla.yaml configuration file. Did you copy the config file as well?

Also, there are multiple data directories and a commitlog directory. Make sure you copied all of them.

The fact that you upgraded Scylla version might also be problematic. Do one thing at a time – either upgrade version or hardware, not both at the same time. If you want to upgrade from 5.1 to 5.2 then follow the documented rolling upgrade procedure, keeping the hardware static.

All of that said, I haven’t found whatever you’re doing as a documented procedure in our docs, so you’re doing something that is not generally supported by Scylla, so we don’t test it, don’t be surprised if it doesn’t work.