I attempted to set up a ScyllaDB using three nodes, but it's not working as expected

bloomingFlower · June 19, 2024, 8:14am

I am trying to set up a Scylla cluster on three separate servers, following this course:

courses/s905-scylla-university-live-2024-essentials-track/lessons/scylla-essentials/topic/consistency-level-demo-part-1

Server 1(10.0.0.81)

$ sudo docker run --name Node_X -d scylladb/scylla:latest --overprovisioned 1 --smp 1

Server 2(10.0.0.147)

$ sudo docker run --name Node_Y -d scylladb/scylla:latest --seeds=10.0.0.81 --overprovisioned 1 --smp 1

Server3(10.0.0.18)

$ sudo docker run --name Node_Y -d scylladb/scylla:latest --seeds=10.0.0.81 --overprovisioned 1 --smp 1

However, I encountered the following results and here are the logs for each node. Could you help me identify the problem?

Server1

$ sudo docker exec -it Node_X nodetool status  
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address    Load      Tokens Owns Host ID                              Rack 
UN 172.17.0.2 191.92 KB 256    ?    c1b91e90-3db0-4749-b1d8-e92707d6db49 rack1

$ sudo docker logs <containerID> 
...
INFO  2024-06-19 05:30:43,127 [shard 0:strm] load_balancer - Prepared 0 migration plans, out of which there were 0 tablet migration(s) and 0 resize decision(s)
INFO  2024-06-19 05:31:43,126 [shard 0:strm] raft_topology - raft topology: Refreshing table load stats for DC datacenter1 that has 1 endpoints
INFO  2024-06-19 05:31:43,127 [shard 0:strm] load_balancer - Examining DC datacenter1
INFO  2024-06-19 05:31:43,127 [shard 0:strm] load_balancer - Node c1b91e90-3db0-4749-b1d8-e92707d6db49: rack=rack1 avg_load=0, tablets=0, shards=1, state=normal
INFO  2024-06-19 05:31:43,127 [shard 0:strm] load_balancer - Prepared 0 migrations in DC datacenter1
INFO  2024-06-19 05:31:43,127 [shard 0:strm] load_balancer - Prepared 0 migration plans, out of which there were 0 tablet migration(s) and 0 resize decision(s)
INFO  2024-06-19 05:32:43,127 [shard 0:strm] raft_topology - raft topology: Refreshing table load stats for DC datacenter1 that has 1 endpoints
INFO  2024-06-19 05:32:43,127 [shard 0:strm] load_balancer - Examining DC datacenter1
INFO  2024-06-19 05:32:43,127 [shard 0:strm] load_balancer - Node c1b91e90-3db0-4749-b1d8-e92707d6db49: rack=rack1 avg_load=0, tablets=0, shards=1, state=normal
INFO  2024-06-19 05:32:43,127 [shard 0:strm] load_balancer - Prepared 0 migrations in DC datacenter1
INFO  2024-06-19 05:32:43,127 [shard 0:strm] load_balancer - Prepared 0 migration plans, out of which there were 0 tablet migration(s) and 0 resize decision(s)
INFO  2024-06-19 05:33:43,127 [shard 0:strm] raft_topology - raft topology: Refreshing table load stats for DC datacenter1 that has 1 endpoints
INFO  2024-06-19 05:33:43,127 [shard 0:strm] load_balancer - Examining DC datacenter1
INFO  2024-06-19 05:33:43,127 [shard 0:strm] load_balancer - Node c1b91e90-3db0-4749-b1d8-e92707d6db49: rack=rack1 avg_load=0, tablets=0, shards=1, state=normal
INFO  2024-06-19 05:33:43,127 [shard 0:strm] load_balancer - Prepared 0 migrations in DC datacenter1
INFO  2024-06-19 05:33:43,127 [shard 0:strm] load_balancer - Prepared 0 migration plans, out of which there were 0 tablet migration(s) and 0 resize decision(s)
INFO  2024-06-19 05:34:43,128 [shard 0:strm] raft_topology - raft topology: Refreshing table load stats for DC datacenter1 that has 1 endpoints
INFO  2024-06-19 05:34:43,138 [shard 0:strm] load_balancer - Examining DC datacenter1
INFO  2024-06-19 05:34:43,138 [shard 0:strm] load_balancer - Node c1b91e90-3db0-4749-b1d8-e92707d6db49: rack=rack1 avg_load=0, tablets=0, shards=1, state=normal
INFO  2024-06-19 05:34:43,138 [shard 0:strm] load_balancer - Prepared 0 migrations in DC datacenter1
INFO  2024-06-19 05:34:43,138 [shard 0:strm] load_balancer - Prepared 0 migration plans, out of which there were 0 tablet migration(s) and 0 resize decision(s)
INFO  2024-06-19 05:35:43,132 [shard 0:strm] raft_topology - raft topology: Refreshing table load stats for DC datacenter1 that has 1 endpoints

Server2

$ sudo docker logs <containerID> 
...
INFO  2024-06-19 05:31:04,322 [shard 0:main] repair - Loading repair history for keyspace=system, table=scylla_table_schema_history, table_uuid=0191a53e-40f0-31d4-9171-b0d19ffb17b4
INFO  2024-06-19 05:31:04,322 [shard 0:main] repair - Loading repair history for keyspace=system, table=tablets, table_uuid=fd4f7a46-96bd-3e73-91bf-99eb77e82a5c
INFO  2024-06-19 05:31:04,324 [shard 0:main] raft_group0 - Disabling migration_manager schema pulls because Raft is enabled and we're bootstrapping.
INFO  2024-06-19 05:31:04,325 [shard 0:strm] messaging_service - Starting Messaging Service on address 172.17.0.2 port 7000
INFO  2024-06-19 05:31:04,326 [shard 0:strm] storage_service - entering STARTING mode
INFO  2024-06-19 05:31:04,326 [shard 0:strm] storage_service - Loading persisted ring state
INFO  2024-06-19 05:31:04,327 [shard 0:strm] storage_service - initial_contact_nodes={10.0.0.81}, loaded_endpoints=[], loaded_peer_features=0

Server 3

$ sudo docker logs <containerID> 
...
INFO  2024-06-19 05:31:30,603 [shard 0:main] repair - Loading repair history for keyspace=system, table=tablets, table_uuid=fd4f7a46-96bd-3e73-91bf-99eb77e82a5c
INFO  2024-06-19 05:31:30,606 [shard 0:main] raft_group0 - Disabling migration_manager schema pulls because Raft is enabled and we're bootstrapping.
INFO  2024-06-19 05:31:30,607 [shard 0:strm] messaging_service - Starting Messaging Service on address 172.17.0.2 port 7000
INFO  2024-06-19 05:31:30,608 [shard 0:strm] storage_service - entering STARTING mode
INFO  2024-06-19 05:31:30,608 [shard 0:strm] storage_service - Loading persisted ring state
INFO  2024-06-19 05:31:30,610 [shard 0:strm] storage_service - initial_contact_nodes={10.0.0.81}, loaded_endpoints=[], loaded_peer_features=0

Guy · June 19, 2024, 8:59am

Usually it takes a minute or so for the the nodes to connect etc.
Did you try waiting a bit and running the nodetool status command again?

bloomingFlower · June 19, 2024, 9:14am

I have set up the cluster and waited for 3 hours, but the status remains the same. The logs of each node indicate that the connections are established correctly, but only one node is visible in the status

mtillberg · June 24, 2024, 1:32pm

The issue is that you’re doing this on separate standalone docker nodes. To get this to work, you’ll need to export the ports used by ScyllaDB. At a minimum, you should add -p 7000:7000 to allow the nodes to talk to each other. ScyllaDB’s ports are listed at its Administration Guide, which I can’t link to.

My current compose file for this doing docker testing with separate standalone docker hosts looks like:

services:
  scylla:
    container_name: scylla
    hostname: scylla
    extra_hosts:
      - "scylla:10.10.0.85"
    image: scylladb/scylla:5.4
    ports:
      - 7000:7000
      - 7001:7001
      - 9042:9042
      - 10000:10000
    network_mode: "host"

I don’t claim this is a best practice setup, but it ended up working for testing. I ended up using host mode networking rather than port forwarding. The hostname/extra_hosts combo is necessary because of how scylla decides what IPs to listen on, it needs to be able to resolve its hostname to the external IP address of the docker host its running on, not the internal docker assigned IP. The port section is unnecessary in host mode, it’s there for documentation.

Topic		Replies	Views
Nodes not joining a cluster, incrementally adding nodes to a cluster ScyllaDB nodetool	1	569	November 13, 2024
Cluster stuck at initialization phase ScyllaDB open-source , troubleshooting	2	1522	October 30, 2023
Error in setup scyllaDB ( runtime error: No nodes present in the cluster. Has this node finished starting up? ) ScyllaDB troubleshooting	1	738	August 24, 2023
Getting Scylla cluster started with docker (Noob Questions) ScyllaDB open-source	4	1108	September 22, 2023
ScyllaDB is not starting on a private subnet server ScyllaDB	1	287	October 9, 2023