Add node even when seed is unreachable

Hi,
Is there any way to add a new node to the cluster even when one of it’s configured seeds isn’t reachable but the rest are? e.g. typo…

Let’s try it out…

% cat unreachable-seeds.yml 
version: "3"

services:
  scylla-node1:
    container_name: node1
    image: scylladb/scylla:5.2.10
    restart: always
    command: --seeds=node1 --smp 1 --memory 1G --overprovisioned 1 --api-address 0.0.0.0 --endpoint-snitch GossipingPropertyFileSnitch
    networks:
      - web
    volumes: 
      - ./cassandra-rackdc.properties:/etc/scylla/cassandra-rackdc.properties
    healthcheck:
      test: ["CMD-SHELL", "sh -c $(curl -s -X GET --header 'Accept: application/json' 'http://localhost:10000/storage_service/native_transport')"]
      interval: 30s
      timeout: 10s
      retries: 5

  scylla-node2:
    container_name: node2
    image: scylladb/scylla:5.2.10
    restart: always
    command: --seeds=node1 --smp 1 --memory 1G --overprovisioned 1 --api-address 0.0.0.0 --endpoint-snitch GossipingPropertyFileSnitch
    networks:
      - web
    volumes: 
      - ./cassandra-rackdc.properties:/etc/scylla/cassandra-rackdc.properties
    healthcheck:
      test: ["CMD-SHELL", "sh -c $(curl -s -X GET --header 'Accept: application/json' 'http://localhost:10000/storage_service/native_transport')"]
      interval: 30s
      timeout: 10s
      retries: 5
    depends_on:
      scylla-node1:
        condition: service_healthy

  scylla-node3:
    container_name: node3
    image: scylladb/scylla:5.2.10
    restart: always
    command: --seeds=172.17.0.155,172.22.0.156,node2,172.21.0.50,172.21.0.1  --smp 1 --memory 1G --overprovisioned 1 --api-address 0.0.0.0 --endpoint-snitch GossipingPropertyFileSnitch
    networks:
      - web
    volumes: 
      - ./cassandra-rackdc.properties:/etc/scylla/cassandra-rackdc.properties
    healthcheck:
      test: ["CMD-SHELL", "sh -c $(curl -s -X GET --header 'Accept: application/json' 'http://localhost:10000/storage_service/native_transport')"]
      interval: 30s
      timeout: 10s
      retries: 5
    depends_on:
      scylla-node2:
        condition: service_healthy

networks:
  web:
    driver: bridge

In the above configuration, scylla-node3 has 4 invalid seeds: 3 from different subnets and unreachable, one from the local network (which is the GW), and node2 (the only valid entry). Note we haven’t specified an invalid FQDN here, as that would fail the DNS lookup.

% docker compose -f unreachable.yml up -d
[+] Running 2/3
 ⠿ Container node1  Healthy                                                                                                                                                                                                                                                      32.0s
 ⠿ Container node2  Healthy                                                                                                                                                                                                                                                      152.7s
 ⠿ Container node3  Started                                                                                                                                                                                                                                                       153.0s

Then:

% docker exec -it node3 nodetool status
Datacenter: dc
==============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns    Host ID                               Rack
UN  172.21.0.4  624 KB     256          ?       1c0dff39-e5f2-4960-98c4-e1838f6cf673  r1
UN  172.21.0.3  ?          256          ?       67088527-d288-4ee3-8e0c-7f1af9258669  r1
UN  172.21.0.2  260 KB     256          ?       acaf2b17-c66a-4034-a083-644a775a931a  r1

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless

Hope it helps!

Thats great, when I originally tried it I failed on DNS lookup which wasn’t obvious from the logs.
Thanks!

I will quickly note for anyone visiting this topic in the future, that this is still a problem in the context of the first node of the cluster.
see this issue.