Add node even when seed is unreachable

ronharel · November 27, 2023, 11:02am

Hi,
Is there any way to add a new node to the cluster even when one of it’s configured seeds isn’t reachable but the rest are? e.g. typo…

felipemendes · November 27, 2023, 12:37pm

Let’s try it out…

% cat unreachable-seeds.yml 
version: "3"

services:
  scylla-node1:
    container_name: node1
    image: scylladb/scylla:5.2.10
    restart: always
    command: --seeds=node1 --smp 1 --memory 1G --overprovisioned 1 --api-address 0.0.0.0 --endpoint-snitch GossipingPropertyFileSnitch
    networks:
      - web
    volumes: 
      - ./cassandra-rackdc.properties:/etc/scylla/cassandra-rackdc.properties
    healthcheck:
      test: ["CMD-SHELL", "sh -c $(curl -s -X GET --header 'Accept: application/json' 'http://localhost:10000/storage_service/native_transport')"]
      interval: 30s
      timeout: 10s
      retries: 5

  scylla-node2:
    container_name: node2
    image: scylladb/scylla:5.2.10
    restart: always
    command: --seeds=node1 --smp 1 --memory 1G --overprovisioned 1 --api-address 0.0.0.0 --endpoint-snitch GossipingPropertyFileSnitch
    networks:
      - web
    volumes: 
      - ./cassandra-rackdc.properties:/etc/scylla/cassandra-rackdc.properties
    healthcheck:
      test: ["CMD-SHELL", "sh -c $(curl -s -X GET --header 'Accept: application/json' 'http://localhost:10000/storage_service/native_transport')"]
      interval: 30s
      timeout: 10s
      retries: 5
    depends_on:
      scylla-node1:
        condition: service_healthy

  scylla-node3:
    container_name: node3
    image: scylladb/scylla:5.2.10
    restart: always
    command: --seeds=172.17.0.155,172.22.0.156,node2,172.21.0.50,172.21.0.1  --smp 1 --memory 1G --overprovisioned 1 --api-address 0.0.0.0 --endpoint-snitch GossipingPropertyFileSnitch
    networks:
      - web
    volumes: 
      - ./cassandra-rackdc.properties:/etc/scylla/cassandra-rackdc.properties
    healthcheck:
      test: ["CMD-SHELL", "sh -c $(curl -s -X GET --header 'Accept: application/json' 'http://localhost:10000/storage_service/native_transport')"]
      interval: 30s
      timeout: 10s
      retries: 5
    depends_on:
      scylla-node2:
        condition: service_healthy

networks:
  web:
    driver: bridge

In the above configuration, scylla-node3 has 4 invalid seeds: 3 from different subnets and unreachable, one from the local network (which is the GW), and node2 (the only valid entry). Note we haven’t specified an invalid FQDN here, as that would fail the DNS lookup.

% docker compose -f unreachable.yml up -d
[+] Running 2/3
 ⠿ Container node1  Healthy                                                                                                                                                                                                                                                      32.0s
 ⠿ Container node2  Healthy                                                                                                                                                                                                                                                      152.7s
 ⠿ Container node3  Started                                                                                                                                                                                                                                                       153.0s

Then:

% docker exec -it node3 nodetool status
Datacenter: dc
==============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns    Host ID                               Rack
UN  172.21.0.4  624 KB     256          ?       1c0dff39-e5f2-4960-98c4-e1838f6cf673  r1
UN  172.21.0.3  ?          256          ?       67088527-d288-4ee3-8e0c-7f1af9258669  r1
UN  172.21.0.2  260 KB     256          ?       acaf2b17-c66a-4034-a083-644a775a931a  r1

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless

Hope it helps!

ronharel · November 27, 2023, 1:04pm

Thats great, when I originally tried it I failed on DNS lookup which wasn’t obvious from the logs.
Thanks!

ronharel · December 4, 2023, 12:26pm

I will quickly note for anyone visiting this topic in the future, that this is still a problem in the context of the first node of the cluster.
see this issue.

Topic		Replies	Views
Cannot remove unreachable dead nodes from my cluster ScyllaDB open-source , troubleshooting	1	619	February 7, 2024
Scylla cluster membership issue after failed change ScyllaDB open-source , troubleshooting , multi-dc	18	1337	March 29, 2024
After Successfully Install ScyllaDB Enterprise on UBUNTU, we are trying to update configuration file scylla.yaml and after Saving the below configuaration, we tried to restart Scylla but getting error ScyllaDB	5	305	December 4, 2023
UnavailableException (x required but only 0 alive) ScyllaDB cassandra , troubleshooting , migration	3	132	February 23, 2024
Error on removing dead nodes using nodetool removenode ScyllaDB troubleshooting , nodetool	0	117	April 9, 2024

Add node even when seed is unreachable

Related topics