Nodes not joining a cluster, incrementally adding nodes to a cluster

Guy · July 14, 2024, 1:38pm

Originally from the User Slack

@Pham_William: Hey everyone, setting up a test cluster in a virtual environment, and I have a few questions.
Based on the configuration guide for a single data center: https://opensource.docs.scylladb.com/stable/operating-scylla/procedures/cluster-management/create-cluster.html

I just have 2 nodes:
| scylladb | RUNNING | 10.70.188.92 (eth0) | fd42:ab78:d54:2140:216:3eff:fe59:33a7 (eth0) | CONTAINER | 0 |
±----------±--------±---------------------±---------------------------------------------±----------------±----------+
| scylladb2 | RUNNING | 10.70.188.39 (eth0) | fd42:ab78:d54:2140:216:3eff:feeb:2be7 (eth0) | CONTAINER | 0 |

scylladb (10.70.188.92 - seed):
root@scylladb:~/server_dotfiles# cat /etc/scylla/scylla.yaml|grep seeds
          - seeds: "10.70.188.92"
root@scylladb:~/server_dotfiles# cat /etc/scylla/scylla.yaml|grep rpc_address
rpc_address: 10.70.188.92
root@scylladb:~/server_dotfiles# cat /etc/scylla/scylla.yaml|grep listen_address
listen_address: 10.70.188.92
root@scylladb:~/server_dotfiles# cat /etc/scylla/scylla.yaml|grep endpoint_snitch
endpoint_snitch: GossipingPropertyFileSnitch
root@scylladb:~# cat /etc/scylla/cassandra-rackdc.properties
prefer_local=true
dc=datacenter1
rack=rack1
scylladb2 (10.70.188.39):
root@scylladb2:~/server_dotfiles# cat /etc/scylla/scylla.yaml|grep seeds
          - seeds: "10.70.188.92"
root@scylladb2:~/server_dotfiles# cat /etc/scylla/scylla.yaml|grep rpc_address
rpc_address: 10.70.188.39
root@scylladb2:~/server_dotfiles# cat /etc/scylla/scylla.yaml|grep listen_address
listen_address: 10.70.188.39
root@scylladb2:~/server_dotfiles# cat /etc/scylla/scylla.yaml|grep endpoint_snitch
endpoint_snitch: GossipingPropertyFileSnitch
root@scylladb2:cat /etc/scylla/cassandra-rackdc.properties
prefer_local=true
dc=datacenter1
rack=rack2
I tried to confirm the client can connect to it, so using cqlsh , I got Connection error: ('Unable to connect to any servers', {'127.0.0.1:9042': ConnectionRefusedError(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")}). cqlsh --help then suggested I to change the host with $CQLSH_HOST. So I did:
scylladb2 (10.70.188.39):
root@scylladb2:~# CQLSH_HOST=10.70.188.39 cqlsh
Connected to scylladb1 at 10.70.188.39:9042
[cqlsh 6.0.18 | Scylla 6.0.0-0.20240606.a77615adf324 | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cqlsh>
I also tried to connect to node 1 with:
scylladb2 (10.70.188.39):
root@scylladb2:~# CQLSH_HOST=10.70.188.92 cqlsh
Connected to scylladb1 at 10.70.188.92:9042
[cqlsh 6.0.18 | Scylla 6.0.0-0.20240606.a77615adf324 | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cqlsh>
And the documentation says I should run nodetool status to check the status of the cluster. But it failed with error running operation: std::system_error (error system:111, Connection refused) . So I checked the config which you can specify -h for host, so I did:
scylladb2 (10.70.188.39):
root@scylladb2:~# nodetool -h 10.70.188.39 status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address      Load    Tokens Owns Host ID                              Rack
UN 10.70.188.39 2.20 MB 256    ?    353babac-b23e-4cf3-9bab-d09e9fa3ebd5 rack2
This is a little bit odd for me because I set multiple nodes, but there’s only one node there. I can connect to node 1 as well, but I can only see one node there.
scylladb2 (10.70.188.39):
Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless
root@scylladb2:~# nodetool -h 10.70.188.92 status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address      Load    Tokens Owns Host ID                              Rack
UN 10.70.188.92 2.32 MB 256    ?    b054e62c-8c89-4107-b839-1560b67988d4 rack1

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless
My questions are:

Why is it not showing 2 nodes even though I set both the config for node 1 and node 2 to use node 1 as seed?

In the documentation, cqlsh or nodetool status will automatically connect to the cluster. How does that work? In my case, I have to specify the IP address of one particular node, or it won’t connect.

This is more like a subset of question 2: If I use the Rust library as a client to connect to scylladb, which IP address should I use for the cluster? Should it be the seed’s IP?

Create a ScyllaDB Cluster - Single Datacenter (DC) | ScyllaDB Docs

@Felipe_Cardeneti_Mendes: 1. It seems like node2 bootstrapped as a separate cluster and is unaware of node1. Wipe it clean and retry.
2. Set rpc_address to 0.0.0.0, and rpc_broadcast_address to the node relevant IP
3. You should specify the IP you set as RPC address as a contact point

@Pham_William: @Felipe_Cardeneti_Mendes That works! Thank you very much. I would assume I should be using the RPC address of the seed? Or it can be whatever node?

@Felipe_Cardeneti_Mendes: Any node is fine if you refer to the contact point for the app, but you more often than not want to specify more than a single node as a contact point

@Pham_William: Hi, @Felipe_Cardeneti_Mendes, there was some issue, so I started from scratch. With 3 nodes as follows:
±----------±--------±---------------------±---------------------------------------------±----------------±----------+
| scylladb1 | RUNNING | 10.70.188.63 (eth0) | fd42:ab78:d54:2140:216:3eff:fef1:2d6a (eth0) | CONTAINER | 0 |
±----------±--------±---------------------±---------------------------------------------±----------------±----------+
| scylladb2 | RUNNING | 10.70.188.242 (eth0) | fd42:ab78:d54:2140:216:3eff:fe67:72ef (eth0) | CONTAINER | 0 |
±----------±--------±---------------------±---------------------------------------------±----------------±----------+
| scylladb3 | RUNNING | 10.70.188.145 (eth0) | fd42:ab78:d54:2140:216:3eff:fea9:4149 (eth0) | CONTAINER | 0 |
±----------±--------±---------------------±---------------------------------------------±----------------±----------+
- node 1
cluster_name: 'scylladb'
seed_provider:
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      parameters:
          - seeds: "10.70.188.63"
listen_address: 10.70.188.63
endpoint_snitch: GossipingPropertyFileSnitch
rpc_address: 0.0.0.0
api_address: 10.70.188.63
broadcast_rpc_address: 10.70.188.63

- node 2
cluster_name: 'scylladb'
listen_address: 10.70.188.242
api_address: 10.70.188.242
rpc_address: 0.0.0.0
broadcast_rpc_address: 10.70.188.242
endpoint_snitch: GossipingPropertyFileSnitch
seed_provider:
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      parameters:
          - seeds: "10.70.188.63"

- node 3
cluster_name: 'scylladb'
listen_address: 10.70.188.145
api_address: 10.70.188.145
rpc_address: 0.0.0.0
broadcast_rpc_address: 10.70.188.145
endpoint_snitch: GossipingPropertyFileSnitch
seed_provider:
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      parameters:
          - seeds: "10.70.188.63"
They all have the same /etc/scylla/cassandra-rackdc.properties (only difference in rack number 1,2,3)
dc=sydney
rack=rack1
prefer_local=true
Now, if I run nodetool status on whatever node, will all return node 1 status only.
root@scylladb1:~# nodetool -h 10.70.188.63 status
Datacenter: sydney
==================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address      Load    Tokens Owns Host ID                              Rack
UN 10.70.188.63 1.99 MB 256    ?    410f137f-d213-4cd3-8729-976b1101464c rack1

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless
node 2 and 3 will show node 1 only but without information on its load
root@scylladb2:~# nodetool -h 10.70.188.242 status
root@scylladb3:~# nodetool -h 10.70.188.145 status
Datacenter: sydney
==================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address      Load Tokens Owns Host ID                              Rack
DN 10.70.188.63 ?    256    ?    410f137f-d213-4cd3-8729-976b1101464c rack1

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless
I have also tried to remove the data from nodes 2 and 3, but no luck. Is there a reason why it’s only showing node 1 with status DN “Down Normal”, instead of up?:
bash
sudo rm -rf /var/lib/scylla/data
sudo find /var/lib/scylla/commitlog -type f -delete
sudo find /var/lib/scylla/hints -type f -delete
sudo find /var/lib/scylla/view_hints -type f -delete
Note that, systemctl status scylla-server all returns active (running)

@Felipe_Cardeneti_Mendes: Is this a “local” docker deployment? How are you making changes to the config file? In general, it’s best to start with a single node, then add nodes incrementally until you get the config and steps right. For example, if you add the 2nd node and it doesn’t work, then probably it bootstrapped early and you have to clear it up. Check its logs, see what happened, then correct whatever insights the logs might be telling you. Once you’ve got 2 nodes up, just repeat the same for other particular nodes

ChaitDevOps · November 13, 2024, 5:26pm

set rpc_address to 0.0.0.0 and broadcast_rpc_address to private_ip. restart the cluster. this is how i solved it.

Topic		Replies	Views
Scylla cluster membership issue after failed change ScyllaDB open-source , troubleshooting , multi-dc	18	1455	March 29, 2024
Getting Scylla cluster started with docker (Noob Questions) ScyllaDB open-source	4	1085	September 22, 2023
I attempted to set up a ScyllaDB using three nodes, but it's not working as expected ScyllaDB troubleshooting , docker	3	340	June 24, 2024
Adding a node in a running Production Cluster ScyllaDB	0	358	June 6, 2024
Error in setup scyllaDB ( runtime error: No nodes present in the cluster. Has this node finished starting up? ) ScyllaDB troubleshooting	1	731	August 24, 2023

Nodes not joining a cluster, incrementally adding nodes to a cluster

Related topics