After node goes down, getting error Cannot achieve consistency level for cl LOCAL_ONE. Requires 1, alive 0

Guy · February 19, 2025, 4:55am

Originally from the User Slack

@Marcondes_Viana_de_Oliveira_Junior: Hey, one question, I’m connecting to a 3 nodes cluster with a key space using replica of 3 and CL of local quorum. I intentionally killed one node to test and I got some logs: Cannot achieve consistency level for cl LOCAL_ONE. Requires 1, alive 0 , but the other two nodes seems alive. When I connect to the cluster I pass all ips.

@Karol_Baryła: How are nodes distributed between DCs and what is your exact replication strategy?

@Marcondes_Viana_de_Oliveira_Junior: Only one DC
NetworkTopologyStrategy
I’m using golang and I set this to the client:
fallback := gocql.RoundRobinHostPolicy()

c.PoolConfig.HostSelectionPolicy = gocql.TokenAwareHostPolicy(fallback)
Can it lock a host? So it don’t try other hosts?
Also, I just tried to recreate the connection and it fails.
Fails to start even I pass 3 hosts, one is down It do not connects

@avi: Check if you misspelled the datacenter name at the client side

@Marcondes_Viana_de_Oliveira_Junior:
unable to discover protocol version: dial tcp x.x.x.x:9042: connect: connection refused
IP is correct.

Is the IP that I killed, even I pass 3 ips to the client connection.
The other 2 IPS are live
DC is correct
unable to discover protocol version: Cannot achieve consistency level for cl LOCAL_ONE. Requires 1, alive 0
we have 2 alive and one dead, and the ips are correct.
also dc
I added some logs in the lib and it fails to get the protocol version
host ["x.x.x.2"]: Cannot achieve consistency level for cl LOCAL_ONE. Requires 1, alive 0

host ["x.x.x.1"]: dial tcp x.x.x.1:9042: connect: connection refused

host ["x.x.x.3"]: Cannot achieve consistency level for cl LOCAL_ONE. Requires 1, alive 0
It only logs the latest error, that’s why I was seeing different errors, now It’s more clear, but I still don’t understand

@avi: Double-check the replication factor

@Marcondes_Viana_de_Oliveira_Junior: Is 3

@avi: I don’t have an explanation then, if a node was alive enough to respond, it is alive enough to be a replica.
Are you using tablets?

@Marcondes_Viana_de_Oliveira_Junior: Not sure. I asked the responsible for the deployment.
$ nodetool status -h x.x.x.2 --keyspace my_keyspace
Datacenter: DTC3
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
DN x.x.x.1 793.81 MB 256 100.0% <UUID> rack-01
UN x.x.x.2 787.60 MB 256 100.0% <UUID> rack-01
UN x.x.x.3 759.51 MB 256 100.0% <UUID> rack-01
I can perform queries using cqlsh
But the go client can’t even connect, fails in this protocol version request

@avi: Then the problem is somewhere in the client or networking

@Marcondes_Viana_de_Oliveira_Junior: Ok, I will keep digging

@avi: wireshark may help

@Marcondes_Viana_de_Oliveira_Junior: One user can login other don’t, when I try to list the roles from cqlsh I get:
my_user@cqlsh> list roles;

NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: x.x.x.2:9042 DC1>: Unavailable('Error from server: code=1000 [Unavailable exception] message="Cannot achieve consistency level for cl QUORUM. Requires 1, alive 0" info={\'consistency\': \'QUORUM\', \'required_replicas\': 1, \'alive_replicas\': 0}')})
the only diff between the user is that one can only read and the other can read/modify

@avi: What version are you running?

@Marcondes_Viana_de_Oliveira_Junior:
I guess this is the problem, auth is spread
Version: 6.0.2-0.20240703.c9cd171f426e

@avi: But in 6.0 auth was moved to raft

@Marcondes_Viana_de_Oliveira_Junior: i got version from nodetool status --version
wrong place?

@avi: no, it’s the right place
you can fix the problem by increasing the replication factor of system_auth and running repair (this is documented), but it should be in raft in 6.0

@Marcondes_Viana_de_Oliveira_Junior: Is there a place where raft is configurable so I can check if is it on/off?

@avi: It’s automatic
https://github.com/scylladb/scylladb/commit/19b816bb68292b2a5ff7d8e8ec374ceb0d5ed85e

GitHub: Merge ‘Migrate system_auth to raft group0’ from Marcin Maliszkiewicz · scylladb/scylladb@19b816b

@Marcondes_Viana_de_Oliveira_Junior:
authenticator: PasswordAuthenticator
authorizer: CassandraAuthorizer
This is the current configuration

@avi: Was this cluster upgraded, or did it begin life as 6.0?

@Marcondes_Viana_de_Oliveira_Junior: This I need to ask the devops guy, not available now. But my bet it was not upgraded.

@avi: Well you can fix it with ALTER KEYSPACE + repair

@Marcondes_Viana_de_Oliveira_Junior: Sure
Appreciate you. Tomorrow I will try the fix and let you know!
nodetool repair system_auth -full
Right?

@avi: Yes

@Marcondes_Viana_de_Oliveira_Junior: Thanks again
You were right. It was born in 5 and migrated to 6. Is there a tool/tutorial to migrate auth from system_auth to raft?
We found the docs, Thanks!
Worked, thank you!!

Topic	Replies	Views
Raft majority loss issue - no raft quorum after node failure ScyllaDB troubleshooting , administration , raft , multi-dc	102	September 18, 2024
UnavailableException: Not enough replicas available for query ScyllaDB drivers , troubleshooting , multi-dc , topology	559	March 27, 2024
Drivers and consistency with multiple datacenters and local quorum ScyllaDB drivers , multi-dc , consistency , topology	18	December 30, 2024
Error on removing dead nodes using nodetool removenode ScyllaDB troubleshooting , nodetool	125	April 9, 2024
Consistency LeveL (CL), rollback on failure, retries and repair ScyllaDB data-model , drivers , consistency , high-availability	35	August 28, 2024

After node goes down, getting error Cannot achieve consistency level for cl LOCAL_ONE. Requires 1, alive 0

Related topics