Hello guys,
I have two ScyllaDB clusters one for production and one for test(specs are below). We are using only the alternator features but time to time on both we are facing the reader_concurrency_semaphore errors and sometimes the “Operation timed out” errors. (Added them below)
The funny thing is on both clusters load is almost zero and total data size less than 100 MB. (Literally we got almost no load)
I checked the logs and cluster health but everything looks good. Do you have any idea why we are getting these errors?
One thing I notice is from the installation configs. The file /etc/scylla.d/cpuset.conf was empty all the nodes on the clusters. (The one other interesting this is I am able to run scylla when that file is empty)
I am not sure that is the case but just for the test environment I changed it to “–cpuset 0-3” now.
Thanks for your help
Example Logs From Test Environment:
scylla[169107]: [shard 3:stmt] reader_concurrency_semaphore - Semaphore user with 1/100 count and 16384/30828134 memory resources: timed out, dumping permit diagnostics:
permits count memory table/operation/state
1 1 16K alternator_Dummy.Dummy/data-query/waiting_for_execution
1 1 16K total
Stats:
permit_based_evictions: 0
time_based_evictions: 2694
inactive_reads: 0
total_successful_reads: 453194
total_failed_reads: 0
total_reads_shed_due_to_overload: 0
total_reads_killed_due_to_kill_limit: 0
reads_admitted: 464798
reads_enqueued_for_admission: 1742
reads_enqueued_for_memory: 0
reads_admitted_immediately: 463056
reads_queued_because_ready_list: 190
reads_queued_because_need_cpu_permits: 1552
reads_queued_because_memory_resources: 0
reads_queued_because_count_resources: 0
reads_queued_with_eviction: 0
total_permits: 470819
current_permits: 1
need_cpu_permits: 0
awaits_permits: 0
disk_reads: 0
sstables_read: 0
scylla[169107]: [shard 0: gms] rpc - client 1xx.1xx.1xx.1xx:7000: ignoring error response: Semaphore timedout
scylla[169107]: [shard 3:stmt] rpc - client 1xx.1xx.1xx.1xx:7000: ignoring error response: Operation timed out for system.paxos - received only 0 responses from 1 CL=ONE.
scylla[169107]: [shard 0:stmt] storage_proxy - Failed to apply mutation from 1xx.1xx.1xx.1xx#0: exceptions::mutation_write_timeout_exception (Operation timed out for system.paxos - received only 0 responses from 1 CL=ONE.)
On one of the other nodes:
scylla[122752]: [shard 3:stmt] reader_concurrency_semaphore - Semaphore user with 1/100 count and 17577/30828134 memory resources: timed out, dumping permit diagnostics:
permits count memory table/operation/state
1 1 17K system.paxos/data-query/active/need_cpu
1 0 0B system.paxos/data-query/waiting_for_admission
2 1 17K total
Stats:
permit_based_evictions: 0
time_based_evictions: 86
inactive_reads: 0
total_successful_reads: 64879
total_failed_reads: 2
total_reads_shed_due_to_overload: 0
total_reads_killed_due_to_kill_limit: 0
reads_admitted: 65833
reads_enqueued_for_admission: 84
reads_enqueued_for_memory: 0
reads_admitted_immediately: 65750
reads_queued_because_ready_list: 41
reads_queued_because_need_cpu_permits: 43
reads_queued_because_memory_resources: 0
reads_queued_because_count_resources: 0
reads_queued_with_eviction: 0
total_permits: 66421
current_permits: 2
need_cpu_permits: 1
awaits_permits: 0
disk_reads: 0
sstables_read: 0
scylla[122752]: [shard 3:stmt] reader_concurrency_semaphore - Semaphore user with 1/100 count and 16384/30828134 memory resources: timed out, dumping permit diagnostics:
permits count memory table/operation/state
1 1 16K system.paxos/data-query/waiting_for_execution
1 1 16K total
Stats:
permit_based_evictions: 0
time_based_evictions: 70
inactive_reads: 0
total_successful_reads: 31016
total_failed_reads: 0
total_reads_shed_due_to_overload: 0
total_reads_killed_due_to_kill_limit: 0
reads_admitted: 31602
reads_enqueued_for_admission: 35
reads_enqueued_for_memory: 0
reads_admitted_immediately: 31567
reads_queued_because_ready_list: 19
reads_queued_because_need_cpu_permits: 16
reads_queued_because_memory_resources: 0
reads_queued_because_count_resources: 0
reads_queued_with_eviction: 0
total_permits: 31968
current_permits: 1
need_cpu_permits: 0
awaits_permits: 0
disk_reads: 0
sstables_read: 0
#ScyllaDB version: 6.1.2
compaction strategy : SizeTieredCompactionStrategy
#Cluster size: 3 Nodes - (TEST: 4 Core / 8 GB Ram) (PROD: 16 Core / 32 GB Ram)
#OS: Rocky Linux 9.4