Reader_concurrency_semaphore and timeout errors

Hello guys,
I have two ScyllaDB clusters one for production and one for test(specs are below). We are using only the alternator features but time to time on both we are facing the reader_concurrency_semaphore errors and sometimes the “Operation timed out” errors. (Added them below)
The funny thing is on both clusters load is almost zero and total data size less than 100 MB. (Literally we got almost no load)
I checked the logs and cluster health but everything looks good. Do you have any idea why we are getting these errors?

One thing I notice is from the installation configs. The file /etc/scylla.d/cpuset.conf was empty all the nodes on the clusters. (The one other interesting this is I am able to run scylla when that file is empty)
I am not sure that is the case but just for the test environment I changed it to “–cpuset 0-3” now.

Thanks for your help :slight_smile:

Example Logs From Test Environment:

scylla[169107]:  [shard 3:stmt] reader_concurrency_semaphore - Semaphore user with 1/100 count and 16384/30828134 memory resources: timed out, dumping permit diagnostics:
                                                     permits    count     memory        table/operation/state
                                                     1          1         16K           alternator_Dummy.Dummy/data-query/waiting_for_execution
                                                     
                                                     1          1         16K           total
                                                     
                                                     Stats:
                                                     permit_based_evictions: 0
                                                     time_based_evictions: 2694
                                                     inactive_reads: 0
                                                     total_successful_reads: 453194
                                                     total_failed_reads: 0
                                                     total_reads_shed_due_to_overload: 0
                                                     total_reads_killed_due_to_kill_limit: 0
                                                     reads_admitted: 464798
                                                     reads_enqueued_for_admission: 1742
                                                     reads_enqueued_for_memory: 0
                                                     reads_admitted_immediately: 463056
                                                     reads_queued_because_ready_list: 190
                                                     reads_queued_because_need_cpu_permits: 1552
                                                     reads_queued_because_memory_resources: 0
                                                     reads_queued_because_count_resources: 0
                                                     reads_queued_with_eviction: 0
                                                     total_permits: 470819
                                                     current_permits: 1
                                                     need_cpu_permits: 0
                                                     awaits_permits: 0
                                                     disk_reads: 0
                                                     sstables_read: 0                    
scylla[169107]:  [shard 0: gms] rpc - client 1xx.1xx.1xx.1xx:7000: ignoring error response: Semaphore timedout
scylla[169107]:  [shard 3:stmt] rpc - client 1xx.1xx.1xx.1xx:7000: ignoring error response: Operation timed out for system.paxos - received only 0 responses from 1 CL=ONE.
scylla[169107]:  [shard 0:stmt] storage_proxy - Failed to apply mutation from 1xx.1xx.1xx.1xx#0: exceptions::mutation_write_timeout_exception (Operation timed out for system.paxos - received only 0 responses from 1 CL=ONE.)

On one of the other nodes:

scylla[122752]:  [shard 3:stmt] reader_concurrency_semaphore - Semaphore user with 1/100 count and 17577/30828134 memory resources: timed out, dumping permit diagnostics:
                                                     permits        count        memory        table/operation/state
                                                     1              1            17K           system.paxos/data-query/active/need_cpu
                                                     1              0            0B            system.paxos/data-query/waiting_for_admission
                                                     
                                                     2              1            17K           total
                                                     
                                                     Stats:
                                                     permit_based_evictions: 0
                                                     time_based_evictions: 86
                                                     inactive_reads: 0
                                                     total_successful_reads: 64879
                                                     total_failed_reads: 2
                                                     total_reads_shed_due_to_overload: 0
                                                     total_reads_killed_due_to_kill_limit: 0
                                                     reads_admitted: 65833
                                                     reads_enqueued_for_admission: 84
                                                     reads_enqueued_for_memory: 0
                                                     reads_admitted_immediately: 65750
                                                     reads_queued_because_ready_list: 41
                                                     reads_queued_because_need_cpu_permits: 43
                                                     reads_queued_because_memory_resources: 0
                                                     reads_queued_because_count_resources: 0
                                                     reads_queued_with_eviction: 0
                                                     total_permits: 66421
                                                     current_permits: 2
                                                     need_cpu_permits: 1
                                                     awaits_permits: 0
                                                     disk_reads: 0
                                                     sstables_read: 0

scylla[122752]:  [shard 3:stmt] reader_concurrency_semaphore - Semaphore user with 1/100 count and 16384/30828134 memory resources: timed out, dumping permit diagnostics:
                                                     permits        count        memory        table/operation/state
                                                     1              1            16K           system.paxos/data-query/waiting_for_execution
                                                     
                                                     1              1            16K           total
                                                     
                                                     Stats:
                                                     permit_based_evictions: 0
                                                     time_based_evictions: 70
                                                     inactive_reads: 0
                                                     total_successful_reads: 31016
                                                     total_failed_reads: 0
                                                     total_reads_shed_due_to_overload: 0
                                                     total_reads_killed_due_to_kill_limit: 0
                                                     reads_admitted: 31602
                                                     reads_enqueued_for_admission: 35
                                                     reads_enqueued_for_memory: 0
                                                     reads_admitted_immediately: 31567
                                                     reads_queued_because_ready_list: 19
                                                     reads_queued_because_need_cpu_permits: 16
                                                     reads_queued_because_memory_resources: 0
                                                     reads_queued_because_count_resources: 0
                                                     reads_queued_with_eviction: 0
                                                     total_permits: 31968
                                                     current_permits: 1
                                                     need_cpu_permits: 0
                                                     awaits_permits: 0
                                                     disk_reads: 0
                                                     sstables_read: 0

#ScyllaDB version: 6.1.2
compaction strategy : SizeTieredCompactionStrategy
#Cluster size: 3 Nodes - (TEST: 4 Core / 8 GB Ram) (PROD: 16 Core / 32 GB Ram)
#OS: Rocky Linux 9.4

I want to add one more thing operations that uses LWT taking way too long to complete.
here the screenshot from our monitoring stack. You can see there is not a lot of operations but especially Delete and Put Item operations taking too long. Also the CPU load is near zero.


On the LWT dashboard, informations are almost the same as this page.

Thanks.