Understanding timeout issue in logs, count (I/O) and memory are not saturated

Guy · April 21, 2024, 5:27am

Originally from the User Slack

@Shantanu_Sachdev:
Feb 15 07:56:00 ip-172-31-129-69 scylla[1797805]:  [shard 11] reader_concurrency_semaphore - (rate limiting dropped 11 similar messages) Semaphore _read_concurrency_sem with 6/100 count and 4074705/169324052 memory resources: timed out, dumping permit diagnostics:
                                                 permits        count        memory        table/description/state
                                                  2        2        2307K        traffickeyspace.alarm_data_createdate_asc_view/data-query/active/used
                                                  2        2        1635K        traffickeyspace.alarm_data_createdate_asc_view/data-query/inactive
                                                  2        2        37K        traffickeyspace.current_vehicle_data/data-query/inactive
                                                  1        0        0B        system.local/shard-reader/waiting
                                                  10        0        0B        traffickeyspace.current_vehicle_data/mutation-query/waiting
                                                  10        0        0B        traffickeyspace.trips_by_vehicles/data-query/waiting
                                                  2        0        0B        traffickeyspace.raw_device_data_2_2024/data-query/waiting
                                                  22        0        0B        traffickeyspace.alarm_data_createdate_asc_view/data-query/waiting
                                                  5        0        0B        traffickeyspace.vehicle_daily_data/data-query/waiting
                                                  35        0        0B        traffickeyspace.current_vehicle_data/data-query/waiting
                                                  2        0        0B        traffickeyspace.vehicle_hourly_data/data-query/waiting
                                                  1        0        0B        system.peers/shard-reader/waiting
                                                  
                                                  94        6        3979K        total
                                                  
                                                  Total: 94 permits with 6 count and 3979K memory resources
Hello All,
I am getting similar logs as above frequently in the scylla nodes on production. Can someone help me understand what exactly timed out as per above log. It seems that neither count nor memory threshold reached, still the query timed out. What are permits and how can I increase it so that time outs do not happen.
Thanks.

@Botond_Dénes: If neither count (I/O) or memory limits is saturated, it will be the third (implicit) one: CPU.
Look at your monitoring, you will likely see the shard reporting this hoverig at round 100% CPU usage.

@Shantanu_Sachdev: yes, a lot of shards Load frequently peaks to 100%. But the node’s overall CPU never goes above 50%. How can I optimally utilize it ?

@Botond_Dénes: You probably have partitions that are more popular then other partitions. The shards hosting these popular partiitons are maxxed out. A scale-out or scale-up can help, if it the problem is multiple reasonably hot partitions – the scale-{out,up} can move these to separate nodes/shards, helping spread the load. If there is just a handful of really hot partitions, I’m afraid you will have to look into changing your data model so that you don’t have such hot partitions.

@Shantanu_Sachdev: okay, scale out would become costly for us since we are using i4i.8xlarge nodes. Will inspect more on the hot partitions

Topic		Replies	Views
Memory resources: timed out, dumping permit diagnostics ScyllaDB open-source , troubleshooting , open-source-5-2 , java-driver	3	294	April 18, 2024
Cluster overloaded? Getting error message: reader_concurrency_semaphore - (rate limiting dropped 49 similar messages) Semaphore _read_concurrency_sem with 55/100 count and 13255581/147597557 memory resources: timed out, dumping permit diagnostics: ScyllaDB error-message , troubleshooting , nodetool , hot-partition	0	62	February 13, 2025
Reader_concurrency_semaphore and timeout errors ScyllaDB error-message , alternator , unanswered	2	86	June 4, 2025
Reader_concurrency_semaphore Database Community troubleshooting	1	197	September 4, 2024
Reader_concurrency_semaphore: Multiprocessing, timeout, CPU overload 100% ScyllaDB troubleshooting	1	413	February 19, 2024

Understanding timeout issue in logs, count (I/O) and memory are not saturated

Related topics