Full scan performance impact on cluster, impact on other queries and workloads

Guy · February 24, 2025, 7:19am

@Hyunwoo_Kim: Question about Full Scan Count Query Impact on Cluster

[Situation]
• Our cluster consists of 3 nodes each in DC1, DC2.
• One user ran the following query to count rows in a specific table
◦ SELECT count(*) FROM keyspace.table_inference
• At the same time, another user exerienced a failure when running a read query.
◦ Read from keyspace2.table_2
• ScyllaDB Version : 5.4.0
I understand that a full scan query can take a long time to return results,
but I’m unsure why it caused another read query to fail.
Could anyone provide insights into why this might have happend?

ps. I’ve attached ScyllaDB server error logs from when the read query failed.

 2월 03 20:32:46 p-chn-scylla001-dc1 scylla[2866797]:  [shard 58:stat] reader_concurrency_semaphore - Semaphore _read_concurrency_sem with 20/100 count and 1703936/82501959 memory resources: timed out, dumping permit diagnostics:
                                                       permits        count        memory        table/operation/state
                                                       19        19        2M        keyspace.table_inference/data-query/active/await
                                                       1        1        16K        keyspace.table_inference/data-query/active/need_cpu
                                                       61        0        0B        keyspace.table_inference/data-query/waiting_for_admission
                                                       
                                                       81        20        2M        total
                                                       
                                                       Stats:
                                                       permit_based_evictions: 0
                                                       time_based_evictions: 0
                                                       inactive_reads: 0
                                                       total_successful_reads: 45214
                                                       total_failed_reads: 0
                                                       total_reads_shed_due_to_overload: 0
                                                       total_reads_killed_due_to_kill_limit: 0
                                                       reads_admitted: 45234
                                                       reads_enqueued_for_admission: 626
                                                       reads_enqueued_for_memory: 0
                                                       reads_admitted_immediately: 44669
                                                       reads_enqueued_for_memory: 0
                                                       reads_admitted_immediately: 44669
                                                       reads_queued_because_ready_list: 504
                                                       reads_queued_because_need_cpu_permits: 122
                                                       reads_queued_because_memory_resources: 0
                                                       reads_queued_because_count_resources: 0
                                                       reads_queued_with_eviction: 0
                                                       total_permits: 45295
                                                       current_permits: 81
                                                       need_cpu_permits: 20
                                                       awaits_permits: 19
                                                       disk_reads: 20
                                                       sstables_read: 20
 2월 03 20:32:46 p-chn-scylla001-dc1 scylla[2866797]:  [shard 28:stat] reader_concurrency_semaphore - Semaphore _read_concurrency_sem with 10/100 count and 819200/82501959 memory resources: timed out, dumping permit diagnostics:
                                                       permits        count        memory        table/operation/state
                                                       9        9        784K        keyspace.table_inference/data-query/active/await
                                                       1        1        16K        keyspace.table_inference/data-query/active/need_cpu
                                                       1        0        0B        keyspace2.table_2/data-query/waiting_for_admission
                                                       95        0        0B        keyspace.table_inference/data-query/waiting_for_admission
                                                       
                                                       106        10        800K        total
                                                       
                                                       Stats:
                                                       permit_based_evictions: 0
                                                       time_based_evictions: 0
                                                       inactive_reads: 0
                                                       total_successful_reads: 44684
                                                       total_failed_reads: 0
                                                       total_reads_shed_due_to_overload: 0
                                                       total_reads_killed_due_to_kill_limit: 0
                                                       reads_admitted: 44694
                                                       reads_enqueued_for_admission: 633
                                                       reads_enqueued_for_memory: 0
                                                       reads_admitted_immediately: 44157
                                                       reads_queued_because_ready_list: 362
                                                       reads_queued_because_need_cpu_permits: 271
                                                       reads_queued_because_memory_resources: 0
                                                       reads_queued_because_count_resources: 0
                                                       reads_queued_with_eviction: 0
                                                       total_permits: 44790
                                                       current_permits: 106
                                                       need_cpu_permits: 10
                                                       awaits_permits: 9
                                                       disk_reads: 10
                                                       sstables_read: 10

[Grafana Metric]

[Follow up Question]
When managing Scylladb, would decreasing range_request_timeout_in_ms or read_request_timeout_in_ms help prevent full scan queries from impacting other users?

@Botond_Dénes: Full scans have impact on other queries because they generate a lot of work for the cluster, increasing the load on the nodes.
Decreasing timeouts is counter-productive and leads to thrown away work.

@Hyunwoo_Kim: Thank you for answering Denes!

So… from an operational perspective, how can we prevent this kind of failure?
I think it’s not enough to tell users that count queries are not appropriate.
Or add one more API layer to read or write data into scylladb?
I expect only count (full scan) query to fail, and the other queries will not be affected by full scan query.

@Botond_Dénes: You can achieve that, but you need to use workload prioritization, which is available in enterprise or the upcoming source-available release.
With workload prioritization you can isolate workloads from each other, so a count() query running in one priority group will not hurt other queries in another priority group.

@Hyunwoo_Kim: Oh I see… Too bad we’re using version 5.4

@Robert: in 5.4.x that feature also exists

[cqlsh 6.0.19.dev2+g9d49b38 | Scylla 5.4.9-0.20240703.fdcbbb85adcd | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
....
cassandra@cqlsh> select * from system_auth.role_attributes 
   ... ;

 role                   | name          | value
------------------------+---------------+-----------
 aaaaaa                 | service_level |       api
 bbbbbb                 | service_level |       api
 cccccc.                | service_level | analytics

@Botond_Dénes: No, system table schema was adjusted to be compatible with that of enterprise, but the feature is not implemented in open-source ScyllaDB.

@avi: Once you migrate to 2025.1, you can isolate the parallel scan and give it fewer resources
The table will have an additional column shares denoting how much resources to allocate to these queries

Topic		Replies	Views
Handling large partition and full scan query when operating ScyllaDB ScyllaDB data-model , performance , unanswered , large-partitions	0	15	July 5, 2025
Reader_concurrency_semaphore Database Community troubleshooting	1	188	September 4, 2024
High read latency ScyllaDB performance , operator , latency	2	347	August 29, 2024
ReadTimeOut issue ScyllaDB	6	1633	April 14, 2023
Read fails for one node in the cluster, no response received on GCP ScyllaDB error-message , large-partitions , hot-partition , gcp	0	16	March 10, 2025

Full scan performance impact on cluster, impact on other queries and workloads

Related topics