Originally from the User Slack
@Hyunwoo_Kim: Question about Full Scan Count Query Impact on Cluster
[Situation]
• Our cluster consists of 3 nodes each in DC1, DC2.
• One user ran the following query to count rows in a specific table
◦ SELECT count(*) FROM keyspace.table_inference
• At the same time, another user exerienced a failure when running a read query.
◦ Read from keyspace2.table_2
• ScyllaDB Version : 5.4.0
I understand that a full scan query can take a long time to return results,
but I’m unsure why it caused another read query to fail.
Could anyone provide insights into why this might have happend?
ps. I’ve attached ScyllaDB server error logs from when the read query failed.
2월 03 20:32:46 p-chn-scylla001-dc1 scylla[2866797]: [shard 58:stat] reader_concurrency_semaphore - Semaphore _read_concurrency_sem with 20/100 count and 1703936/82501959 memory resources: timed out, dumping permit diagnostics:
permits count memory table/operation/state
19 19 2M keyspace.table_inference/data-query/active/await
1 1 16K keyspace.table_inference/data-query/active/need_cpu
61 0 0B keyspace.table_inference/data-query/waiting_for_admission
81 20 2M total
Stats:
permit_based_evictions: 0
time_based_evictions: 0
inactive_reads: 0
total_successful_reads: 45214
total_failed_reads: 0
total_reads_shed_due_to_overload: 0
total_reads_killed_due_to_kill_limit: 0
reads_admitted: 45234
reads_enqueued_for_admission: 626
reads_enqueued_for_memory: 0
reads_admitted_immediately: 44669
reads_enqueued_for_memory: 0
reads_admitted_immediately: 44669
reads_queued_because_ready_list: 504
reads_queued_because_need_cpu_permits: 122
reads_queued_because_memory_resources: 0
reads_queued_because_count_resources: 0
reads_queued_with_eviction: 0
total_permits: 45295
current_permits: 81
need_cpu_permits: 20
awaits_permits: 19
disk_reads: 20
sstables_read: 20
2월 03 20:32:46 p-chn-scylla001-dc1 scylla[2866797]: [shard 28:stat] reader_concurrency_semaphore - Semaphore _read_concurrency_sem with 10/100 count and 819200/82501959 memory resources: timed out, dumping permit diagnostics:
permits count memory table/operation/state
9 9 784K keyspace.table_inference/data-query/active/await
1 1 16K keyspace.table_inference/data-query/active/need_cpu
1 0 0B keyspace2.table_2/data-query/waiting_for_admission
95 0 0B keyspace.table_inference/data-query/waiting_for_admission
106 10 800K total
Stats:
permit_based_evictions: 0
time_based_evictions: 0
inactive_reads: 0
total_successful_reads: 44684
total_failed_reads: 0
total_reads_shed_due_to_overload: 0
total_reads_killed_due_to_kill_limit: 0
reads_admitted: 44694
reads_enqueued_for_admission: 633
reads_enqueued_for_memory: 0
reads_admitted_immediately: 44157
reads_queued_because_ready_list: 362
reads_queued_because_need_cpu_permits: 271
reads_queued_because_memory_resources: 0
reads_queued_because_count_resources: 0
reads_queued_with_eviction: 0
total_permits: 44790
current_permits: 106
need_cpu_permits: 10
awaits_permits: 9
disk_reads: 10
sstables_read: 10
[Grafana Metric]
[Follow up Question]
When managing Scylladb, would decreasing range_request_timeout_in_ms
or read_request_timeout_in_ms
help prevent full scan queries from impacting other users?
@Botond_Dénes: Full scans have impact on other queries because they generate a lot of work for the cluster, increasing the load on the nodes.
Decreasing timeouts is counter-productive and leads to thrown away work.
@Hyunwoo_Kim: Thank you for answering Denes!
So… from an operational perspective, how can we prevent this kind of failure?
I think it’s not enough to tell users that count queries are not appropriate.
Or add one more API layer to read or write data into scylladb?
I expect only count (full scan) query to fail, and the other queries will not be affected by full scan query.
@Botond_Dénes: You can achieve that, but you need to use workload prioritization, which is available in enterprise or the upcoming source-available release.
With workload prioritization you can isolate workloads from each other, so a count() query running in one priority group will not hurt other queries in another priority group.
@Hyunwoo_Kim: Oh I see… Too bad we’re using version 5.4 
@Robert: in 5.4.x that feature also exists 
[cqlsh 6.0.19.dev2+g9d49b38 | Scylla 5.4.9-0.20240703.fdcbbb85adcd | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
....
cassandra@cqlsh> select * from system_auth.role_attributes
... ;
role | name | value
------------------------+---------------+-----------
aaaaaa | service_level | api
bbbbbb | service_level | api
cccccc. | service_level | analytics
@Botond_Dénes: No, system table schema was adjusted to be compatible with that of enterprise, but the feature is not implemented in open-source ScyllaDB.
@avi: Once you migrate to 2025.1, you can isolate the parallel scan and give it fewer resources
The table will have an additional column shares
denoting how much resources to allocate to these queries