Handling large partition and full scan query when operating ScyllaDB

qrlagusdn · July 5, 2025, 1:32am

Installation details
#ScyllaDB version: 5.4
#Cluster size: 3 each DC1,2
os (RHEL/CentOS/Ubuntu/AWS AMI): CentOS

Hi all — I’m using ScyllaDB to store transactional data and had a few questions around query patterns and cluster impact.

We need to fetch 1-day range of data, which can be hundreds of thousands of rows.
There’s a chance of large partitions. Very few. Some 0.001% user have about 630k trasactions.
A single query timing out is okay — but we must avoid impacting other services due to one heavy scan.

So far I’m thinking:

Bucketing to reduce hot partitions, but read burden may still be high.
LIMIT doesn’t help much if hot partition happens anyway.
Pagination could help — but unclear if it prevents cross-service impact.
Maybe Materialized Views?

I’d love to hear your experience on:

How do you monitor or alert for large partition reads in production?
How do you prevent the Full scan query like Count query and large partition query?
What strategies do you use to contain the impact of large queries — whether at the app level or through cluster tuning (e.g., read concurrency, timeouts, coordinator limits)?

Has anyone tackled similar patterns or have suggestions from an app-level or cluster tuning perspective?
Thanks a lot!

Botond_Denes · July 15, 2025, 10:39am

Look into Workload Priotization, it was designed exactly for this use-case: isolate heavy but non-latency sensitive scans from other queries.

qrlagusdn · July 16, 2025, 9:16am

Thank you for the reply!
Just to confirm, isn’t Workload Prioritization an enterprise-only feature?

Botond_Denes · July 16, 2025, 10:38am

With the move to source-available, we no longer have enterprise-only features. All features are available in the single source-available release stream.

Topic		Replies	Views
Full scan performance impact on cluster, impact on other queries and workloads ScyllaDB workload-prioritizat , performance , full-table-scan	0	41	February 24, 2025
ScyllaDB and Large Partitions ScyllaDB	1	481	January 22, 2023
P99 and p95 spikes, hot partitions, performance and data modeling ScyllaDB data-model , performance , kubernetes , hot-partition	0	37	June 3, 2025
Latency issue and data retrieval, page size and number of threads (Java Driver) ScyllaDB data-model , java-driver , paging , latency , threads	0	54	December 23, 2024
Recommendations for partitioning imbalanced data ScyllaDB data-model , hot-partition	1	158	November 22, 2024

Handling large partition and full scan query when operating ScyllaDB

Related topics