Installation details
#ScyllaDB version: 5.4
#Cluster size: 3 each DC1,2
os (RHEL/CentOS/Ubuntu/AWS AMI): CentOS
Hi all — I’m using ScyllaDB to store transactional data and had a few questions around query patterns and cluster impact.
- We need to fetch 1-day range of data, which can be hundreds of thousands of rows.
- There’s a chance of large partitions. Very few. Some 0.001% user have about 630k trasactions.
- A single query timing out is okay — but we must avoid impacting other services due to one heavy scan.
So far I’m thinking:
- Bucketing to reduce hot partitions, but read burden may still be high.
- LIMIT doesn’t help much if hot partition happens anyway.
- Pagination could help — but unclear if it prevents cross-service impact.
- Maybe Materialized Views?
I’d love to hear your experience on:
- How do you monitor or alert for large partition reads in production?
- How do you prevent the Full scan query like Count query and large partition query?
- What strategies do you use to contain the impact of large queries — whether at the app level or through cluster tuning (e.g., read concurrency, timeouts, coordinator limits)?
Has anyone tackled similar patterns or have suggestions from an app-level or cluster tuning perspective?
Thanks a lot!