Best way to Fetch N rows in ScyllaDB: Count, Limit or Paging

Guy · December 11, 2022, 8:27am

Your client should always use paging - otherwise, you risk adding pressure to the query coordinator, which may introduce latency and memory fragmentation. If you use the Scylla Monitoring stack (and you should if you don’t!), refer to the CQL Optimization dashboard and - more specifically - to the Paged Queries panel.

Now, to your question. It seems to be that your example is a bit minimalist for what you are actually wanting to achieve, and - even then - should it not be, we have to consider such set-up at scale. Eg, There may be a tenant allowed which is allowed to place 3 orders within a day, but another tenant allowed to place 1 million orders within a week?

If the above assumption is correct - and with the options at hand you have given - you are better off using LIMIT with paging. The reason is that there are some particular problems with the description you’ve given at hand:

First, you want to retrieve N amount of records within a particular time frame, but your queries don’t specify such a time-frame
Second, either COUNT or LIMIT will initiate a partition scan, and it is unclear how a hash_id + user_id combination can be done to determine the number of records within a time frame.

Of course, it may be that I am wrong, but I’d like to suggest different approaches which may or may not apply to you and to your use case.

Consider a timestamp component as part of the clustering key. This will allow you to avoid full partition scans with queries such as:

SELECT something FROM orders WHERE hash_id=? AND user_id=? AND ts >= ? AND ts < ?;

If the above is not applicable, perhaps a Counter Table would suffice your needs? You could simply increment a counter after an order is placed, and - after - query the counter table as in:

SELECT count FROM counter_table WHERE hash_id=? AND user_id=? AND date=?;

I hope that helps!

*The answer was provided on Stack Overflow by Felipe Mendes

Topic		Replies	Views
Queries with millions of records and pagination in a distributed database ScyllaDB data-model , performance , drivers	1	855	October 7, 2023
How do I use `query()` safely? ScyllaDB open-source , troubleshooting , rust-driver	1	185	February 1, 2024
ScyllaDB returning a smaller page than expected to maintain low latency, handling of tombstones ScyllaDB performance , drivers , rust-driver , go-driver , paging	0	122	June 9, 2024
Latency issue and data retrieval, page size and number of threads (Java Driver) ScyllaDB data-model , java-driver , paging , latency , threads	0	46	December 23, 2024
Counter Table vs SELECT COUNT for partition row count ScyllaDB	2	313	July 8, 2024

Best way to Fetch N rows in ScyllaDB: Count, Limit or Paging

Related topics