How to optimally design data model to select by date, which couldn't be PK?

Dionysius · February 1, 2023, 3:13pm

My table is

CREATE TABLE cache_entries(
	created timestamp, 
	expired timestamp, 
	request_key text, 
	response_key text,
	method text,
	uri text,
	response_code int,
	PRIMARY KEY (request_key, created))
WITH CLUSTERING ORDER BY (created DESC);

I find expired data by query SELECT ... WHERE expired < ? ALLOW FILTERING and then remove files, which are linked with found records, and drop records by PK. So I cannot use TTL feature. Is there a way to avoid using of ALLOW FILTERING in my case?

Botond_Denes · February 9, 2023, 12:40pm

Whenever you find yourself having to use ALLOW FILTERING, a possible alternative is to create a secondary index on the filtered columns. This is a trade-off: creating a secondary index will speed up the queries but it is extra load on the cluster, as the secondary index has to be kept in sync with the base table and thus be updated on each write.
If you do this filtering read often, using an index might be a good choice. If this filtering read is rare, then you can just keep using filtering, nothing wrong with that.

Topic		Replies	Views
Tradeoff between allow filtering and creating an index ScyllaDB data-model	3	705	December 26, 2022
Local Secondary Index filtering ScyllaDB	9	768	November 29, 2023
When to use filtering - and when not Knowledge Base	0	199	November 7, 2022
Materialized Views and Indexing, filtering columns by range, ALLOW FILTERING ScyllaDB data-model , materialized-views , secondary-index	0	150	May 8, 2024
Refershing data, expiring old records when uploading batch data (daily aggregates), full table scan, TTL and Secondary Indexes ScyllaDB workload-prioritizat , data-model , ttl	0	12	February 18, 2025

How to optimally design data model to select by date, which couldn't be PK?

Related topics