Refershing data, expiring old records when uploading batch data (daily aggregates), full table scan, TTL and Secondary Indexes

Guy · February 18, 2025, 9:01am

Originally from the User Slack

@Igor_Q: hello, everyone
i have a running scylla cluster storing some features in a table (id, field, data, updated_at) with primary key (id, field)
this is great for saving realtime updates by (id, field) and querying the whole wide row by (id)
now what i want to do is to also upload batch data (daily aggregates). upsert by (id, field) works fine here, too, but the problem is, these data are meant to be fully refreshed, so i need some way to expire old records from the table
i have considered the following solutions:
• implement full scan as described in https://www.scylladb.com/2017/03/28/parallel-efficient-full-table-scan-scylla/ with bypass caching. this can be used to expire records from all aggregate uploads by (field, updated_at) and also to collect some data metrics at the same time. but i am concerned about performance penalty. there’s probably non-negligible limitation on how often i can run full scans
• use secondary index to filter by field , but essentially run full scan on all records with matching field
• set short TTL (3-6 hours) for records updated from aggregates and constantly reupload them to refresh the TTL. the downside is, if constant reuploading breaks, we lose the data from scylla
is there something i am missing? what would you suggest?

@avi: Full scan is a good solution here, especially with workload prioritization pushing it to only use idle time

Topic		Replies	Views
Data modeling issues ScyllaDB data-model , open-source	4	266	March 25, 2024
What table options should be used for fast writes & reads and no deletes Database Community	1	834	January 4, 2023
Restore backup with expired TTL ScyllaDB scylla-manager , ttl , backup-restore	2	43	August 5, 2024
Need help with "transaction" modeling ScyllaDB	3	462	October 31, 2023
MySQL to NoSQL, Data retention (similar to AWS S3 Retention) and tiered storage (similar to Elastic Search Data Tier, S3) ScyllaDB data-model , ttl , migration , aws , mysql	0	88	May 13, 2024

Refershing data, expiring old records when uploading batch data (daily aggregates), full table scan, TTL and Secondary Indexes

Related topics