What table options should be used for fast writes & reads and no deletes

rpap · December 1, 2022, 3:16pm

( New user to Scylladb … excuse my naivety )

I am going to use ScyllaDB cluster ( single DC ) for key value pair , key will be the queue-id and value will be the job data . Key size 32 bytes , values varying avg 50kb - max 10 Mb

There will by 800 million writes every day at peaks of 4 k writes per second ( which might grow)
No batch inserts all single . No updates . Reads will happen exactly once per record

To avoid any tombstones I will use 1 table per day and drop entire tables after 2 days

There are my doubts
What table type should I used . I think Caching enabled=true ?
What is the ideal concurrency I should design my writers and readers ?
If there are no updates or deletes can I use this to speed up my read/writes by setting up some config ?

Lubos_Kosco · January 4, 2023, 10:13am

You can use default table,
however I suggest to use TimeWindow Compaction Strategy instead of dropping and creating the table with TTL of X days (make sure you properly then set the window unit in TWCS, don’t create within your TTL more than 12-13 windows please)
then tombstones won’t be an issue, Scylla will effectively throw them away (thanks to TTL and windows)
However a warning here is - you won’t overwrite old data here (outside of current window), if you will, then this effective removing of old data will get broken.
(and in such case default ICS with TTL should also work with either SAG or periodic major compaction)

I don’t know what will be your latency SLAs, but 50k-10M payloads are huuuge range, where the rows around 50k will be quite fast, but processing of 10M payload might bottleneck the cpu(shard).
Limits we suggest to keep are here: scylladb/config.cc at master · scylladb/scylladb · GitHub , so for you if we assume single row partition, then it’s about cell size, which is 1 MB, you will have 10MB, so I’d expect not single digit ms latencies, but worst case 10x more
Above largely depends on how many reads per second will you do and how many cpus will be there (and how good will be your PK distribution).

Concurrency depends on distribution and how many cpus you will have and how many reads/writes (with ideally percentiles for that data, since that payload size range of yours is big). Some guidance is in Sizing Up Your ScyllaDB Cluster - ScyllaDB , or check sizing calculators (take them as guidance, not as a rule of thumb, cassandra-stress is your best friend here to see how much 1 cpu will be able to handle with your RF). You can also read Maximizing Performance via Concurrency While Minimizing Timeouts in Distributed Databases - ScyllaDB .

If there are no updates and deletes, then this is perfect TTL + TWCS situation assuming you really want to drop all your data older than 2 days. And that is basically your best tuning - Compaction | ScyllaDB Docs
(but do check other strategies, too)

hth
L

Topic		Replies	Views
Compaction strategy best for deletion of a large number of records ScyllaDB ttl , compaction , twcs	1	141	June 14, 2024
ScyllaDB for high write and read but low insert ScyllaDB	2	405	December 4, 2023
Time Window Compaction Strategy, TTL, number of windows and performance ScyllaDB data-model , performance , ttl , twcs	0	26	March 11, 2025
How to migrate data between TWC tables to change the TTL ScyllaDB data-manipulation , data-model , ttl , migration	0	68	May 20, 2024
Time to Live (TTL) with mixed expiration times, tombstone_threshold and sstable expiration issues ScyllaDB ttl , twcs , sstable , tombstone	0	20	November 7, 2024

What table options should be used for fast writes & reads and no deletes

Related topics