What table options should be used for fast writes & reads and no deletes

( New user to Scylladb … excuse my naivety )

I am going to use ScyllaDB cluster ( single DC ) for key value pair , key will be the queue-id and value will be the job data . Key size 32 bytes , values varying avg 50kb - max 10 Mb

There will by 800 million writes every day at peaks of 4 k writes per second ( which might grow)
No batch inserts all single . No updates . Reads will happen exactly once per record

To avoid any tombstones I will use 1 table per day and drop entire tables after 2 days

There are my doubts
What table type should I used . I think Caching enabled=true ?
What is the ideal concurrency I should design my writers and readers ?
If there are no updates or deletes can I use this to speed up my read/writes by setting up some config ?

You can use default table,
however I suggest to use TimeWindow Compaction Strategy instead of dropping and creating the table with TTL of X days (make sure you properly then set the window unit in TWCS, don’t create within your TTL more than 12-13 windows please)
then tombstones won’t be an issue, Scylla will effectively throw them away (thanks to TTL and windows)
However a warning here is - you won’t overwrite old data here (outside of current window), if you will, then this effective removing of old data will get broken.
(and in such case default ICS with TTL should also work with either SAG or periodic major compaction)

I don’t know what will be your latency SLAs, but 50k-10M payloads are huuuge range, where the rows around 50k will be quite fast, but processing of 10M payload might bottleneck the cpu(shard).
Limits we suggest to keep are here: scylladb/config.cc at master · scylladb/scylladb · GitHub , so for you if we assume single row partition, then it’s about cell size, which is 1 MB, you will have 10MB, so I’d expect not single digit ms latencies, but worst case 10x more
Above largely depends on how many reads per second will you do and how many cpus will be there (and how good will be your PK distribution).

Concurrency depends on distribution and how many cpus you will have and how many reads/writes (with ideally percentiles for that data, since that payload size range of yours is big). Some guidance is in Sizing Up Your ScyllaDB Cluster - ScyllaDB , or check sizing calculators (take them as guidance, not as a rule of thumb, cassandra-stress is your best friend here to see how much 1 cpu will be able to handle with your RF). You can also read Maximizing Performance via Concurrency While Minimizing Timeouts in Distributed Databases - ScyllaDB .

If there are no updates and deletes, then this is perfect TTL + TWCS situation assuming you really want to drop all your data older than 2 days. And that is basically your best tuning - Compaction | ScyllaDB Docs
(but do check other strategies, too)

hth
L

1 Like