ScyllaDB, Tokio, parallel tasks, concurrency, and optimizing the performance

Guy · May 30, 2024, 11:33am

Originally from the User Slack

@Jimmy: hi there i saw the presentation of storing telemetry in syclladb, (https://www.youtube.com/watch?v=wZ9xc5LnsB80 storing 80k entries in about a second is very impressive, my question is are 80k tokio tasks built right? each task handling 1 insertion, is this generates overhead despite being massive parallel? would not make sense to batch the insertion a bit like assigning 100 (or X number) entries to 1 tasks or that is not needed at all? ty how u optimize the calls to the db, cc: @Felipe_Cardeneti_Mendes

@Felipe_Cardeneti_Mendes: No, there aren’t 80k concurrent tasks. There’s a limit imposed by a semaphore. In any case, it’s been quite a while I don’t touch that code, happy to see people find it useful still.

To your question, tokio is very lightweight on CPU, so handling several tasks in parallel isn’t typically a problem. What you want to avoid, however, is to end up with high concurrency on the database - where many queries are overwhelming what a single shard can handle.

You could assign a pool of workers and submit requests to then, but the code in question was meant to be easy for users to understand. Batching in this specific case could work as we are ingesting several rows per device. You shouldn’t be batching across different partitions though, as it is less efficient.

Topic		Replies	Views
Reader_concurrency_semaphore: Multiprocessing, timeout, CPU overload 100% ScyllaDB troubleshooting	1	412	February 19, 2024
Best practice for multi-row insertions ScyllaDB data-model , drivers , rust , batch	1	70	April 6, 2025
Performance issue, throughput drop and latency increase ScyllaDB data-model , performance , troubleshooting , sizing	0	128	November 28, 2024
ScyllaDB mass insertion advice needed ScyllaDB data-model , cassandra , performance , migration , nodetool	1	115	October 21, 2024
Batch data inserts, number of rows and parallel processing ScyllaDB data-model , performance , drivers , batch	0	46	January 16, 2025

ScyllaDB, Tokio, parallel tasks, concurrency, and optimizing the performance

Related topics