What is the difference between ScyllaDB’s read path and Cassandra’s read path? When I stress test Cassandra and ScyllaDB, ScyllaDB’s read performance is poorer by five times compared to Cassandra’s using 16 cores and a normal HDD.
I expect better read performance on ScyllaDB compared to Cassandra when using a normal HDD.
Can someone please confirm, if it’s possible to achieve better read performance using a normal HDD?
If so, what changes are required to Scylla’s config? Please guide me!
*Based on a question originally asked on Stack Overflow by sateesh
There can be various reasons why you are not getting the most out of your Scylla Cluster.
Number of concurrent connections from your clients/loaders is not high enough, or you’re not using a sufficient amount of loaders. In such cases, some shards will be doing all the work, while others will be mostly idle. You want to keep your parallelism high.
Scylla likes to have a minimum of 2 connections per shard (you can see the number of shards in /etc/scylla.d/cpuset.conf)
What’s the size of your dataset? Are you reading a large number of partitions or just a few? You might be hitting a hot partition situation
I strongly recommend reading the following docs that will provide you with more insights:
Both Cassandra and ScyllaDB utilize the same disk storage architecture (LSM). That means that they have relatively the same disk access patterns because the algorithms are largely the same. The LSM trees were built with the idea in mind that it is not necessary to do instant in-place updates. It consists of immutable data buckets that are large continuous pieces of data on disk. That means less random IO, and more sequential IO for which the HDD works great (not counting utilized parallelism by modern database implementations).
All the above means that the difference that you see is not induced by the difference in how those databases use a disk. It must be related to the configuration differences and what happens underneath. Maybe ScyllaDB tries to utilize more parallelism or more aggressively do compaction. It depends.
To be able to say anything specific, please share your tests, envs, and configurations.
Both databases use an LSM tree, but Scylla has thread-per-core architecture on top plus we use O_Direct while C* uses the page cache. Scylla also has a sophisticated IO scheduler that makes sure not to overload the disk, and thus scylla_setup runs a benchmark automatically to tune. Check your output of it in io.conf.
There are far more things to review, an more information from you is required to look into it. In general, Scylla should perform better in this case as well, but your disk is likely to be the bottleneck in both cases.
Some other responses focused on write performance, but this isn’t what you asked about - you asked about reads.
Uncached read performance on HDDs is bound to be poor in Cassandra and Scylla, because reads from disk each require several seeks on the HDD, and even the best HDD cannot do more than, say, 200 of those seeks per second. Even with a RAID of several of these disks, you will rarely be able to do more than, say, 1000 requests per second. Since a modern multi-core can do orders of magnitude more CPU work than 1000 requests per second, in both Scylla and Cassandra cases, you’ll likely see free CPU. So Scylla’s main benefit of using much less CPU per request will not even matter when the disk is the performance bottleneck. In such cases, I would expect Scylla’s and Cassandra’s performance (I am assuming that you’re measuring throughput when you talk about performance?) should be roughly the same.
If still, you’re seeing better throughput from Cassandra than Scylla, several details may explain why, beyond the general client misconfiguration issues raised in other responses:
If you have low amounts of data that can fit in memory, Cassandra’s caching policy is better for your workload. Cassandra uses the OS’s page cache, which reads whole disk pages and may cache multiple items in one read, as well as multiple index entries. While Scylla works differently and has a row cache - only caching the specific data read. Scylla’s caching is better for large volumes of data that do not fit in memory but much worse when the data can fit in memory until the entire data set has been cached (after everything is cached, it becomes very efficient again).
On HDDs, the details of compaction are very important for read performance - if in one setup you have more sstables to read, it can increase the number of reads and lower the performance. This can change depending on your compaction configuration or even randomly (depending on when the compaction was run last). You can check if this explains your performance issues by doing a major compaction (“nodetool compact”) on both systems and checking the read performance afterward. You can switch the compaction strategy to LCS to ensure that random-access read performance is better, at the cost of more write work (on HDDs, this can be a worthwhile compromise).
If you are measuring scan performance (reading an entire table) instead of reading individual rows, other issues become relevant: As you may have heard, Scylla subdivides each nodes into shards (each shard is a single CPU). This is fantastic for CPU-bounded work but could be worse for scanning tables that aren’t huge because each sstable is now smaller, and the amount of contiguous data you can read before needing to seek again is lower.
I don’t know which of these differences - or something else - is causing the performance of your use-case to be lower in Scylla, but please keep in mind that whatever you fix, your performance is always going to be bad with HDDs. With SDDs, we’ve measured in the past more than a million random-access read requests per second on a single node. HDDs cannot come to anything close. If you really need optimum performance or performance per dollar, SDDs are the way to go.