Comparing Cassandra To Scylla - what can I do to show scylla in its best possible light?

gcollins · May 19, 2025, 7:53pm

Installation details
#ScyllaDB version: 6.2.3
#Cluster size: 3 m6g.large-1000 EC2 instances
os (RHEL/CentOS/Ubuntu/AWS AMI): AWS AMI

I have been some load testing of a small Cassandra system (3 m6g.large-600 nodes - Cassandra 4.0.9) for a new data design I am working on. While I have been doing this, we are also testing the waters with scylla so I pointed my load test tooling at it instead.

The test tool has two parts:
(1) A python tool to insert an average of x number of records for y customers. On Cassandra I did this with batching records in the same partition → this seemed to be the fastest method from what we have learnt from Cassandra use
(2) A second Java tool which tries to read all records for each customer at a predetermined rate (e.g. 50 records per second or perhaps 500 records per second)

I thought ScyllaDB would smoke it…but at least from what I have seen so far it is working slower. Often ScyllaDB would even fail on the bulk inserts (a couple of billion records inserted over several hours) which were fine for Cassandra.

I may not be using Scylla right of course. I have done some research and I think I have found some things I should change. Anything else? Would these make a significant difference to performance?:
(1) Switch to massive concurrent write instead of batching.
(2) Switch to the ScyllaDB specific drivers
(3) Switch to using the shard port (19042)

Is the system potentially too small to a fair comparison?

Any responses are greatly appreciated.

thanks in advance,
Gareth Collins

Gabriel · May 25, 2025, 12:01pm

Dear @gcollins ,

Thanks for your question,

First, I see you are executing your tests on m6g.large instances,
I strongly suggest you stick to one of the instance types ScyllaDB recommends, either i3en, i4i, i7i or i7ie, for AWS. (Cloud Instance Recommendations for AWS, GCP, and Azure | ScyllaDB Docs)

As per your questions and assumptions, using our drivers, and performing concurrent inserts would definelty improve performance.

Last, consider testing with larger instances (more vCPUs), that will also make the differene as ScyllaDB performance scales better with more cores and memory bandwidth. On small nodes, the shard-per-core design gets bottlenecked easily.

I am referencing a couple of my collagues who could give you additional extra tips:

cc @avikivity @GuyCarmin

GuyCarmin · May 26, 2025, 11:46am

Hi @gcollins -
a quick question from my end - did you meant 3 nodes of m6g.large?
If so - what does the “1000” and “600” number refer to?

As @Gabriel mentioned - ScyllaDB is a real-time low-latency DB’s and as such we realy on fast and consist disk performance hence ScyllaDB would work much better on the i4i/i7i machines.

on top of that - could you pleas share more about what and how you tested, schema, query pattern etc?

And yes - concurrent - or connection per shard is a better way to utilise ScyllaSB.

a hint- in order to see where is the bottle neck it is highly recommended to deploy and connect the Scylla Monitoring stack which will give you a full visibility of the cluster.

feel free to ping me if you have more info / more questions

gcollins · May 27, 2025, 10:26am

Hi @GuyCarmin,

Thanks very much for the responses!

The “-600” and “-1000” indicate the disk sizes.

So the schema we were using looks like this:

CREATE TABLE

(

id int,

timestamp timestamp,

unique_id int,

data blob,

PRIMARY KEY (id, timestamp, unique_id)

) WITH CLUSTERING ORDER BY (timestamp DESC, unique_id ASC)

AND compaction = {‘class’: ‘org.apache.cassandra.db.compaction.LeveledCompactionStrategy’}

We are using leveled compaction as we are optimizing to minimize latency on reading.

The load testing involves reading all the records for random ids on a configurable number of threads.

And we are testing varying the number of ids, the number of records per id and the size of the data blob. Interestingly, at least for Cassandra, fewer records with more data per record appears to do much better for read speed than more records with much less data (but the same total data overall).

thanks,

Gareth

gcollins · May 28, 2025, 7:37am

A question - @GuyCarmin We have systems in production which are 27-36 nodes for example (m6g.4xlarge). What sort of size cut should I be able to expect from a Scylla DB system (very roughly)? We tend to currently get limited by the CPU rather than I/O. For Scylla, is it that much more CPU efficient that it is way more likely to be limited by I/O instead of CPU thus the recommendation for instances that use SSDs?

Topic		Replies	Views
Cassandra Vs ScyllaDB Memory Usage ScyllaDB cassandra , performance	2	758	December 13, 2022
ScyllaDB mass insertion advice needed ScyllaDB data-model , cassandra , performance , migration , nodetool	1	100	October 21, 2024
Use case for replacing Cassandra with ScyllaDB, read and write performance ScyllaDB	0	132	July 9, 2024
ScyllaDB's read path vs. Cassandra's and performance when using HDD vs SSD ScyllaDB cassandra , performance	4	670	December 5, 2022
Smallest sensible memory footprint for small database? ScyllaDB	5	505	December 19, 2023

Comparing Cassandra To Scylla - what can I do to show scylla in its best possible light?

Related topics