Is ScyllaDB Right for My Application? ScyllaDB's Sweet Spot

This is a question that I often get:

I’m currently evaluating different databases for my application. What is the sweet spot for ScyllaDB?

I’d say it’s a need for HIGH volume throughput with HIGH cardinality and LOW (15ms or even single digit) tail (p95/99) latency.
Many times our I/O scheduler will do most of the prioritization for you.
Application not optimized enough? reach out to our Solution architects for advise on better data modeling.
Want more optimization tips? Make sure to check Scylla Monitoring:

Have cardinality problems? check this out.

2 Likes

I agree with what Tomer wrote about performance (High throughput, low latency).

I’d also add to the sweet spot High Availability and Big Data:

  • High availability, fault tolerance, and disaster recovery: Scylla is designed to be highly available. Data is replicated regardless of geographic location, and there is no single point of failure. This means that your system remains up and running even if something goes wrong. The system is topology-aware, meaning that you can create redundancy using multiple data centers and multiple racks within each data center. An example is from Kiwi.com, a popular online travel website running ScyllaDB, and the OVHcloud fire. A fire broke out in a room at the SBG2 data center of OVHcloud, a popular French cloud provider. Within hours the fire had been contained, but not before wreaking, causing a lot of damage. It knocked out about 3.6 million websites spread across 464,000 domains. However, Kiwi.com, kept on running. This is because it had two other data centers which were able to take over.
  • A high volume of data: Some large organizations use ScyllaDB to manage petabytes of information while still getting great performance.

There are other reasons that teams choose ScyllaDB:

  • Avoid vendor lock-in: No one wants to be stuck with one provider. If you’re currently using DynamoDB, migrating to ScyllaDB is a great alternative (project Alternator). It supports the same client SDKs, data modeling, queries, and so on. However, you can deploy it on-premises, on any public cloud, or using ScyllaDB Cloud. Going back to Tomer’s point about performance, you’ll also get way better performance.
  • Transparent and open-source: ScyllaDB is open-source. This increases the speed of development, innovation, and reliability (while avoiding vendor lock-in, see the above point). Also cost-effective. Given enough eyeballs, all bugs are shallow (Linus’s Law).
  • Reduced costs: ScyllaDB is cost-effective. As an example, DynamoDB is often up to 7x more expensive with similar or better performance. Compare costs vs. Astra, Keyspaces, and ScyllaDB Cloud with this cost calculator.
  • Ease of use: ScyllaDB has auto-tuning capabilities. If you’re coming from Apache Cassandra, you can stop worrying about things like garbage collection and constantly trying to tune the JVM.
  • High Scalability: Scale horizontally by adding more nodes. No downtime is required. This is valuable if you have many concurrent users and you’re expecting to grow.
  • Familiar Interfaces: Use either CQL or the DynamoDB API. CQL is similar to SQL, providing users comfortable with relational databases a lower barrier of entry.

I’d recommend a relational database and not ScyllaDB if:

  • You’re running an application that uses a small amount of data (and you don’t expect it to grow). If one node is enough, don’t use a distributed data system.
  • You require strong ACID (atomicity, consistency, isolation, durability) compliance. While you can get high consistency with ScyllaDB, its nuanced and Dynamo-style databases make trade-offs with ACID.
2 Likes

I’m going to add a few more elements that make for a sweet spot for ScyllaDB:

  • Transactional (OLTP) vs. Analytical (OLAP) — while ScyllaDB does have workload prioritization for balancing various workloads on the same cluster, and certain kinds of analytics can be run on ScyllaDB, we’re far more focused on transactional/operational workloads. ScyllaDB is a row-oriented data store, vs. a columnar database that is designed for analytics.

  • Single-digit Millisecond P99 Latencies — ScyllaDB is optimized for locally-attached NVMe SSD. While we have a built-in row-based in-memory cache, we’re not a pure-play in-memory database or data grid. So you will find that ScyllaDB is faster than a database connected to block storage, but not as expensive as a RAM-based system. It works in a “goldilocks” zone optimizing performance and price.

  • High Throughput — This is a subjective term, but generally ScyllaDB is the database to look at when you scale to tens of thousands, hundreds of thousands and millions of operations per second (OPS). ScyllaDB uses immutable LSM tree based storage, so it is optimized for fast writes. And because it has a built-in row-based cache we are also good for fast reads.

  • Multi-datacenter Replication — ScyllaDB automagically does replication across multiple sites, so if you are looking to deploy your data in the region of your users we will take care of that distribution for you.

You don’t need to do all four of these. Any one of these make ScyllaDB a good fit. But the more of these criteria apply to your use case, the more you should be adding ScyllaDB to a technology short list for consideration.

1 Like

This page provides some additional guidance: Fit - ScyllaDB
image

1 Like