[RELEASE] ScyllaDB 2025.2.0

The ScyllaDB team is pleased to announce the release of ScyllaDB 2025.2, a production-ready ScyllaDB Short Term Support (STS) Minor Feature Release.

Upon the release of ScyllaDB 2025.2 STS, support for ScyllaDB 2024.2 has officially ended.
ScyllaDB Cloud users currently on the 2024.x release will be contacted to schedule an upgrade.
ScyllaDB Enterprise users utilizing the 2024.x release should reach out to the support team for upgrade assistance.

More information on ScyllaDB’s Long Term Support (LTS) policy is available here.

The 2025.2 release adds new features like improved storage compression, Tablets rebalancing, and multiple additional improvements.

Relevant links

ScyllaDB Enterprise customers are encouraged to upgrade to ScyllaDB 2025.2, and are welcome to contact our Support Team with questions.

To get the most from ScyllaDB 2025.1, use ScyllaDB Manager 3.5 and later, and ScyllaDB Monitoring Stack 4.9 and later.

New features

Capacity Aware Tablets Load balancing

Tablets load-balancing is now aware of each node’s capacity. Different nodes can have different ratios between storage size and shard count. This change prevents some nodes from reaching 100% utilization while others have free space. #23079

Storage ZSTD + dictionary compression

New compressor implementations use dictionaries to improve the compression ratio. The dictionaries are shared across SSTables and across all nodes in the cluster. The system automatically generates new dictionaries when it sees a gain in compression ratio.

The new compression is NUMA aware. It distributes the dictionaries across all shards in a node, with one copy per NUMA node. Performance loss is minimized due to cross-NUMA-node memory accesses.

Below is a comparison of the storage used for a dataset from Tutorials and Example Datasets | ClickHouse Docs

You can either CREATE or ALTER table to use the new ‘sstable_compression’ option:

ALTER TABLE keyspace.table

WITH compression = {‘sstable_compression’: ‘ZstdWithDictsCompressor’};

Source: Shared-Dictionary Compression for SSTables Docs

#24355 #23590

New Tablets Guardrail: enforce tablets mode for new keyspaces

A new Guardrail allows the ScyllaDB admin to force Tablets only Keyspaces on the cluster.

This feature is used in the latest ScyllaDB X Cloud feature.

#22273

New Topology guardrail: prevent Snitch update and DC and/or RACK name change

Changing DC or rack on a node which was already bootstrapped is unsafe for Vnodes , and not supported for Tablets. With the new guardrail changing snitch is possible only if it uses the same DCs and Racks.

#23278 #22869

Cluster Level Repair for Tablets

This version includes a new nodetool command: nodetool cluster, for cluster wide operations.

The first cluster-level command is nodetool cluster repair for Tablets.

The command uses a new admin REST API /storage_service/tablets/repair .

Unlike “nodetool repair” which runs at the node level, running cluster-level repair synchronizes all data on all nodes in the cluster, for Tablets only. ScyllaDB Manager automatically uses the proper API.

#22409 #23032

Raft majority loss Recovery

A new procedure for recovering from Raft group 0 majority loss. The procedure is safe for use with tablets. #20657

Use this procedure only as a last resort, when there is no other way to recover a failed node and get a quorum of the nodes running. The full procedure is here.

Raft Voters

The raft group 0 implementation now limits the number of raft voters in order to reduce the amount of work needed to reach consensus. Nodes are promoted to voters or demoted as needed.

ScyllaDB automatically selects a maximum of 5 voters per cluster, and replaces them dynamically as needed. No configuration is required.

#18793 #23786 #23950 #23588

Security

Audit syslog output was improved to make it machine parseable. #23099

Vector Type

There is now support for vector types in CQL. Vectors are fixed-size arrays of another data type, commonly used for AI. Note nearest-neighbor vector search is not yet supported.

#19455

Native Backup Enhancement - Experimental

This release includes experimental support for native backup.

Backups to S3 relied on the Scylla Manager Agent (using rclone), managed by the Scylla Manager server. In this release adds the infrastructure for direct connection between ScyllaDB and S3 for backup and restore.

Native Backup proves to complete the backup file upload much faster, but in the same cases, it hurts online request latency. We are working toward making it production ready in an upcoming release.

You can already experiment with direct backup, by setting up the S3 connectivity.

Additional Updates

CQL

  • A materialized view created using AS SELECT * will show as such in the DESCRIBE statement, instead of expanding to the table columns. #21154
  • A new CQL function, set_intersection, is now available to calculate the intersection of two set values. #22763
  • The SELECT DISTINCT statement now rejects the redundant PER PARTITION LIMIT clause. #15109
  • The values provided to the LIMIT and PER PARTITION LIMIT CQL clauses are now required to be strictly positive. #23013
  • CQL PER PARTITION LIMIT queries are now rejected if aggregate functions are present. #9879
  • The native CQL transport now supports the metadata ID extension (from protocol version 5) that allows updating row metadata for prepared statements for SELECT * queries (when columns were added or removed) or SELECT udt queries (when the user defined type definition changed). Note a driver that supports the extension is required to make use of this. #20860
  • ScyllaDB allow longer table name length #4480

Alternator

  • Alternator should limit attribute-name lengths #9169
  • Alternator, ScyllaDB’s implementation of the DynamoDB API, now automatically retries schema table changes that can fail due to contention. #13152

Correctness

  • compacting_reader: decorated-key passed by reference to the compactor is moved. The compactor might use this moved-from key later to obtain tombstone GC information, which will result in incorrect tombstone GC decisions and possibly data resurrection. #23291

  • Partitions are (temporarily) missing when combining range scans with SELECT DISTINCT or PER PARTITION LIMIT, and can result in omitting records from the table if they do read-repair. #20084

  • Materialized view updates perform a read-modify-write operation on the base table. To prevent overload, we queue some of the reads. Previously, this queue had a limit of some number of entries beyond which reads would be rejected. This could cause base/view inconsistencies. The queue limit is now removed and we rely on throttling the base table writes to control its length. #23319

  • A bug which could cause data resurrection in materialized views (but not in the base tables) due to mixing up purge times for regular tombstones and shadowable tombstones was fixed. #23272

Tablets

  • Tablet merges happen when the load balancer wants to reduce the number of tablets in a table. To merge tablets, the load balancer performs a “colocation migration” to move one tablet of the pair to the same node and shard as the other. It will now prefer migrating within the same rack. #22994

  • Add tablets enforcing option #22273

  • There are now per-table tablet configuration options used to control how many tablets are created for a table, allowing planning ahead for performance. #22090

    • Expected_data_size_in_gb (default 0) - provides a hint for the anticipated table size, before replication. ScyllaDB will generate a tablet topology that matches that expectation
    • Min_per_shard_tablet_count (default 10) - Used for ensuring that the table workload is well balanced in the whole cluster in a topology-independent way. A higher number of tablet replicas per shard may help balance the table workload more evenly across shards and across nodes in the cluster.
    • Min_tablet_count (default 0) - Determines the minimum number of tablets to allocate for the table.

    Full docs: Data Definition | ScyllaDB Docs

  • Tables created with tablets now have topology that is better prepared for immediate ingestion.

  • changes default number of tablet replicas per shard to be 10 in order to reduce load imbalance between shards

  • introduces a global goal for tablets replica count per shard and adds logic to tablet scheduler to respect it by controlling per-table tablet count #21967

  • Tablet repair can now filter by host or datacenter. #22417

  • Repair of one tablet will no longer prevent another tablet from being migrated. #22408

  • storage_service: fix tablet split of materialized views #23335

  • Finalize tablet splits earlier. If there is a large load balancing backlog, split finalization may be delayed arbitrarily long and we end up with large tablets. #21762

  • repair: Topology operations such as tablet migration can now run concurrently with repair of other tablets. #23453

  • Tablet allocation on table creation overloads nodes with fewer shards #23378

  • Truncate or drop table after tablet migration might cause assert and unexpected exit #18059

  • When rebuilding a tablet (due to the loss of a node), we will now stream data from just one replica, and use repair to fill in data from the rest. This saves bandwidth and reduces space amplification. #17174

  • There is now a new virtual table that describes load per node, and the tablet monitoring script was updated to make use of it. This is useful for heterogeneous clusters where different nodes have different storage capacity. #23584

  • Improved tablet load distribution is implemented to address situations where a new table is created on an already unevenly balanced cluster. #23631

  • The process for merging tablets (which happens when a table’s size decreases) did not update the row cache about the merged sstables, causing some data to be missed during full scans. This is now fixed. #23313

Deployment

  • Deprecating Ubuntu 20.04. Ubuntu 20.04 is EOL May 2025, and will not be supported in followup releases.
  • Docker: Remove node exporter and systemd
  • The bundled node_exporter package was updated to version 1.9.0 to fix some vulnerabilities. #22884
  • The container image is now based on Red Hat Universal Base Image 9. This is necessary for OpenShift certification. scylla-pkg#4858

Stability

  • ScyllaDB no longer crashes when creating a table while there is a rack that has no nodes in the NORMAL state. #22625

  • Some configuration parameters can be live-updated on a running server by sending SIGHUP. We now prevent parameters that are not designed to be live updated from being updated in the same manner, as it can cause unpredictable behavior. Updateable Config: SIGHUP will make unsupported parameters effective in runtime · Issue #5382 · scylladb/scylladb · GitHub

  • Error handling while streaming mutations was improved. #20227

  • ScyllaDB will automatically parallelize some aggregation queries, such as SELECT count(*) FROM table. Such parallelized queries are now cancelled if a node is being shut down. #22337

  • The Raft implementation now limits consumption of memory for replication. #14411

  • Handling of TRUNCATE statements while previous TRUNCATE statements are still processing was improved. #22166

  • A race condition between splitting tablets of a table, and a DROP of the same table, was fixed. #21859

  • A case where Raft initialization loads incorrect values from disk was fixed. #21114

  • A race condition between the cleanup operation and snapshot operation was fixed. #23049

  • s3_client: Add retries to Security Token Service/EC2 instance metadata credentials providers #21933 (see Native Backup above)

  • Node shutdown now cancels draining hints. This reduces problems shutting down a node if the rest of the cluster is not healthy. #21949

  • A bug which prevented column renames from being propagated to materialized views was fixed. #22194

  • The CQL binary protocol server now throttles new connection processing in order to prevent connection storms from overwhelming the server. #22844 Since Scylla 6.0, cache and memtable cells between 13 kiB and 128 kiB are getting allocated in the standard allocator rather than inside LSA segments. This can result in an out of memory issue

    #22941 #22389 #23781

  • Materialized views are now more robust during schema changes. The change removes the possibility of accessing an outdated schema that no longer exists or is incompatible with the view schema.

    #9059 #21292 #22194 #22410

  • Fix potential use after free in replica/database: memtable_list: save ref to memtable_table_shared_data #23762

  • A possible use-after-free during schema changes related to the sstable_set type was fixed #22040

  • The memtable_flush_period_in_ms option now works for system tables. #21223

Performance

  • A new configuration, enable_session_tickets, for enabling TLS 1.3 session cookies. TLS session cookies reduce TLS handshake cost. This is important when a node restarts, and especially for Alternator that requires many connections as it uses HTTP for transport. #22928
  • Querying via a secondary index is now careful not to fetch too many rows from the index, as this can cause allocation related stalls. #18536
  • Backup will prioritize sstables that were deleted (usually as the result of compaction), as deletes sstables occupy space in snapshots. #23241
  • ScyllaDB uses a data structure called partition_sstable_set to rapidly find relevant sstables for run-based compaction strategies (Leveled Compaction Strategy and Incremental Compaction Strategy). This data structure is now careful to avoid quadratic space complexity when pushed to extreme situations, which could cause excessive memory use in the past. This is achieved by noting if sstables are actually organized in runs, and if not, using a different data structure. #23634
  • ScyllaDB now manages temporary memory for zstd inter-node compression, in order to reduce allocation stalls. #24160 #24183

Tooling

  • The scylla sstable command, used for inspecting sstables outside ScyllaDB itself, can now run CQL queries against individual sstables. This is useful for debugging problems. #22007

Example:

$ scylla sstable query --system-schema /path/to/data/system_schema/keyspaces-*/*-big-Data.db

keyspace_name durable_writes replication

-------------------------------+----------------+-------------------------------------------------------------------------------------

system_replicated_keys true ({class : org.apache.cassandra.locator.EverywhereStrategy})

system_auth true ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 1})

system_schema true ({class : org.apache.cassandra.locator.LocalStrategy})

system_distributed true ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 3})

system true ({class : org.apache.cassandra.locator.LocalStrategy})

ks true ({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1})

system_traces true ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 2})

system_distributed_everywhere true ({class : org.apache.cassandra.locator.EverywhereStrategy})
  • The scylla sstable command can now access sstables stored on S3 rather than local disk. (see S3 backend above) #20535

  • scylla-nodetool: rapidjson::GenericValue::GetInt() can trigger assertions if integer value overflows 32 bit int. #23394

  • When loading sstables into the database with nodetool, there is now a –skip-cleanup option when the user wishes to defer cleanup to a later time. This allows a single cleanup operation to be run for many sstable loads. #24136

  • The nodetool refresh command now supports the –scope option, allowing data to be streamed only to the local node, local rack or local datacenter. This is useful when restoring from a backup that contains individual sstable sets for each logical scope and its main purpose is to speed up restoring of 1:1 scenarios by reducing the number of copies streamed. #23861

Security

  • Audit logs has empty query string for BATCH queries #23311
  • Unify http transport in EAR to use seastar http client. Encryption-at-rest now uses the Seastar HTTP client rather than a hand-rolled client to interact with key management APIs. #22925

Tracing

  • server/transport - Tracing omits User-supplied timestamps for Prepared Statements #23173.
  • ScyllaDB warns on large allocations in excess of 1MB as they can cause high latency and thrash the cache. The warning threshold is now reduced to 128kB to flush out smaller violations. #23975
  • The compaction history table now has additional columns for statistics. #3791

Monitoring

Scylla Monitoring Stack 4.10 and later support ScyllaDB 2025.2

  • Basic metrics are now labeled so it is possible to fetch only those metrics. #12246
1 Like