[RELASE] ScyllaDB 6.1

The ScyllaDB team is pleased to announce ScyllaDB Open Source 6.1.0, a production-ready minor release.

ScyllaDB 6.1 includes many improvements in functionality, stability, UX and performance, in particular for the new Tablets features, introduced in ScyllaDB 6.0.

Only the latest two minor releases of the ScyllaDB Open Source project are supported. With this release, only ScyllaDB Open Source 6.0 and 6.1 are supported. Users running earlier releases are encouraged to upgrade to one of these two releases.

Related Links

Related Links

Alternator

  • Alternators now yield when generating large JSON responses to avoid inducing latency for concurrent requests. #18806
  • Alternator runs TTL expiration as an internal background process. This expiration process is run in the maintenance scheduling group to prevent it from dominating the user workload; however, this isolation was broken when making calls to replica nodes. TTL expiration is now properly isolated in the maintenance group. #18719
  • API: Alternator “/localnodes” request returns nodes which are still joining #19694

Stability

  • The memory allowance for bloom filters has been increased from 10% to 20%, recognizing that some workloads need more memory for bloom filters.
  • bloom-filter: default value for components_memory_reclaim_threshold is too strict #18607. Updated from 0.1 to 0.2 in the fix.
  • Repair: The primary replica algorithm for tablets has been adjusted so that nodetool repair -pr balances repair work more evenly across the cluster. #17752
  • MV: Materialized View flow control is based on adjusting the write rate based on backlog. The backlog calculations are now more accurate. #18783
  • The task manager provides observability to the operator about internal operations. It is now careful to conserve memory by not keeping state for completed tasks. #18735
  • Hints were recently changed to be stored per host ID rather than per IP address. However, hints can still be stored with IP-based directories from older clusters. A bug draining those hints when a node leaves the cluster was fixed. #18761
  • The schema and topology coordinator now runs in the gossip group. previously it was run in the streaming scheduling group. This prevents operations like repair and streaming from competing with metadata management. #18863
  • A write back-pressure management bug that could trigger an out-of-memory condition was fixed. #17476 #1834
  • MV: Materialized views perform flow control by measuring a backlog and keeping it under control. The propagation of backlog to all shards in a node has been fixed. #19232
  • Batches (as generated by the BATCH statement) are held in the system.batchlog table. As this table can accumulate a lot of tombstones, we now take steps to ensure these tombstones are purged eagerly. This is important for repair, which replays the batch log. #19376
  • Failed to add nodes in parallel. The root cause was a problem with a node forgetting its own IP address was fixed. #19523
  • In ScyllaDB writes have a server memory footprint even after an acknowledgement is returned to the client, in order to track writes to replica past the consistency level requirement. In one case, CL=ANY and all the target replicas DOWN, writes were kept even after all replicas acknowledged, increasing server memory load, and possibly preventing topology changes from making progress. This scenario can be generated internally when writing to a materialized view. We now clean up the internal structures immediately. #19529
  • In rare cases, ScyllaDB might crash while inserting a mutation into memtable or cache. This is now fixed. #19552
  • A crash in the REST API call to get the Raft group 0 leader was fixed. #19714
  • A crash in certain cases of schema changes while an sstable was being written was fixed. #16065
  • Lightweight transactions (LWT) on the same partition are serialized on the coordinator since this generates less wasted work when Paxos transactions contend. A bug in this serialization, if we timed out while waiting to acquire the lock, was fixed. #19699
  • LWT: in rare cases, coordinator sometimes run multiple transactions on the same key in parallel #19698
  • Stability: malformed_sstable_exception due to reload of bloom filters from unlinked sstables #19722

Bloom Filter

  • bloom filter: reduce size of over-estimated bloom filter #19049

Tools

  • CQLSH: The bundled cqlsh version was updated to 6.0.20. #18990
  • CQLSH: The bundled cqlsh package now includes a Python driver that is compiled for the target architecture, making it faster. #19385
  • The scylla sstable command can now recover the schema from the SStable itself, if it is new enough (ma format or later). #17869 #18809
  • There is now a REST API for triggering a Raft group-0 read barrier. This is useful for making sure all nodes have caught up with the Raft leader and see the latest schema and topology, for example when taking a snapshot of the schema. #19213

Tracing

  • Tracing of speculative retries is improved. #19520

Monitoring

Scylla Monitoring Stack released 4.8 and later supports ScyllaDB 6.1.

See metrics update between 6.0 and 6.1 here, as well as the new, beta, metrics reference here.

More monitoring related updates:

  • There are now metrics keeping track of incoming hints, in addition to the existing metrics for outgoing hints. #10987

  • A regression in Lightweight Transaction (LWT) contention metric has been fixed. The regression would shop contentions increasing even when none were happening. While it’s just a metric, it’s one of the more important ones for LWT users. #19625

Performance

  • Internal access to the roles table was optimized #19299
  • Off-strategy compaction is used to make SStables conform to the compaction strategy after an operation such as repair. Off-strategy compaction for TWCS will now have less storage space overhead. #16514
  • Statements such as SELECT count(*) use an internal map-reduce service to parallelize the query. ScyllaDB no longer does so for single-partition queries as they don’t benefit from it. #19349
  • When a node is started, it will now make a best-effort attempt to notify other nodes that it is up. This speeds up rolling restart, as we don’t have to wait for nodes to notice the node is up via pings.
  • If the CPU is busy processing a query, ScyllaDB will let that query complete before starting another one, since several queries using the CPU concurrently on a shard will make all of them slower. There is now an option to allow CPU concurrency 3, for example, on queries for workloads where this helps. #19017
  • A write to a new replica of base table will no longer update any materialized views while it is still joining, since that work will be canceled later (in some cases) or be unnecessary (in others). #19152
  • Materialized views calculate a backlog to decide on whether throttling of base table writes is required. This calculation is now more accurate #18542
  • Changes to service levels are now reflected immediately after the change, rather than via a polling loop with a cycle time of 10 seconds. #18060
  • A lock over the node tablet replica map was removed, as it was causing topology changes to be delayed. #18821

Tablets

  • The tablet load balancer now uses randomization to select candidates for tablet replica migration. This prevents tablets for a particular table from clustering in some nodes or shards, which can cause CPU imbalance. #18885
  • Repair and tablet migration are now serialized. #17658, #18561.
  • MV: A bug when updating materialized views on a newly-migrated tablet can cause ScyllaDB to crush after cleanup, decommission, or repair #19052 #19033.
  • ALTER KEYSPACE will now refuse to switch a keyspace from tablets to vnodes. This was not supported before, but now blocked.
  • Keyspaces with tablets enabled will now reject tables with counter columns, as counters aren’t yet supported with tablets. #19449

CQL

  • Schema-modifying statements (DDL) and authentication were moved to rely on Raft separately. A DDL statement that grants permissions (for example, CREATE TABLE) will now execute in a single transaction. This prevents failures from leaving only part of the operation committed. #17738
  • A DDL statement CREATE … IF NOT EXISTS will now tell the driver a schema change occurred even if it did not make any changes (because the table or keyspace already existed). This helps tools like cassandra-stress that create keyspaces from multiple processes; before the change such a tool could miss the keyspace creation. #16909
  • A bug in DESCRIBE SCHEMA when describing indexes on collection columns was fixed. #19278

Configuration

  • Maintenance_reader_concurrency_semaphore_count_limit - set the number of concurrent reads allowed for maintenance operations (e.g. repair). This helps repair in some situations where different nodes have different shard counts. #19248
  • Reader_concurrency_semaphore_cpu_concurrency - admit new reads while there are less than this number of requests that need CPU. #19017
  • Enable_tombstone_gc_for_streaming_and_repair - If the compacting reader is enabled for streaming and repair (see enable_compacting_data_for_streaming_and_repair), allow it to garbage-collect tombstones. This can reduce the amount of data repair has to process. #19015

Build

  • The build toolchain is now based on Fedora 40; this moves the compiler from clang 16 to clang 18. #19205
  • Due to compiler immaturity, we previously had to restrict optimization on the aarch64 platform. As the compiler bugs have been fixed, these restrictions are now removed. #19531
  • The source language was updated from C++20 to C++23. #19528
  • The compiler toolchain used to build ScyllaDB is now itself optimized using profile-guided optimization, resulting in faster build speeds. #19685

Deprecated and removed features

  • Thrift API (disabled by default for years) #18453
  • Debian 10 support (EOL)