[RELASE] ScyllaDB 6.1

tzach · August 14, 2024, 3:02pm

The ScyllaDB team is pleased to announce ScyllaDB Open Source 6.1.0, a production-ready minor release.

ScyllaDB 6.1 includes many improvements in functionality, stability, UX and performance, in particular for the new Tablets features, introduced in ScyllaDB 6.0.

Only the latest two minor releases of the ScyllaDB Open Source project are supported. With this release, only ScyllaDB Open Source 6.0 and 6.1 are supported. Users running earlier releases are encouraged to upgrade to one of these two releases.

Related Links

Alternator

Alternators now yield when generating large JSON responses to avoid inducing latency for concurrent requests. #18806
Alternator runs TTL expiration as an internal background process. This expiration process is run in the maintenance scheduling group to prevent it from dominating the user workload; however, this isolation was broken when making calls to replica nodes. TTL expiration is now properly isolated in the maintenance group. #18719
API: Alternator “/localnodes” request returns nodes which are still joining #19694

Stability

The memory allowance for bloom filters has been increased from 10% to 20%, recognizing that some workloads need more memory for bloom filters.
bloom-filter: default value for components_memory_reclaim_threshold is too strict #18607. Updated from 0.1 to 0.2 in the fix.
Repair: The primary replica algorithm for tablets has been adjusted so that nodetool repair -pr balances repair work more evenly across the cluster. #17752
MV: Materialized View flow control is based on adjusting the write rate based on backlog. The backlog calculations are now more accurate. #18783
The task manager provides observability to the operator about internal operations. It is now careful to conserve memory by not keeping state for completed tasks. #18735
Hints were recently changed to be stored per host ID rather than per IP address. However, hints can still be stored with IP-based directories from older clusters. A bug draining those hints when a node leaves the cluster was fixed. #18761
The schema and topology coordinator now runs in the gossip group. previously it was run in the streaming scheduling group. This prevents operations like repair and streaming from competing with metadata management. #18863
A write back-pressure management bug that could trigger an out-of-memory condition was fixed. #17476 #1834
MV: Materialized views perform flow control by measuring a backlog and keeping it under control. The propagation of backlog to all shards in a node has been fixed. #19232
Batches (as generated by the BATCH statement) are held in the system.batchlog table. As this table can accumulate a lot of tombstones, we now take steps to ensure these tombstones are purged eagerly. This is important for repair, which replays the batch log. #19376
Failed to add nodes in parallel. The root cause was a problem with a node forgetting its own IP address was fixed. #19523
In ScyllaDB writes have a server memory footprint even after an acknowledgement is returned to the client, in order to track writes to replica past the consistency level requirement. In one case, CL=ANY and all the target replicas DOWN, writes were kept even after all replicas acknowledged, increasing server memory load, and possibly preventing topology changes from making progress. This scenario can be generated internally when writing to a materialized view. We now clean up the internal structures immediately. #19529
In rare cases, ScyllaDB might crash while inserting a mutation into memtable or cache. This is now fixed. #19552
A crash in the REST API call to get the Raft group 0 leader was fixed. #19714
A crash in certain cases of schema changes while an sstable was being written was fixed. #16065
Lightweight transactions (LWT) on the same partition are serialized on the coordinator since this generates less wasted work when Paxos transactions contend. A bug in this serialization, if we timed out while waiting to acquire the lock, was fixed. #19699
LWT: in rare cases, coordinator sometimes run multiple transactions on the same key in parallel #19698
Stability: malformed_sstable_exception due to reload of bloom filters from unlinked sstables #19722

Bloom Filter

bloom filter: reduce size of over-estimated bloom filter #19049

Tools

CQLSH: The bundled cqlsh version was updated to 6.0.20. #18990
CQLSH: The bundled cqlsh package now includes a Python driver that is compiled for the target architecture, making it faster. #19385
The scylla sstable command can now recover the schema from the SStable itself, if it is new enough (ma format or later). #17869 #18809
There is now a REST API for triggering a Raft group-0 read barrier. This is useful for making sure all nodes have caught up with the Raft leader and see the latest schema and topology, for example when taking a snapshot of the schema. #19213

Tracing

Tracing of speculative retries is improved. #19520

Monitoring

Scylla Monitoring Stack released 4.8 and later supports ScyllaDB 6.1.

See metrics update between 6.0 and 6.1 here, as well as the new, beta, metrics reference here.

More monitoring related updates:

There are now metrics keeping track of incoming hints, in addition to the existing metrics for outgoing hints. #10987
A regression in Lightweight Transaction (LWT) contention metric has been fixed. The regression would shop contentions increasing even when none were happening. While it’s just a metric, it’s one of the more important ones for LWT users. #19625

Performance

Internal access to the roles table was optimized #19299
Off-strategy compaction is used to make SStables conform to the compaction strategy after an operation such as repair. Off-strategy compaction for TWCS will now have less storage space overhead. #16514
Statements such as SELECT count(*) use an internal map-reduce service to parallelize the query. ScyllaDB no longer does so for single-partition queries as they don’t benefit from it. #19349
When a node is started, it will now make a best-effort attempt to notify other nodes that it is up. This speeds up rolling restart, as we don’t have to wait for nodes to notice the node is up via pings.
If the CPU is busy processing a query, ScyllaDB will let that query complete before starting another one, since several queries using the CPU concurrently on a shard will make all of them slower. There is now an option to allow CPU concurrency 3, for example, on queries for workloads where this helps. #19017
A write to a new replica of base table will no longer update any materialized views while it is still joining, since that work will be canceled later (in some cases) or be unnecessary (in others). #19152
Materialized views calculate a backlog to decide on whether throttling of base table writes is required. This calculation is now more accurate #18542
Changes to service levels are now reflected immediately after the change, rather than via a polling loop with a cycle time of 10 seconds. #18060
A lock over the node tablet replica map was removed, as it was causing topology changes to be delayed. #18821

Tablets

The tablet load balancer now uses randomization to select candidates for tablet replica migration. This prevents tablets for a particular table from clustering in some nodes or shards, which can cause CPU imbalance. #18885
Repair and tablet migration are now serialized. #17658, #18561.
MV: A bug when updating materialized views on a newly-migrated tablet can cause ScyllaDB to crush after cleanup, decommission, or repair #19052 #19033.
ALTER KEYSPACE will now refuse to switch a keyspace from tablets to vnodes. This was not supported before, but now blocked.
Keyspaces with tablets enabled will now reject tables with counter columns, as counters aren’t yet supported with tablets. #19449

CQL

Schema-modifying statements (DDL) and authentication were moved to rely on Raft separately. A DDL statement that grants permissions (for example, CREATE TABLE) will now execute in a single transaction. This prevents failures from leaving only part of the operation committed. #17738
A DDL statement CREATE … IF NOT EXISTS will now tell the driver a schema change occurred even if it did not make any changes (because the table or keyspace already existed). This helps tools like cassandra-stress that create keyspaces from multiple processes; before the change such a tool could miss the keyspace creation. #16909
A bug in DESCRIBE SCHEMA when describing indexes on collection columns was fixed. #19278

Configuration

Maintenance_reader_concurrency_semaphore_count_limit - set the number of concurrent reads allowed for maintenance operations (e.g. repair). This helps repair in some situations where different nodes have different shard counts. #19248
Reader_concurrency_semaphore_cpu_concurrency - admit new reads while there are less than this number of requests that need CPU. #19017
Enable_tombstone_gc_for_streaming_and_repair - If the compacting reader is enabled for streaming and repair (see enable_compacting_data_for_streaming_and_repair), allow it to garbage-collect tombstones. This can reduce the amount of data repair has to process. #19015

Build

The build toolchain is now based on Fedora 40; this moves the compiler from clang 16 to clang 18. #19205
Due to compiler immaturity, we previously had to restrict optimization on the aarch64 platform. As the compiler bugs have been fixed, these restrictions are now removed. #19531
The source language was updated from C++20 to C++23. #19528
The compiler toolchain used to build ScyllaDB is now itself optimized using profile-guided optimization, resulting in faster build speeds. #19685

Deprecated and removed features

Thrift API (disabled by default for years) #18453
Debian 10 support (EOL)