[RELASE] ScyllaDB 6.2.0

tzach · October 29, 2024, 6:22am

The ScyllaDB team is pleased to announce ScyllaDB Open Source, a production-ready minor release.

ScyllaDB 6.2 introduces many Tablets improvements, new zero-token nodes, Alternator RBAC support and many other bug fixes and stabilizations.

Only the last two minor releases of the ScyllaDB Open Source project are supported. Once ScyllaDB Open Source 6.2 is officially released, only ScyllaDB Open Source 6.2 and ScyllaDB 6.1 will be supported, and ScyllaDB 6.0 will be retired.

High Availability - zero-token node

There is now support for zero-token nodes. Such nodes do not replicate any data, but can participate in query coordination, and in Raft quorum voting.

One can use this to create an Arbiter: a tiebreaker node, with no data, that can help maintain quorum in the case of a symmetrical two-datacenter clusters. If one of the data centers fails, the Arbiter, deployed on a 3rd datacenter, keeps quorum on the node alive. Since the Arbiter has zero token, it does not replicate user data, and does not come with network and storage costs. #15360

Alternator RBAC

Authorization: Alternator now supports Role-Based Access Control (RBAC) via CQL commands. #5047

Known Install Issues

offline-installer: still uses JMX and get tested with old nodetool #19185
[centos9] install without setup of selinux fails on coredump configuration check #19325

Both issue are expected to be fix in a followup patch release 6.2.x

More updated

Tablets

Performance: The tablet load balancer now tries to ensure that not only are tablets distributed evenly among nodes and shards, but that tablets for any particular table are evenly distributed. This prevents a hot table that is unevenly distributed from causing hot nodes or hot shards. #16824
Stability: The tablet allocator will now refrain from allocating tablets on a table created concurrently with decommission #20032
Performance: When tablet metadata (system.tablets table) changes, we now reload only the changed rows. #15294
A bug when ALTERing a keyspace that doesn’t exist, with tablets, was fixed. #19576
Stability: A race condition between tablet repair and tablet split (the latter happens when a table grows) has been fixed. Fixes #19378 #19416.
Decommission a node in multi-dc failed to find a tablet replica for a secondary-index table #20240
Stability: Schema ALTER (add column) in the middle of ongoing tablet migration causes internal error: “Compaction state for table … not found” #20699
Stability: Too many pending replicas for a table. The fix improve validation of the new RF in an ALTER command #20039
Creating a keyspace with tablets enabled and without NetworkTopologyStrategy succeeds and results in using Vnodes #19743. Tablets only works with NetworkTopologyStrategy.

Tracing

ScyllaDB now collects cell level statistics in addition to row and tombstone statistics for result pages. The statistic is exposed in a trace message. #18996

Stability

The sstable primary index reader will now respond to service shutdown requests. This can happen if we’re rebuilding the bloom filter for a large sstable when the service is shut down #19453
A race between table drop and a counter column update was fixed. #19948
Stability: Fatal error during cache update during elasticity test write workload #19873. Root cause is a race between split compaction and tablet migration.
A regression in processing limits for the GROUP BY clause was fixed #17237 #5361 #5362
Raft uses log truncation to limit memory consumption. A mismatch between in-memory log truncation and on-disk log truncation was fixed. #16817 #20080
Topology coordinator and replacing node stop see each other after entered transition state #19025
A node will now ignore dns name resolution errors of seeds when restarting, as those seed names could be referring to nodes that were removed. #14945
Commitlog is now able to store entries larger than half a commitlog segment. This limitation caused problems with large clusters, as cluster metadata could exceed this limit. Large entries are now fragmented and split over multiple segments. #19472
A bug in computing whether to flush all memtables was fixed. #20301
A memory leak in the Paxos implementation was fixed #20602
Stability: A REST command of task-manager fails with: ({“message”: “seastar::rpc::closed_error (connection is closed)”, “code”: 500}) #20843
Stability: Bootstrap fails with init - Startup failed: std::runtime_error (Failed to obtain IP addresses of nodes that should be seen as alive within 30s) #20600
Stability: coredump during bootstrap when replacing a dead node #20629
Upgrade: Upgrade fails when the number of service levels close to the max. The root cause is ScyllaDB did not correctly identify internal scheduling groups, for example for maintenance, and tried to create service levels for them. #20070
Stability: raft topology; concurrent removenode requests for the same node can hang #20271
scylla_commitlog_memory_buffer_bytes seems to be growing since 6.0. The root cause was a metric, not the actual memory usage #20862
Stability: a rare SStable’s clustering key index lookup may yield incorrect position. This could happen in the event of LSA memory compaction in the middle of promoted index entry parsing which moves the cached page and the entry must span file pages, for index larger than 64KB. #20766
Stability: time_window_compaction_strategy::get_reshaping_job must limit partial_sort range to multi_window size. The result might be a crash and infinite restart loop #20608
Stability: storage_proxy: make sure there is no end iterator in _live_iterators array #20874
Stability: Segfault during shutdown if join procedure fails (the failure itself is expected by test) in various tests #20701
Stability: Better error handling in Add TLS certificate authenticator, should capture boost::regex_error in auth::certificate_authenticator’s ctor #20941

Admin and Tooling

As part of moving to native tooling and away from Java tools, we will deprecate SSTableloader, in future versions of ScyllaDB. You can use the Load and Stream to upload SSTables directly to Scylla, either from Apache Cassandra or other ScyllaDB clusters.
We are also deprecating the Java version of nodetool, which was replaced by a compatible native version.
A New REST API system/highest_supported_sstable_version, return the sstable format version supported across the cluster #19772
The internal ‘cluster feature’ mechanism now supports suppressing features, enabling simulation of upgrades. This should catch version upgrade problems earlier. #20034
Integrated backup and restore has been merged. A new nodetool backup and restore commands (and corresponding REST API endpoint) will copy a snapshot to and from an S3 compatible endpoint. This is a work in progress aim to replace current external (Manager Agent) backup, with Scylla Core managed backup and restore. #19890 #20305
A new nodetool tasks command can be used to view and manage maintenance tasks running on the node. #19201
ScyllaDB will now tune the number of allowed open files descriptors (LimitNOFILES) for very large nodes, reducing the chance of “Too many files” error. #20443
Tools: Scrub/validate compactions will now verify checksums for uncompressed sstables. #20207
Compaction CLEANUP jobs now run under the maintenance/streaming scheduling/group. #20582
Deprecate IP based node operation REST API. Use host IDs instead. #19218

Alternator

Performance: Alternator uses JSON to communicate with the client. Previously, sending very large JSON values was adjusted to avoid stalls. The destruction of these large JSON values now also avoids stalls #19968
Monitoring: Alternators add metrics for batch latency and size.
Performance: Alternator, ScyllaDB’s implementation of the DynamoDB API, has more efficient reverse queries now, reducing the gap from CQL. #20191
Authentication: Alternator, will reject authentication from roles that do not have the LOGIN attribute. #19735
Monitoring: Reconsider BatchGetItemSize and BatchWriteItemSize metrics #20571

Performance

In order to make sstables durable, the directory where they are placed must be flushed after they are sealed. This is now done without re-opening the directory each time, saving some cycles. #19624
Commitlog segments older than 24 hours will now flush corresponding memtables regardless of memory pressure. This allows more timely garbage collection of tombstones. #15971
ScyllaDB tracks internal maintenance work, as well as work requested by the user (for example, repair), as tasks. New virtual tasks allow ScyllaDB to track multi-node operations. #16374
Hinted Handoff writes to local storage and now uses the commitlog scheduling group. #18654
The driver for S3 access is now optimized for throughput #20074. Direct S3 access is experimental in this release.
Reversed queries (WITH CLUSTERING ORDER BY) are already quite efficient in ScyllaDB, yet the internal RPC protocol between nodes was kept unaware of reversed queries in order to maintain compatibility; result sets were un-reversed before sending over the wire, then re-reversed. ScyllaDB now support an alternate protocol where these wasteful transformations are avoided #12557
When communicating with older versions of ScyllaDB, the server uses a schema digest to see whether there is a schema mismatch or not. This is now less likely to stall when processing large schemas. #18173
Major compaction now supports a new option to only check existing sstables during tombstone garbage collection; this can increase the effectiveness of garbage collection for partitions that are updated frequently. Major compaction should check only the compacted sstables for the purpose of tombstone garbage collection #19728
The heuristics for purging tombstones during compaction were improved, leading to less tombstone accumulation. #20424 #20423
The system may sometimes drop the bloom filter of some sstables to save memory, and then reload it when memory is available. We no no longer reload bloom filters for sstables that are queued for deletion. #19722

CQL

Service levels are used to group and classify sessions. Service level names beginning with $ are now reserved. #20122
A CQL filtering bug when a regular column was filtered but no regular columns were selected was fixed. #10357

Materialized view

Performance: Materialized view updates destined to a node that has left the cluster are now dropped. #19439
Performance: When a materialized view’s primary key has the same columns as the base table primary key, we now optimize deletions by deleting an entire partition when possible. #8199
A write to a base table will now be rejected by the coordinator when one or more of the replicas has a full view update backlog. This reduces inconsistencies in materialized views. #17426
The system_distributed.view_build_status was moved to the system keyspace and is now managed by Raft in a strongly consistent way. #15329

Packaging

The JMX submodule was removed from the source tree. With nodetool now talking directly to the REST API, it is no longer necessary. JMX is still available as a separate package.

Config

When Service Level parameters, like timeouts, are modified, connections are adjusted in real time. #12923
commitlog_use_fragmented_entries - Whether or not to allow commitlog entries to fragment across segments, allowing for larger entry sizes. Default: True.
cql_duplicate_bind_variable_names_refer_to_same_variable - a bind variable that appears twice in a CQL query refers to a single variable (if false, no name matching is performed). Default: True. #15559
Option reversed_reads_auto_bypass_cache was deprecated. It’s no longer needed as Reverse reads are now mature.
commitlog_max_data_lifetime_in_seconds - Controls how long data remains in commit log before the system tries to evict it to sstable, regardless of usage pressure. (0 disabled). Default: 24* 60 * 60 (1 days) #15971

Monitoring

Scylla Monitoring stack 4.8.1 and later support ScyllaDB 6.2 release.

See upgrade docs for Metrics update in ScyllaDB 6.2.