[RELASE] ScyllaDB 6.2.0

The ScyllaDB team is pleased to announce ScyllaDB Open Source, a production-ready minor release.

ScyllaDB 6.2 introduces many Tablets improvements, new zero-token nodes, Alternator RBAC support and many other bug fixes and stabilizations.

Only the last two minor releases of the ScyllaDB Open Source project are supported. Once ScyllaDB Open Source 6.2 is officially released, only ScyllaDB Open Source 6.2 and ScyllaDB 6.1 will be supported, and ScyllaDB 6.0 will be retired.

Related Links

High Availability - zero-token node

There is now support for zero-token nodes. Such nodes do not replicate any data, but can participate in query coordination, and in Raft quorum voting.

One can use this to create an Arbiter: a tiebreaker node, with no data, that can help maintain quorum in the case of a symmetrical two-datacenter clusters. If one of the data centers fails, the Arbiter, deployed on a 3rd datacenter, keeps quorum on the node alive. Since the Arbiter has zero token, it does not replicate user data, and does not come with network and storage costs. #15360

Alternator RBAC

Authorization: Alternator now supports Role-Based Access Control (RBAC) via CQL commands. #5047

Known Install Issues

  • offline-installer: still uses JMX and get tested with old nodetool #19185
  • [centos9] install without setup of selinux fails on coredump configuration check #19325

Both issue are expected to be fix in a followup patch release 6.2.x

More updated

Tablets

  • Performance: The tablet load balancer now tries to ensure that not only are tablets distributed evenly among nodes and shards, but that tablets for any particular table are evenly distributed. This prevents a hot table that is unevenly distributed from causing hot nodes or hot shards. #16824
  • Stability: The tablet allocator will now refrain from allocating tablets on a table created concurrently with decommission #20032
  • Performance: When tablet metadata (system.tablets table) changes, we now reload only the changed rows. #15294
  • A bug when ALTERing a keyspace that doesn’t exist, with tablets, was fixed. #19576
  • Stability: A race condition between tablet repair and tablet split (the latter happens when a table grows) has been fixed. Fixes #19378 #19416.
  • Decommission a node in multi-dc failed to find a tablet replica for a secondary-index table #20240
  • Stability: Schema ALTER (add column) in the middle of ongoing tablet migration causes internal error: “Compaction state for table … not found” #20699
  • Stability: Too many pending replicas for a table. The fix improve validation of the new RF in an ALTER command #20039
  • Creating a keyspace with tablets enabled and without NetworkTopologyStrategy succeeds and results in using Vnodes #19743. Tablets only works with NetworkTopologyStrategy.

Tracing

  • ScyllaDB now collects cell level statistics in addition to row and tombstone statistics for result pages. The statistic is exposed in a trace message. #18996

Stability

  • The sstable primary index reader will now respond to service shutdown requests. This can happen if we’re rebuilding the bloom filter for a large sstable when the service is shut down #19453
  • A race between table drop and a counter column update was fixed. #19948
  • Stability: Fatal error during cache update during elasticity test write workload #19873. Root cause is a race between split compaction and tablet migration.
  • A regression in processing limits for the GROUP BY clause was fixed #17237 #5361 #5362
  • Raft uses log truncation to limit memory consumption. A mismatch between in-memory log truncation and on-disk log truncation was fixed. #16817 #20080
  • Topology coordinator and replacing node stop see each other after entered transition state #19025
  • A node will now ignore dns name resolution errors of seeds when restarting, as those seed names could be referring to nodes that were removed. #14945
  • Commitlog is now able to store entries larger than half a commitlog segment. This limitation caused problems with large clusters, as cluster metadata could exceed this limit. Large entries are now fragmented and split over multiple segments. #19472
  • A bug in computing whether to flush all memtables was fixed. #20301
  • A memory leak in the Paxos implementation was fixed #20602
  • Stability: A REST command of task-manager fails with: ({“message”: “seastar::rpc::closed_error (connection is closed)”, “code”: 500}) #20843
  • Stability: Bootstrap fails with init - Startup failed: std::runtime_error (Failed to obtain IP addresses of nodes that should be seen as alive within 30s) #20600
  • Stability: coredump during bootstrap when replacing a dead node #20629
  • Upgrade: Upgrade fails when the number of service levels close to the max. The root cause is ScyllaDB did not correctly identify internal scheduling groups, for example for maintenance, and tried to create service levels for them. #20070
  • Stability: raft topology; concurrent removenode requests for the same node can hang #20271
  • scylla_commitlog_memory_buffer_bytes seems to be growing since 6.0. The root cause was a metric, not the actual memory usage #20862
  • Stability: a rare SStable’s clustering key index lookup may yield incorrect position. This could happen in the event of LSA memory compaction in the middle of promoted index entry parsing which moves the cached page and the entry must span file pages, for index larger than 64KB. #20766
  • Stability: time_window_compaction_strategy::get_reshaping_job must limit partial_sort range to multi_window size. The result might be a crash and infinite restart loop #20608
  • Stability: storage_proxy: make sure there is no end iterator in _live_iterators array #20874
  • Stability: Segfault during shutdown if join procedure fails (the failure itself is expected by test) in various tests #20701
  • Stability: Better error handling in Add TLS certificate authenticator, should capture boost::regex_error in auth::certificate_authenticator’s ctor #20941

Admin and Tooling

  • As part of moving to native tooling and away from Java tools, we will deprecate SSTableloader, in future versions of ScyllaDB. You can use the Load and Stream to upload SSTables directly to Scylla, either from Apache Cassandra or other ScyllaDB clusters.
    We are also deprecating the Java version of nodetool, which was replaced by a compatible native version.

  • A New REST API system/highest_supported_sstable_version, return the sstable format version supported across the cluster #19772

  • The internal ‘cluster feature’ mechanism now supports suppressing features, enabling simulation of upgrades. This should catch version upgrade problems earlier. #20034

  • Integrated backup and restore has been merged. A new nodetool backup and restore commands (and corresponding REST API endpoint) will copy a snapshot to and from an S3 compatible endpoint. This is a work in progress aim to replace current external (Manager Agent) backup, with Scylla Core managed backup and restore. #19890 #20305

  • A new nodetool tasks command can be used to view and manage maintenance tasks running on the node. #19201

  • ScyllaDB will now tune the number of allowed open files descriptors (LimitNOFILES) for very large nodes, reducing the chance of “Too many files” error. #20443

  • Tools: Scrub/validate compactions will now verify checksums for uncompressed sstables. #20207

  • Compaction CLEANUP jobs now run under the maintenance/streaming scheduling/group. #20582

  • Deprecate IP based node operation REST API. Use host IDs instead. #19218

Alternator

  • Performance: Alternator uses JSON to communicate with the client. Previously, sending very large JSON values was adjusted to avoid stalls. The destruction of these large JSON values now also avoids stalls #19968
  • Monitoring: Alternators add metrics for batch latency and size.
  • Performance: Alternator, ScyllaDB’s implementation of the DynamoDB API, has more efficient reverse queries now, reducing the gap from CQL. #20191
  • Authentication: Alternator, will reject authentication from roles that do not have the LOGIN attribute. #19735
  • Monitoring: Reconsider BatchGetItemSize and BatchWriteItemSize metrics #20571

Performance

  • In order to make sstables durable, the directory where they are placed must be flushed after they are sealed. This is now done without re-opening the directory each time, saving some cycles. #19624
  • Commitlog segments older than 24 hours will now flush corresponding memtables regardless of memory pressure. This allows more timely garbage collection of tombstones. #15971
  • ScyllaDB tracks internal maintenance work, as well as work requested by the user (for example, repair), as tasks. New virtual tasks allow ScyllaDB to track multi-node operations. #16374
  • Hinted Handoff writes to local storage and now uses the commitlog scheduling group. #18654
  • The driver for S3 access is now optimized for throughput #20074. Direct S3 access is experimental in this release.
  • Reversed queries (WITH CLUSTERING ORDER BY) are already quite efficient in ScyllaDB, yet the internal RPC protocol between nodes was kept unaware of reversed queries in order to maintain compatibility; result sets were un-reversed before sending over the wire, then re-reversed. ScyllaDB now support an alternate protocol where these wasteful transformations are avoided #12557
  • When communicating with older versions of ScyllaDB, the server uses a schema digest to see whether there is a schema mismatch or not. This is now less likely to stall when processing large schemas. #18173
  • Major compaction now supports a new option to only check existing sstables during tombstone garbage collection; this can increase the effectiveness of garbage collection for partitions that are updated frequently. Major compaction should check only the compacted sstables for the purpose of tombstone garbage collection #19728
  • The heuristics for purging tombstones during compaction were improved, leading to less tombstone accumulation. #20424 #20423
  • The system may sometimes drop the bloom filter of some sstables to save memory, and then reload it when memory is available. We no no longer reload bloom filters for sstables that are queued for deletion. #19722

CQL

  • Service levels are used to group and classify sessions. Service level names beginning with $ are now reserved. #20122
  • A CQL filtering bug when a regular column was filtered but no regular columns were selected was fixed. #10357

Materialized view

  • Performance: Materialized view updates destined to a node that has left the cluster are now dropped. #19439
  • Performance: When a materialized view’s primary key has the same columns as the base table primary key, we now optimize deletions by deleting an entire partition when possible. #8199
  • A write to a base table will now be rejected by the coordinator when one or more of the replicas has a full view update backlog. This reduces inconsistencies in materialized views. #17426
  • The system_distributed.view_build_status was moved to the system keyspace and is now managed by Raft in a strongly consistent way. #15329

Packaging

  • The JMX submodule was removed from the source tree. With nodetool now talking directly to the REST API, it is no longer necessary. JMX is still available as a separate package.

Config

  • When Service Level parameters, like timeouts, are modified, connections are adjusted in real time. #12923
  • commitlog_use_fragmented_entries - Whether or not to allow commitlog entries to fragment across segments, allowing for larger entry sizes. Default: True.
  • cql_duplicate_bind_variable_names_refer_to_same_variable - a bind variable that appears twice in a CQL query refers to a single variable (if false, no name matching is performed). Default: True. #15559
  • Option reversed_reads_auto_bypass_cache was deprecated. It’s no longer needed as Reverse reads are now mature.
  • commitlog_max_data_lifetime_in_seconds - Controls how long data remains in commit log before the system tries to evict it to sstable, regardless of usage pressure. (0 disabled). Default: 24* 60 * 60 (1 days) #15971

Monitoring

Scylla Monitoring stack 4.8.1 and later support ScyllaDB 6.2 release.

See upgrade docs for Metrics update in ScyllaDB 6.2.