ScyllaDB Enterprise Release 2023.1.0 - Part 3

tzach · August 21, 2023, 7:16am

Part 1
Part 2

Tools

The sstable tools gained Lua scripting. This is an expert feature intended for offline analysis of sstables. #9679
The scylla-types tool can now compute the token and shard of a partition key, using the tokenof and shardof subcommands.
The bundled cqlsh now uses the ScyllaDB Python driver (rather than the generic Cassandra driver) and supports Scylla Cloud Serverless connection bundles.
The bundled cqlsh now considers system_distributed_everywhere a system keyspace.
The bundled scylla types tool can now serialize a value to the sstable binary format.
The scylla-api-client tool is now documented. The tool is suitable for interactive usage as well as shell automation of the REST API. #11999.
scylla-api-cli is a lightweight command line tool interfacing the ScyllaDB REST API. The tool can be used to list the different API functions and their parameters, and to print detailed help for each function.

Then, when invoking any function, scylla-api-cli performs basic validation on the function arguments
and prints the result to the standard output. Note that json results msy be pretty-printed using
commonly available command line utilities. It is recommended to use scylla-api-cli for interactive
usage of the REST API over plain http tools, like curl, to prevent human errors.
The sstable utilities now emit JSON output. See example output here.
There two new sstable tools, validate-checksums and decompress, allowing for more offline inspection options of sstables.
- scylla-sstable validate-checksums: helps identifying whether an sstable is intact or not, but checking the digest and the per-chunk checksums against the data on disk.
- scylla-sstable decompress: helps when one wants to manually examine the content of a compressed sstable.
The SSTableLoader code base has been updated to support “me” format sstables.
The sstable parsing tools usually need the schema to interpret an sstable’s data. For the special case of system tables, the tools can now use well-known schemas.
Nodetool was updated to fix IPv6 related errors (even when IPv4 is used) with update JVMs. #10442
Cassandra-derived tooling such as cqlsh and cassandra-stress was synchronized with Cassandra 3.11.3.
The bundled Prometheus node_exporterm used to report OS level metrics to ScyllaDB Monitoring Stack was upgraded to version 1.3.1.
Repairs that were in their preparation stage previously could not be aborted. This is now fixed.
ScyllaDB documentation has been moved from the scylla-docs.git repository to scylla.git. This will allow us to provide versioned documentation.
The sstable tools gained a write operation that can convert a json dump of an sstable back into an sstable.

Storage

“me” format sstables are now supported (and the default format).
ScyllaDB will now store the ScyllaDB version and build-id used to generate an sstable. This is helpful in tracking down bugs and altered persisted data.

Configuration

It is now possible to limit, and control in real time, the bandwidth of streaming and compaction.

These and more configuration updates below:

Audit is now disabled by default.
It is now possible to limit I/O for repair and streaming to a user-defined bandwidth limit, using the new stream_io_throughput_mb_per_sec config value. The value throttles streaming I/O to the specified total throughput (in MiBs/s) across the entire system. Streaming I/O includes the one performed by repair and both RBNO and legacy topology operations such as adding or removing a node. Setting the value to 0 disables stream throttling (default). The value can be updated in real time via the config virtual table or via configuration file hot-reload. It is recommended not to change this configuration from its default value, which dynamically determines the best bandwidth to use.
compaction_throughput_mb_per_sec: Throttles compaction to the specified total throughput across the entire system. The faster you insert data, the faster you need to compact in order to keep the SSTable count down. The recommended Value is 16 to 32 times the rate of write throughput (in MBs/second). Setting the value to 0 disables compaction throttling, It is recommended not to change this configuration from its default value, which dynamically determines the best bandwidth to use.
It is now possible to disable updates to node configuration via the configuration virtual table. This is aimed at ScyllaDB Cloud, where users have access to CQL but not the node configuration. #9976
EC2MultiRegionSnitch will now honor broadcast_rpc_address if set in the configuration file.#10236
The permissions cache configuration is now live-updatable (via SIGHUP); and there is now an API to clear the authorization cache.
The compaction_static_shares and memtable_flush_static_shares configuration items, used to override the controllers, can now be updated without restarting the server.
column_index_auto_scale_threshold_in_kb to the configuration (defaults to 10MB). When the promoted index (serialized) size gets to this threshold, it’s halved by merging each two adjacent blocks into one and doubling the desired_block_size.
Commitlog_flush_threshold_in_mb: Threshold for commitlog disk usage. When used disk space goes above this value, ScyllaDB initiates flushes of memtables to disk for the oldest commitlog segments, removing those log segments. Adjusting this affects disk usage vs. write latency.
The Cassandra tombstone_warn_threshold (default 1000) configuration for the maximum number of tombstones a query can scan before a warning item is now respected, producing a warning if a query takes too long.
Messaging will now prevent 0.0.0.0 and its IPv6 equivalent from being used as a node IP address.
New config parameters:
- Restrict_future_timestamp Controls whether to detect and forbid unreasonable USING TIMESTAMP, more than 3 days into the future. See Sanity check for USING TIMESTAMP above.
- replace_node_first_boot - The Host ID of a dead node to replace. And alternative to the old replace_address_first_boot which uses the old node address. See replace node docs.
- WASM (experimental feature) related configs:
  - wasm_cache_memory_fraction
  - wasm_cache_timeout_in_ms
  - wasm_cache_instance_size_limit
  - wasm_udf_yield_fuel
  - wasm_udf_total_fuel
  - wasm_udf_memory_limit
- consistent_cluster_management - replace the Raft experimental flag (see Raft above)
- x_log2_compaction_groups - new config for setting static number of compaction groups
- unspooled_dirty_soft_limit - replace the old virtual_dirty_soft_limit.
- compaction_collection_elements_count_warning_threshold - see large collection above.
- cache_index_pages - Keep SSTable index pages in the global cache after a SSTable read
- restrict_twcs_without_default_ttl - Controls whether to prevent creating TimeWindowCompactionStrategy tables without a default TTL. Can be true, false, or warn (default)
- Twcs_max_window_count - The maximum number of compaction windows allowed when making use of TimeWindowCompactionStrategy (default: 50)
- task_ttl_seconds - Time for which information about finished tasks stays in memory (default 10s)
- broadcast-tables - new experimental Raft feature for internal testing
- query_tombstone_page_limit - The number of tombstones after which a query cuts a page, even if not full or even empty (default 10000)

Deprecated and removed features

The CQL binary protocol versions 1 and 2 are no longer supported. Version 3 and above have been supported for 9 years, so it’s unlikely to be in real use. You can check for version 1 and 2 in the system.clients virtual table. #10607
New DateTieredCompactionStrategy tables are now rejected by default. Users should switch to TimeWindowCompactionStrategy. Existing DateTieredCompactionStrategy tables are still supported, and it is still possible to configure the database to allow new DateTieredCompactionStrategy tables.
Thrift API - legacy ScyllaDB (and Apache Cassandra) API is deprecated and will be removed in followup release. Thrift has been disabled by default.
Compact Storage - a file format used by Thrift and deprecated from Apache Cassandra, is deprecated and will be removed in followup release.
In-Memory Tables - an enterprise-only feature**

Monitoring and tracing

Scylla Monitoring Stack release 4.4 and later will support ScyllaDB Enterprise 2023.1

metrics related updates below:

Shard Latencies are now reported as summaries. This is part of an effort to reduce the total number of generated metrics. In addition, empty histograms and summaries will not be reported. The overall result is a 5x reduction in the number of metrics #11173.

This is how a summary looks like: scylla_storage_proxy_coordinator_read_latency_summary_count{scheduling_group_name="statement",shard="1"} 2
scylla_storage_proxy_coordinator_read_latency_summary{quantile="0.990000",scheduling_group_name="statement",shard="1"} 640
There is now a metric that allows observation of update progress of materialized views from staging sstables.
There are now completion percentage metrics for node operations using streaming; previously the completion metrics were only available when using repair-based node operations. #11600
The sstable row_reads metric for m-format sstables is now properly incremented, instead of showing zeroes. #12406
The replica-side read metrics, which have been incorrect for some time, have been revamped. #10065
Slow query tracing only considered local times - the time from when a request first hit the replica - to determine if a request needs to be traced. This could cause some parts of slow query tracing to be missed. To fix that, slow queries on the replicas are determined using the start time on the coordinator.
The system.large_partitions and similar system tables will now hold only the base name of the sstable, not the full path. This is to avoid confusion if the large partition is reported while the sstable is in one directory, but later moved to another, for example from staging to the main directory after view building is done or into the quarantine subdirectory if they are found to be inconsistent with scrub.
#10075
There are now metrics showing each node’s idea of how many live nodes and how many unreachable nodes there are. This aids understanding problems where failure detection is not symmetric. #10102
The system.clients table has been virtualized. This is a refactoring with no UX impact.
Aggregated queries that use an index are now properly traced.
The amount of per-table metrics has been reduced by sending metric summaries instead of histograms and not sending unused metrics.

Additional bug fixes

The following issues have been fixed on top of what was fixed in Scylla Open Source 5.2.0, with open source reference if available. In addition, all relevant bug fixes from 2022.1.x and 2022.2.x are fixed in 2023.1.0

Stability: an extremely rare case can cause Iterator invalidation in lsa_partition_reader::reset_state(), following by process exit #14696
Stability: mutation_reader_merger can overflow stack when merging many empty readers. This may happen when running a second repair right after the other. #14415
Stability: a lot of lsa-timing log messages during node replace cause c-s stuck and aborted. The fix update the reactor shares for default IO class from 1 to 200 #13753
DynamoDB API (Alternator) stability: assertion in output_stream when exception occurs during response streaming #14453.
Stability: cached_file, used by index caching, will potentially cause a crash after OOM #14814
Stability: compaction: excessive reallocation during input list formatting #14071. Issue is more likely with offstrategy compaction.
Stability: deadlock caused by view update _registration_sem and streaming reader _streaming_concurrency_sem #14676
Stability: a failure when reading metrics, caused by a rare race condition when another node is down. (seastar::metrics::double_registration (registering metrics twice for metrics: storage_proxy_coordinator_background_replica_writes_failed_remote_node)) #11017
Stability: ICS compaction is not working in cleanup #14035 (introduced in 2022.2.0)
Stability: messaging: when upgrading OSS nodes to Enterprise, service-levels are matched to the default scheduling group #13841, #12552
Stability: Range-scans have a protection against using the wrong service-level to continue a suspended range-scan. This protection had a mistake, resulting in the node crashing when the protection mechanism was triggered. multishard_mutation_query: reader_context::lookup_readers() is not exception safe w.r.t. closing readers #13784
Stability: partitioned_sstable_set::insert might stall when called by table::make_reader_v2_excluding_sstables. The root cause is View building from staging creates a reader from scratch for every partition, in order to calculate the diff between new staging data and data in base sstable set, and then pushes the result into the view replicas. #14244
Setup: scylla-fstrim.timer is enabled but not started #14249
Setup: The installer now wipes filesystem signatures from the individual disks making up a RAID array, preventing problems with reuse of disks. #13737
Stability: bad_alloc (seastar - Failed to allocate 536870912 bytes) #13491. Root cause is a logic fault causing the reader to attempt to read all the data, consuming all memory. Can occur during sstableloader/nodetool refresh, repair or range scan.
Stability stack-use-after-return in table::make_reader_v2_excluding_staging() #14812
Stability: View building crashes on large partitions with range tombstones. #14503
DynamoDB API (Alternator) stability: Yield while building large results in Alternator - rjson::print, executor::batch_get_item #13689
Setup: fix a regression in setup, which overrides the manual update of perftune.yaml #11385 #10121
Setup: updates in perftune.py, improving performance for larger servers (32 cores and above)
- introduce a generic auto_detect_irq_mask(cpu_mask) function
- auto-select the same number of IRQ cores on each NUMA
Stability: ‘sleep_aborted’ error during Scylla shutdown #13374
Stability: a rare failure in row_cache_test/test_concurrent_reads_and_eviction #12462
Stability: ALTER KEYSPACE can break tables with UDT columns #14139
Correctness: Decommission and removenode may lead to consistency issues if one of the nodes decides to abort during streaming #12989
UX: non informative iotune warnings in scylla_kernel_check #13373
Stability: a race condition in scylla boot, when migration_manager::sync_schema failed with seastar::rpc::closed_error causing repair to fail #12956, #12764
Stability: Node operations failures get masked by abort request failures #12798
Stability: Node operations may fail if prepare takes longer than heartbeat timeout #12969, #11011
Stability: Segmentation fault happend on alive nodes during adding new node with replace terminated one #13368 (issues introduced in 5.2)
Stability: Shutting down auth service may hang #13545
Correctness: tables with the new tombstone_gc ‘immediate’ mode might delete ttl data that is not expired #13572
Stability: possible use-after-move in virtual table for secondary indexes #13396
Stability: possible use-after-move when initializing row cache with dummy entry #13400
Stability: possible use-after-move in virtual table for secondary indexes #13396
Stability: possible use-after-move when making streaming reader #13397
Stability: possible use-after-move when reading from SSTable in reverse #13394
Stability: possible use-after-move when tracking view builder progress #13395
Stability: reactor stalls in commitlog replay path due to commit log regexp processing #11710
Stability: Replication of default auth settings may fail #2852
Stability: db/view: update view generator doesn’t close staging sstable reader on exceptions #13413
Stability: direct_failure_detector::ping_with_timeout() causes exceptions to be thrown every 100ms times the number of live nodes, which spam the logs, and might slow it down #13278
Stability: on_internal_error doesn’t log an error when not aborting #13786
Packaging: RPM package dependencies issue. When installing a specific version with yum/dnf, scylla-python3 version will not match the specified version, but the latest one. #13222
Stability: bad_alloc (seastar - Failed to allocate 536870912 bytes) #13491. Root cause is a logic fault causing the reader to attempt to read all the data, consuming all memory. Can occur during sstableloader/nodetool refresh, repair or range scan.
Monitoring: new metric for CQL request and response sizes #13061
Audit: do not round timestamp in the audit table
Encryption at rest: rare deadlock when creating a table using encryption with replicated key provider (default) for the first time
Stability: Adding nodes to a large cluster (90+ nodes) may cause existing nodes to crash. The root cause is quadratic behavior in get_address_ranges function #12724
Stability: a rare crash due to null pointer dereference: clear_gently of disengaged unique_ptr dereferences nullptr #13636
Performance: Compaction manager “periodic reevaluation” is one-off. This means that compaction was not kicking in later for a table, with low to none write activity, that had expired data 1 hour from now. #13430
Stability:: Internal error in a COUNT request with empty IN. The query “select count(*) from {table1} where p in ()” should result in the count 0, because the empty p in () matches no row. However, what we get in Scylla now is an internal error. #12475
Tools: total disk space used metric incorrectly tells the amount of disk space ever used, which is wrong. It should tell the size of all SSTable being used plus the ones waiting to be deleted. Live disk space used shouldn’t account for the ones waiting to be deleted, and live SSTable Count shouldn’t account SSTable waiting to be deleted. #12717
Stability: Bootstrap fails during replace operation while starting “off-strategy compaction”. Huge amount of “Error applying view update” errors were received #12693. The cause is commit “repair: Reduce repair reader eviction with diff shard count” introduced in 2022.2.1
Stability: CQL compression might cause reactor stalls on buffer allocation #13437
Stability: coredumps were not being generated. A fix increase systemd coredump generation timeout #5430
Performance: Fix stalls caused quadratic behavior when inserting sstables into tracker on schema change #12499
Stability: abort_source::do_request_abort(std::optionalstd::exception_ptr): Assertion ‘_subscriptions’ failed. during shutdown #12512
CQL: scylla: types: is_tuple(): doesn’t handle reverse types. For example, a schema with reversed clustering key component; this component will be incorrectly represented in the schema CQL dump: the UDT will lose the frozen attribute. When attempting to recreate this schema based on the dump, it will fail as the only frozen UDTs are allowed in primary key components. #12576
Stability: commitlog: segment recycling breaks on segment file removal #12645
Workload Prioritization improvements and bug fix:
- Stability: removing a service level during an sstable load could lead to reading deleted memory and exit
- Stability: some requests can ‘leak’ into the default service level just after authentication
- Stability: a bug in the service level controller, introduced in 2022.1.4, might give the wrong priority to a task, resulting in timeouts.
Incremental Compaction Strategy (ICS) improvements and bug fixes:
- Make ICS reshape more efficient for off-strategy compaction, including large data sets. This fix a regression in replace node operation, which uses Repair Base Node Operation (RBNO)
- Crash on compaction completion when ICS ends up with a run containing staging and non-staging files. Error log: “scylla: sstables/sstables.cc:2744:

Topic	Replies	Views
ScyllaDB Enterprise Release 2024.1.0 - Tools, Configuration, Admin API and Monitoring Release Notes	288	February 8, 2024
[RELEASE] ScyllaDB Enterprise 2021.1.18 Release Notes enterprise , enterprise-release , enterprise-2021-1	284	March 12, 2023
[RELEASE] Scylla 5.2.0 Release - part 2 Release Notes open-source , open-source-release , open-source-5-2	1858	May 4, 2023
[RELEASE] ScyllaDB Java Driver 4.18.0.1 Release Notes	96	July 17, 2024
[RELEASE] ScyllaDB Rust Driver 0.13.0 Release Notes release , drivers , rust-driver , rust	112	May 9, 2024