ScyllaDB Enterprise Release 2024.1.0 - Tools, Configuration, Admin API and Monitoring

Scylla Enterprise 2024.1 Release Notes.

Tools

As part of that change, cqlsh is now compatible with Python 3.

CQLSh is now available as a Docker image, and in PiPy, allowing you to easily use it when you do not need the entire ScyllaDB server, for example with Scylla Cloud.

See Scylla SSTable docs for more info.

  • A bug in the nodetool command to disable auto compaction has been fixed #13553
  • The nodetool checkAndRepairCdcStreams is used to align CDC streams with the cluster topology. It now works when topology is under Raft control.
  • The nodetool refresh command gained the –primary-replica-only option.
  • The scylla sstable tool now supports the scrub operation, enabling offline (and off-node) scrubbing of sstables. #14203
  • The cassandra-stress tool now supports the Java driver’s rack-aware policy. This can reduce cloud inter availability zone networking costs, with the downside of less even load balancing if care isn’t taken to balance the application.
  • The setup utility supported an --online-discard switch to enable/disable online discard, but it did not actually work. This is now fixed. #14963
  • The nodetool stop RESHAPE command is supposed to stop the reshape operation, but in fact only aborted running reshape compactions, which were promptly restarted. It now aborts the entire operation as expected. #15058
  • Java Tools UX: Improve error message when a Java based SSTables tools, like sstablemetadata, hit a new uuid-based identifier introduced in 5.4.0 scylla-tools-java#360
  • Java Tool: updated to newer versions of scylla-driver-core, libthrift, logback, hk2-locator, Netty, guava, and others dependencies scylla-tools-java#343, scylla-jmx#231 scylla-tools-java#352, scylla-tools-java#365 scylla-tools-java#364, scylla-tools-java#363
  • Java Tools: nodetool fails due to tderr: error: ‘java.lang.Object com.google.common.base.Objects.firstNonNull(java.lang.Object, java.lang.Object)’. Root cause is a 3rd party package, io.airlift.airline, API update,scylla-tools-java#374
  • Tools: scylla nodetool crashes if invoked without further args #16451
  • Tools: scylla-sstable ignores scylla.yaml unless it is explicitly provided #16132
  • Tools: scylla-sstable tool crash due to unclosed reader in tools/schema_loader.cc #16519
  • Tools: scylla-sstable cryptic errors printed when table is not found in schema-tables #16459
  • Tools: scylla-sstable: schema_loader uses dummy db::config when loading schema #16480
  • Tools: scylla-sstable: loading the schemas of materialized-views and indexes doesn’t work #16492
  • Tools: scylla-sstable: tool failed on deprecated configuration, while scylla itself doesn’t #16538

Monitoring, tracing and logging

Scylla Monitoring Stack released 4.6 and later supports ScyllaDB Enterprise 2024.1

metrics related updates below:

  • There is a new metric for prepared statement cache eviction rates. #10463
  • CQL transport metrics were refined, and new metrics were added so one can measure request and response bandwidth, for each opcode type.
  • The CQL transport server (port 9042) recently gained per-opcode bandwidth statistics. They are now measured per service level as well.
  • ScyllaDB can now relabel metrics according to user-provided configuration. This can be used together with Prometheus to reduce the number of metrics reported.
  • We now drop per-table metrics early during teardown of a table. Previously, if a table was dropped and re-created quickly, the metrics from the old and new tables could clash, resulting in an error.
  • The column name reported when writetime() is given a primary key column (which is illegal) is now human readable, even for humans that don’t remember the ASCII table.
  • If the startup sequence is aborted by an interrupt (ctrl-C or systemd shutdown), an exception error message is shown. It is now ignored by the system and not displayed. #12898
  • When compaction completes, it reports the throughput it achieved. We now base it on the input bytes read rather than output bytes, as the latter gives incorrect results for overwrite or expiring workloads. #14533
  • There is now a REST API for configuring Prometheus metrics label rewriting

Admin REST API

  • It’s now possible to disable and enable tombstone compaction on a per-node basis using a REST API endpoint. This is useful if the user knows that all DELETEs were performed with CL=ALL and so there is no risk of data resurrection.
  • The REST API that accepts sstable generation numbers now uses a string value, in preparation for using UUID generations.
  • The type of the “generation” field of “sstable” in the return value of RESTful API entry point at “/storage_service/sstable_info” is changed from “long” to “string”.
  • The API for performing sstable cleanup, and use by nodetool cleanup, will now wait for staging sstables to be cleaned up too.
  • The hints synchronization point API allows an external user to wait for hints to replay. Misuse of the API cookie could lead to unbounded memory usage; the cookie is now protected with a checksum. #9405
  • The --experimental flag was removed. It was replaced some time ago with --experimental-features., which provides fine-grained control about which experimental features are enabled.
  • There is a new REST API call to recalculate schema digests. It can be useful to heal some schema disagreement problems. #15380

Configuration

The scylla.yaml configuration items are now documented in the documentation website.

Additional update

New and updated configuration options:

  • It is now possible to disable configuration changes via the system.config virtual table using a configuration parameter. Use this option to prevent runtime configuration changes via CQL.#14355
  • task_ttl_in_seconds - Task Manager option: time for which information about finished tasks stays in memory.
  • RF Guardrail config values (see above)
    • minimum_replication_factor_fail_threshold
    • minimum_replication_factor_warn_threshold
    • maximum_replication_factor_warn_threshold
    • Maximum_replication_factor_fail_threshold
  • Stream_plan_ranges_percentage is renamed to stream_plan_ranges_fraction
  • Cache_index_pages is no enabled by default, with an index_cache_fraction value of 0.2

Index_cache_fraction is the maximum fraction of cache memory permitted for use by index cache. Clamped to the [0.0; 1.0] range. Must be small enough to not deprive the row cache of memory, but should be big enough to fit a large fraction of the index. The default value 0.2 means that at least 80% of cache memory is reserved for the row cache, while at most 20% is usable by the index cache.

  • x_log2_compaction_groups option to controls static number of compaction groups per table per shard - is removed
  • Live_updatable_config_params_changeable_via_cql - If set to true, configuration parameters defined with LiveUpdate option can be updated in runtime with CQL (more above)
  • Enable_node_aggregated_table_metrics - Enable aggregated per node, per keyspace and per table metrics reporting, applicable if enable_keyspace_column_family_metrics is false. Default True.
  • Enable_compacting_data_for_streaming_and_repair - Enable the compacting reader, which compacts the data for streaming and repair (load and stream included) before sending it to, or synchronizing it with peers. Can reduce the amount of data to be processed by removing dead data, but adds CPU overhead. Default: True.
  • Table_digest_insensitive_to_expiry - When enabled, per-table schema digest calculation ignores empty partitions. Default: True.
  • Schema_commitlog_segment_size_in_mb - ScyllaDB uses a separate commitlog, called the schema commitlog, for schema changes and topology operations in order to reduce the latency of these operations. The segmented size of the schema commitlog has been raised from 32MB to 128MB in order to avoid problems with large numbers of tables, as the entire schema must fit in a single segment.
  • Stream_plan_ranges_percentage - Specify the percentage of ranges to stream in a single stream plan. Value is between 0 and 1. Default 0.1 #14191
  • alternator_describe_endpoints - Overrides the behavior of Alternator’s DescribeEndpoints operation. An empty value (the default) means DescribeEndpoints will return the same endpoint used in the request. The string ‘disabled’ disables the DescribeEndpoints operation. Any other string is the fixed value that will be returned by DescribeEndpoints operations. This was require to bypass AWS SDK issue When DynamoDB DescribeEndpoints is used, wrong scheme may be tacked on the result · Issue #2554 · aws/aws-sdk-cpp · GitHub
  • Table_digest_insensitive_to_expiry - When enabled, per-table schema digest calculation ignores empty partitions. Default: True.
  • Auth_certificate_role_queries - Regular expression used by CertificateAuthenticator to extract role name from an accepted transport authentication certificate subject info. See more in the Security section.
  • Auth_superuser_name - Initial authentication super username. Ignored if authentication tables already contain a super user.
  • Auth_superuser_salted_password - Initial authentication super user salted password. Create using mkpassword or similar. The hashing algorithm used must be available on the node host. Ignored if authentication tables already contain a super user password.
  • strict_is_not_null_in_views - In materialized views, restrictions are allowed only on the view’s primary key columns. In old versions Scylla mistakenly allowed IS NOT NULL restrictions on columns which were not part of the view’s primary key. These invalid restrictions were ignored. This option controls the behavior when someone tries to create a view with such invalid IS NOT NULL restrictions. Can be true, false, or warn. Default: True.
  • object_storage_config_file - part of the new experimental object store feature (above). Optionally, read object-storage endpoints config from file.
  • “tablets” - new experimental flag.
  • relabel_config_file - optionally, read relabel config from file.
  • Schema_commitlog_directory - The directory where the schema commit log is stored. This is a special commitlog instance used for schema and system tables. For optimal write performance, it is recommended the commit log be on a separate disk partition (ideally, a separate physical device) from the data file directories.
  • Nodeops_watchdog_timeout_seconds - Time in seconds after which node operations abort when not hearing from the coordinator. Default 120s.
  • Nodeops_heartbeat_interval_seconds - Period of heartbeat ticks in node operations. Default 10.
  • Query timeouts in configuration (e.g. read_request_timeout_in_ms) can now be hot-reloaded using SIGHUP. #12232
  • ScyllaDB has an error injection facility, used by QA to test error paths. It can now be enabled via configuration. Use with caution!
  • The experimental flag used to enable consistent topology changes has been renamed from “raft” to "consistent-topology-changes. #14145
  • The schema commitlog size was accidentally set to 10TB, it’s now set to a reasonable size.
  • The --max-io-requests init option, which has been obsolete for quite some time, was removed.
  • compaction_flush_all_tables_before_major_seconds - Set the minimum interval in seconds between flushing all tables before each major compaction (default is 86400). This option is useful for maximizing tombstone garbage collection by releasing all active commitlog segments. Set to 0 to disable automatic flushing all tables before major compaction.

Additional bug fixes

The following issues have been fixed on top of what was fixed in Scylla Open Source 5.4.0, with open source reference if available. In addition, all relevant bug fixes from 2023.1.x are fixed in 2024.1.0