[REALESE] Scylla 5.4 RC1 - part 2

tzach · November 8, 2023, 5:07am

More Improvements

CQL API

CQL table columns that have the list data type aren’t allowed to contain NULLs, but in certain situations list values in CQL literals or bind variables are allowed to contain NULLs (for example, in LWT IF conditions that use the IN operator). The type system was relaxed to accept NULLs where this is allowed. Previously, these cases were handled by hard-to-maintain workarounds.
The CQL USING TTL clause allows one to specify an INSERT or UPDATE’s time-to-live property, after which the cells are automatically deleted. TTL 0 was misinterpreted as the default TTL (which happens to be unlimited, usually) rather than an explicitly unlimited TTL. This is now fixed. #6447
The C-style cast syntax ((type) expression) can now be applied to bind variables ((type) ? or (type) :var) to explicitly specify the type of bind variables Example: blob_column = (blob)(int)12323
Error messages for incorrect usage of the CQL TOKEN() function have been improved. #13468
The check for altering permissions of functions in the system keyspace has been tightened.
Error messages involving the CQL token function have been improved.
Error messages involving CQL expressions will not be printed in a more user-friendly way. Previously they contained some debug information.
Change Data Capture (CDC) exports updates to the database as a table containing changes. One option is to capture not only the change, but also the state of the row before it was changed. In some cases, in a lightweight transaction (LWT) change, the preimage could return the state of the row after the change instead of before the change. This is now fixed. #12098
The NetworkTopologyStrategy replication strategy will now reject an empty value for the replication factor. #13986
Materialized views require the “IS NOT NULL” qualifier on primary key elements, but also accept (and ignore) the qualifier on regular columns. The qualifier is now rejected when applied to regular columns. A configuration variable allows you to warn about the rejected clause, emit an error and fail the request, or ignore it. #10365
The count(column) function is supposed to only count cells where the column is not NULL. A regression caused count(column) to behave like count(*) for collection, tuple, and user-defined column types. This is now fixed. #14198.
When performing the last-write-wins rule comparison, if the timestamp of the two versions being compared was equal, ScyllaDB first compared the cell value and then the expiration time (TTL). This is compatible with earlier versions of Cassandra. However, this could cause a NULL value to appear if the cell was overwritten with the same timestamp but a different TTL. The algorithm was changed to compare the cell value last, and check all the other metadata first, resulting in fewer surprising results. It is also compatible with current Cassandra versions. #14182
A GROUP BY query ought to return one row per group, except when all rows of a group are filtered out. However, ScyllaDB returned a row even for fully-filtered groups. This is now fixed, and ScyllaDB will not emit rows for filtered groups. #12477
In older versions of ScyllaDB, different clauses of CQL statements were processed using different code bases. ScyllaDB is gradually moving towards a single code base for processing expressions. It is now the SELECT clause’s turn, moving us closer to the goal of a unified expression syntax. As this is an internal refactoring, there are no user visible changes, apart from some names of fields in SELECT JSON statements changing (specifically, if those fields are function evaluations).
A recent regression when using GROUP BY together with the ttl() and writetime() pseduo-functions was fixed. #14715
There is a new SELECT MUTATION_FRAGMENTS statement that allows seeing where the data that composes a selection comes from. Normally, cache, sstable, and memtable data are merged before output, but with this variant one can see the original source of the data. This is intended for forensics and is not a stable API. #11130
The CQL grammar incorrectly accepted nonsensical empty limit clauses such as SELECT * FROM tab LIMIT;. The errors were discovered later in processing, but with unhelpful error messages. They are now rejected. #14705.
The CQL grammar incorrectly accepted nonsensical INSERT JSON statements such as INSERT INTO tab JSON;, causing a crash. This is now fixed. #14709
A mistake in function type inference, which could lead the CQL statements to claim there is ambiguity when in fact there is none, was fixed.
The format of the timestamp data type is now compatible with Cassandra. #14518
In CQL, a few functions for dealing with counter types were added. #14501
A SELECT statement that has the DISTINCT keyword and also GROUP BY on clustering keys is now rejected. DISTINCT implies only selecting the partition key and static rows, so grouping on the clustering keys is nonsensical. #12479
When ALTERing a table, the compaction strategy options are now validated. #2336
A bug in the fromJson() CQL function when operating on NULL operands has been fixed #7912
The DESCRIBE statement now includes user defined types and functions #14170
The column names for SELECT CAST(b AS int) and similar expressions have been adjusted to match Cassandra. #14508
In some cases where a bind variable was used both for the partition key and to match a non-key column, ScyllaDB would not generate correct partition key routing for the driver. This is now fixed. #15374
A map<ascii, something> value, when parsed from its JSON representation, did not parse the key correctly. This is now fixed. #7949
SSTable compression can be configured with a chunk size, with larger chunks trading less efficient I/O and higher latency for higher compression ratios. The chunk size is now capped at 128 kB, to avoid running out of memory. #9933

Amazon DynamoDB Compatible API (Alternator)

Alternator is ScyllaDB’s implementation of the DynamoDB API.

A bug was fixed that could cause error handling while streaming responses to the client to crash the server. #14453
It’s now possible to disable the DescribeEndpoints API. This makes it possible to run the dynamodb shell against ScyllaDB. #14410
Alternator now limits embedded expression length and nesting. #14473
Table name validation has been optimized.
In alternator (ScyllaDB’s implementation of the DynamoDB API), a bug in concurrent modification of table tags has been fixed. #6389
Validation of decimal numbers has been improved. #6794
Timeout configuration value can be hot-updated without restarting the node.
Alternator now returns the full table description as a response to the DeleteTable API request. #11472
Alternator now avoids latency spikes for unrelated requests while building large responses for batch_get_item. #13689
Alternator validation of the table name on ordinary read/write requests is done only if the table lookup fails. This provides a small optimization. #12538
Alternator implemented the error path of the size() function incorrectly. This is now fixed. #14592

Strongly Consistent Schema Management with Raft

Strongly Consistent Schema Management with Raft became the default for new clusters in ScyllaDB 5.2. In this release it is enabled by default when upgrading existing clusters.

If you do not want to enable Raft, you should explicitly disable it in scylla.yaml of each node before the upgrade. #13980

Below are additional related fixes and updates:

When Raft-based schema and topology management is in use, it will also manage the Change Data Capture (CDC) generation table. This increases the reliability of this operation.
Raft remote procedure call (RPC) verbs now check that the call arrived at its intended recipient and not somewhere else.
When a node synchronizes the schema from another node, if Raft is in use, it will issue a read barrier first to make sure it’s not missing any keyspaces.
Schema pulls happen when a node receives a read or write request (as a replica) with an unknown schema; it will then ask the requesting node for an updated schema. These are now disabled when the schema is managed using Raft; instead the system will rely solely on Raft for schema distribution. #12870
When a node is decommissioned or forcibly removed, Raft will now ban it from communicating with the cluster, to avoid theA GROUP BY query ought to return one row per group, except when all rows of a group are filtered out. However, ScyllaDB returned a row even for fully-filtered groups. This is now fixed, and ScyllaDB will not emit rows for filtered groups. removed nodes from affecting the cluster.
ScyllaDB uses Raft to coordinate changes to the schema and topology. It now attempts to merge adjacent changes to reduce overhead.
When using Raft for topology and schema changes, ScyllaDB will force the schema and topology to be transferred to new nodes. #14066
In gossip-managed clusters, the schema is propagated by nodes contacting each other ad-hoc. In Raft-managed clusters, the schema is centrally managed by the group 0 leader. We now disable the ad-hoc schema pull method when Raft cluster management is enabled. #12870
Raft cluster management still uses gossip to translate host IDs to IP addresses. It is now more careful not to let old IP address mappings overwrite new mappings. #14257
Raft-managed clusters run Data Definition Language (DDL) statements in a transaction. The transaction scope has been extended to also include access checking and validation, and not just the actual schema change. #13942
A subtle bug leading to incorrect merging is now fixed. #14600
ScyllaDB uses feature flags to coordinate rolling upgrades; a feature isn’t enabled until all nodes report they support that particular feature flag. Occasionally some older feature flags are considered “always on” and aren’t negotiated. A problem with non-negotiated features and storing feature flags in Raft group 0 would have prevented upgrades, but it was fixed.
The system.group0_history table now has descriptions for events. #13370
Data definition language (DDL) statements are used to modify the schema. They are covered by a Raft transaction to ensure atomicity. The scope of the transaction has been extended to cover access checking to prevent check/use races (this change was already committed in the past but reverted due to performance regressions). #13942
The Raft leadership monitor is now started during normal node start, not only bootstrap. #15166
Raft snapshot update and commit log truncation are now atomic, removing a failure case. #9603

Strongly Consistent Topology Updates with Raft - Experimental

This release includes an experimental Strongly Consistent Topology Updates. To enable it, use the new consistent-topology-changes flag.