Last week in scylladb.git master (issue #164; 2023-01-22)

This short report brings to light some interesting commits to scylladb.git master from the last week. Commits in the abc43f97c9…ebc100f74f range are covered.

There were 121 non-merge commits from 20 authors in that period. Some notable commits:

Automatically parallelized query aggregation used a node-local clock for timeouts, rather than the standard clock. This meant that automatically parallelized queries would fail, unless the nodes were started at the same time. This is now fixed.

It is now possible to replace a node by mentioning its host id, rather then its IP address. This is useful in a container environments, where IP addresses are transient.

“Unset” values are an obscure prepared statement feature that allows only some columns in an UPDATE or INSERT statement to be modified. It was a source of minor bugs and inconvenience in code. The feature has been refactored so it has less impact on the code and is more robust.

USING TIMESTAMP allows setting the mutation timestamp on a CQL statement level. It has a sanity check that prevents setting timestamps in the future, as these can be hard to delete, but sometimes one wishes to do so anyway. There is now a configuration option that allows disabling the feature.

During startup, ScyllaDB makes sstables conform to the compaction strategy in a process called reshaping, so that future reads will perform will. It is now more careful when reshaping Leveled Compaction Strategy tables, to avoid doing unnecessary work.

Lightweight transactions are now more robust when the schema is changed during a transaction.

Raft group 0 (responsible for managing topology and schema) now has improved availability during removenode and decommission.

Alternator, ScyllaDB’s implementation of the DynamoDB protocol, has better validation of malformed base64 encoded values.

The development version number was updated to 5.3.0-dev, marking the beginning of the 5.2 release stabilization cycle.

ScyllaDB carefully measures the memory consumed by queries, and tries to ensure it will not exceed available memory. However, a query’s memory can grow after it already started. If this happens to all concurrently running queries, we may run out. To prevent this, two new safeguards are added: first, when one memory threshold is passed, we pause all queries except one with the intent of completing this one query and releasing memory. If this doesn’t help and memory grows even further, we fail all other queries with the intent of letting one succeed, with the rest retried later.

Alternator table name validation has been optimized.

The ScyllaDB source base contains several performance microbenchmarks. These are now integrated into the main Scylla binary as subcommands, so they can be run on any machine where ScyllaDB is installed e.g. scylla perf_simple_query.

CQL table columns that have the list data type aren’t allowed to contain NULLs, but in certain situations list values in CQL literals or bind variables are allowed to contain NULLs (for example, in LWT IF conditions that use the IN operator). The type system was relaxed to accept NULLs where this is allowed. Previously, these cases were handled by hard-to-maintain workarounds.

See you in the next issue of last week in scylladb.git master!