Last week in scylladb.git master (issue #181; 2023-05-28)

This short report brings to light some interesting commits to scylladb.git master from the last week. Commits in the 3b424e391b…af65d5a1e8 range are covered.

There were 167 non-merge commits from 18 authors in that period. Some notable commits:

The master branch has version has been bumped to 5.4.0-dev, as the 5.3 stabilization and release cycle has started.

When the schema changes, the rows in the row cache have to be upgraded to the new schema. This happens on-demand as rows are hit in the cache. Until now, this happened with partition granularity - all of a partition’s rows that happened to be in cache were upgraded at the same time, causing reactor stalls and high latency when large partitions were cached. This has now been fixed, and the cache is upgraded using row granularity.

Tablet metadata (stored in the system.tablets table) is now loaded after its commitlog has been replayed.

The nodetool checkAndRepairCdcStreams is used to align CDC streams with the cluster topology. It now works when topology is under Raft control.

Logging of node failures during repair has been improved, in order to help diagnose repair failures.

We now drop per-table metrics early during teardown of a table. Previously, if a table was dropped and re-created quickly, the metrics from the old and new tables could clash, resulting in an error.

Commitlog has gained its own scheduling group, to complement the already existing commitlog I/O priority class. This is in preparation for unification of CPU scheduling and I/O scheduling.

The S3 client can now upload files larger than 50GB. The limit was due to multipart uploads having at most 10,000 components, and ScyllaDB using the minimum component size of 5MB in order to reduce memory footprint.

There is now a dedicated performance benchmark for the S3 client.

Schema pulls happen when a node receives a read or write request (as a replica) with an unknown schema; it will then ask the requesting node for an updated schema. These are now disabled when the schema is managed using Raft; instead the system will rely solely on Raft for schema distribution.

The column name reported when writetime() is given a primary key column (which is illegal) is now human readable, even for humans that don’t remember the ASCII table.

The NetworkTopologyStrategy replication strategy will now reject an empty value for the replication factor.

See you in the next issue of last week in scylladb.git master!