Last week in scylladb.git master (issue #190; 2023-07-30)

This short report brings to light some interesting commits to scylladb.git master from the last week. Commits in the decbc841b7…1c3d22b717 range are covered.

There were 88 non-merge commits from 13 authors in that period. Some notable commits:

Tablets are a new, experimental way to distribute data on the cluster. Tablets now have an automatic load balancer that detects nodes and shards that have a deficit or tablets, and migrates tablets to those nodes and shards in order to restore balance.

ScyllaDB uses a separate commitlog, called the schema commitlog, for schema changes and topology operations in order to reduce the latency of these operations. The segmented size of the schema commitlog has been raised from 32MB to 128MB in order to avoid problems with large numbers of tables, as the entire schema must fit in a single segment.

ScyllaDB uses objects called reader_concurrency_semaphores to limit query concurrency and to isolate different service levels. We now check if the service level changed during a query and avoid erroring out in this case.

Fencing the the mechanism in which requests that were sent using an outdated view of the cluster topology are rejected , in order to avoid reading outdated data or resurrecting old data. It now applies to hints, a mechanism used to heal the cluster after a short node downtime.

Recently the mechanism to update materialized views after repair was optimized. A latent use-after-free bug was discovered in the optimization, and fixed.

A deadlock during shutdown in internode communication was fixed.

When updating a materialized view after repair, we chunk the base table data and process each chunk individually. Chunking is based on memory consumption. However, empty partitions were not accounted for, so long runs of empty partitions could create large chunks and run the node out of memory. This is now fixed by accounting for empty partitions.

ScyllaDB caches pages from the sstable primary index in order to reduce I/O. In certain cases it reads index pages ahead of the actual need to use them to reduce latency. In rare cases this caused an internal invariant to be violated, crashing the node. This is now fixed.

The format of the timestamp data type is now compatible with Cassandra.

ScyllaDB computes the version of the schema by hashing the mutations that describe the schema in the schema tables. This can lead to an inconsistency between nodes if tombstones are expired at different times. This is now fixed by ignoring empty partitions, making the tombstone expiration time irrelevant.

Streaming and repair will now compact data before streaming it, reducing bandwidth usage if the sstables being streamed happen to contain data and tombstones that cover that data.

A bug in the Seastar coroutine code, which could lead to unexpected crashes, has been fixed.

The build toolchain has been updated to Fedora 38 with clang 16.0.6.

See you in the next issue of last week in scylladb.git master!