Last 2 weeks in scylla-cluster-tests.git master (issue #26; 2023-10-06)

This short report brings to light some interesting commits to scylla-cluster-tests.git master from the last week. Commits in the 558925b4…ecaae207 range are covered.

There were 35 non-merge commits from 12 authors in that period. Some notable commits:

Performance latency test also started to use placement groups to stabilise the test results.

New performance test for GrowShrinkClusterNemesis in multi-AZ environment was added

Logging was improved in the upgrade test by adding multiple InfoEvent messages for all (or mostly of) the commands we run in the upgrade procedure.

New performance test related to latency during upgrade was added uses 650GB dataset, and in the report each node upgrade is represented as operation cycle.

Individual nemesis tests was enhanced by adding MV and large partition load Removed the compaction strategy from the cassandra-stress commands, since we’d like scylla to use its default compaction strategy and increased row size and partition count to have a more significant amount of data (a few GB).

When triggering a reboot using Azure begin_restart SDK VM is not rebooted immediately, but rather scheduled for reboot in the future. This is causing timeouts and broken nemesis logic. So we switched Azure API restart to running reboot -ff to reboot immediately.

Azure’s ExtensionOperations were causing problems by enabling auditd service - this was causing log flood and breaking long longevities. To tackle this issue we decided to disable azure agents.

We fixed the issue with respecting disk_size option for Azure VM’s.

We started to validate if sstables are truly encrypted.

Because of false failures during some nemesis in ScyllaDB cloud tests we reworked SSH-based remote loggers and improved code quality.

DB logs now have millisecond resolution.

See you in the next issue of last week in scylla-cluster-tests.git master!