[RELEASE] Scylla Manager 3.1 Release

The ScyllaDB Manager team is pleased to announce the release of Scylla Manager 3.1, a production-ready version of ScyllaDB Manager for ScyllaDB Enterprise customers and ScyllaDB Open Source users. ScyllaDB Manager is a centralized cluster administration and recurrent tasks automation tool.

ScyllaDB Manager 3.1 brings a new, improved backup and restore procedure, as well as other bug fixes.

ScyllaDB Enterprise customers are encouraged to upgrade to ScyllaDB Manager 3.1 in coordination with the ScyllaDB support team.

The new release includes upgrades of both ScyllaDB Manager Server and Agent.

Useful Links:

Scylla Manager 3.1 support the following Scylla Enterprise releases:

  • 2022.1
  • 2022.2
  • 2023.1

And the following Open Source release (limited to 5 nodes see license):

  • 5.0
  • 5.1
  • 5.2

You can install and run Scylla Manager on Kubernetes using Scylla Operator. More here.

Automated cluster restore procedure:

We introduce a new way of performing the cluster restore.

It replaces the old approach that required to run and monitor ansible script with a task creation that defines the backup location and the snapshot tag that end user wants to restore.

New restore procedure doesn’t limit the restore to the same cluster topology. Any backups can be restored on any new cluster topologies, assuming it has enough storage, taking advantage of the ScyllaDB Load and Stream feature

There is a possibility for restoring either the schema or the data of the cluster.

To restore the schema, backup must contain system_schema keyspace data.

To restore the data, the schema must be already defined on the destination cluster.

Automated restore procedure works well with older backups made with Scylla Manager < 3.1.

More information about the new restore procedure is available in the restore section of scylla manager 3.1 documentation.

Health Check service:

Configuration of health check service is simplified.

# Health-check service configuration.
#healthcheck:
# max_timeout specifies ping timeout for all ping types (CQL, REST, Alternator).
# max_timeout: 1s

It requires just one parameter now, which is the max_timeout . It defines the threshold for all ping types after which the TIMEOUT error is reported.

Note, that health check reports can be observed via metric exposed by the scylla manager named scylla_manager_healthcheck_{cql | alternator | rest}l_rtt_ms . This metrics shows the response time of CQL, ALTERNATOR and REST pings.

Monitoring

See Scylla Monitoring release 4.3.4 or later for Manager 3.1 dashboard, including a new restore status panel.

The following metrics are added in Manager 3.1:

  • scylla_manager_restore_remaining_bytes labeled with cluster, snapshot_tag, location, dc, node, keyspace and table. It shows how much of the backup data is yet to be restored.
  • scylla_manager_restore_batch_size labeled with cluster and host. It shows how batches of the backup data are propagated between nodes that participate in the restore
  • scylla_manager_restore_state labeled with cluster, location, snapshot_tag and host. It shows the state of the node that participates in the restore. It can be one of the following: idle, downloading, load&stream or error.

Bug Fixes:

Manager 3.1 fixes a problem in the backup process:

If a node has been replaced and keeps the same host-id it starts enumerating SSTables from zero. Manager deduplication may assume an SSTable was already backed up and skip the latest, new SSTable of the new node. Manager 3.1 fixed this issue moving forward by adding versioning to SSTables.

Additional bug fixes:

Compatibility issue with ScyllaDB, introduced in ScyllaDB Open Source 5.0.

Run repairs on full ranges when tables are below small table threshold or fully replicated.

It was possible to fail the repair task when manager metrics weren’t initialized.