[RELEASE] Scylla Monitoring Stack 4.3.0

The ScyllaDB team is pleased to announce the release of ScyllaDB Monitoring Stack 4.3.0

ScyllaDB Monitoring Stack is an open-source stack for monitoring ScyllaDB Enterprise and ScyllaDB Open Source, based on Prometheus and Grafana. ScyllaDB Monitoring Stack 4.3.0 supports:

  • ScyllaDB Open Source versions 5.0, 5.1 and 5.2
  • ScyllaDB Enterprise versions 2020.x, 2021.x and 2022.x
  • ScyllaDB Manager 2.3.x, 2.4.x, 2.5.x, 2.6.x, 3.0.x

This release focuses on adaptation for the coming ScyllaDB open source 5.2 metrics changes. It is advised to upgrade the monitoring stack before upgrading ScyllaDB.

Working towards a better Datadog integration, the label used for scraping was modified and by default, there will be no scrapping of the per-shard metrics.

The level label was deprecated, and will be removed in future versions.
Follow the Datadog integration guide and download the updated config and dashboard.

Related Links

Versions updates for Scylla Monitoring Stack 4.3.0

  • Set Prometheus version to 2.42.0
  • Set Grafana version to 9.3.8

New Information in ScyllaDB Dashboards

  • Allow selecting specific CPUs on OS dashboard #1878

  • Changes in the advanced dashboard:
    • Add class and group filters
    • Set the tooltip in descending order
    • Remove the legend-format so instance and shard values will be shown

Add a CPU Starvation graph to the per-scheduling group section

|440x368.4848484848485

Bug Fixes

  • Scylla CQL instance all filter should list all nodes #1895
  • prometheus.consul.yml: take the cluster name from the consul #1891
  • scylla-detailed rename the active and queued read correctly #1876
  • Different value of metric of calculate by recording rules of prometheus and on-fly calculate #1815

Operational Changes

  • Ready for ScyllaDB Open Source 5.2 #1881
  • Rename the datadog labels #1902
  • Remove the alertmanger plugin #1898
  • Use p50 quantile instead of average for latencies #1889
  • Allow admin password and an anonymous viewer #1913
  • more robust alert for node reports being up, while going down. #1877