[RELEASE] ScyllaDB Monitoring Stack 4.9.0

The ScyllaDB team is pleased to announce the release of ScyllaDB Monitoring Stack 4.9.0

ScyllaDB Monitoring Stack is an open-source stack for monitoring ScyllaDB Enterprise and ScyllaDB Open Source, based on Prometheus and Grafana. ScyllaDB Monitoring Stack 4.9.0 supports:

  • ScyllaDB Open Source versions 6.0, 6.1 and 6.2
  • ScyllaDB Enterprise versions 2022.2, 2023.x, 2024.x, and the upcoming Enterprise 2025.1 Source-Available release
  • ScyllaDB Manager 3.4.x

Related Links

Versions updates for ScyllaDB Monitoring Stack 4.9.0

  • Prometheus upgraded to version 3.1.0
  • Grafana upgraded to version 11.4.0
  • Loki upgraded to version 3.3.2

New Information in ScyllaDB Dashboards

Keyspace Dashboard Change

  • Add a summary table for all tables #2373

The new table gives a summary of all tables in the keyspace. It shows disk space usage and reads and writes information.

Alternator Dashboard Change

  • Improve the information panel #2365

Removed the live, compaction, and cache misses from the table.

The P50 is the average of get, put, update, and delete operations.

There is now a P99 for each of the get, put, update, and delete operations.

CQL Dashboard Change

  • Add a requests-by-consistency level panel #2361

The new panel shows all the consistency level requests together and makes it easier to compare.

General Dashboard Changes

  • Show only top-k or bottom-k, not both #2442

The dashboard limits the number of series for better performance, the original filter was hard to understand.

The current implementation lets you choose what kind of filter you want (top-k, bottom-k or limit-k) and how many results.

  • Add limit-k to the dashboards #2454

The introduced experimental limit-k function in Prometheus filters a given number of results, but without specifying a specific rule (like top or bottom), it makes it easier to understand the overall distribution of the values.

Bug Fixes

  • Alternator total ops is empty #2440
  • The write timeouts panel is not filtered by the scheduling group #2432
  • RPC metrics are intermixed #2428
  • CQL Fail should be shown only if a node is in operation mode #2427
  • Tables with no writes are not shown in the keyspace dashboard #2425

Operational Changes

  • ScyllaDB plugin executables are making the repo 153MB #2246

ScyllaDB Monitoring ships with the ScyllaDB plugin. By default, it shipped with all plugin implementations, which made the repository too big. Now, only the needed implementation is shipped.

  • Make it easy to run local Thanos query #2464

For multi-cluster monitoring, it is sometimes better to use a centralized Thanos query to read from a local per-cluster Prometheus with a sidecar. To split a query calculation between, so each cluster will do its calculation and the centralized query will combine the results, we need to run a thanos query locally.

  • Support placing dashboards under a closed support folder #2462

As part of the work towards making user-facing dashboards clearer, while keeping the Support dashboards verbose, a new option was added that splits the dashboards into user-facing and support.

  • Add Quick startup option #2436

To make the startup script run quicker, two changes were made. First, the regular wait interval when testing a container start was shortened, which reduced the start time of a clean installation by 80%.
Second, there is an option to start the monitoring stack without waiting for application validation. While usually unnecessary, it will reduce the startup time of even a non-clean installation to a few seconds.