The ScyllaDB team is pleased to announce the release of ScyllaDB Monitoring Stack 4.10.0
ScyllaDB Monitoring Stack is an open-source stack for monitoring ScyllaDB based on Prometheus and Grafana. ScyllaDB Monitoring Stack 4.10.0 supports:
- ScyllaDB 2024.1, 2024.2, 2025.1, and the upcoming 2025.2 release
- ScyllaDB Manager 3.x
This release includes multiple updates to the overview, detailed, alternator, and advanced dashboards, including in CPU Utilization, Disk utilization, and others.
Related Links
- Download ScyllaDB Monitoring Stack 4.10.0
- ScyllaDB Monitoring Stack Docs
- Upgrade from ScyllaDB Monitoring 3.x to 4.y
- Upgrade from ScyllaDB Monitoring 4.x to 4.y
Version updates for ScyllaDB Monitoring Stack 4.10.0
- Prometheus upgraded to version 3.3.1
- Grafana upgraded to version 11.6.1
New Information in ScyllaDB Dashboards
Overview Dashboard Change
- CPU utilization to show priority classes that are involved in query processing #2478
ScyllaDB’s advanced use of Service Level for query processing and background operations can cause confusion when observing CPU consumption.
When monitoring ScyllaDB, high CPU usage is not necessarily an indication of system overload.
To clarify this, the overview dashboard now displays only the query-related priority group consumption, split by priority group.
- Switch disk utilization to show percentile #2499
In non-heterogeneous clusters, mixing different instance size nodes, each with a different number of cores, storage volume and disk usage is harder to interpret when viewed in bytes.
Instead, the graph now shows percentile usage, displaying the average percentage of disk space used.
Detailed Dashboard Change
- Misleading titles and descriptions for Tombstones panels in Grafana #2481
The descriptions for tombstones in SSTables graphs have been clarified.
The relative numbers represent the number of tombstones found in an SSTable and are updated after flush, compaction, or streaming operations.
This clarification helps users understand when to expect updates to these values.
- Compressed Bytes Sent by Algorithm - add aggregated value #2500
When viewing the compressed bytes sent graph, it’s useful to track the aggregated total number of bytes.
A new total graph within the panel makes this easier to follow.
- Add the scylla_load_balancer_load metric to the tablet section #2514
When examining tablet balance across the system, the total number of tablets per node can be misleading in non-heterogeneous clusters.
Instead, the tablet load balancer’s load metric should be used. This metric defines load in proportion to each node’s capacity.
In a balanced non-heterogeneous cluster, the load balancer load metric will be equal across nodes, even if the number of tablets is not.
- RPC delay metrics #2349
The new RPC delay graph in the RPC section shows the total round-trip time of an RPC message between the verb caller and the server.
- Querier cache sub-panel #2471
The querier cache stores queries paused due to paging and resumes them later, reducing query startup costs.
If it misbehaves due to overload or bugs, performance can degrade.
The new querier cache section displays population, lookup rate, and miss rate.
Alternator Dashboard Change
- Expose HTTP metrics in the Alternator dashboard #2506
Alternator relies on the HTTP protocol. There is now an HTTP section in the Alternator dashboard, currently showing open connections and new connections.
This helps identify situations where there are too few or too many connections.
CQL Dashboard Change
- Split non-token-aware queries to show reads and writes #2468
Non-token-aware queries result in performance loss.
When viewing the non-token-aware graph, it’s helpful to distinguish whether the source is reads or writes.
The panel now displays two separate graphs: one for reads and one for writes.
- Support new large partition columns #2483
Large partitions can lead to performance degradation.
To address this, ScyllaDB collects information about large partitions.
The updated panel now includes additional columns: dead rows and range tombstones.
OS Dashboard Change
- Add node_netstat_Tcp_RetransSegs #2472
TCP retransmission segments may indicate a network problem.
A new graph now shows the rate of TCP retransmissions.
Advanced Dashboard Change
- “Commit log Should use aggregation function” #2516
The commit log information was previously always aggregated using averages. This caused confusion, and in some cases, it was necessary to aggregate using other methods, such as sum.
It now uses the same aggregation functions available from the drop-down menu as the rest of the graphs.
Bug Fixes
- Fix a typo in the log message (“grafna” → “grafana”) #2519
Operational Changes
- Set the auto dashboard refresh to 5m #2501
- Manager versions will be based on Major releases instead of minor releases. When specifying ScyllaDB Manager 3.x versions, use 3 instead of a specific minor version (like 3.3 or 3.4)