Also, which of the two metrics is more relevant? If I see 100% Load, is it a bad thing?
That’s a great question!
While checking dashboards, you might come across two different yet similar metrics:
Load (in the Detailed Dashboard) and CPU used (in the OS Metrics dashboard).
**
LOAD:**
In other words: “How many things want CPU time right now”
CPU USED:
In other words, *it’s raw CPU OS-level utilization, telling you how much time your processor wasn’t idling.
*
If we put these two metrics together we can see this:
- Scylla Load as the green metric
2. CPU used as the yellow metric
We can clearly see that CPU usage has a greater value than Load.
The reason is that CPU usage is taken from node_exporter, which monitors CPU usage directly on the host machine, so it takes into consideration not only ScyllaDB itself, but also other processes running on the host machine.
There is also one more reason why OS usage may appear higher than expected.
Scylla is built on top of Seastar, C++ framework, which is used to design extremely low-latency systems. The key concept is that Scylla uses a polling (busy-polling) model, rather than traditional interrupt-driven model.
It means that Scylla instead of sleeping and waiting for requests, it continuously checks (“polls”) for a completion in a tight event loop. It’s not waiting for the OS scheduler. In other words, the elevated CPU usage reported by the OS is expected behavior and is a consequence of Scylla’s high-performance, polling-based architecture.
Therefore Load is more relevant to us, because it represents the actual work that Scylla itself is doing on handling reads and writes, applying mutations, running queries or running background tasks like repairs, compaction, streaming etc.
Is always 100% Load a bad thing and my cluster is running at its limits?
Let’s look at an example. We have a cluster where we enabled ZSTD compression and now we’re running the below command to rewrite (using compaction) SSTables to the new format.
nodetool upgradesstables -a
In the graph below we can see that LOAD went to 100% so did CPU but latency remained stable at about 7ms
This brings up another question:
Why did the CPU go to 100% and what would happen if there were a sudden burst of writes or reads.
The answer is:
Scylla ALWAYS tries to finish any task in no time without sacrificing Client READS and WRITES. If there’s an increase in reads or writes, Scylla throttles background tasks to keep super low latency on customer queries.
ProTip:
Always keep an eye on latency and your application’s performance, not just resource graphs.



