How do I debug latency spikes when there haven't been any changes to my query patterns or app workload?

Guy · May 17, 2026, 4:59am

For example, after a recent OS upgrade, kernel change or Scylla version upgrade.

Also, how can I find hotspots on specific CPU cores? I’d like to better understand the CPU usage and find the bottlenecks.

Diego_De_Camargo · May 18, 2026, 12:30pm

Context / Problem Statement

Flame Graphs in ScyllaDB help visualize CPU usage and spot performance bottlenecks (like latency, high CPU usage). They map call stacks to show which functions take the most time, letting you identify specific code paths and check imbalances across shards (CPU cores).

Step-by-Step Instructions

A Flame Graph is built by sampling call stacks (stack traces) and showing them in a hierarchical view.

Visual Components:

X-Axis (Horizontal): The width of each box shows how much time (or samples) that function used.
Y-Axis (Vertical): Shows the depth of the call stack.

Example Use Case:

In the example below, the customer reported an elevated average read latency (red line) after a new node was provisioned (got online).

This was causing a general degradation which was showing up in higher disk delays and higher I/O starvation. Which in line translated into the elevated latency.

FlameGraph:

With FlameGraphs it becomes clear who is consuming the most time. The widest “flame” at the top is the function eating most of the total time.

The investigation showed that this node had very high kernel CPU times and the dominating symbol was osq_lock.

The problem happened because the kernel installed on the new node had a change that caused a regression in seastar, which caused the behavior described above.

Downgrading the kernel on the affected node worked as a temporary solution until a corresponding fix was added to seastar.

Without a Flamegraph it would be much more difficult to figure out the problem, and to understand why the node was slow.

Expected Outcome / Benefit

Flame Graphs turn confusing CPU profiles into clear. They spot hotspots on specific shards (CPU cores).

Key points:

Focus on CPU usage per shard.
Hot paths show issues like I/O delays, compaction overload, network bottlenecks, etc.
Cross-check with Prometheus metrics for full picture.

Topic		Replies	Views
The perf flamegraph data from scylladb is coming out weird ScyllaDB perf , flamegraph	2	302	February 8, 2023
[RELEASE] ScyllaDB Monitoring 4.10.0 Release Notes monitoring-release	0	157	June 5, 2025
How to troubleshoot an issue with high latency happening every once in a while? ScyllaDB performance , troubleshooting , scylladb-monitoring , logging	1	160	March 26, 2025
Mutliple Datacenter cluster, diagnosing high latency spike and performance issues ScyllaDB performance , drivers , compaction , scylladb-monitoring	0	91	April 20, 2025
Need help in explaination on rlatencyp95 metrics exposed by scylla ScyllaDB scylladb-monitoring	3	348	May 8, 2024

How do I debug latency spikes when there haven't been any changes to my query patterns or app workload?

Related topics