Performance issue, throughput drop and latency increase

Guy · November 28, 2024, 4:46am

Originally from the User Slack

@Ritesh: Hi,
I’m facing an unusual problem where my single-node ScyllaDB instance, with 40 cores and 80GB RAM, is now handling only 1K write QPS with latency of 135 ms. Previously, it was handling around 100K write QPS. Our Spark jobs are writing to this ScyllaDB node, but I don’t see any unusual lines in the logs. Could someone help me troubleshoot this?

@Ritesh: This is the htop output-

@avi: What storage do you use?

@Ritesh: SSD storage

@avi: Could you be less specific

@Ritesh: @avi Disk & CPU Info-

@avi: What’s the ,ale and model of the disk? How many disks are there? How are they organized?

@Ritesh: It’s a single 9TB SSD formatted with EXT4

Can you give me some pointers to debug the slow write throughput.

I deleted the docker volume twice and it works with 100K Write QPS for a day, then suddenly it drops to 5K. There are no errors in the logs.

Below is the screen shoot-
@avi I see extremely high disk I/O which I feel is causing the writes to slow down, why isn’t it using the RAM for writes?
These are some nodetool command stats-

root@scylladb-node1:/# nodetool compactionstats
pending tasks: 0

id compaction 7f181f30-9952-11ef-a52a-ea3f6c0e32ba COMPACTION e463abc0-9952-11ef-91e6-ea296c0e32ba COMPACTION e63e3820-9952-11ef-914e-ea3e6c0e32ba COMPACTION 9130d7c0-9952-11ef-b05e-ea436c0e32ba COMPACTION c44e79a0-9952-11ef-9f8d-ea2e6c0e32ba COMPACTION e4fa1f60-9952-11ef-b6b8-ea306c0e32ba COMPACTION 2bbfe8e0-9952-11ef-a7c7-ea3d6c0e32ba COMPACTION 733bed90-9952-11ef-8887-ea2d6c0e32ba COMPACTION e4eed4c0-9952-11ef-a9ce-ea326c0e32ba COMPACTION e4bffc90-9952-11ef-8317-ea316c0e32ba COMPACTION e5a07180-9952-11ef-a1e8-ea346c0e32ba COMPACTION e461aff0-9952-11ef-8622-ea256c0e32ba COMPACTION e418c010-9952-11ef-940b-ea336c0e32ba COMPACTION e4edea60-9952-11ef-9c7d-ea406c0e32ba COMPACTION e3cb1540-9952-11ef-acf7-ea376c0e32ba COMPACTION e5e9af80-9952-11ef-ae29-ea476c0e32ba COMPACTION a2bb5a10-9952-11ef-877b-ea286c0e32ba COMPACTION 8834e2b0-9952-11ef-9a27-ea2b6c0e32ba COMPACTION e463f9e0-9952-11ef-bf63-ea416c0e32ba COMPACTION Active compaction remaining time : n/a type keyspace table completed total unit progress
user segments 10450305 16954752 keys 61.64%
user segments 16880 536064 keys 3.15%
user segments 493 533504 keys 0.09%
user segments 8682107 16182144 keys 53.65%
user segments 4336080 7870848 keys 55.09%
user segments 21118 803200 keys 2.63%
user segments 14981297 47618432 keys 31.46%
user segments 12171322 15482496 keys 78.61%
user segments 65605 535936 keys 12.24%
user segments 13424 534400 keys 2.51%
user segments 6845 1066112 keys 0.64%
user segments 234639 536320 keys 43.75%
user segments 53084 803712 keys 6.60%
user segments 59928 802432 keys 7.47%
user segments 69324 535296 keys 12.95%
user segments 12129 1065984 keys 1.14%
user segments 5516920 35002240 keys 15.76%
user segments 5432425 7618816 keys 71.30%
user segments 24961 1066240 keys 2.34%

root@scylladb-node1:/# nodetool cfstats user.segments
Total number of tables: 66

Keyspace : user
Read Count: 18546287
Read Latency: 2.0165853143542964E-4 ms
Write Count: 58488507
Write Latency: 4.2926142737751884E-5 ms
Pending Flushes: 0
Table: segments
SSTable count: 227
SSTables in each level: [227/4]
Space used (live): 255941509511
Space used (total): 255941509511
Space used by snapshots (total): 0
Off heap memory used (total): 11382524916
SSTable Compression Ratio: 0.5989474298522396
Number of partitions (estimate): 2778003890
Memtable cell count: 4803792
Memtable data size: 2806399012
Memtable off heap memory used: 5588910080
Memtable switch count: 198
Local read count: 18553639
Local read latency: 0.202 ms
Local write count: 58565454
Local write latency: 0.043 ms
Pending flushes: 0
Percent repaired: 0.0
Bloom filter false positives: 58849
Bloom filter false ratio: 0.11260
Bloom filter space used: 5599935004
Bloom filter off heap memory used: 5599935004
Index summary off heap memory used: 193679832
Compression metadata off heap memory used: 0
Compacted partition minimum bytes: 43
Compacted partition maximum bytes: 446
Compacted partition mean bytes: 111
Average live cells per slice (last five minutes): 0.0
Maximum live cells per slice (last five minutes): 0
Average tombstones per slice (last five minutes): 0.0
Maximum tombstones per slice (last five minutes): 0
Dropped Mutations: 0

root@scylladb-node1:/# cqlsh 10.0.7.135 -e “DESCRIBE user.segments;”

CREATE TABLE user.segments (
maid uuid PRIMARY KEY,
segment_map map<int, int>
) WITH bloom_filter_fp_chance = 0.01
AND caching = {‘keys’: ‘ALL’, ‘rows_per_partition’: ‘ALL’}
AND comment = ‘’
AND compaction = {‘class’: ‘SizeTieredCompactionStrategy’}
AND compression = {‘sstable_compression’: ‘org.apache.cassandra.io.compress.LZ4Compressor’}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = ‘99.0PERCENTILE’;

@avi: What’s the make and model of the disk?

Check scylla-monitoring, Advanced dashboard, commitlog and compaction I/O panels. Look at the disk latency and bandwidth charts.

@Robert: And use xfs instead ext4 and I’m not sure if You are not overload a single partitions by map operations - data model doesn’t looks so efficient
Btw You wrote 80GB memory peer 40 cores, but htop shows 96 cores and 128GB… but even that it’s around 2GB peer Scylla shard, there is maybe not enough space for a memtable

@Ritesh: @Robert Thanks for the suggestion on using XFS instead of EXT4!
I’m concerned about potential issue you specified about partition overload due to the map operations. Could you recommend an alternative data model that would be more efficient for handling high-cardinality segments in this setup?
> And use xfs instead ext4 and I’m not sure if You are not overload a single partitions by map operations - data model doesn’t looks so efficient
In our use case, each UUID is associated with multiple integer segments with high cardinality, and we frequently update specific UUIDs with their associated segments. For our query pattern, we primarily perform lookups using UUIDs in the WHERE clause. Based on this, I thought the below data model would be well-suited for our queries:

CREATE TABLE user.segments (
maid uuid PRIMARY KEY,
segment_map map<int, int>
)
> Btw You wrote 80GB memory peer 40 cores, but htop shows 96 cores and 128GB… but even that it’s around 2GB peer Scylla shard, there is maybe not enough space for a memtable
The difference in cores and RAM is due to resources allocated for running Apache Spark, which handles writing partitions to the ScyllaDB table
> What’s the make and model of the disk?
>
> Check scylla-monitoring, Advanced dashboard, commitlog and compaction I/O panels. Look at the disk latency and bandwidth charts.
@avi Thanks for the suggestion!
I’ll check the scylla-monitoring, Advanced dashboard, commitlog and compaction I/O panels you have mentioned.

This is info about the disk-
``Model: PERC H745 Front
Firmware: 51.16.0-4076

Product Firmware Size Diskbay RPM
SAMSUNG RAID0 1920 GB 134:2 SSD
SAMSUNG RAID0 1920 GB 134:3 SSD
SAMSUNG RAID0 1920 GB 134:4 SSD
SAMSUNG RAID0 1920 GB 134:5 SSD
SAMSUNG RAID0 1920 GB 134:6 SSD

@Robert: Imo instead of map should be just a simple table with partition, clustering column as map key and normal column as map value:
CREATE TABLE user.segments (
maid uuid,
segment_key int,
segment_value int,
Primary key(maid, segment_key)
);

Looks like a multi row (per uuid) instead just one in Your model, but from Scylla perspective it’s a single partition (per uuid) which contains some rows. So You will directly stream a whole partition at once.
But maybe currently Scylla handle both model in the same way. But in past iirc operations on the big collections has some troubles. Additional with that data model You are capable to sort (asc/desc - worth to thing about that when table is created) and additionally keys could be filtered and range filtered (for example: select * from segments where maid = x and segment_key < 10;)

Topic		Replies	Views
ScyllaDB timeout error comes while writing/reading from/to table ScyllaDB troubleshooting	2	447	January 21, 2024
ScyllaDB's read path vs. Cassandra's and performance when using HDD vs SSD ScyllaDB cassandra , performance	4	699	December 5, 2022
P99 and p95 spikes, hot partitions, performance and data modeling ScyllaDB data-model , performance , kubernetes , hot-partition	0	40	June 3, 2025
How to Optimize ScyllaDB Performance for High Throughput Applications? ScyllaDB data-model , performance , scylladb-monitoring , architecture	1	287	October 28, 2024
Use case for replacing Cassandra with ScyllaDB, read and write performance ScyllaDB	0	179	July 9, 2024

Performance issue, throughput drop and latency increase

Related topics