Originally from the User Slack
@Ritesh: Hi,
I’m facing an unusual problem where my single-node ScyllaDB instance, with 40 cores and 80GB RAM, is now handling only 1K write QPS with latency of 135 ms. Previously, it was handling around 100K write QPS. Our Spark jobs are writing to this ScyllaDB node, but I don’t see any unusual lines in the logs. Could someone help me troubleshoot this?
@Ritesh: This is the htop output-
@avi: What storage do you use?
@Ritesh: SSD storage
@avi: Could you be less specific
@Ritesh: @avi Disk & CPU Info-
@avi: What’s the ,ale and model of the disk? How many disks are there? How are they organized?
@Ritesh: It’s a single 9TB SSD formatted with EXT4
Can you give me some pointers to debug the slow write throughput.
I deleted the docker volume twice and it works with 100K Write QPS for a day, then suddenly it drops to 5K. There are no errors in the logs.
Below is the screen shoot-
@avi I see extremely high disk I/O which I feel is causing the writes to slow down, why isn’t it using the RAM for writes?
These are some nodetool command stats-
root@scylladb-node1:/# nodetool compactionstats
pending tasks: 0
id compaction type keyspace table completed total unit progress
7f181f30-9952-11ef-a52a-ea3f6c0e32ba COMPACTION user segments 10450305 16954752 keys 61.64%
e463abc0-9952-11ef-91e6-ea296c0e32ba COMPACTION user segments 16880 536064 keys 3.15%
e63e3820-9952-11ef-914e-ea3e6c0e32ba COMPACTION user segments 493 533504 keys 0.09%
9130d7c0-9952-11ef-b05e-ea436c0e32ba COMPACTION user segments 8682107 16182144 keys 53.65%
c44e79a0-9952-11ef-9f8d-ea2e6c0e32ba COMPACTION user segments 4336080 7870848 keys 55.09%
e4fa1f60-9952-11ef-b6b8-ea306c0e32ba COMPACTION user segments 21118 803200 keys 2.63%
2bbfe8e0-9952-11ef-a7c7-ea3d6c0e32ba COMPACTION user segments 14981297 47618432 keys 31.46%
733bed90-9952-11ef-8887-ea2d6c0e32ba COMPACTION user segments 12171322 15482496 keys 78.61%
e4eed4c0-9952-11ef-a9ce-ea326c0e32ba COMPACTION user segments 65605 535936 keys 12.24%
e4bffc90-9952-11ef-8317-ea316c0e32ba COMPACTION user segments 13424 534400 keys 2.51%
e5a07180-9952-11ef-a1e8-ea346c0e32ba COMPACTION user segments 6845 1066112 keys 0.64%
e461aff0-9952-11ef-8622-ea256c0e32ba COMPACTION user segments 234639 536320 keys 43.75%
e418c010-9952-11ef-940b-ea336c0e32ba COMPACTION user segments 53084 803712 keys 6.60%
e4edea60-9952-11ef-9c7d-ea406c0e32ba COMPACTION user segments 59928 802432 keys 7.47%
e3cb1540-9952-11ef-acf7-ea376c0e32ba COMPACTION user segments 69324 535296 keys 12.95%
e5e9af80-9952-11ef-ae29-ea476c0e32ba COMPACTION user segments 12129 1065984 keys 1.14%
a2bb5a10-9952-11ef-877b-ea286c0e32ba COMPACTION user segments 5516920 35002240 keys 15.76%
8834e2b0-9952-11ef-9a27-ea2b6c0e32ba COMPACTION user segments 5432425 7618816 keys 71.30%
e463f9e0-9952-11ef-bf63-ea416c0e32ba COMPACTION user segments 24961 1066240 keys 2.34%
Active compaction remaining time : n/a
root@scylladb-node1:/# nodetool cfstats user.segments
Total number of tables: 66
Keyspace : user
Read Count: 18546287
Read Latency: 2.0165853143542964E-4 ms
Write Count: 58488507
Write Latency: 4.2926142737751884E-5 ms
Pending Flushes: 0
Table: segments
SSTable count: 227
SSTables in each level: [227/4]
Space used (live): 255941509511
Space used (total): 255941509511
Space used by snapshots (total): 0
Off heap memory used (total): 11382524916
SSTable Compression Ratio: 0.5989474298522396
Number of partitions (estimate): 2778003890
Memtable cell count: 4803792
Memtable data size: 2806399012
Memtable off heap memory used: 5588910080
Memtable switch count: 198
Local read count: 18553639
Local read latency: 0.202 ms
Local write count: 58565454
Local write latency: 0.043 ms
Pending flushes: 0
Percent repaired: 0.0
Bloom filter false positives: 58849
Bloom filter false ratio: 0.11260
Bloom filter space used: 5599935004
Bloom filter off heap memory used: 5599935004
Index summary off heap memory used: 193679832
Compression metadata off heap memory used: 0
Compacted partition minimum bytes: 43
Compacted partition maximum bytes: 446
Compacted partition mean bytes: 111
Average live cells per slice (last five minutes): 0.0
Maximum live cells per slice (last five minutes): 0
Average tombstones per slice (last five minutes): 0.0
Maximum tombstones per slice (last five minutes): 0
Dropped Mutations: 0
root@scylladb-node1:/# cqlsh 10.0.7.135 -e “DESCRIBE user.segments;”
CREATE TABLE user.segments (
maid uuid PRIMARY KEY,
segment_map map<int, int>
) WITH bloom_filter_fp_chance = 0.01
AND caching = {‘keys’: ‘ALL’, ‘rows_per_partition’: ‘ALL’}
AND comment = ‘’
AND compaction = {‘class’: ‘SizeTieredCompactionStrategy’}
AND compression = {‘sstable_compression’: ‘org.apache.cassandra.io.compress.LZ4Compressor’}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = ‘99.0PERCENTILE’;
@avi: What’s the make and model of the disk?
Check scylla-monitoring, Advanced dashboard, commitlog and compaction I/O panels. Look at the disk latency and bandwidth charts.
@Robert: And use xfs instead ext4 and I’m not sure if You are not overload a single partitions by map operations - data model doesn’t looks so efficient
Btw You wrote 80GB memory peer 40 cores, but htop shows 96 cores and 128GB… but even that it’s around 2GB peer Scylla shard, there is maybe not enough space for a memtable
@Ritesh: @Robert Thanks for the suggestion on using XFS instead of EXT4!
I’m concerned about potential issue you specified about partition overload due to the map operations. Could you recommend an alternative data model that would be more efficient for handling high-cardinality segments in this setup?
> And use xfs instead ext4 and I’m not sure if You are not overload a single partitions by map operations - data model doesn’t looks so efficient
In our use case, each UUID is associated with multiple integer segments with high cardinality, and we frequently update specific UUIDs with their associated segments. For our query pattern, we primarily perform lookups using UUIDs in the WHERE clause. Based on this, I thought the below data model would be well-suited for our queries:
CREATE TABLE user.segments (
maid uuid PRIMARY KEY,
segment_map map<int, int>
)
> Btw You wrote 80GB memory peer 40 cores, but htop shows 96 cores and 128GB… but even that it’s around 2GB peer Scylla shard, there is maybe not enough space for a memtable
The difference in cores and RAM is due to resources allocated for running Apache Spark, which handles writing partitions to the ScyllaDB table
> What’s the make and model of the disk?
>
> Check scylla-monitoring, Advanced dashboard, commitlog and compaction I/O panels. Look at the disk latency and bandwidth charts.
@avi Thanks for the suggestion!
I’ll check the scylla-monitoring, Advanced dashboard, commitlog and compaction I/O panels you have mentioned.
This is info about the disk-
``Model: PERC H745 Front
Firmware: 51.16.0-4076
Product Firmware Size Diskbay RPM
SAMSUNG RAID0 1920 GB 134:2 SSD
SAMSUNG RAID0 1920 GB 134:3 SSD
SAMSUNG RAID0 1920 GB 134:4 SSD
SAMSUNG RAID0 1920 GB 134:5 SSD
SAMSUNG RAID0 1920 GB 134:6 SSD
@Robert: Imo instead of map should be just a simple table with partition, clustering column as map key and normal column as map value:
CREATE TABLE user.segments (
maid uuid,
segment_key int,
segment_value int,
Primary key(maid, segment_key)
);
Looks like a multi row (per uuid) instead just one in Your model, but from Scylla perspective it’s a single partition (per uuid) which contains some rows. So You will directly stream a whole partition at once.
But maybe currently Scylla handle both model in the same way. But in past iirc operations on the big collections has some troubles. Additional with that data model You are capable to sort (asc/desc - worth to thing about that when table is created) and additionally keys could be filtered and range filtered (for example: select * from segments where maid = x and segment_key < 10;)