Currently chasing p99 read latency in our applications that queries time-series data on period of time.
Useful information :
Scylla Open Source 5.1.14-0.20230716.753c9a4769be
16 core / 128 GB RAM
RAID0 SSD 3 TB
3 nodes
Schema definition :
CREATE TABLE trades_v1.tick_v1_desc (
exchange_code text,
symbol text,
hour timestamp,
datetime timestamp,
id text,
amount double,
collected_at timestamp,
origin text,
price double,
side tinyint,
vwp double,
PRIMARY KEY ((exchange_code, symbol, hour), datetime, id)
) WITH CLUSTERING ORDER BY (datetime DESC, id DESC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'ALL'}
AND comment = ''
AND compaction = {'class': 'TimeWindowCompactionStrategy', 'compaction_window_size': '1', 'compaction_window_unit': 'HOURS'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 259200
AND gc_grace_seconds = 0
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';
Application (using golang gocqlx driver) query are pretty straight forward, it queries data by aggregation (always 1 row returned, using basic agg. functions) on a given time-range, from current time minus P to current time, where P is in the range 1min - 15min. (so always query most recent data)
P99 read latency on the scylla dashboard is low, (1ms - 10ms).
On the gocql latency report, P95 is between 1ms-10ms BUT the p99 is very high : from 700ms to 1s !
I’ve tried many things : code profile, tracing, but it seems that somehow ONE request amongst several take forever to run (we issue a batch of read every 1 second to aggregate data).
I’ve noticed those exceptions in the scylla-server logs :
[shard 11] reader_concurrency_semaphore - Semaphore _read_concurrency_sem with 4/100 count and 109559/179264552 memory resources: timed out, dumping permit diagnostics:
permits count memory table/description/state
3 3 74K trades_v1.tick_v1_desc/data-query/inactive
1 1 33K trades_v1.tick_v1_desc/data-query/active/used
1 0 0B trades_v1.tick_v1_desc/data-query/waiting
1 0 0B trades_v1.tick_v1_desc/data-query/waiting
144 0 0B trades_v1.tick_v1_desc/data-query/waiting
1 0 0B trades_v1.tick_v1_desc/data-query/waiting
1 0 0B trades_v1.tick_v1_desc/data-query/waiting
1 0 0B trades_v1.tick_v1_desc/data-query/waiting
1 0 0B trades_v1.tick_v1_desc/data-query/waiting
However it’s not in direct correlation in terms of timestamp with those p99 spikes in the application i.e spikes are every 5/10 s whereas those exceptions occurs less frequently.
To add more context in terms of volumetry, it’s very low, I expected scylla to handle it easily,
6k writes / s , 2k read / s
It’s an isolated cluster so I can pinpoint the root cause, our prod cluster handles more req/s ( ~ 50/130k writes and 8 / 20k reads)
If anyone have an idea of where to look to eliminate those p99 latency spike that would be very helpful.
Thanks