Originally from the User Slack
@Chaitanya_Tondlekar: Hello All,
Seeing issues on scylla one of the node where queued reads & reads failed happen for only one node in cluster. This got auto resolved but didnt understand why did it happened.
Checked disk level IOPS as well but didnt find any.
We are getting errors on application level at same time :
ERROR : Error while executing query on Scylla:
Error ="gocql: no response received from cassandra within timeout period"
logs of the node.
Also there are spikes at local read error & C ++ exception at same time
@avi: It looks like you have a large+hot partition. File an issue, slack isn’t a good place to debug this.
@Chaitanya_Tondlekar: https://github.com/scylladb/scylladb/issues/22506
GitHub: Seeing spikes in queued reads & reads failed on one of the node. · Issue #22506 · scylladb/scylladb
nodetool toppartitions
WRITES Sampler:
Cardinality: ~0 (256 capacity)
Top 10 partitions:
Nothing recorded during sampling period...
READS Sampler:
Cardinality: ~256 (256 capacity)
Top 10 partitions:
Partition Count +/-
(feature_store_v2:user_sscat_features_wide_table_2) 141021894:1004 3520 3509
(feature_store_v2:user_sscat_features_wide_table_2) 409973193:1020 3519 3509
(feature_store_v2:user_sscat_features_wide_table_2) 280050270:1094 3518 3506
(feature_store_v2:user_sscat_features_wide_table_2) 357359671:1182 3518 3506
(feature_store_v2:user_sscat_features_wide_table_2) 304797540:1020 3517 3507
(feature_store_v2:user_sscat_features_wide_table_2) 374548741:1092 3515 3504
(feature_store_v2:user_sscat_features_wide_table_2) 387223310:2680 3515 3505
(feature_store_v2:user_sscat_features_wide_table_2) 413415095:1093 3515 3508
(feature_store_v2:user_sscat_features_wide_table_2) 436609686:1091 3515 3509
(feature_store_v2:user_sscat_features_wide_table_2) 36953007:1091 3515 3510
@avi this command is ran when we started facing this issue,
@avi: Need to increase sampling capacity (try -k 1000) until the “+/-” column is small compared to the Count column
@Chaitanya_Tondlekar:
nodetool toppartitions -k 100
WRITES Sampler:
Cardinality: ~0 (256 capacity)
Top 100 partitions:
Nothing recorded during sampling period...
READS Sampler:
Cardinality: ~256 (256 capacity)
Top 100 partitions:
Partition Count +/-
(Keyspace_name:table-1) perfume perfume:700-800 3624 3611
(Keyspace_name:table-2) folding school bag:136346658 3620 3619
(Keyspace_name:table-2) pooja porul:128008071 3619 3618
(Keyspace_name:table-2) long sling bag for women:53088419 3618 3617
(Keyspace_name:table-2) study table buy one get:145731875 3617 3616
(Keyspace_name:table-2) double bangle:81572151 3617 3616
(Keyspace_name:table-3) 334017196:1092 3616 3609
(Keyspace_name:table-3) 71021159:2151 3616 3612
(Keyspace_name:table-3) 285725590:1267 3616 3612
(Keyspace_name:table-3) 435414383:1092 3616 3615
nodetool toppartitions -k 200
WRITES Sampler:
Cardinality: ~0 (256 capacity)
Top 200 partitions:
Nothing recorded during sampling period...
READS Sampler:
Cardinality: ~256 (256 capacity)
Top 200 partitions:
Partition Count +/-
(Keyspace_name:table-2) western top:3862641 3587 3586
(Keyspace_name:table-2) partywear heels sandals:103657919 3584 3583
(Keyspace_name:table-2) jar water:117870082 3584 3583
(Keyspace_name:table-2) women slipper stylish:139073886 3584 3583
(Keyspace_name:table-2) cute sweater:142723276 3584 3583
(Keyspace_name:table-2) hanging for lehenga:129722892 3584 3583
(Keyspace_name:table-3) 16109061:1212 3583 3578
(Keyspace_name:table-3) 92633448:1092 3583 3578
(Keyspace_name:table-3) 61116124:1094 3583 3578
(Keyspace_name:table-2) oats weight gain:111341274 3583 3582
nodetool toppartitions -k 1000
nodetool: TopK count (-k) option must be smaller then the summary capacity (-s)
See 'nodetool help' or 'nodetool help <command>'.
even fro -k 300 was giving same error.
TopK count (-k) option must be smaller then the summary capacity (-s)
@avi: Increase -s, not -k. My mistake.
@Chaitanya_Tondlekar: will check.
@Guy: @Chaitanya_Tondlekar, did this solve the issue?
@Chaitanya_Tondlekar: It seems the maintenance is getting triggered for nodes from GCP level.
@avi: I think you can tell them not to do it
@Chaitanya_Tondlekar: @avi @Guy Need your help on this.
We initially thought that the timings were matching with the maintenance which was happening but even after stopping maintenance we are seeing queues reads and reads failed on couple of nodes. Logs of the node where reads failed are printing memory resources: timed out with semaphore.
Logs referece :
Mar 03 07:29:35 NODENAME scylla[64273]: [shard 18:stmt] reader_concurrency_semaphore - (rate limiting dropped 47023 similar messages) Semaphore user with 87/100 count and 8878039/156153937 memory resources: timed out, dumping permit diagnostics:
permits count memory table/operation/state
81 81 8M orion.query_catalog_features_wide_table_7d/data-query/active/await
6 6 112K orion.query_catalog_features_wide_table_7d/data-query/active/need_cpu
1698 0 0B orion.query_catalog_features_wide_table_7d/data-query/waiting_for_admission
1785 87 8M total
Stats:
permit_based_evictions: 0
time_based_evictions: 0
inactive_reads: 0
total_successful_reads: 1119067135
total_failed_reads: 14986903
total_reads_shed_due_to_overload: 0
total_reads_killed_due_to_kill_limit: 0
reads_admitted: 1120091691
reads_enqueued_for_admission: 496241894
reads_enqueued_for_memory: 0
reads_admitted_immediately: 637813929
reads_queued_because_ready_list: 215349766
reads_queued_because_need_cpu_permits: 241788276
reads_queued_because_memory_resources: 0
reads_queued_because_count_resources: 39103852
reads_queued_with_eviction: 0
total_permits: 1134055823
current_permits: 1785
need_cpu_permits: 87
awaits_permits: 81
disk_reads: 86
sstables_read: 301
@Botond_Dénes is it possible to help here ?
@avi @dor
@avi: Did you check for hot partitions?
@Chaitanya_Tondlekar: its happening now @avi. It happens majorly for two nodes only.
nodetool toppartitions
READS Sampler:
Cardinality: ~256 (256 capacity)
Top 10 partitions:
Partition Count +/-
(orion:query_catalog_features_wide_table_7d) cordset for women partywear:117568709 722 721
(orion:query_catalog_features_wide_table_7d) golden hoops:111736619 721 720
(orion:query_catalog_features_wide_table_7d) s alphabet locket:23258415 720 719
(orion:query_catalog_features_wide_table_7d) slipper for women:47997260 720 719
(orion:query_catalog_features_wide_table_7d) vichitra silk blouse:124937828 720 719
(orion:query_catalog_features_wide_table_7d) quran hindi tarjuma:133662915 720 719
(orion:query_catalog_features_wide_table_7d) quran hindi tarjuma:160986329 720 719
(orion:query_catalog_features_wide_table_7d) quran hindi tarjuma:94563420 720 719
(orion:query_catalog_features_wide_table_7d) vichitra silk blouse:150518157 709 708
(orion:query_catalog_features_wide_table_7d) chiffon suit:145947018 707 706
WRITES Sampler:
Cardinality: ~17 (256 capacity)
Top 10 partitions:
Partition Count +/-
(orion:query_catalog_features_wide_table_7d) 8 year girl jean top:139824122 1 0
(orion:query_catalog_features_wide_table_7d) white cricket jersey:138950176 1 0
(orion:query_catalog_features_wide_table_7d) new born onesie:139550658 1 0
(orion:query_catalog_features_wide_table_7d) butt acne cream:120467954 1 0
(orion:query_catalog_features_wide_table_7d) blue potli:139158023 1 0
(orion:query_catalog_features_wide_table_7d) boys belt slipper:73637371 1 0
(orion:query_catalog_features_wide_table_7d) boys cleeper:76472606 1 0
(orion:query_catalog_features_wide_table_7d) boys cleeper:76541341 1 0
(orion:query_catalog_features_wide_table_7d) sandvich toaster:119839650 1 0
(orion:query_catalog_features_wide_table_7d) star bag:77320988 1 0
nodetool toppartitions -s 50
READS Sampler:
Cardinality: ~50 (50 capacity)
Top 10 partitions:
Partition Count +/-
(orion:query_catalog_features_wide_table_7d) remote control dinosaur:59965119 780 779
(orion:query_catalog_features_wide_table_7d) ayur cleansing milk:161480862 780 779
(orion:query_catalog_features_wide_table_7d) remote control dinosaur:51730296 780 779
(orion:query_catalog_features_wide_table_7d) remote control dinosaur:52279056 780 779
(orion:query_catalog_features_wide_table_7d) camera stand:148754594 779 778
(orion:query_catalog_features_wide_table_7d) ayur cleansing milk:149095208 779 778
(orion:query_catalog_features_wide_table_7d) remote control dinosaur:59786513 779 778
(orion:query_catalog_features_wide_table_7d) camera stand:161722912 773 772
(orion:query_catalog_features_wide_table_7d) ayur cleansing milk:141424544 763 762
(orion:query_catalog_features_wide_table_7d) herbal green powder:106803292 756 754
WRITES Sampler:
Cardinality: ~29 (50 capacity)
Top 10 partitions:
Partition Count +/-
(orion:query_catalog_features_wide_table_7d) 0.9 pen pencil:141488211 1 0
(orion:query_catalog_features_wide_table_7d) full socks for women:145342917 1 0
(orion:query_catalog_features_wide_table_7d) islamic long kaftan:127529143 1 0
(orion:query_catalog_features_wide_table_7d) yoga bar oats 1kg:68171629 1 0
(orion:query_catalog_features_wide_table_7d) kamar ka:143789837 1 0
(orion:query_catalog_features_wide_table_7d) frock kids:133173284 1 0
(orion:query_catalog_features_wide_table_7d) love ring:85647398 1 0
(orion:query_catalog_features_wide_table_7d) stylish anklet:117106672 1 0
(orion:query_catalog_features_wide_table_7d) wall transparent hook:111889512 1 0
(orion:query_catalog_features_wide_table_7d) corset top for women:146186383 1 0
@avi what can be done over here ?
tried with
nodetool toppartitions -s 50
Mar 03 14:48:07 scyl-dsci-ml-orionnew2-prd-ase1 scylla[64273]: [shard 18:stmt] reader_concurrency_semaphore - (rate limiting dropped 33126 similar messages) Semaphore user with 100/100 count and 7983993/156153937 memory resources: timed out, dumping permit diagnostics:
permits count memory table/operation/state
98 98 7748K orion.query_catalog_features_wide_table_7d/data-query/active/await
2 2 49K orion.query_catalog_features_wide_table_7d/data-query/active/need_cpu
1267 0 0B orion.query_catalog_features_wide_table_7d/data-query/waiting_for_admission
1367 100 7797K total
Stats:
permit_based_evictions: 0
time_based_evictions: 0
inactive_reads: 0
total_successful_reads: 1250681823
total_failed_reads: 28598835
total_reads_shed_due_to_overload: 0
total_reads_killed_due_to_kill_limit: 0
reads_admitted: 1252630246
reads_enqueued_for_admission: 623816669
reads_enqueued_for_memory: 0
reads_admitted_immediately: 655465356
reads_queued_because_ready_list: 229330142
reads_queued_because_need_cpu_permits: 319931533
reads_queued_because_memory_resources: 0
reads_queued_because_count_resources: 74554994
reads_queued_with_eviction: 0
total_permits: 1279282025
current_permits: 1367
need_cpu_permits: 100
awaits_permits: 98
disk_reads: 100
sstables_read: 349
@Botond_Dénes: The disk on the node is not keeping up with requests, so you have a lots of queued reads and timeouts.
@Chaitanya_Tondlekar: yes, but it keeps on happening all of sudden. and only on 2 nodes. It spikes on queued reads and reads failed.
@Botond_Dénes: You possibly have a spike in load (check requests count for replica metrics on the dashboard) or possibly a spike in request for certain partitions
@Chaitanya_Tondlekar: partition hits seems same on all nodes. Even on the nodes where queued reads and reads failed are there.
There’s no sudden increased in QPS.
I have shared output for top paritions from the node which are affected. Can we get any help from there ?
@Botond_Dénes
@Botond_Dénes: partition load seems pretty uniform
do partitions vary in size?
@Chaitanya_Tondlekar: how to get the partition sizes ?
@Botond_Dénes: is there anything in the logs about large partitions
how about system.large_partitions
?
@Chaitanya_Tondlekar: Logs are printing only those which we have posted here.
cassandra@cqlsh> select * from system.large_partitions;
keyspace_name | table_name | sstable_name | partition_size | partition_key | compaction_time | dead_rows | range_tombstones | rows
---------------+------------+--------------+----------------+---------------+-----------------+-----------+------------------+------
(0 rows)
cassandra@cqlsh>
logs are only printing for semaphores
@Botond_Dénes anything else we can check now ? As we are seeing this and can check things activiely
@Botond_Dénes: check CPU load metric with shard resolution
do you see some shards pegged at 100% CPU load?
@Chaitanya_Tondlekar:
check CPU load metric with shard resolution – > where we can get this ?
we are checking load metrics from detailed dashboard and seem load is even.
@Botond_Dénes: Change by Instance
to by Shard
@Chaitanya_Tondlekar: shard 18 is having 100%
@Botond_Dénes: Is this the shard that is logging the semaphore timeotus?
I see both reports from above are from shard 18
So you do have some partitions which are hotter
they live on shard 18
@Chaitanya_Tondlekar: Got it
@Botond_Dénes how do we solve hot partition issues?
@Botond_Dénes: Two ways to solve it:
• change data model to split the partitions
• scale up/out so hopefully the hot partitions are spread between multiple shards/nodes
there is a 3rd way: chance clients so they don’t hammer a subset of partiitons – this is not always possible
@Chaitanya_Tondlekar: chance clients
didnt understand
@Botond_Dénes: change clients – change application code
@Chaitanya_Tondlekar: Oh got it
@avi: The capacity for nodetool toppartitions
is too low, use the -s parameter.
@Chaitanya_Tondlekar: Yes we have used -s only
@Botond_Dénes Is there any way we can increase the value for reader_concurrency_semaphore in scylla.yaml ?
reader_concurrency_semaphore_limit
@Botond_Dénes: No way to increase the limit but it would be pointless even if it would be possible. If the shard is overloaded, any limit will be reached eventually, because client sends more requests than what the node can retire, so the queue keeps on growing forever.
@Chaitanya_Tondlekar: Can you elaborate the third approach of changing clients ? How we can do that ?
@Botond_Dénes: I cannot give you concrete advice, I don’t know anything about your application. I just mentioned the possibility – sometimes it may be possible to change the application to not request some partition much more often than other partitions. This may not always be possible e.g. if the requests are generated by users who you can’t control and some partitions are just more popular than others.