Originally from the User Slack
 @Chaitanya_Tondlekar: Hello All,
 @Chaitanya_Tondlekar: Hello All,
Seeing issues on scylla one of the node where queued reads & reads failed happen for only one node in cluster.  This got auto resolved but didnt understand why did it happened.
Checked disk level IOPS as well but didnt find any.
We are getting errors on application level at same time :
ERROR : Error while executing query on Scylla:
Error ="gocql: no response received from cassandra within timeout period"
logs of the node.
Also there are spikes at local read error & C ++ exception at same time
 @avi: It looks like you have a large+hot partition. File an issue, slack isn’t a good place to debug this.
 @avi: It looks like you have a large+hot partition. File an issue, slack isn’t a good place to debug this.
 @Chaitanya_Tondlekar: https://github.com/scylladb/scylladb/issues/22506
 @Chaitanya_Tondlekar: https://github.com/scylladb/scylladb/issues/22506
GitHub: Seeing spikes in queued reads & reads failed on one of the node. · Issue #22506 · scylladb/scylladb
nodetool toppartitions 
WRITES Sampler:
  Cardinality: ~0 (256 capacity)
  Top 10 partitions:
        Nothing recorded during sampling period...
READS Sampler:
  Cardinality: ~256 (256 capacity)
  Top 10 partitions:
        Partition                                                              Count       +/-
        (feature_store_v2:user_sscat_features_wide_table_2) 141021894:1004      3520      3509
        (feature_store_v2:user_sscat_features_wide_table_2) 409973193:1020      3519      3509
        (feature_store_v2:user_sscat_features_wide_table_2) 280050270:1094      3518      3506
        (feature_store_v2:user_sscat_features_wide_table_2) 357359671:1182      3518      3506
        (feature_store_v2:user_sscat_features_wide_table_2) 304797540:1020      3517      3507
        (feature_store_v2:user_sscat_features_wide_table_2) 374548741:1092      3515      3504
        (feature_store_v2:user_sscat_features_wide_table_2) 387223310:2680      3515      3505
        (feature_store_v2:user_sscat_features_wide_table_2) 413415095:1093      3515      3508
        (feature_store_v2:user_sscat_features_wide_table_2) 436609686:1091      3515      3509
        (feature_store_v2:user_sscat_features_wide_table_2) 36953007:1091       3515      3510
@avi this command is ran when we started facing this issue,
 @avi: Need to increase sampling capacity (try -k 1000) until the “+/-” column is small compared to the Count column
 @avi: Need to increase sampling capacity (try -k 1000) until the “+/-” column is small compared to the Count column
 @Chaitanya_Tondlekar:
 @Chaitanya_Tondlekar:
nodetool toppartitions -k 100
WRITES Sampler:
  Cardinality: ~0 (256 capacity)
  Top 100 partitions:
        Nothing recorded during sampling period...
READS Sampler:
  Cardinality: ~256 (256 capacity)
  Top 100 partitions:
        Partition                                                     Count       +/-
        (Keyspace_name:table-1) perfume perfume:700-800                3624      3611
        (Keyspace_name:table-2) folding school bag:136346658           3620      3619
        (Keyspace_name:table-2) pooja porul:128008071                  3619      3618
        (Keyspace_name:table-2) long sling bag for women:53088419      3618      3617
        (Keyspace_name:table-2) study table buy one get:145731875      3617      3616
        (Keyspace_name:table-2) double bangle:81572151                 3617      3616
        (Keyspace_name:table-3) 334017196:1092                         3616      3609
        (Keyspace_name:table-3) 71021159:2151                          3616      3612
        (Keyspace_name:table-3) 285725590:1267                         3616      3612
        (Keyspace_name:table-3) 435414383:1092                         3616      3615
nodetool toppartitions -k 200
WRITES Sampler:
  Cardinality: ~0 (256 capacity)
  Top 200 partitions:
        Nothing recorded during sampling period...
READS Sampler:
  Cardinality: ~256 (256 capacity)
  Top 200 partitions:
        Partition                                                      Count      +/-
        (Keyspace_name:table-2) western top:3862641                    3587      3586
        (Keyspace_name:table-2) partywear heels sandals:103657919      3584      3583
        (Keyspace_name:table-2) jar water:117870082                    3584      3583
        (Keyspace_name:table-2) women slipper stylish:139073886        3584      3583
        (Keyspace_name:table-2) cute sweater:142723276                 3584      3583
        (Keyspace_name:table-2) hanging for lehenga:129722892          3584      3583
        (Keyspace_name:table-3) 16109061:1212                          3583      3578
        (Keyspace_name:table-3) 92633448:1092                          3583      3578
        (Keyspace_name:table-3) 61116124:1094                          3583      3578
        (Keyspace_name:table-2) oats weight gain:111341274             3583      3582
nodetool toppartitions -k 1000
nodetool: TopK count (-k) option must be smaller then the summary capacity (-s)
See 'nodetool help' or 'nodetool help <command>'.
even fro -k 300 was giving same error.
TopK count (-k) option must be smaller then the summary capacity (-s)
 @avi: Increase -s, not -k. My mistake.
 @avi: Increase -s, not -k. My mistake.
 @Chaitanya_Tondlekar: will check.
 @Chaitanya_Tondlekar: will check.
 @Guy: @Chaitanya_Tondlekar, did this solve the issue?
 @Guy: @Chaitanya_Tondlekar, did this solve the issue?
 @Chaitanya_Tondlekar:  It seems the maintenance is getting triggered for nodes from GCP level.
 @Chaitanya_Tondlekar:  It seems the maintenance is getting triggered for nodes from GCP level.
 @avi: I think you can tell them not to do it
 @avi: I think you can tell them not to do it
 @Chaitanya_Tondlekar: @avi @Guy Need your help on this.
 @Chaitanya_Tondlekar: @avi @Guy Need your help on this.
We initially thought that the timings were matching with the maintenance which was happening but even after stopping maintenance we are seeing queues reads and reads failed on couple of nodes. Logs of the node where reads failed are printing  memory resources: timed out with semaphore.
Logs referece :
Mar 03 07:29:35 NODENAME scylla[64273]:  [shard 18:stmt] reader_concurrency_semaphore - (rate limiting dropped 47023 similar messages) Semaphore user with 87/100 count and 8878039/156153937 memory resources: timed out, dumping permit diagnostics:
                                                               permits        count        memory        table/operation/state
                                                               81        81        8M        orion.query_catalog_features_wide_table_7d/data-query/active/await
                                                               6        6        112K        orion.query_catalog_features_wide_table_7d/data-query/active/need_cpu
                                                               1698        0        0B        orion.query_catalog_features_wide_table_7d/data-query/waiting_for_admission
                                                               
                                                               1785        87        8M        total
                                                               
                                                               Stats:
                                                               permit_based_evictions: 0
                                                               time_based_evictions: 0
                                                               inactive_reads: 0
                                                               total_successful_reads: 1119067135
                                                               total_failed_reads: 14986903
                                                               total_reads_shed_due_to_overload: 0
                                                               total_reads_killed_due_to_kill_limit: 0
                                                               reads_admitted: 1120091691
                                                               reads_enqueued_for_admission: 496241894
                                                               reads_enqueued_for_memory: 0
                                                               reads_admitted_immediately: 637813929
                                                               reads_queued_because_ready_list: 215349766
                                                               reads_queued_because_need_cpu_permits: 241788276
                                                               reads_queued_because_memory_resources: 0
                                                               reads_queued_because_count_resources: 39103852
                                                               reads_queued_with_eviction: 0
                                                               total_permits: 1134055823
                                                               current_permits: 1785
                                                               need_cpu_permits: 87
                                                               awaits_permits: 81
                                                               disk_reads: 86
                                                               sstables_read: 301
@Botond_Dénes is it possible to help here ?
@avi @dor
 @avi: Did you check for hot partitions?
 @avi: Did you check for hot partitions?
 @Chaitanya_Tondlekar: its happening now @avi. It happens majorly for two nodes only.
 @Chaitanya_Tondlekar: its happening now @avi. It happens majorly for two nodes only.
nodetool toppartitions
READS Sampler:
  Cardinality: ~256 (256 capacity)
  Top 10 partitions:
        Partition                                                                              Count       +/-
        (orion:query_catalog_features_wide_table_7d) cordset for women partywear:117568709       722       721
        (orion:query_catalog_features_wide_table_7d) golden hoops:111736619                      721       720
        (orion:query_catalog_features_wide_table_7d) s alphabet locket:23258415                  720       719
        (orion:query_catalog_features_wide_table_7d) slipper for women:47997260                  720       719
        (orion:query_catalog_features_wide_table_7d) vichitra silk blouse:124937828              720       719
        (orion:query_catalog_features_wide_table_7d) quran hindi tarjuma:133662915               720       719
        (orion:query_catalog_features_wide_table_7d) quran hindi tarjuma:160986329               720       719
        (orion:query_catalog_features_wide_table_7d) quran hindi tarjuma:94563420                720       719
        (orion:query_catalog_features_wide_table_7d) vichitra silk blouse:150518157              709       708
        (orion:query_catalog_features_wide_table_7d) chiffon suit:145947018                      707       706
WRITES Sampler:
  Cardinality: ~17 (256 capacity)
  Top 10 partitions:
        Partition                                                                       Count       +/-
        (orion:query_catalog_features_wide_table_7d) 8 year girl jean top:139824122         1         0
        (orion:query_catalog_features_wide_table_7d) white cricket jersey:138950176         1         0
        (orion:query_catalog_features_wide_table_7d) new born onesie:139550658              1         0
        (orion:query_catalog_features_wide_table_7d) butt acne cream:120467954              1         0
        (orion:query_catalog_features_wide_table_7d) blue potli:139158023                   1         0
        (orion:query_catalog_features_wide_table_7d) boys belt slipper:73637371             1         0
        (orion:query_catalog_features_wide_table_7d) boys cleeper:76472606                  1         0
        (orion:query_catalog_features_wide_table_7d) boys cleeper:76541341                  1         0
        (orion:query_catalog_features_wide_table_7d) sandvich toaster:119839650             1         0
        (orion:query_catalog_features_wide_table_7d) star bag:77320988                      1         0
nodetool toppartitions -s 50
READS Sampler:
  Cardinality: ~50 (50 capacity)
  Top 10 partitions:
        Partition                                                                         Count       +/-
        (orion:query_catalog_features_wide_table_7d) remote control dinosaur:59965119       780       779
        (orion:query_catalog_features_wide_table_7d) ayur cleansing milk:161480862          780       779
        (orion:query_catalog_features_wide_table_7d) remote control dinosaur:51730296       780       779
        (orion:query_catalog_features_wide_table_7d) remote control dinosaur:52279056       780       779
        (orion:query_catalog_features_wide_table_7d) camera stand:148754594                 779       778
        (orion:query_catalog_features_wide_table_7d) ayur cleansing milk:149095208          779       778
        (orion:query_catalog_features_wide_table_7d) remote control dinosaur:59786513       779       778
        (orion:query_catalog_features_wide_table_7d) camera stand:161722912                 773       772
        (orion:query_catalog_features_wide_table_7d) ayur cleansing milk:141424544          763       762
        (orion:query_catalog_features_wide_table_7d) herbal green powder:106803292          756       754
WRITES Sampler:
  Cardinality: ~29 (50 capacity)
  Top 10 partitions:
        Partition                                                                        Count       +/-
        (orion:query_catalog_features_wide_table_7d) 0.9 pen pencil:141488211                1         0
        (orion:query_catalog_features_wide_table_7d) full socks for women:145342917          1         0
        (orion:query_catalog_features_wide_table_7d) islamic long kaftan:127529143           1         0
        (orion:query_catalog_features_wide_table_7d) yoga bar oats 1kg:68171629              1         0
        (orion:query_catalog_features_wide_table_7d) kamar ka:143789837                      1         0
        (orion:query_catalog_features_wide_table_7d) frock kids:133173284                    1         0
        (orion:query_catalog_features_wide_table_7d) love ring:85647398                      1         0
        (orion:query_catalog_features_wide_table_7d) stylish anklet:117106672                1         0
        (orion:query_catalog_features_wide_table_7d) wall transparent hook:111889512         1         0
        (orion:query_catalog_features_wide_table_7d) corset top for women:146186383          1         0
@avi what can be done over here ?
tried with
nodetool toppartitions -s 50
Mar 03 14:48:07 scyl-dsci-ml-orionnew2-prd-ase1 scylla[64273]:  [shard 18:stmt] reader_concurrency_semaphore - (rate limiting dropped 33126 similar messages) Semaphore user with 100/100 count and 7983993/156153937 memory resources: timed out, dumping permit diagnostics:
                                                               permits        count        memory        table/operation/state
                                                               98        98        7748K        orion.query_catalog_features_wide_table_7d/data-query/active/await
                                                               2        2        49K        orion.query_catalog_features_wide_table_7d/data-query/active/need_cpu
                                                               1267        0        0B        orion.query_catalog_features_wide_table_7d/data-query/waiting_for_admission
                                                               
                                                               1367        100        7797K        total
                                                               
                                                               Stats:
                                                               permit_based_evictions: 0
                                                               time_based_evictions: 0
                                                               inactive_reads: 0
                                                               total_successful_reads: 1250681823
                                                               total_failed_reads: 28598835
                                                               total_reads_shed_due_to_overload: 0
                                                               total_reads_killed_due_to_kill_limit: 0
                                                               reads_admitted: 1252630246
                                                               reads_enqueued_for_admission: 623816669
                                                               reads_enqueued_for_memory: 0
                                                               reads_admitted_immediately: 655465356
                                                               reads_queued_because_ready_list: 229330142
                                                               reads_queued_because_need_cpu_permits: 319931533
                                                               reads_queued_because_memory_resources: 0
                                                               reads_queued_because_count_resources: 74554994
                                                               reads_queued_with_eviction: 0
                                                               total_permits: 1279282025
                                                               current_permits: 1367
                                                               need_cpu_permits: 100
                                                               awaits_permits: 98
                                                               disk_reads: 100
                                                               sstables_read: 349
 @Botond_Dénes: The disk on the node is not keeping up with requests, so you have a lots of queued reads and timeouts.
 @Botond_Dénes: The disk on the node is not keeping up with requests, so you have a lots of queued reads and timeouts.
 @Chaitanya_Tondlekar: yes, but it keeps on happening all of sudden. and only on 2 nodes. It spikes on queued reads and reads failed.
 @Chaitanya_Tondlekar: yes, but it keeps on happening all of sudden. and only on 2 nodes. It spikes on queued reads and reads failed.
 @Botond_Dénes: You possibly have a spike in load (check requests count for replica metrics on the dashboard) or possibly a spike in request for certain partitions
 @Botond_Dénes: You possibly have a spike in load (check requests count for replica metrics on the dashboard) or possibly a spike in request for certain partitions
 @Chaitanya_Tondlekar: partition hits seems same on all nodes. Even on the nodes where queued reads and reads failed are there.
 @Chaitanya_Tondlekar: partition hits seems same on all nodes. Even on the nodes where queued reads and reads failed are there.
There’s no sudden increased in QPS.
I have shared output for top paritions from the node which are affected. Can we get any help from there ?
@Botond_Dénes
 @Botond_Dénes: partition load seems pretty uniform
 @Botond_Dénes: partition load seems pretty uniform
do partitions vary in size?
 @Chaitanya_Tondlekar: how to get the partition sizes ?
 @Chaitanya_Tondlekar: how to get the partition sizes ?
 @Botond_Dénes: is there anything in the logs about large partitions
 @Botond_Dénes: is there anything in the logs about large partitions
how about system.large_partitions?
 @Chaitanya_Tondlekar: Logs are printing only those which we have posted here.
 @Chaitanya_Tondlekar: Logs are printing only those which we have posted here.
cassandra@cqlsh> select * from system.large_partitions;
 keyspace_name | table_name | sstable_name | partition_size | partition_key | compaction_time | dead_rows | range_tombstones | rows
---------------+------------+--------------+----------------+---------------+-----------------+-----------+------------------+------
(0 rows)
cassandra@cqlsh> 
logs are only printing for semaphores
@Botond_Dénes anything else we can check now ? As we are seeing this and can check things activiely
 @Botond_Dénes: check CPU load metric with shard resolution
 @Botond_Dénes: check CPU load metric with shard resolution
do you see some shards pegged at 100% CPU load?
 @Chaitanya_Tondlekar:
 @Chaitanya_Tondlekar:
check CPU load metric with shard resolution – > where we can get this ?
we are checking load metrics from detailed dashboard and seem load is even.
 @Botond_Dénes: Change
 @Botond_Dénes: Change by Instance to by Shard
 @Chaitanya_Tondlekar: shard 18 is having 100%
 @Chaitanya_Tondlekar: shard 18 is having 100%
 @Botond_Dénes: Is this the shard that is logging the semaphore timeotus?
 @Botond_Dénes: Is this the shard that is logging the semaphore timeotus?
I see both reports from above are from shard 18
So you do have some partitions which are hotter
they live on shard 18
 @Chaitanya_Tondlekar: Got it
 @Chaitanya_Tondlekar: Got it
@Botond_Dénes how do we solve hot partition issues?
 @Botond_Dénes: Two ways to solve it:
 @Botond_Dénes: Two ways to solve it:
• change data model to split the partitions
• scale up/out so hopefully the hot partitions are spread between multiple shards/nodes
there is a 3rd way: chance clients so they don’t hammer a subset of partiitons – this is not always possible
 @Chaitanya_Tondlekar:
 @Chaitanya_Tondlekar: chance clients didnt understand
 @Botond_Dénes: change clients – change application code
 @Botond_Dénes: change clients – change application code
 @Chaitanya_Tondlekar: Oh got it
 @Chaitanya_Tondlekar: Oh got it
 @avi: The capacity for
 @avi: The capacity for nodetool toppartitions is too low, use the -s parameter.
 @Chaitanya_Tondlekar: Yes we have used -s only
 @Chaitanya_Tondlekar: Yes we have used -s only
@Botond_Dénes Is there any way we can increase the value for reader_concurrency_semaphore in scylla.yaml ?
reader_concurrency_semaphore_limit 
 @Botond_Dénes: No way to increase the limit but it would be pointless even if it would be possible. If the shard is overloaded, any limit will be reached eventually, because client sends more requests than what the node can retire, so the queue keeps on growing forever.
 @Botond_Dénes: No way to increase the limit but it would be pointless even if it would be possible. If the shard is overloaded, any limit will be reached eventually, because client sends more requests than what the node can retire, so the queue keeps on growing forever.
 @Chaitanya_Tondlekar: Can you elaborate the third approach of changing clients ? How we can do that ?
 @Chaitanya_Tondlekar: Can you elaborate the third approach of changing clients ? How we can do that ?
 @Botond_Dénes: I cannot give you concrete advice, I don’t know anything about your application. I just mentioned the possibility – sometimes it may be possible to change the application to not request some partition much more often than other partitions. This may not always be possible e.g. if the requests are generated by users who you can’t control and some partitions are just more popular than others.
 @Botond_Dénes: I cannot give you concrete advice, I don’t know anything about your application. I just mentioned the possibility – sometimes it may be possible to change the application to not request some partition much more often than other partitions. This may not always be possible e.g. if the requests are generated by users who you can’t control and some partitions are just more popular than others.