Originally from the User Slack
@Sujay_KS: hey, i’m facing frequent issues in 3 node scylla cluster setup (here we are running as docker container) this happens very frequently, is there any resolution for this issue ?
INFO 2025-01-13 04:36:38,950 [shard 2] reader_concurrency_semaphore - (rate limiting dropped 49 similar messages) Semaphore _read_concurrency_sem with 55/100 count and 13255581/147597557 memory resources: timed out, dumping permit diagnostics:
permits count memory table/description/state
53 53 12M temporal.executions/data-query/active/blocked
2 2 276K temporal.executions/data-query/active/used
1 0 0B system_auth.roles/data-query/waiting
4 0 0B system.paxos/data-query/waiting
84 0 0B temporal.executions/data-query/waiting
144 55 13M total
@Botond_Dénes: Look like your cluster is overloaded. Have a look at monitoring, you will probably see high reactor load.
@Sujay_KS: yeah memory usage was pretty high, will check this from application side
row_cache_size_in_mb
can this flag help in reducing cache overload ?
@Botond_Dénes: No, this option is not implemented (it is there for Cassandra backwards-compatibility).
ScyllaDB always tries to use all of the memory, that is not the problem here. The cluster seems to be overloaded on the CPU and possibly on the I/O level.
@Sujay_KS: can we limit the memory usage and also looks like it’s using memory a lot without flushing ( 32 Gb Fully used )
@Botond_Dénes: Memory usage can be limited by the --memory
command-line parameter. By default ScyllaDB will use all available memory, minus the reserve (set by --reserve-memory
).
@Sujay_KS:
version: "3"
services:
scylla-node3:
container_name: scylla-node3
image: scylladb/scylla:5.2.4
restart: always
command: --developer-mode=0 --seeds 10.68.10.238,10.68.11.197 --broadcast-rpc-address 10.68.12.98 --broadcast-address 10.68.12.98
volumes:
- "/var/lib/scylla:/var/lib/scylla"
- "./scylla/scylla.yaml:/etc/scylla/scylla.yaml"
- "./scylla/mutant-data.txt:/mutant-data.txt"
ports:
- "7000:7000"
- "7001:7001"
- "9042:9042"
- "10000:10000"
can you pls review this and suggest any abnormality which can cause this issue?
@Botond_Dénes: I don’t see anything obviously wrong here. To resolve the issue, you will probably either have to reduce load or scale out/up the cluster.
@avi: If it’s always shard 2, then it can be a hot partition. Use nodetool toppartition
to track which one.
@Sujay_KS: is there any command or query using which i can check which is overloading scylla and spiking cpu usage
@Botond_Dénes: Unfortunately no (although we should definitely have such a command). The above command (nodetool toppartitions
) is your best bet, it will tell you which partition is queried the most and it will show you which partition is hot (if any).