Reader_concurrency_semaphore: Multiprocessing, timeout, CPU overload 100%

Hi all. Can you give me some advice?

[cqlsh 5.0.1 | Cassandra 3.0.8 | CQL spec 3.3.1 | Native protocol v4]

docker-compose exec scylla_node scylla --version
5.2.7-0.20230821.e0ebc95025d1

I’m using multiprocessing + asyncio for parallelism and asynchronous requests without blocking. Database queries are executed cyclically.

Each process has its own cluster + session. Everything works fine for a while, but then the SELECT queries fail.

Server error: code=1200 [Coordinator node timed out waiting for responses from replica nodes] message="Operation timed out for directory.candles - only 0 responses received from 1 CL=LOCAL_ONE." info={'consistency': 'LOCAL_ONE', 'required_responses': 1, 'received_responses': 0}

Scylladb logs:

scylla_node      | 2024-02-16T18:45:35.215232875Z INFO  2024-02-16 18:45:35,214 [shard 0] reader_concurrency_semaphore - (rate limiting dropped 1713 similar messages) Semaphore _read_concurrency_sem with 100/100 count and 4311954/170750115 memory resources: timed out, dumping permit diagnostics:
scylla_node      | 2024-02-16T18:45:35.215255669Z permits	count	memory	table/description/state
scylla_node      | 2024-02-16T18:45:35.215258548Z 99	99	4192K	catalog.candles/data-query/inactive
scylla_node      | 2024-02-16T18:45:35.215260182Z 1	1	19K	catalog.candles/data-query/active/used
scylla_node      | 2024-02-16T18:45:35.215261757Z 13928	0	0B	catalog.candles/data-query/waiting
scylla_node      | 2024-02-16T18:45:35.215263195Z
scylla_node      | 2024-02-16T18:45:35.215264520Z 14028	100	4211K	total
scylla_node      | 2024-02-16T18:45:35.215265949Z
scylla_node      | 2024-02-16T18:45:35.215267281Z Total: 14028 permits with 100 count and 4211K memory resources
scylla_node      | 2024-02-16T18:45:35.215268677Z
scylla_node      | 2024-02-16T18:46:05.218802220Z INFO  2024-02-16 18:46:05,218 [shard 0] reader_concurrency_semaphore - (rate limiting dropped 3386 similar messages) Semaphore _read_concurrency_sem with 100/100 count and 2120008/170750115 memory resources: timed out, dumping permit diagnostics:
scylla_node      | 2024-02-16T18:46:05.218820925Z permits	count	memory	table/description/state
scylla_node      | 2024-02-16T18:46:05.218823246Z 99	99	2048K	catalog.candles/data-query/inactive
scylla_node      | 2024-02-16T18:46:05.218824812Z 1	1	22K	catalog.candles/data-query/active/used
scylla_node      | 2024-02-16T18:46:05.218826303Z 7870	0	0B	catalog.candles/data-query/waiting
scylla_node      | 2024-02-16T18:46:05.218827761Z
scylla_node      | 2024-02-16T18:46:05.218829049Z 7970	100	2070K	total
scylla_node      | 2024-02-16T18:46:05.218837258Z
scylla_node      | 2024-02-16T18:46:05.218838734Z Total: 7970 permits with 100 count and 2070K memory resources
scylla_node      | 2024-02-16T18:46:05.218840109Z
scylla_node      | 2024-02-16T18:46:35.231726618Z INFO  2024-02-16 18:46:35,231 [shard 0] reader_concurrency_semaphore - (rate limiting dropped 3704 similar messages) Semaphore _read_concurrency_sem with 98/100 count and 2080320/170750115 memory resources: timed out, dumping permit diagnostics:
scylla_node      | 2024-02-16T18:46:35.231762714Z permits	count	memory	table/description/state
scylla_node      | 2024-02-16T18:46:35.231764898Z 97	97	2007K	catalog.candles/data-query/inactive
scylla_node      | 2024-02-16T18:46:35.231766465Z 1	1	25K	catalog.candles/data-query/active/used
scylla_node      | 2024-02-16T18:46:35.231767911Z 541	0	0B	catalog.candles/data-query/waiting
scylla_node      | 2024-02-16T18:46:35.231769333Z
scylla_node      | 2024-02-16T18:46:35.231770741Z 639	98	2032K	total
scylla_node      | 2024-02-16T18:46:35.231772107Z
scylla_node      | 2024-02-16T18:46:35.231773430Z Total: 639 permits with 98 count and 2032K memory resources
scylla_node      | 2024-02-16T18:46:35.231774848Z
scylla_node      | 2024-02-16T18:48:39.657502001Z INFO  2024-02-16 18:48:39,656 [shard 0] reader_concurrency_semaphore - (rate limiting dropped 202 similar messages) Semaphore _read_concurrency_sem with 100/100 count and 2730530/170750115 memory resources: timed out, dumping permit diagnostics:
scylla_node      | 2024-02-16T18:48:39.657517001Z permits	count	memory	table/description/state
scylla_node      | 2024-02-16T18:48:39.657519135Z 99	99	2643K	catalog.candles/data-query/inactive
scylla_node      | 2024-02-16T18:48:39.657521110Z 1	1	23K	catalog.candles/data-query/active/used
scylla_node      | 2024-02-16T18:48:39.657522614Z 13542	0	0B	catalog.candles/data-query/waiting
scylla_node      | 2024-02-16T18:48:39.657524048Z
scylla_node      | 2024-02-16T18:48:39.657525471Z 13642	100	2667K	total
scylla_node      | 2024-02-16T18:48:39.657526874Z
scylla_node      | 2024-02-16T18:48:39.657528218Z Total: 13642 permits with 100 count and 2667K memory resources
scylla_node      | 2024-02-16T18:48:39.657529608Z
scylla_node      | 2024-02-16T18:49:09.660815001Z INFO  2024-02-16 18:49:09,659 [shard 0] reader_concurrency_semaphore - (rate limiting dropped 3834 similar messages) Semaphore _read_concurrency_sem with 100/100 count and 2490677/170750115 memory resources: timed out, dumping permit diagnostics:
scylla_node      | 2024-02-16T18:49:09.660830225Z permits	count	memory	table/description/state
scylla_node      | 2024-02-16T18:49:09.660832250Z 96	96	2M	catalog.candles/data-query/inactive
scylla_node      | 2024-02-16T18:49:09.660833827Z 4	4	446K	catalog.candles/data-query/active/used
scylla_node      | 2024-02-16T18:49:09.660835255Z 14723	0	0B	catalog.candles/data-query/waiting
scylla_node      | 2024-02-16T18:49:09.660836701Z
scylla_node      | 2024-02-16T18:49:09.660838136Z 14823	100	2432K	total
scylla_node      | 2024-02-16T18:49:09.660839841Z
scylla_node      | 2024-02-16T18:49:09.660841341Z Total: 14823 permits with 100 count and 2432K memory resources
scylla_node      | 2024-02-16T18:49:09.660846879Z
scylla_node      | 2024-02-16T18:49:39.696854632Z INFO  2024-02-16 18:49:39,695 [shard 0] reader_concurrency_semaphore - (rate limiting dropped 3459 similar messages) Semaphore _read_concurrency_sem with 97/100 count and 2051096/170750115 memory resources: timed out, dumping permit diagnostics:
scylla_node      | 2024-02-16T18:49:39.696872216Z permits	count	memory	table/description/state
scylla_node      | 2024-02-16T18:49:39.696874342Z 96	96	2M	catalog.candles/data-query/inactive
scylla_node      | 2024-02-16T18:49:39.696875936Z 1	1	17K	catalog.candles/data-query/active/used
scylla_node      | 2024-02-16T18:49:39.696877410Z 15861	0	0B	catalog.candles/data-query/waiting
scylla_node      | 2024-02-16T18:49:39.696878964Z
scylla_node      | 2024-02-16T18:49:39.696880298Z 15958	97	2003K	total
scylla_node      | 2024-02-16T18:49:39.696881662Z
scylla_node      | 2024-02-16T18:49:39.696882960Z Total: 15958 permits with 97 count and 2003K memory resources
scylla_node      | 2024-02-16T18:49:39.696884320Z
scylla_node      | 2024-02-16T18:50:09.710129016Z INFO  2024-02-16 18:50:09,709 [shard 0] reader_concurrency_semaphore - (rate limiting dropped 3087 similar messages) Semaphore _read_concurrency_sem with 97/100 count and 2907535/170750115 memory resources: timed out, dumping permit diagnostics:
scylla_node      | 2024-02-16T18:50:09.710183713Z permits	count	memory	table/description/state
scylla_node      | 2024-02-16T18:50:09.710185973Z 92	92	1903K	catalog.candles/data-query/inactive
scylla_node      | 2024-02-16T18:50:09.710187518Z 5	5	936K	catalog.candles/data-query/active/used
scylla_node      | 2024-02-16T18:50:09.710188918Z 8056	0	0B	catalog.candles/data-query/waiting
scylla_node      | 2024-02-16T18:50:09.710190339Z
scylla_node      | 2024-02-16T18:50:09.710191737Z 8153	97	2839K	total
scylla_node      | 2024-02-16T18:50:09.710193132Z
scylla_node      | 2024-02-16T18:50:09.710194475Z Total: 8153 permits with 97 count and 2839K memory resources
scylla_node      | 2024-02-16T18:50:09.710195897Z

Scylla runs on a single node, the configuration is simple (--memory=8G, --smp=1). I looked at the load - there are enough resources on the server. The Scylla container is constantly running at 100+% CPU.

If you run it in only one process, everything works without errors. CPU load 90+%.

Do I understand correctly that the problem is the use of multiprocessing and incorrect configuration (do you need to allocate so much CPU so that it does not exceed 100% of the load on the database?)?

Seems like you are overloading your node. You either need to reduce load (use single loader process), or give ScyllaDB more than 1 CPUs so it can spread the load.