MultiDC cluster ,Observing 100% cpu when connecting scylla with kafka source connector to read cdc logs

amitesh88 · February 5, 2024, 8:00pm

We have our multi dc setup with 3 node in dc1 and 3 in dc2 . Although both the DCs are in the same region n same subnet, This is done as we require separate clusters for reading n writing data.

Our setup creates a new table everyday at 12 midnight with cdc enabled in DC1

At the same time we also create a kafka source connector to consume cdc logs from DC2 everyday

Issue:
At around 12
When creating a new source con , we observe scylladb servers on dc2 consumes 100% cpu.

We increased cpu from 16cores to 32 but still same behavior
Once the kafka connector creates its topic and start reading data from cdc log ,scylladb cpu cools down

Logs:

The logs in syslog shows reader_concurrency_semaphores for that time period.

Any expert thoughts is appreciated
Thanks in advance

roy · February 12, 2024, 1:10am

The cpu utilization itself isn’t always an indication of a problem, however the reader_concurrency_semaphore may be a stronger indication.

For how long do you have this high utlization and the reader_conccurency_semaphore errors?

Also, how exactly are your CDC configured?

Full logs would be more helpful, but sounds like a better place for it would be a Github issue.

amitesh88 · February 12, 2024, 2:09am

Thanks a lot for responding

Already raised this on github

Here is the link ,along with logs and other necessary details you need

github.com/scylladb/scylla-cdc-source-connector

MultiDC cluster, observing 100% CPU when connecting kafka debezium source conector

opened 08:23PM - 05 Feb 24 UTC

amitesh88

We have our multi dc setup with 3 node in dc1 and 3 in dc2 . Although both the D…Cs are in the same region n same subnet, This is done as we require separate clusters for reading n writing data. Our setup creates a new table everyday at 12 midnight with cdc enabled in DC1 At the same time we also create a kafka source connector to consume cdc logs from DC2 everyday Issue: At around 12 When creating a new source con , we observe scylladb servers on dc2 consumes 100% cpu. We increased cpu from 16cores to 32 but still same behavior Once the kafka connector creates its topic and start reading data from cdc log ,scylladb cpu cools down Logs: The logs in syslog shows reader_concurrency_semaphores for that time period. Any expert thoughts is appreciated Thanks in advance

The reader_concurreny_semaphore log only comes for 15 mins when a new kafka debezium source connector spawns. Cpu also goes high during that time .
Exact details are available in the link, please go
through it and help in understandng the exact issue

amitesh88 · February 12, 2024, 6:50pm

We manage to reduce 15 mins of High CPUby tweaking scylla.query.time.window.size in kafka source connector , but still no clue about 100% cpu utilisation

Also it is observed , our 6 node dc2 cluster and 4 nodes kafka connector, there is a total of 40000 active connection on each scylladb node

Any clue about this??

Topic		Replies	Views
Reader_concurrency_semaphore: Multiprocessing, timeout, CPU overload 100% ScyllaDB troubleshooting	1	418	February 19, 2024
Reader_concurrency_semaphore Database Community troubleshooting	1	206	September 4, 2024
Cluster overloaded? Getting error message: reader_concurrency_semaphore - (rate limiting dropped 49 similar messages) Semaphore _read_concurrency_sem with 55/100 count and 13255581/147597557 memory resources: timed out, dumping permit diagnostics: ScyllaDB error-message , troubleshooting , nodetool , hot-partition	0	67	February 13, 2025
New Kafka and ScyllaDB lab on ScyllaDB University University and Training cdc , kafka , integration , scylladb-university	0	614	January 23, 2023
Abnormal cluster's node(s) behaviour. High CPU usage on 4/5 nodes ScyllaDB performance , unanswered , kubernetes	11	116	June 13, 2025

MultiDC cluster ,Observing 100% cpu when connecting scylla with kafka source connector to read cdc logs

Related topics