Getting address already in use error, number of client side connections

Guy · June 12, 2025, 10:26am

Originally from the User Slack

@Terence_Liu: I’m having a lot of trouble getting our istio service mesh to live together with the scylla operator and scylla cluster. Is there a guide that help set this up right?

@Terence_Liu: We’ve excluded a few ports on the operator and the ScyllaCluster, notably 7000 (node-to-node communication) and 8080 (from the scylladb-api-status-probe), the service is up, and client can read from it. But when writing becomes anywhere around 500~1000 reqs/s, the scylladb-api-status-probe starts to report errors, and the istio sidecar starts to throw out many messages like
Request to probe app failed: Get "<http://100.122.13.158:8080/healthz>": dial tcp 127.0.0.6:0->100.122.13.158:8080: bind: address already in use, original URL path = /app-health/scylla/livez
app URL path = /healthz
This forces the scylla livenessProbe to fail after 12 tries, and restarts scylla. My 3-node cluster goes into a rotating restart wave as a result. When no writes happen, some reads seem totally fine.
This is the pod level manifest. You can see from ISTIO_KUBE_APP_PROBERS our istio remaps these health endpoints (notably on 8080) to
/app-health/{container_name}/{startupz,readyz,livez}
on 15020.

Better view
{'/app-health/scylla-manager-agent/readyz': {'tcpSocket': {'port': 10001},
                                             'timeoutSeconds': 1},
 '/app-health/scylla/livez': {'httpGet': {'path': '/healthz',
                                          'port': 8080,
                                          'scheme': 'HTTP'},
                              'timeoutSeconds': 10},
 '/app-health/scylla/readyz': {'httpGet': {'path': '/readyz',
                                           'port': 8080,
                                           'scheme': 'HTTP'},
                               'timeoutSeconds': 30},
 '/app-health/scylla/startupz': {'httpGet': {'path': '/healthz',
                                             'port': 8080,
                                             'scheme': 'HTTP'},
                                 'timeoutSeconds': 30},
 '/app-health/scylladb-api-status-probe/readyz': {'tcpSocket': {'port': 8080},
                                                  'timeoutSeconds': 30},
 '/app-health/scylladb-ignition/readyz': {'httpGet': {'path': '/readyz',
                                                      'port': 42081,
                                                      'scheme': 'HTTP'},
                                          'timeoutSeconds': 30
We figured it out - we opened too many CQL connections from the client side, ~3000 to a 3-node Scylla cluster. After packing and trimming there, the cluster became stable to write to.
The excessive connections were not only a strain on the cluster, but more importantly our istio service mesh.

Topic	Replies	Views
ScyllaDB K8 Operator Error: [unsupported value of nodes-broadcast-address-type "", supported ones are: ScyllaDB troubleshooting , operator , kubernetes	498	March 13, 2024
Operator sidecar issue, getting error: syncing key failed: can't sync the HostID annotation Kubernetes Operator error-message , operator , kubernetes , memory	6	June 11, 2025
Operator Replace operation on node issue ScyllaDB troubleshooting , operator	8	December 15, 2024
Loss of Availability and Timeout errors, Kubernetes nodes de-scheduled ScyllaDB troubleshooting , kubernetes , high-availability	150	May 20, 2024
[RELEASE] Scylla Operator 1.11.1 Release Notes release , operator , operator-release , kubernetes	356	December 28, 2023

Getting address already in use error, number of client side connections

Related topics