Hi, I’ve tried to deploy Scylladb into AWS EKS Kubernetes with yaml manifests and helm charts, but I’m encountering an error
Internal error occurred: failed calling webhook "webhook.scylla.scylladb.com": failed to call webhook: Post "https://scylla-operator-webhook.scylla-operator.svc:443/validate?timeout=10s": context deadline exceeded
Deployment process with manifests:
git clone git@github.com:scylladb/scylla-operator.git
cd scylla-operator
# Cert manager is deployed already when the EKS cluster was originally created, so that's skipped.
kubectl apply -f deploy/operator.yaml
kubectl wait --for condition=established crd/scyllaclusters.scylla.scylladb.com \
&& kubectl -n scylla-operator rollout status deployment.apps/scylla-operator
kubectl -n scylla-operator logs deployment.apps/scylla-operator
Found 2 pods, using pod/scylla-operator-5c8cb676d9-mbfcq
I0731 10:16:54.692443 1 operator/cmd.go:21] maxprocs: Leaving GOMAXPROCS=[4]: CPU quota undefined
I0731 10:16:54.692905 1 operator/operator.go:202] operator version "v1.14.0-alpha.0-87-g6d65216"
I0731 10:16:54.692928 1 flag/flags.go:64] FLAG: --burst="75"
I0731 10:16:54.692933 1 flag/flags.go:64] FLAG: --concurrent-syncs="50"
I0731 10:16:54.692936 1 flag/flags.go:64] FLAG: --cqls-ingress-port="0"
I0731 10:16:54.692941 1 flag/flags.go:64] FLAG: --crypto-key-buffer-delay="200ms"
I0731 10:16:54.692947 1 flag/flags.go:64] FLAG: --crypto-key-buffer-size-max="30"
I0731 10:16:54.692951 1 flag/flags.go:64] FLAG: --crypto-key-buffer-size-min="10"
I0731 10:16:54.692955 1 flag/flags.go:64] FLAG: --feature-gates=""
I0731 10:16:54.692985 1 flag/flags.go:64] FLAG: --help="false"
I0731 10:16:54.692997 1 flag/flags.go:64] FLAG: --image="docker.io/scylladb/scylla-operator:latest"
I0731 10:16:54.693003 1 flag/flags.go:64] FLAG: --kubeconfig=""
I0731 10:16:54.693007 1 flag/flags.go:64] FLAG: --leader-election-lease-duration="1m0s"
I0731 10:16:54.693013 1 flag/flags.go:64] FLAG: --leader-election-renew-deadline="35s"
I0731 10:16:54.693017 1 flag/flags.go:64] FLAG: --leader-election-retry-period="10s"
I0731 10:16:54.693020 1 flag/flags.go:64] FLAG: --loglevel="2"
I0731 10:16:54.693025 1 flag/flags.go:64] FLAG: --namespace="scylla-operator"
I0731 10:16:54.693030 1 flag/flags.go:64] FLAG: --qps="50"
I0731 10:16:54.693035 1 flag/flags.go:64] FLAG: --v="2"
I0731 10:16:54.693258 1 leaderelection/leaderelection.go:100] Starting leader election
I0731 10:16:54.693278 1 leaderelection/leaderelection.go:250] attempting to acquire leader lease scylla-operator/scylla-operator-lock...
kubectl create -f examples/generic/cluster.yaml
Error from server (InternalError): error when creating "examples/generic/cluster.yaml": Internal error occurred: failed calling webhook "webhook.scylla.scylladb.com": failed to call webhook: Post "https://scylla-operator-webhook.scylla-operator.svc:443/validate?timeout=10s": context deadline exceeded
I’ve seen in couple threads that firewall port 9443 should be open from cluster to the nodes. I’ve checked the rules with this process:
# List node groups
aws eks --region eu-central-1 list-nodegroups --cluster-name k8s01 --query nodegroups
[
"k8s01-initial-2024062407114279650000000f"
]
# Get auto scaling group name
aws eks --region eu-central-1 describe-nodegroup --cluster-name k8s01 --nodegroup-name k8s01-initial-2024062407114279650000000f --query nodegroup.resources.autoScalingGroups[].name
[
"eks-k8s01-initial-2024062407114279650000000f-b2c8248b-0857-f142-cf40-7a7c66219c46"
]
# Get one instance of the auto scaling group
aws autoscaling --region eu-central-1 describe-auto-scaling-groups --auto-scaling-group-names eks-k8s01-initial-2024062407114279650000000f-b2c8248b-0857-f142-cf40-7a7c66219c46 --query AutoScalingGroups[].Instances[0].InstanceId
[
"i-000d9209e60e7365d"
]
# Get security groups of the instance
aws ec2 --region eu-central-1 describe-instances --instance-ids i-000d9209e60e7365d --query Reservations[].Instances[].SecurityGroups[].GroupId
[
"sg-0c79a4a5106d84011"
]
# Check firewall rules of the security group
aws ec2 --region eu-central-1 describe-security-groups --group-ids sg-0c79a4a5106d84011 --query 'SecurityGroups[].IpPermissions[].{FromPort:FromPort,UserIdGroupPairs:UserIdGroupPairs}' --output yaml | sed '/UserId/d'
- FromPort: 30080
- FromPort: 6443
- Description: Cluster API to node 6443/tcp webhook
GroupId: sg-06a93cc5cd09cf20a
- FromPort: 30880
- FromPort: null
- Description: Node to node ingress traffic
GroupId: sg-0c79a4a5106d84011
- FromPort: 9443
- Description: Cluster API to node 9443/tcp webhook
GroupId: sg-06a93cc5cd09cf20a
- FromPort: 1025
- Description: Node to node ingress on ephemeral ports
GroupId: sg-0c79a4a5106d84011
- FromPort: 8443
- Description: Cluster API to node 8443/tcp webhook
GroupId: sg-06a93cc5cd09cf20a
- FromPort: 10250
- Description: Cluster API to node kubelets
GroupId: sg-06a93cc5cd09cf20a
- FromPort: 53
- Description: Node to node CoreDNS
GroupId: sg-0c79a4a5106d84011
- FromPort: 53
- Description: Node to node CoreDNS UDP
GroupId: sg-0c79a4a5106d84011
- FromPort: 443
- Description: Cluster API to node groups
GroupId: sg-06a93cc5cd09cf20a
- FromPort: 4443
- Description: Cluster API to node 4443/tcp webhook
GroupId: sg-06a93cc5cd09cf20a
# Check the cluster security groups
aws eks describe-cluster --region eu-central-1 --name k8s01 --query 'cluster.resourcesVpcConfig.{securityGroupIds:securityGroupIds, clusterSecurityGroupId:clusterSecurityGroupId}'
{
"securityGroupIds": [
"sg-06a93cc5cd09cf20a"
],
"clusterSecurityGroupId": "sg-06071c24109602e48"
}
Cluster security group sg-06a93cc5cd09cf20a has access to the nodes via port 9443, so I think the firewall rule is fine.
- FromPort: 9443
- Description: Cluster API to node 9443/tcp webhook
GroupId: sg-06a93cc5cd09cf20a
Versions:
- Platform: AWS EKS
- Kubernetes: v1.30.0-eks-036c24b
- cert-manager: v1.12.12
- Helm: v3.15.2
- Scylla-operator: 1.14.0
Any ideas what I should check?