Originally from the User Slack
@Łukasz_Sanokowski: Hello!
I have question about memory management, namely during data ingestion we see a large number of Writes blocked on dirty
, despite that there is plenty of free ram. Stack consist of latest Scylla operator running on recent GKE.
root@scylla-cluster-europe-west4-europe-west4-a-0:/# nodetool info
ID : 61faac7d-6854-40c1-913f-03d7353e6e0d
Gossip active : true
Thrift active : false
Native Transport active: true
Load : 41.54 GB
Generation No : 1725539773
Uptime (seconds) : 82014
Heap Memory (MB) : 0.00 / 0.00
Off Heap Memory (MB) : 2494.69
Data Center : europe-west4
Rack : europe-west4-a
Exceptions : 0
Key Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, 0.000 recent hit rate, 0 save period in seconds
Row Cache : entries 641541, size 14.70 GB, capacity 51.56 GB, 289137 hits, 289137 requests, 1.000 recent hit rate, 0 save period in seconds
Counter Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, 0.000 recent hit rate, 0 save period in seconds
Percent Repaired : 0.0%
Token : (invoke with -T/--tokens to see all 1280 tokens)
The average load is around 50%:
(please ignore the fact that we do lack node exporter metrics here, screenshot with CPU / RAM is pasted below)
Config of running scylla process:
root@scylla-cluster-europe-west4-europe-west4-a-0:/# ps aux | grep "usr/bin/scylla"
root 80 40.6 35.7 17180617744 23540712 ? Rl Sep05 556:14 /usr/bin/scylla --log-to-syslog 0 --log-to-stdout 1 --network-stack posix --io-properties-file=/etc/scylla.d/io_properties.yaml --cpuset 0-7 --smp 7 --overprovisioned --listen-address 0.0.0.0 --rpc-address 0.0.0.0 --seed-provider-parameters seeds=10.229.3.35 --broadcast-address 10.229.3.35 --broadcast-rpc-address 10.229.3.35 --alternator-address 0.0.0.0 --blocked-reactor-notify-ms 999999999 --prometheus-address=0.0.0.0
And the memory utilisation:
We suppose that because of this issue(?), the performance of data ingestion is affected.
@Maciej_Zimnoch: make sure to follow performance tuning documentation, you’re not getting most out of your cluster.
https://operator.docs.scylladb.com/stable/performance.html
Performance tuning | ScyllaDB Docs
@Łukasz_Sanokowski: Thanks @Maciej_Zimnoch, you are right: our pods are not QoS Guaranteed class.
Quick question: after editing the cluster.yaml
resource requests and limits (namely: adding agentResources
section), and applying the changes, we can see that scylla operator itself is displaying the correct value:
Kubectl describe scyllaclusters.scylla.scylladb.com scylla-cluster
Name: scylla-cluster
Namespace: scylla
Labels: <none>
Annotations: <none>
API Version: scylla.scylladb.com/v1
Kind: ScyllaCluster
Metadata:
Creation Timestamp: 2024-09-05T12:33:38Z
Generation: 9
Resource Version: 62702536
UID: 1b50e8e6-44d1-43d5-bbd0-86e0480596ec
Spec:
Agent Repository: docker.io/scylladb/scylla-manager-agent
Agent Version: 3.3.0
Automatic Orphaned Node Cleanup: true
Cpuset: true
Datacenter:
Name: europe-west4
Racks:
Agent Resources:
Limits:
Cpu: 1
Memory: 1G
Members: 1
Name: europe-west4-a
Placement:
Node Affinity:
Required During Scheduling Ignored During Execution:
Node Selector Terms:
Match Expressions:
Key: failure-domain.beta.kubernetes.io/zone
Operator: In
Values:
europe-west4-a
Tolerations:
Effect: NoSchedule
Key: role
Operator: Equal
Value: scylla-clusters
Resources:
Limits:
Cpu: 4
Memory: 16G
Scylla Agent Config: scylla-agent-config
Scylla Config: scylla-config
Storage:
Capacity: 750G
Storage Class Name: scylladb-local-xfs
Agent Resources:
Limits:
Cpu: 1
Memory: 1G
Members: 2
namely here 4CPU / 16 GB of ram && 1CPU and 1GB of ram, but the changes are not being propagated to the underlying statefulset:
Kubectl describe statefulsets.apps scylla-cluster-europe-west4-europe-west4-a
Name: scylla-cluster-europe-west4-europe-west4-a
Namespace: scylla
CreationTimestamp: Thu, 05 Sep 2024 14:33:39 +0200
Selector: app=scylla,app.kubernetes.io/managed-by=scylla-operator,app.kubernetes.io/name=scylla,scylla/cluster=scylla-cluster,scylla/datacenter=europe-west4,scylla/rack=europe-west4-a
Labels: app=scylla
app.kubernetes.io/managed-by=scylla-operator
app.kubernetes.io/name=scylla
scylla/cluster=scylla-cluster
scylla/datacenter=europe-west4
scylla/rack=europe-west4-a
scylla/rack-ordinal=0
scylla/scylla-version=6.1.1
Annotations: scylla-operator.scylladb.com/managed-hash: cSlD88rA8BiVl4IhUjDlosrRuvq82zC9bOwZ8rnKt9F12kpMKSevfq1g+laiY1Sq/uhl+ZlHzO0vXZhgC8PJIA==
Replicas: 1 desired | 1 total
Update Strategy: RollingUpdate
Partition: 0
Pods Status: 0 Running / 1 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=scylla
app.kubernetes.io/managed-by=scylla-operator
app.kubernetes.io/name=scylla
scylla/cluster=scylla-cluster
scylla/datacenter=europe-west4
scylla/rack=europe-west4-a
scylla/rack-ordinal=0
scylla/scylla-version=6.1.1
Annotations: prometheus.io/port: 9180
prometheus.io/scrape: true
scylla-operator.scylladb.com/inputs-hash: RGj+omxpBFTLQWKoIyzMjXXZpzmcSt0z4kaYfQCLlfAmvfU9DO0rbfxqe4ixbd6LME/Okt6/gwoKjNa2cJb1JQ==
Service Account: scylla-cluster-member
Init Containers:
sidecar-injection:
Image: docker.io/scylladb/scylla-operator:latest
Port: <none>
Host Port: <none>
Command:
/bin/sh
-c
cp -a /usr/bin/scylla-operator /mnt/shared
Limits:
cpu: 10m
memory: 50Mi
Requests:
cpu: 10m
memory: 50Mi
Environment: <none>
Mounts:
/mnt/shared from shared (rw)
sysctl-buddy:
Image: docker.io/scylladb/scylla-operator:latest
Port: <none>
Host Port: <none>
Command:
/bin/sh
-c
sysctl -w fs.aio-max-nr=2097152
Limits:
cpu: 10m
memory: 50Mi
Requests:
cpu: 10m
memory: 50Mi
Environment: <none>
Mounts: <none>
Containers:
scylla:
Image: docker.io/scylladb/scylla:6.1.1
Ports: 7000/TCP, 7001/TCP, 9042/TCP, 9142/TCP, 7199/TCP, 9180/TCP, 9100/TCP, 9160/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
Command:
/usr/bin/bash
-euEo
pipefail
-O
inherit_errexit
-c
printf 'INFO %s ignition - Waiting for /mnt/shared/ignition.done\n' "$( date '+%Y-%m-%d %H:%M:%S,%3N' )" > /dev/stderr
until [[ -f "/mnt/shared/ignition.done" ]]; do
sleep 1;
done
printf 'INFO %s ignition - Ignited. Starting ScyllaDB...\n' "$( date '+%Y-%m-%d %H:%M:%S,%3N' )" > /dev/stderr
# TODO: This is where we should start ScyllaDB directly after the sidecar split #1942
exec /mnt/shared/scylla-operator sidecar \
--feature-gates=AllAlpha=false,AllBeta=false,AutomaticTLSCertificates=true \
--nodes-broadcast-address-type=ServiceClusterIP \
--clients-broadcast-address-type=ServiceClusterIP \
--service-name=$(SERVICE_NAME) \
--cpu-count=$(CPU_COUNT) \
--loglevel=2 \
Limits:
cpu: 6
memory: 60G
Liveness: http-get http://:8080/healthz delay=0s timeout=10s period=10s #success=1 #failure=12
Readiness: http-get http://:8080/readyz delay=0s timeout=30s period=10s #success=1 #failure=1
Startup: http-get http://:8080/healthz delay=0s timeout=30s period=10s #success=1 #failure=40
Environment:
SERVICE_NAME: (v1:metadata.name)
CPU_COUNT: 6 (limits.cpu)
Mounts:
/mnt/scylla-client-config from scylla-client-config-volume (ro)
/mnt/scylla-config from scylla-config-volume (ro)
/mnt/shared from shared (ro)
/var/lib/scylla from data (rw)
/var/run/configmaps/scylla-operator.scylladb.com/scylladb/managed-config from scylladb-managed-config (ro)
/var/run/secrets/scylla-operator.scylladb.com/scylladb/client-ca from scylladb-client-ca (ro)
/var/run/secrets/scylla-operator.scylladb.com/scylladb/serving-certs from scylladb-serving-certs (ro)
/var/run/secrets/scylla-operator.scylladb.com/scylladb/user-admin from scylladb-user-admin (ro)
scylladb-api-status-probe:
Image: docker.io/scylladb/scylla-operator:latest
Port: <none>
Host Port: <none>
Command:
/usr/bin/scylla-operator
serve-probes
scylladb-api-status
--port=8080
--service-name=$(SERVICE_NAME)
--loglevel=2
Limits:
cpu: 10m
memory: 40Mi
Requests:
cpu: 10m
memory: 40Mi
Readiness: tcp-socket :8080 delay=0s timeout=30s period=5s #success=1 #failure=1
Environment:
SERVICE_NAME: (v1:metadata.name)
Mounts: <none>
scylladb-ignition:
Image: docker.io/scylladb/scylla-operator:latest
Port: <none>
Host Port: <none>
Command:
/usr/bin/scylla-operator
run-ignition
--service-name=$(SERVICE_NAME)
--nodes-broadcast-address-type=ServiceClusterIP
--clients-broadcast-address-type=ServiceClusterIP
--loglevel=2
Limits:
cpu: 10m
memory: 40Mi
Requests:
cpu: 10m
memory: 40Mi
Readiness: http-get http://:42081/readyz delay=0s timeout=30s period=5s #success=1 #failure=1
Environment:
SERVICE_NAME: (v1:metadata.name)
Mounts:
/mnt/shared from shared (rw)
scylla-manager-agent:
Image: docker.io/scylladb/scylla-manager-agent:3.3.0
Port: 10001/TCP
Host Port: 0/TCP
Command:
/usr/bin/bash
-euEo
pipefail
-O
inherit_errexit
-c
printf '{"L":"INFO","T":"%s","M":"Waiting for /mnt/shared/ignition.done"}\n' "$( date -u '+%Y-%m-%dT%H:%M:%S,%3NZ' )" > /dev/stderr
until [[ -f "/mnt/shared/ignition.done" ]]; do
sleep 1;
done
printf '{"L":"INFO","T":"%s","M":"Ignited. Starting ScyllaDB Manager Agent"}\n' "$( date -u '+%Y-%m-%dT%H:%M:%S,%3NZ' )" > /dev/stderr
scylla-manager-agent \
-c "/etc/scylla-manager-agent/scylla-manager-agent.yaml" \
-c "/mnt/scylla-agent-config/scylla-manager-agent.yaml" \
-c "/mnt/scylla-agent-config/auth-token.yaml"
Limits:
cpu: 1
memory: 1G
where scylla
container limits are still:
Limits:
cpu: 6
memory: 60G
@Maciej_Zimnoch: please collect must-gather dump and attach it here so i can have a look at entire picture
Gathering data with must-gather | ScyllaDB Docs
@Łukasz_Sanokowski: Sure thing:
@Maciej_Zimnoch: it’s because we apply statefulset changes only once they are fully rolled out. Your first pod in -a rack is missing
lastTransitionTime: "2024-09-06T13:14:04Z"
message: '0/6 nodes are available: 1 Insufficient memory, 5 node(s) didn''t match
Pod''s node affinity/selector. preemption: 0/6 nodes are available: 1 No preemption
victims found for incoming pod, 5 Preemption is not helpful for scheduling.'
reason: Unschedulable
status: "False"
type: PodScheduled
@Łukasz_Sanokowski: The pod is missing since the attempt of setting:
agentResources:
requests:
cpu: 1
memory: 1G
which eventually has resulted with lack of ram available on the node, so the second attempt was to reduce the:
Resources:
Limits:
Cpu: 4
Memory: 16G
which is failing because (likely) we apply statefulset changes only once they are fully rolled out
Shall I edit the statefulset directly, to make it healthy and operator to accept and propagate changes?
Update: it helped
Alright Maciej, i think that for now I know where and how to proceed, thanks for your help!
@Maciej_Zimnoch: i’m not sure your initial issue with memory will be solved, but lets see if it helped.
@Łukasz_Sanokowski: Sure, let me tweak around for a while, will come back with the results