Operator - memory management issue when ingesting data, Writes blocked on dirty

Guy · September 22, 2024, 8:03am

@Łukasz_Sanokowski: Hello!

I have question about memory management, namely during data ingestion we see a large number of Writes blocked on dirty , despite that there is plenty of free ram. Stack consist of latest Scylla operator running on recent GKE.

root@scylla-cluster-europe-west4-europe-west4-a-0:/# nodetool info
ID                     : 61faac7d-6854-40c1-913f-03d7353e6e0d
Gossip active          : true
Thrift active          : false
Native Transport active: true
Load                   : 41.54 GB
Generation No          : 1725539773
Uptime (seconds)       : 82014
Heap Memory (MB)       : 0.00 / 0.00
Off Heap Memory (MB)   : 2494.69
Data Center            : europe-west4
Rack                   : europe-west4-a
Exceptions             : 0
Key Cache              : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, 0.000 recent hit rate, 0 save period in seconds
Row Cache              : entries 641541, size 14.70 GB, capacity 51.56 GB, 289137 hits, 289137 requests, 1.000 recent hit rate, 0 save period in seconds
Counter Cache          : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, 0.000 recent hit rate, 0 save period in seconds
Percent Repaired       : 0.0%
Token                  : (invoke with -T/--tokens to see all 1280 tokens)

The average load is around 50%:
(please ignore the fact that we do lack node exporter metrics here, screenshot with CPU / RAM is pasted below)
Config of running scylla process:

root@scylla-cluster-europe-west4-europe-west4-a-0:/# ps aux | grep "usr/bin/scylla"
root          80 40.6 35.7 17180617744 23540712 ? Rl  Sep05 556:14 /usr/bin/scylla --log-to-syslog 0 --log-to-stdout 1 --network-stack posix --io-properties-file=/etc/scylla.d/io_properties.yaml --cpuset 0-7 --smp 7 --overprovisioned --listen-address 0.0.0.0 --rpc-address 0.0.0.0 --seed-provider-parameters seeds=10.229.3.35 --broadcast-address 10.229.3.35 --broadcast-rpc-address 10.229.3.35 --alternator-address 0.0.0.0 --blocked-reactor-notify-ms 999999999 --prometheus-address=0.0.0.0

And the memory utilisation:
We suppose that because of this issue(?), the performance of data ingestion is affected.

@Maciej_Zimnoch: make sure to follow performance tuning documentation, you’re not getting most out of your cluster.
https://operator.docs.scylladb.com/stable/performance.html

Performance tuning | ScyllaDB Docs

@Łukasz_Sanokowski: Thanks @Maciej_Zimnoch, you are right: our pods are not QoS Guaranteed class.
Quick question: after editing the cluster.yaml resource requests and limits (namely: adding agentResources section), and applying the changes, we can see that scylla operator itself is displaying the correct value:

Kubectl describe scyllaclusters.scylla.scylladb.com scylla-cluster
Name:         scylla-cluster
Namespace:    scylla
Labels:       <none>
Annotations:  <none>
API Version:  scylla.scylladb.com/v1
Kind:         ScyllaCluster
Metadata:
  Creation Timestamp:  2024-09-05T12:33:38Z
  Generation:          9
  Resource Version:    62702536
  UID:                 1b50e8e6-44d1-43d5-bbd0-86e0480596ec
Spec:
  Agent Repository:                 docker.io/scylladb/scylla-manager-agent
  Agent Version:                    3.3.0
  Automatic Orphaned Node Cleanup:  true
  Cpuset:                           true
  Datacenter:
    Name:  europe-west4
    Racks:
      Agent Resources:
        Limits:
          Cpu:     1
          Memory:  1G
      Members:     1
      Name:        europe-west4-a
      Placement:
        Node Affinity:
          Required During Scheduling Ignored During Execution:
            Node Selector Terms:
              Match Expressions:
                Key:       failure-domain.beta.kubernetes.io/zone
                Operator:  In
                Values:
                  europe-west4-a
        Tolerations:
          Effect:    NoSchedule
          Key:       role
          Operator:  Equal
          Value:     scylla-clusters
      Resources:
        Limits:
          Cpu:              4
          Memory:           16G
      Scylla Agent Config:  scylla-agent-config
      Scylla Config:        scylla-config
      Storage:
        Capacity:            750G
        Storage Class Name:  scylladb-local-xfs
      Agent Resources:
        Limits:
          Cpu:     1
          Memory:  1G
      Members:     2

namely here 4CPU / 16 GB of ram && 1CPU and 1GB of ram, but the changes are not being propagated to the underlying statefulset:

Kubectl describe statefulsets.apps scylla-cluster-europe-west4-europe-west4-a 
Name:               scylla-cluster-europe-west4-europe-west4-a
Namespace:          scylla
CreationTimestamp:  Thu, 05 Sep 2024 14:33:39 +0200
Selector:           app=scylla,app.kubernetes.io/managed-by=scylla-operator,app.kubernetes.io/name=scylla,scylla/cluster=scylla-cluster,scylla/datacenter=europe-west4,scylla/rack=europe-west4-a
Labels:             app=scylla
                    app.kubernetes.io/managed-by=scylla-operator
                    app.kubernetes.io/name=scylla
                    scylla/cluster=scylla-cluster
                    scylla/datacenter=europe-west4
                    scylla/rack=europe-west4-a
                    scylla/rack-ordinal=0
                    scylla/scylla-version=6.1.1
Annotations:        scylla-operator.scylladb.com/managed-hash: cSlD88rA8BiVl4IhUjDlosrRuvq82zC9bOwZ8rnKt9F12kpMKSevfq1g+laiY1Sq/uhl+ZlHzO0vXZhgC8PJIA==
Replicas:           1 desired | 1 total
Update Strategy:    RollingUpdate
  Partition:        0
Pods Status:        0 Running / 1 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app=scylla
                    app.kubernetes.io/managed-by=scylla-operator
                    app.kubernetes.io/name=scylla
                    scylla/cluster=scylla-cluster
                    scylla/datacenter=europe-west4
                    scylla/rack=europe-west4-a
                    scylla/rack-ordinal=0
                    scylla/scylla-version=6.1.1
  Annotations:      prometheus.io/port: 9180
                    prometheus.io/scrape: true
                    scylla-operator.scylladb.com/inputs-hash: RGj+omxpBFTLQWKoIyzMjXXZpzmcSt0z4kaYfQCLlfAmvfU9DO0rbfxqe4ixbd6LME/Okt6/gwoKjNa2cJb1JQ==
  Service Account:  scylla-cluster-member
  Init Containers:
   sidecar-injection:
    Image:      docker.io/scylladb/scylla-operator:latest
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/sh
      -c
      cp -a /usr/bin/scylla-operator /mnt/shared
    Limits:
      cpu:     10m
      memory:  50Mi
    Requests:
      cpu:        10m
      memory:     50Mi
    Environment:  <none>
    Mounts:
      /mnt/shared from shared (rw)
   sysctl-buddy:
    Image:      docker.io/scylladb/scylla-operator:latest
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/sh
      -c
      sysctl -w fs.aio-max-nr=2097152
    Limits:
      cpu:     10m
      memory:  50Mi
    Requests:
      cpu:        10m
      memory:     50Mi
    Environment:  <none>
    Mounts:       <none>
  Containers:
   scylla:
    Image:       docker.io/scylladb/scylla:6.1.1
    Ports:       7000/TCP, 7001/TCP, 9042/TCP, 9142/TCP, 7199/TCP, 9180/TCP, 9100/TCP, 9160/TCP
    Host Ports:  0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
    Command:
      /usr/bin/bash
      -euEo
      pipefail
      -O
      inherit_errexit
      -c
      printf 'INFO %s ignition - Waiting for /mnt/shared/ignition.done\n' "$( date '+%Y-%m-%d %H:%M:%S,%3N' )" > /dev/stderr
      until [[ -f "/mnt/shared/ignition.done" ]]; do
        sleep 1;
      done
      printf 'INFO %s ignition - Ignited. Starting ScyllaDB...\n' "$( date '+%Y-%m-%d %H:%M:%S,%3N' )" > /dev/stderr
      
      # TODO: This is where we should start ScyllaDB directly after the sidecar split #1942 
      exec /mnt/shared/scylla-operator sidecar \
      --feature-gates=AllAlpha=false,AllBeta=false,AutomaticTLSCertificates=true \
      --nodes-broadcast-address-type=ServiceClusterIP \
      --clients-broadcast-address-type=ServiceClusterIP \
      --service-name=$(SERVICE_NAME) \
      --cpu-count=$(CPU_COUNT) \
      --loglevel=2 \
    Limits:
      cpu:      6
      memory:   60G
    Liveness:   http-get http://:8080/healthz delay=0s timeout=10s period=10s #success=1 #failure=12
    Readiness:  http-get http://:8080/readyz delay=0s timeout=30s period=10s #success=1 #failure=1
    Startup:    http-get http://:8080/healthz delay=0s timeout=30s period=10s #success=1 #failure=40
    Environment:
      SERVICE_NAME:   (v1:metadata.name)
      CPU_COUNT:     6 (limits.cpu)
    Mounts:
      /mnt/scylla-client-config from scylla-client-config-volume (ro)
      /mnt/scylla-config from scylla-config-volume (ro)
      /mnt/shared from shared (ro)
      /var/lib/scylla from data (rw)
      /var/run/configmaps/scylla-operator.scylladb.com/scylladb/managed-config from scylladb-managed-config (ro)
      /var/run/secrets/scylla-operator.scylladb.com/scylladb/client-ca from scylladb-client-ca (ro)
      /var/run/secrets/scylla-operator.scylladb.com/scylladb/serving-certs from scylladb-serving-certs (ro)
      /var/run/secrets/scylla-operator.scylladb.com/scylladb/user-admin from scylladb-user-admin (ro)
   scylladb-api-status-probe:
    Image:      docker.io/scylladb/scylla-operator:latest
    Port:       <none>
    Host Port:  <none>
    Command:
      /usr/bin/scylla-operator
      serve-probes
      scylladb-api-status
      --port=8080
      --service-name=$(SERVICE_NAME)
      --loglevel=2
    Limits:
      cpu:     10m
      memory:  40Mi
    Requests:
      cpu:      10m
      memory:   40Mi
    Readiness:  tcp-socket :8080 delay=0s timeout=30s period=5s #success=1 #failure=1
    Environment:
      SERVICE_NAME:   (v1:metadata.name)
    Mounts:          <none>
   scylladb-ignition:
    Image:      docker.io/scylladb/scylla-operator:latest
    Port:       <none>
    Host Port:  <none>
    Command:
      /usr/bin/scylla-operator
      run-ignition
      --service-name=$(SERVICE_NAME)
      --nodes-broadcast-address-type=ServiceClusterIP
      --clients-broadcast-address-type=ServiceClusterIP
      --loglevel=2
    Limits:
      cpu:     10m
      memory:  40Mi
    Requests:
      cpu:      10m
      memory:   40Mi
    Readiness:  http-get http://:42081/readyz delay=0s timeout=30s period=5s #success=1 #failure=1
    Environment:
      SERVICE_NAME:   (v1:metadata.name)
    Mounts:
      /mnt/shared from shared (rw)
   scylla-manager-agent:
    Image:      docker.io/scylladb/scylla-manager-agent:3.3.0
    Port:       10001/TCP
    Host Port:  0/TCP
    Command:
      /usr/bin/bash
      -euEo
      pipefail
      -O
      inherit_errexit
      -c
      printf '{"L":"INFO","T":"%s","M":"Waiting for /mnt/shared/ignition.done"}\n' "$( date -u '+%Y-%m-%dT%H:%M:%S,%3NZ' )" > /dev/stderr
      until [[ -f "/mnt/shared/ignition.done" ]]; do
        sleep 1;
      done
      printf '{"L":"INFO","T":"%s","M":"Ignited. Starting ScyllaDB Manager Agent"}\n' "$( date -u '+%Y-%m-%dT%H:%M:%S,%3NZ' )" > /dev/stderr
      
      scylla-manager-agent \
      -c "/etc/scylla-manager-agent/scylla-manager-agent.yaml" \
      -c "/mnt/scylla-agent-config/scylla-manager-agent.yaml" \
      -c "/mnt/scylla-agent-config/auth-token.yaml"
    Limits:
      cpu:        1
      memory:     1G

where scylla container limits are still:

    Limits:
      cpu:      6
      memory:   60G

@Maciej_Zimnoch: please collect must-gather dump and attach it here so i can have a look at entire picture

Gathering data with must-gather | ScyllaDB Docs

@Łukasz_Sanokowski: Sure thing:

@Maciej_Zimnoch: it’s because we apply statefulset changes only once they are fully rolled out. Your first pod in -a rack is missing

    lastTransitionTime: "2024-09-06T13:14:04Z"
    message: '0/6 nodes are available: 1 Insufficient memory, 5 node(s) didn''t match
      Pod''s node affinity/selector. preemption: 0/6 nodes are available: 1 No preemption
      victims found for incoming pod, 5 Preemption is not helpful for scheduling.'
    reason: Unschedulable
    status: "False"
    type: PodScheduled

@Łukasz_Sanokowski: The pod is missing since the attempt of setting:

      agentResources:
        requests:
          cpu: 1
          memory: 1G

which eventually has resulted with lack of ram available on the node, so the second attempt was to reduce the:

      Resources:
        Limits:
          Cpu:              4
          Memory:           16G

which is failing because (likely) we apply statefulset changes only once they are fully rolled out
Shall I edit the statefulset directly, to make it healthy and operator to accept and propagate changes?
Update: it helped
Alright Maciej, i think that for now I know where and how to proceed, thanks for your help!

@Maciej_Zimnoch: i’m not sure your initial issue with memory will be solved, but lets see if it helped.

@Łukasz_Sanokowski: Sure, let me tweak around for a while, will come back with the results

Topic		Replies	Views
Scylla constantly flushes memtables and runs huge number of compactions ScyllaDB	0	138	March 18, 2025
[REALESE] Scylla 5.4 RC1 - part 3 Release Notes open-source , release-candidate , open-source-release , open-source-5-4	0	611	November 8, 2023
Allocation problems 6.1 ScyllaDB troubleshooting , memory	4	86	September 4, 2025
6.0.4 Major Crashes Due to Memory/Gossip Failures ScyllaDB error-message , troubleshooting , upgrade , memory	10	254	March 4, 2025
Read errors in a cluster, IO setup failure, disk performance and optimizations ScyllaDB error-message , administration , configuration , io , disk	0	32	August 4, 2025

Operator - memory management issue when ingesting data, Writes blocked on dirty

Related topics