Originally from the User Slack
@Terence_Liu: How to expand the cluster from 3 members to, say 4 with the operator? I tried to update members:
and restarted, but the new node could not join the existing cluster due to missing certain files
@Terence_Liu:
Error: can't set up scylla: can't setup rackdc properties file: can't read operator snitch config from "/var/run/configmaps/scylla-operator.scylladb.com/scylladb/snitch-config/cassandra-rackdc.properties": open /var/run/configmaps/scylla-operator.scylladb.com/scylladb/snitch-config/cassandra-rackdc.properties: no such file or directory
I’m assuming upon initial cluster setup, proper number of hosts are prepopulated with the right files to form a cluster. I was expecting adjusting that number would automatically set up the new joiners. But apparently it didn’t happen.
I tried to refer to this page https://operator.docs.scylladb.com/stable/resources/scyllaclusters/nodeoperations/replace-node.html but the tagging & restarting didn’t do anything.
Replacing a Scylla node | ScyllaDB Docs
@Maciej_Zimnoch: I guess you’re using latest
as your Operator version, and between when you initially installed it, and when new node was added new version was released. Is it true?
@Terence_Liu: it was indeed latest
as of 11/14/2024
I upgrade the existing nodes from 6.2.1 to 6.2.3 before introducing the new node
@Maciej_Zimnoch: That’s why you should never use rolling tags. Because they resolve to different versions at different time. Make sure to use stable one and use digest format.
> I upgrade the existing nodes from 6.2.1 to 6.2.3 before introducing the new node
I’m talking about Operator version, not Scylla version.
@Terence_Liu: In my case, I don’t think scylla operator was updated or new resources under scylla-operator was created though?
So I’m essentially still looking at whatever latest version on Nov 14
@Maciej_Zimnoch: Operator runs sidecar alongside Scylla and image used by it is the same as Operator image. Because new node was added, latest
resolved to different version that what Operator deployment is using. These two must be in sync.
@Terence_Liu: oooh
So… if I were to make this work, I should pin scylla operator latest explicitly to that Nov 14 version, and recreate the new node? That way, new node will take the same operator sidecar version
@Maciej_Zimnoch: yes
make sure to specify proper version in two places:
https://github.com/scylladb/scylla-operator/blob/61ddd3c444712fc74a2e29286ff4564e2cce0dcd/deploy/operator/50_operator.deployment.yaml#L25-L30
GitHub: scylla-operator/deploy/operator/50_operator.deployment.yaml at 61ddd3c444712fc74a2e29286ff4564e2cce0dcd · scylladb/scylla-operator
both container image and env var as the env var controls sidecar image
@Terence_Liu: Got it. Thank you for the tip - will report back. How do people update scylla operator though?
@Maciej_Zimnoch: https://operator.docs.scylladb.com/stable/installation/overview.html#installation-modes
Overview | ScyllaDB Docs
@Terence_Liu: oh, awesome - N+1, very helpful!
@Bradley_Stock: Hi, so I’ve updated the operator to pin the version (to get the operator deployed, I had to use helm template
and copy the manifests to git in order to make manual modifications to support istio, and those were set to latest by default) from latest to 1.14.
The operator itself updated and seems to have run fine, but my ScyllaCluster object seems to be a bit de-sync’d from the operator. I’m seeing errors such as:
I0417 16:13:53.449252 1 record/event.go:376] "Event occurred" object="<redacted>-config" fieldPath="" kind="ConfigMap" apiVersion="v1" type="Warning" reason="UpdateConfigMapFailed" message="Failed to update ConfigMap <ns>/<redacted>-config: /v1, Kind=ConfigMap \"<ns>/<redacted>-config\" isn't controlled by us"
E0417 17:20:33.534988 1 scyllacluster/controller.go:263] syncing key '<ns>/<redacted>' failed: [can't sync service accounts: can't apply service account: /v1, Kind=ServiceAccount "<ns>/<redacted>-member" isn't controlled by us, can't sync role bindings: can't apply role binding: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>, Kind=RoleBinding "<ns>/<redacted>-member" isn't controlled by us, can't sync agent token: can't apply secret "<ns>/<redacted>-auth-token": /v1, Kind=Secret "<ns>/<redacted>-auth-token" isn't controlled by us, can't sync certificates: [can't apply secret "<ns>/<redacted>-local-client-ca": /v1, Kind=Secret "<ns>/<redacted>-local-client-ca" isn't controlled by us, can't apply secret "<ns>/<redacted>-local-serving-ca": /v1, Kind=Secret "<ns>/<redacted>-local-serving-ca" isn't controlled by us, secret "<ns>/<redacted>-local-user-admin" doesn't exist or is not own by this object], can't sync configs: can't apply configmap "<ns>/<redacted>-managed-config": /v1, Kind=ConfigMap "<ns>/<redacted>-managed-config" isn't controlled by us, can't sync services: /v1, Kind=Service "<ns>/<redacted>-client" isn't controlled by us, can't sync pdbs: can't apply pdb: policy/v1, Kind=PodDisruptionBudget "<ns>/<redacted>" isn't controlled by us]
Looks like the operator I tried to go to was lower, we found that the ones with latest were running 1.16-alpha.0, so I bumped to 1.16 and the operator is communicating properly with the scyllacluster object