Originally from the User Slack
@Patryk_Kandziora: Hi. I use scyllaDB on AWS using the marketplace images in auto-scalling group. All works good (to some extend ;))
The problem is with Scylla Manager and agent to be installed during the boot time. My bootstrap script is as the following:
{
"scylla_yaml": {
"cluster_name": "${cluster_name}",
"experimental": false,
"endpoint_snitch": "Ec2Snitch",
},
"post_configuration_script": "${bash_64encoded}"
}
The above is part of the script in Json format. Then we have base64 encoded post_configuration_script
which looks as the following:
#!/bin/bash
export config_file="/etc/scylla-manager-agent/scylla-manager-agent.yaml"
sudo wget -O /etc/apt/sources.list.d/scylla-manager.list <http://downloads.scylladb.com/deb/ubuntu/scylladb-manager-3.2.list>
sudo apt-get update
sudo apt-get install -y jq scylla-manager-agent vim net-tools
sed -i -E 's/^# ?(auth_token:)/\1/' "$config_file"
sed -i -E "s/^ *auth_token:.*/auth_token: ${auth_token}/" "$config_file"
sudo scyllamgr_agent_setup -y
sudo systemctl start scylla-manager-agent
The scyllaDB will fail to start with the last line of the script sudo systemctl start scylla-manager-agent
If I remove the last line - scyllaDB will start without issue. Error:
more /var/tmp/scylla/scylla_image_setup-424-debug.log
Traceback with variables (most recent call last):
File "/opt/scylladb/scylla-machine-image/libexec/scylla_image_setup", line 23, in <module>
run('/opt/scylladb/scylla-machine-image/scylla_configure.py', shell=True, check=True)
__name__ = '__main__'
__doc__ = None
__package__ = None
__loader__ = <_frozen_importlib_external.SourceFileLoader object at 0x7f13bff0b390>
__spec__ = None
__annotations__ = {}
__builtins__ = <module 'builtins' (built-in)>
__file__ = '/opt/scylladb/scylla-machine-image/libexec/scylla_image_setup'
__cached__ = None
os = <module 'os' (frozen)>
sys = <module 'sys' (built-in)>
Path = <class 'pathlib.Path'>
get_cloud_instance = <function get_cloud_instance at 0x7f13befe13a0>
is_gce = <function is_gce at 0x7f13befe1260>
is_azure = <function is_azure at 0x7f13befe1300>
is_redhat_variant = <function is_redhat_variant at 0x7f13befe14e0>
run = <function run at 0x7f13bf997380>
machine_image_configured = PosixPath('/etc/scylla/machine_image_configured')
cloud_instance = <lib.scylla_cloud.aws_instance object at 0x7f13befddad0>
File "/opt/scylladb/python3/lib64/python3.11/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
input = None
capture_output = False
timeout = None
check = True
popenargs = ('/opt/scylladb/scylla-machine-image/scylla_configure.py',)
kwargs = {'shell': True}
process = <Popen: returncode: 1 args: '/opt/scylladb/scylla-machine-image/scylla_confi...>
stdout = None
stderr = None
retcode = 1
subprocess.CalledProcessError: Command '/opt/scylladb/scylla-machine-image/scylla_configure.py' returned non-zero exit status 1.
Used version:
apt list --installed | grep scylla
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
scylla-conf/stable,now 5.4.5-0.20240328.fd7d57b9fa26-1 amd64 [installed,automatic]
scylla-cqlsh/stable,stable,now 5.4.5-0.20240328.fd7d57b9fa26-1 all [installed,automatic]
scylla-jmx/stable,stable,now 5.4.5-0.20240328.fd7d57b9fa26-1 all [installed,automatic]
scylla-kernel-conf/stable,now 5.4.5-0.20240328.fd7d57b9fa26-1 amd64 [installed,automatic]
scylla-machine-image/stable,stable,now 5.4.5-20240328.ce3880e-1 all [installed]
scylla-manager-agent/stable,now 3.2.7~0.20240330.56fd95fb amd64 [installed]
scylla-node-exporter/stable,now 5.4.5-0.20240328.fd7d57b9fa26-1 amd64 [installed,automatic]
scylla-python3/stable,now 5.4.5-0.20240328.fd7d57b9fa26-1 amd64 [installed,automatic]
scylla-server-dbg/stable,now 5.4.5-0.20240328.fd7d57b9fa26-1 amd64 [installed]
scylla-server/stable,now 5.4.5-0.20240328.fd7d57b9fa26-1 amd64 [installed,automatic]
scylla-tools-core/stable,stable,now 5.4.5-0.20240328.fd7d57b9fa26-1 all [installed,automatic]
scylla-tools/stable,stable,now 5.4.5-0.20240328.fd7d57b9fa26-1 all [installed,automatic]
scylla/stable,now 5.4.5-0.20240328.fd7d57b9fa26-1 amd64 [installed,automatic]
The interesting part is that when I SSH to the node and trigger the command sudo systemctl start scylla-manager-agent
manually - all works fine.
Any idea why is that and how to start the agent during the bootstrap of scyllaDB node in the cluster?
@Felipe_Cardeneti_Mendes: If I remember it correctly, the agent depends on the database. So when you start the agent the database is still configuring itself, and calls the its own dependency, which is the scylla-machine-image service, but this one is already running.
It’s best to check whether the server is up and running, then start the agent. One example can be doing something like https://github.com/fee-mendes/rust-driver-example/blob/main/docker-compose/check_cluster_healthy
GitHub: rust-driver-example/docker-compose/check_cluster_healthy at main · fee-mendes/rust-driver-example
@Patryk_Kandziora: Thx @Felipe_Cardeneti_Mendes - I used something like this:
#!/bin/bash
export config_file="/etc/scylla-manager-agent/scylla-manager-agent.yaml"
sudo wget -O /etc/apt/sources.list.d/scylla-manager.list <http://downloads.scylladb.com/deb/ubuntu/scylladb-manager-3.2.list>
sudo apt-get update
sudo apt-get install -y jq scylla-manager-agent vim netcat-openbsd
sed -i -E 's/^# ?(auth_token:)/\1/' "$config_file"
sed -i -E "s/^ *auth_token:.*/auth_token: ${auth_token}/" "$config_file"
sudo scyllamgr_agent_setup -y
# Check if port 9042 is available
while ! nc -z localhost 9042; do
echo "Waiting for ScyllaDB port 9042..."
sleep 5
done
echo "ScyllaDB is ready"
sudo systemctl start scylla-manager-agent
It doesnt work though.
systemctl status scylla-image-setup
● scylla-image-setup.service - Scylla Cloud Image Setup service
Loaded: loaded (/lib/systemd/system/scylla-image-setup.service; enabled; vendor preset: enabled)
Active: activating (start) since Tue 2024-05-07 16:23:07 UTC; 7min ago
Main PID: 411 (ld.so)
Tasks: 5 (limit: 37845)
Memory: 443.4M
CPU: 4.643s
CGroup: /system.slice/scylla-image-setup.service
├─ 411 /opt/scylladb/scylla-machine-image/../python3/bin/python3 /opt/scylladb/python3/bin/../libexec/python3.11.bin -s /opt/scylladb/scylla-machine-image/libexec/scyll
a_image_setup
├─ 598 /bin/sh -c /opt/scylladb/scylla-machine-image/scylla_configure.py
├─ 599 /opt/scylladb/scylla-machine-image/../python3/bin/python3 /opt/scylladb/python3/bin/../libexec/python3.11.bin -s /opt/scylladb/scylla-machine-image/libexec/scyll
a_configure.py
├─ 606 /bin/sh -c "#!/bin/bash\nexport config_file=\"/etc/scylla-manager-agent/scylla-manager-agent.yaml\"\nsudo wget -O /etc/apt/sources.list.d/scylla-manager.list htt
<p://downloads.scylladb.com/deb/ubuntu/scylladb-manager-3.2.list>\nsudo apt-get update\nsudo apt-get install -y jq scylla-manager-agent vim netcat-openbsd\nsed -i -E 's/^# ?(auth_toke
n:)/\\1/' \"\$config_file\"\nsed -i -E \"s/^ *auth_token:.*/auth_token: 9H5xxxxxxxxxxxx\" \"\$config_file\"\nsudo scyllamgr_agent_setup -y\n# Check if port 9042 is available\nwhile ! nc -z localhost 9042; do\n ech
o \"Waiting for ScyllaDB port 9042...\"\n sleep 5\ndone\necho \"ScyllaDB is ready\"\nsudo systemctl start scylla-manager-agent"
└─1507 sleep 5
May 07 16:30:12 ip-10-102-129-121 scylla_image_setup[606]: Waiting for ScyllaDB port 9042...
May 07 16:30:17 ip-10-102-129-121 scylla_image_setup[606]: Waiting for ScyllaDB port 9042...
May 07 16:30:22 ip-10-102-129-121 scylla_image_setup[606]: Waiting for ScyllaDB port 9042...
May 07 16:30:27 ip-10-102-129-121 scylla_image_setup[606]: Waiting for ScyllaDB port 9042...
May 07 16:30:32 ip-10-102-129-121 scylla_image_setup[606]: Waiting for ScyllaDB port 9042...
May 07 16:30:37 ip-10-102-129-121 scylla_image_setup[606]: Waiting for ScyllaDB port 9042...
May 07 16:30:42 ip-10-102-129-121 scylla_image_setup[606]: Waiting for ScyllaDB port 9042...
May 07 16:30:47 ip-10-102-129-121 scylla_image_setup[606]: Waiting for ScyllaDB port 9042...
May 07 16:30:52 ip-10-102-129-121 scylla_image_setup[606]: Waiting for ScyllaDB port 9042...
It looks like the script is executed before the scyllaDB service actually starts. If we take into consideration there is 600 sec timeout - then thats the problem here - I think.
Yeah the service fails to start:
systemctl status scylla-image-setup
× scylla-image-setup.service - Scylla Cloud Image Setup service
Loaded: loaded (/lib/systemd/system/scylla-image-setup.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Tue 2024-05-07 16:33:12 UTC; 4min 45s ago
Process: 411 ExecStart=/opt/scylladb/scylla-machine-image/scylla_image_setup (code=exited, status=1/FAILURE)
Main PID: 411 (code=exited, status=1/FAILURE)
CPU: 4.755s
May 07 16:33:12 ip-10-102-129-121 scylla_image_setup[411]: File "/opt/scylladb/scylla-machine-image/libexec/scylla_image_setup", line 23, in <module>
May 07 16:33:12 ip-10-102-129-121 scylla_image_setup[411]: run('/opt/scylladb/scylla-machine-image/scylla_configure.py', shell=True, check=True)
May 07 16:33:12 ip-10-102-129-121 scylla_image_setup[411]: File "/opt/scylladb/python3/lib64/python3.11/subprocess.py", line 571, in run
May 07 16:33:12 ip-10-102-129-121 scylla_image_setup[411]: raise CalledProcessError(retcode, process.args,
May 07 16:33:12 ip-10-102-129-121 scylla_image_setup[411]: subprocess.CalledProcessError: Command '/opt/scylladb/scylla-machine-image/scylla_configure.py' returned non-zero exit sta
tus 1.
May 07 16:33:12 ip-10-102-129-121 scylla_image_setup[411]: Debug log created: /var/tmp/scylla/scylla_image_setup-411-debug.log
May 07 16:33:12 ip-10-102-129-121 systemd[1]: scylla-image-setup.service: Main process exited, code=exited, status=1/FAILURE
May 07 16:33:12 ip-10-102-129-121 systemd[1]: scylla-image-setup.service: Failed with result 'exit-code'.
May 07 16:33:12 ip-10-102-129-121 systemd[1]: Failed to start Scylla Cloud Image Setup service.
May 07 16:33:12 ip-10-102-129-121 systemd[1]: scylla-image-setup.service: Consumed 4.755s CPU time.
and the crash log:
cat /var/tmp/scylla/scylla_image_setup-411-debug.log
Traceback with variables (most recent call last):
File "/opt/scylladb/scylla-machine-image/libexec/scylla_image_setup", line 23, in <module>
run('/opt/scylladb/scylla-machine-image/scylla_configure.py', shell=True, check=True)
__name__ = '__main__'
__doc__ = None
__package__ = None
__loader__ = <_frozen_importlib_external.SourceFileLoader object at 0x7f81035c3390>
__spec__ = None
__annotations__ = {}
__builtins__ = <module 'builtins' (built-in)>
__file__ = '/opt/scylladb/scylla-machine-image/libexec/scylla_image_setup'
__cached__ = None
os = <module 'os' (frozen)>
sys = <module 'sys' (built-in)>
Path = <class 'pathlib.Path'>
get_cloud_instance = <function get_cloud_instance at 0x7f81026993a0>
is_gce = <function is_gce at 0x7f8102699260>
is_azure = <function is_azure at 0x7f8102699300>
is_redhat_variant = <function is_redhat_variant at 0x7f81026994e0>
run = <function run at 0x7f810304f380>
machine_image_configured = PosixPath('/etc/scylla/machine_image_configured')
cloud_instance = <lib.scylla_cloud.aws_instance object at 0x7f8102695b50>
File "/opt/scylladb/python3/lib64/python3.11/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
input = None
capture_output = False
timeout = None
check = True
popenargs = ('/opt/scylladb/scylla-machine-image/scylla_configure.py',)
kwargs = {'shell': True}
process = <Popen: returncode: 1 args: '/opt/scylladb/scylla-machine-image/scylla_confi...>
stdout = None
stderr = None
retcode = 1
subprocess.CalledProcessError: Command '/opt/scylladb/scylla-machine-image/scylla_configure.py' returned non-zero exit status 1.
@Felipe_Cardeneti_Mendes: Oh ok, just now I realized these are the AMI parms. lol Sorry
Yeah, it can’t be in post_configuration_script
indeed. The naming is correct, it will run after the configuration, but before the service is started. So you need this script to be run somewhere else, maybe as cloud-init?
@Patryk_Kandziora: Yeah the problem is that if you pass the json config to bootstrap wrapper there is no way to pass anything else - at least I have not found solution for that. That is why I thought there is the post_configuration_script argument to overcome such limitation. Maybe I miss something here?
@Felipe_Cardeneti_Mendes: I think if you really want something in the AMI that is more like a “post_start_script”, then the best place would be to check in https://github.com/scylladb/scylla-machine-image
I don’t see a way out other than maybe forking the process (&
) - IIUC it should just let the “post config” move forward.
Or maybe just set: start_scylla_on_first_boot: false
, past the configuration do something like:
systemctl enable scylla-server scylla-manager-agent
shutdown -r +1
Reboot and everyone should live happily thereafter
@Patryk_Kandziora: Thanks @Felipe_Cardeneti_Mendes - all good and works. Working bootstrap script if anyone will face the same issue with scyllaDB on AWS looks like this:
Content-Type: multipart/mixed; boundary="===============5438789820677534874=="
MIME-Version: 1.0
--===============5438789820677534874==
Content-Type: x-scylla/yaml
MIME-Version: 1.0
Content-Disposition: attachment; filename="scylla_machine_image.yaml"
scylla_yaml:
cluster_name: "${cluster_name}"
experimental: false
start_scylla_on_first_boot: false
--===============5438789820677534874==
Content-Type: text/cloud-config; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="cloud-config.txt"
#cloud-config
# run commands
# default: none
# runcmd contains a list of either lists or a string
# each item will be executed in order at rc.local like level with
# output to the console
# - runcmd only runs during the first boot
# - if the item is a list, the items will be properly executed as if
# passed to execve(3) (with the first arg as the command).
# - if the item is a string, it will be simply written to the file and
# will be interpreted by 'sh'
#
# Note, that the list has to be proper yaml, so you have to quote
# any characters yaml would eat (':' can be problematic)
runcmd:
- [ wget, "<http://downloads.scylladb.com/deb/ubuntu/scylladb-manager-3.2.list>", -O, /etc/apt/sources.list.d/scylla-manager.list ]
- sudo apt-get update
- [ sudo, apt-get, install, -y, jq, scylla-manager-agent, vim, netcat-openbsd, net-tools ]
- [ sed, -i, -E, 's/^# ?(auth_token:)/\1/', "/etc/scylla-manager-agent/scylla-manager-agent.yaml" ]
- [ sed, -i, -E, "s/^ *auth_token:.*/auth_token: ${auth_token}/", "/etc/scylla-manager-agent/scylla-manager-agent.yaml" ]
- [ sudo, scyllamgr_agent_setup, -y ]
- [ sudo, systemctl, start, scylla-server ]
- while ! nc -z localhost 9042; do echo "Waiting for ScyllaDB port 9042..."; sleep 5; done
- [ echo, "ScyllaDB is ready" ]
- [ systemctl, enable, scylla-server, scylla-manager-agent, ]
- [ sudo, systemctl, start, scylla-manager-agent ]
--===============5438789820677534874==--
Docs which helps:
• https://opensource.docs.scylladb.com/stable/getting-started/install-scylla/launch-on-aws.html
• https://github.com/scylladb/scylla-machine-image
• https://cloudinit.readthedocs.io/en/latest/reference/examples.html#run-commands-on-first-boot
Thanks again @Felipe_Cardeneti_Mendes
Launch ScyllaDB on AWS | ScyllaDB Docs
GitHub: GitHub - scylladb/scylla-machine-image