@Mattia: Hello, I am trying to make a contribution to ScyllaDB rust driver but I am unable to start the cluster to test the changes I have made.
I am using the Makefile and running make test
or make ci
but both docker and podman hangs indefinitely (I think on the healthcheck of the first container).
Can anyone help me with this? (this is the docker compose used)
GitHub: scylla-rust-driver/test/cluster/docker-compose.yml at 00856129603b39676c9a31c26626d0ad5ec19c68 · scylladb/scylla-rust-driver
@Karol_Baryła: Sorry for leaving you without response on a previous thread! Tbh this cluster sometimes (but rarely) also fails to start for me. In that case it is usually enough to try again (make down && make up
). I never really investigated why that happens. You can check logs of this node with docker logs cluster-scylla1-1
and link them here (use pastebin / gist, or send a file), maybe we’ll find something there.
@Mattia: Unfortunately I have tried restarting it many times, I will try to share the logs now
Here are the file for the logs from the container:
stdout.txt (2.9 KB)
stderr.txt (109.1 KB)
Even tough after 355 seconds the job stops due to unhealthy condition in scylla1 if I run nodetools
I get the following status:
docker exec -it cluster-scylla1-1 nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 172.42.0.2 269.25 KB 256 ? 5efe6b87-70b0-46a8-a92f-978e7067d027 rack1
@Karol_Baryła: I have no idea why it doesn’t start. Scylla logs look fine I think. What happens if you execute the healthcheck command inside the container manually?
@Mattia: Doing
docker exec -it cluster-scylla-1 /bin/bash
cqlsh -e "select * from system.local WHERE key='local'"
Returns a row correctly
Do you think I should run this with root privileges?
Like, make test
and others
@Karol_Baryła: Definitely without root. docker inspect --format "{{json .State.Health }}" cluster-scylla-1 | jq
should show the output of healthcheck commands executed by docker - maybe it will give us some clue.
@Mattia: Thank you, this is rather telling:
{"Status":"starting","FailingStreak":4,"Log":[{"Start":"2025-05-04T18:53:58.152812491+02:00","End":"2025-05-04T18:53:59.241185859+02:00","ExitCode":1,"Output":"Usage: cqlsh [options] [host [port]]\n\ncqlsh: error: '*' is not a valid port number.\n"},{"Start":"2025-05-04T18:54:04.877333293+02:00","End":"2025-05-04T18:54:05.386606825+02:00","ExitCode":1,"Output":"Usage: cqlsh [options] [host [port]]\n\ncqlsh: error: '*' is not a valid port number.\n"},{"Start":"2025-05-04T18:54:10.852202477+02:00","End":"2025-05-04T18:54:11.263149637+02:00","ExitCode":1,"Output":"Usage: cqlsh [options] [host [port]]\n\ncqlsh: error: '*' is not a valid port number.\n"},{"Start":"2025-05-04T18:54:16.874847316+02:00","End":"2025-05-04T18:54:17.332078002+02:00","ExitCode":1,"Output":"Usage: cqlsh [options] [host [port]]\n\ncqlsh: error: '*' is not a valid port number.\n"}]}
@Karol_Baryła: I don’t see how it could possibly interpret *
as port number in our compose file
Did you modify the compose file? I see that your container name is different than mine, that may have something to do with it.
@Mattia:
version: "3.7"
networks:
public:
name: scylla_rust_driver_public
driver: bridge
ipam:
driver: default
config:
- subnet: 172.42.0.0/16
services:
scylla1:
image: scylladb/scylla
networks:
public:
ipv4_address: 172.42.0.2
command: |
--rpc-address 172.42.0.2
--listen-address 172.42.0.2
--seeds 172.42.0.2
--skip-wait-for-gossip-to-settle 0
--ring-delay-ms 0
--smp 2
--memory 1G
healthcheck:
test: [ "CMD", "cqlsh", "scylla1", "-e", "select * from system.local WHERE key='local'" ]
interval: 5s
timeout: 5s
retries: 60
scylla2:
image: scylladb/scylla
networks:
public:
ipv4_address: 172.42.0.3
command: |
--rpc-address 172.42.0.3
--listen-address 172.42.0.3
--seeds 172.42.0.2
--skip-wait-for-gossip-to-settle 0
--ring-delay-ms 0
--smp 2
--memory 1G
healthcheck:
test: [ "CMD", "cqlsh", "scylla2", "-e", "select * from system.local WHERE key='local'" ]
interval: 5s
timeout: 5s
retries: 60
depends_on:
scylla1:
condition: service_healthy
scylla3:
image: scylladb/scylla
networks:
public:
ipv4_address: 172.42.0.4
command: |
--rpc-address 172.42.0.4
--listen-address 172.42.0.4
--seeds 172.42.0.2,172.42.0.3
--skip-wait-for-gossip-to-settle 0
--ring-delay-ms 0
--smp 2
--memory 1G
healthcheck:
test: [ "CMD", "cqlsh", "scylla3", "-e", "select * from system.local WHERE key='local'" ]
interval: 5s
timeout: 5s
retries: 60
depends_on:
scylla2:
condition: service_healthy
This is the one I am using, straight from the github repo
@Karol_Baryła: What is the output of docker --version
?
@Mattia: Docker version 28.1.1, build 4eba377327
I also tried with podman with the same problem (podman version 5.4.2)
@Karol_Baryła: In that case I’m unfortunately out of ideas, sorry 
@Mattia: Thank you for the help anyway! I think I will try to remove the docker-compose/podman layer. That might create some problems I guess
Seems like I was still using podman, for some reasons it took precedence over docker when running the commands. With docker it works. (at least the first container is healthy, now the second seems to be hanging)