Scylla seems perfect from a performance standpoint and it seems easy enough to get it running and configure tables.
CONFIGURATION
I believe configuration of a 3 node 3 zone system is as follows. Start a Ubuntu EC2 with docker installed and then:
docker pull scylladb/scylla:latest
docker run --name my_scylla_z1_n1 -d -p 9042:9042 -p 9180:9180 scylladb/scylla:latest
docker exec -it my_scylla_z1_n1 cqlsh
CREATE KEYSPACE IF NOT EXISTS prod_keyspace
WITH replication = {'class': 'NetworkTopologyStrategy', 'zone_1': 3, 'zone_2': 3, 'zone_3': 3};
Then I believe on my next EC2 (in same zone z1), I run:
docker run --name my_scylla_z1_n2 -d -p 9043:9042 -p 9181:9180 --link my_scylla_z1_n1:scylla --network host scylladb/scylla:latest --seeds <IP_of_first_node>
Then for the next:
docker run --name my_scylla_z1_n3 -d -p 9044:9042 -p 9182:9180 --link my_scylla_z1_n1:scylla --link my_scylla_z1_n2:scylla scylladb/scylla:latest --seeds my_scylla_z1_n1,my_scylla_z1_n2
Is this correct so far? If I am connecting 9 nodes (3 zones of 3 nodes each = 9 nodes) do I just keep going like this or what?
BACKUP
Let’s say hypothetically I have this system now running. I do not want to back it up to S3. I just want to copy the database to my local disk. How can I do this?
I have read the documentation but I see no clear explanation.
I need to install and run scylla manager, right? Can I just run that on my local machine? If my local machine has no firewall obstructions to the 9 EC2 nodes, can I just run it locally via docker WSL Ubuntu and connect to them that way?
CHATGPT
Chat GPT says I can run
docker run -d \
--name my_scylla_manager \
--network host \
--memory 4G \
scylladb/scylla-manager:latest
docker exec -it my_scylla_manager bash
Then edit Scylla Manager’s configuration to point to your ScyllaDB cluster. “The Scylla Manager configuration file (scylla-manager.conf) typically resides in /etc/scylla-manager”:
# /etc/scylla-manager/scylla-manager.conf
[cluster]
nodes = ["127.0.0.1", "192.168.1.2", "192.168.1.3"] # ScyllaDB nodes
Then it says I can run things like:
sctool cluster status
sctool cluster add <node_ip> --username <username> --password <password>
sctool backup plan create \
--name <backup_plan_name> \
--cluster <cluster_name> \
--location <local_directory> \
--schedule <schedule>
sctool backup plan list
sctool backup run <backup_plan_name>
sctool backup status <backup_plan_name>
sctool restore start <backup_plan_name> --restore-to <restore_location>
But I suspect this is all hallucinations. What is the actual method to copy the database to my local drive if it is running say on 9 EC2’s as described above with a replication factor of 3?
Can I do this with a single command? Can I do it with Scylla Manager docker just running locally? Do I need to still take 9 snapshots and then deal with a mess of trying to reconstitute them if there is a problem later? Or can I get a singular backup file of some kind?
I saw another post like this but the only reply just said something like “you need to configure MinIO” which is not actionable advice if I have not used MinIO or done any of this before.
Thanks for any help.