How to backup a ScyllaDB to local storage?

Scylla seems perfect from a performance standpoint and it seems easy enough to get it running and configure tables.

CONFIGURATION

I believe configuration of a 3 node 3 zone system is as follows. Start a Ubuntu EC2 with docker installed and then:

docker pull scylladb/scylla:latest
docker run --name my_scylla_z1_n1 -d -p 9042:9042 -p 9180:9180 scylladb/scylla:latest
docker exec -it my_scylla_z1_n1 cqlsh
CREATE KEYSPACE IF NOT EXISTS prod_keyspace 
WITH replication = {'class': 'NetworkTopologyStrategy', 'zone_1': 3, 'zone_2': 3, 'zone_3': 3};

Then I believe on my next EC2 (in same zone z1), I run:

docker run --name my_scylla_z1_n2 -d -p 9043:9042 -p 9181:9180 --link my_scylla_z1_n1:scylla --network host scylladb/scylla:latest --seeds <IP_of_first_node>

Then for the next:

docker run --name my_scylla_z1_n3 -d -p 9044:9042 -p 9182:9180 --link my_scylla_z1_n1:scylla --link my_scylla_z1_n2:scylla scylladb/scylla:latest --seeds my_scylla_z1_n1,my_scylla_z1_n2

Is this correct so far? If I am connecting 9 nodes (3 zones of 3 nodes each = 9 nodes) do I just keep going like this or what?

BACKUP

Let’s say hypothetically I have this system now running. I do not want to back it up to S3. I just want to copy the database to my local disk. How can I do this?

I have read the documentation but I see no clear explanation.

I need to install and run scylla manager, right? Can I just run that on my local machine? If my local machine has no firewall obstructions to the 9 EC2 nodes, can I just run it locally via docker WSL Ubuntu and connect to them that way?

CHATGPT

Chat GPT says I can run


docker run -d \
  --name my_scylla_manager \
  --network host \
  --memory 4G \
  scylladb/scylla-manager:latest
  
  docker exec -it my_scylla_manager bash

Then edit Scylla Manager’s configuration to point to your ScyllaDB cluster. “The Scylla Manager configuration file (scylla-manager.conf) typically resides in /etc/scylla-manager”:

# /etc/scylla-manager/scylla-manager.conf
[cluster]
nodes = ["127.0.0.1", "192.168.1.2", "192.168.1.3"]  # ScyllaDB nodes

Then it says I can run things like:

sctool cluster status

sctool cluster add <node_ip> --username <username> --password <password>

sctool backup plan create \
    --name <backup_plan_name> \
    --cluster <cluster_name> \
    --location <local_directory> \
    --schedule <schedule>

sctool backup plan list
sctool backup run <backup_plan_name>

sctool backup status <backup_plan_name>
sctool restore start <backup_plan_name> --restore-to <restore_location>

But I suspect this is all hallucinations. What is the actual method to copy the database to my local drive if it is running say on 9 EC2’s as described above with a replication factor of 3?

Can I do this with a single command? Can I do it with Scylla Manager docker just running locally? Do I need to still take 9 snapshots and then deal with a mess of trying to reconstitute them if there is a problem later? Or can I get a singular backup file of some kind?

I saw another post like this but the only reply just said something like “you need to configure MinIO” which is not actionable advice if I have not used MinIO or done any of this before.

Thanks for any help.

According to the Scylla Manager Backup documentation, Scylla Manager only supports S3 (or a compatible API) or Google Cloud as backup targets.
If you want to backup to a local disk with Scylla Manager, MinIO is your best option, as you already mentioned. MinIO is an S3 compatible object store that you can run locally and thus achieve local backup with Scylla Manager.