Cluster stuck at initialization phase

Hello everyone,

I am trying to set up a brand new 3 CPU Cluster for QA purposes. However, I am having a hard time getting up and running.

I am able to start Node 1 but it seems to get stuck during the initialization procedure:

● scylla-server.service - Scylla Server
     Loaded: loaded (/lib/systemd/system/scylla-server.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/scylla-server.service.d
             └─capabilities.conf, dependencies.conf, sysconfdir.conf
     Active: active (running) since Thu 2023-08-31 19:29:43 UTC; 18min ago
    Process: 130672 ExecStartPre=/opt/scylladb/scripts/scylla_prepare (code=exited, status=0/SUCCESS)
   Main PID: 130727 (scylla)
     Status: "starting sstables loader"
      Tasks: 14 (limit: 18677)
     Memory: 377.7M
        CPU: 1min 20.600s
     CGroup: /scylla.slice/scylla-server.slice/scylla-server.service
             └─130727 /usr/bin/scylla --log-to-syslog 1 --log-to-stdout 0 --default-log-level info --network-stack posix --io-properties-file=/etc/scylla.d/io_properties.yaml --cpuset 1-7

Aug 31 19:29:44 ubuntu-16gb-ash-data-1 scylla[130727]:  [shard 0] compaction - [Compact system_schema.scylla_keyspaces bd14cba0-4834-11ee-b15d-fef71526ee40] Compacting [/var/lib/scylla/data/system_schema/scylla_keyspaces-fa0ea2bd608f3e749b1eb84b46b33adf/mc-147-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/system_schema/scylla_keyspaces-fa0ea2bd608f3e749b1eb84b46b33adf/mc-140-big-Data.db:level=0:origin=>
Aug 31 19:29:44 ubuntu-16gb-ash-data-1 scylla[130727]:  [shard 0] compaction - [Compact system_schema.scylla_keyspaces bd14cba0-4834-11ee-b15d-fef71526ee40] Compacted 2 sstables to [/var/lib/scylla/data/system_schema/scylla_keyspaces-fa0ea2bd608f3e749b1eb84b46b33adf/mc-154-big-Data.db:level=0]. 81kB to 40kB (~50% of original) in 13ms = 3MB/s. ~256 total partitions merged to 2.
Aug 31 19:29:44 ubuntu-16gb-ash-data-1 scylla[130727]:  [shard 0] compaction - [Compact system_schema.view_virtual_columns bd178ac0-4834-11ee-b15d-fef71526ee40] Compacting [/var/lib/scylla/data/system_schema/view_virtual_columns-08843b6345dc3be29798a0418295cfaa/mc-147-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/system_schema/view_virtual_columns-08843b6345dc3be29798a0418295cfaa/mc-140-big-Data.db:lev>
Aug 31 19:29:44 ubuntu-16gb-ash-data-1 scylla[130727]:  [shard 0] compaction - [Compact system_schema.view_virtual_columns bd178ac0-4834-11ee-b15d-fef71526ee40] Compacted 2 sstables to [/var/lib/scylla/data/system_schema/view_virtual_columns-08843b6345dc3be29798a0418295cfaa/mc-154-big-Data.db:level=0]. 81kB to 40kB (~50% of original) in 14ms = 2MB/s. ~256 total partitions merged to 2.
Aug 31 19:29:44 ubuntu-16gb-ash-data-1 scylla[130727]:  [shard 0] compaction - [Compact system_schema.indexes bd1a49e0-4834-11ee-b15d-fef71526ee40] Compacting [/var/lib/scylla/data/system_schema/indexes-0feb57ac311f382fba6d9024d305702f/mc-140-big-Data.db:level=0:origin=compaction,/var/lib/scylla/data/system_schema/indexes-0feb57ac311f382fba6d9024d305702f/mc-147-big-Data.db:level=0:origin=memtable]
Aug 31 19:29:44 ubuntu-16gb-ash-data-1 scylla[130727]:  [shard 0] compaction - [Compact system_schema.indexes bd1a49e0-4834-11ee-b15d-fef71526ee40] Compacted 2 sstables to [/var/lib/scylla/data/system_schema/indexes-0feb57ac311f382fba6d9024d305702f/mc-154-big-Data.db:level=0]. 81kB to 40kB (~50% of original) in 12ms = 3MB/s. ~256 total partitions merged to 2.
Aug 31 19:29:44 ubuntu-16gb-ash-data-1 scylla[130727]:  [shard 0] compaction - [Compact system.discovery bd1cbae0-4834-11ee-b15d-fef71526ee40] Compacting [/var/lib/scylla/data/system/discovery-1ef3d7924f263706a049fc39024f49c5/mc-119-big-Data.db:level=0:origin=compaction,/var/lib/scylla/data/system/discovery-1ef3d7924f263706a049fc39024f49c5/mc-126-big-Data.db:level=0:origin=memtable]
Aug 31 19:29:44 ubuntu-16gb-ash-data-1 scylla[130727]:  [shard 0] compaction - [Compact system.discovery bd1cbae0-4834-11ee-b15d-fef71526ee40] Compacted 2 sstables to [/var/lib/scylla/data/system/discovery-1ef3d7924f263706a049fc39024f49c5/mc-133-big-Data.db:level=0]. 81kB to 40kB (~50% of original) in 10ms = 4MB/s. ~256 total partitions merged to 1.
Aug 31 19:29:44 ubuntu-16gb-ash-data-1 scylla[130727]:  [shard 0] compaction - [Compact system_schema.scylla_aggregates bd1eddc0-4834-11ee-b15d-fef71526ee40] Compacting [/var/lib/scylla/data/system_schema/scylla_aggregates-08d8cb2892023371a968ad926e0fdc37/mc-154-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/system_schema/scylla_aggregates-08d8cb2892023371a968ad926e0fdc37/mc-147-big-Data.db:level=0:orig>
Aug 31 19:29:44 ubuntu-16gb-ash-data-1 scylla[130727]:  [shard 0] compaction - [Compact system_schema.scylla_aggregates bd1eddc0-4834-11ee-b15d-fef71526ee40] Compacted 2 sstables to [/var/lib/scylla/data/system_schema/scylla_aggregates-08d8cb2892023371a968ad926e0fdc37/mc-161-big-Data.db:level=0]. 81kB to 40kB (~50% of original) in 10ms = 4MB/s. ~256 total partitions merged to 2.

Nodetool status is:

nodetool: Scylla API server HTTP GET to URL '/storage_service/ownership/' failed: runtime_exception (runtime error: No nodes present in the cluster. Has this node finished starting up?)
See 'nodetool help' or 'nodetool help <command>'.

As for Nodes 2 & 3, their processes exit with error code 1:

× scylla-server.service - Scylla Server
     Loaded: loaded (/lib/systemd/system/scylla-server.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/scylla-server.service.d
             └─capabilities.conf, dependencies.conf, sysconfdir.conf
     Active: failed (Result: exit-code) since Thu 2023-08-31 19:31:11 UTC; 1min 26s ago
    Process: 93032 ExecStartPre=/opt/scylladb/scripts/scylla_prepare (code=exited, status=0/SUCCESS)
    Process: 93235 ExecStart=/usr/bin/scylla $SCYLLA_ARGS $SEASTAR_IO $DEV_MODE $CPUSET $MEM_CONF (code=exited, status=1/FAILURE)
    Process: 93574 ExecStopPost=/opt/scylladb/scripts/scylla_stop (code=exited, status=0/SUCCESS)
   Main PID: 93235 (code=exited, status=1/FAILURE)
     Status: "starting sstables loader"
        CPU: 8.006s

Aug 31 19:31:10 ubuntu-16gb-ash-data-2 scylla[93235]:  [shard 0] init - Startup failed: std::runtime_error (Node 10.0.0.2 has gossip status=UNKNOWN. Try fixing it before adding new node to the cluster.)
× scylla-server.service - Scylla Server
     Loaded: loaded (/lib/systemd/system/scylla-server.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/scylla-server.service.d
             └─capabilities.conf, dependencies.conf, sysconfdir.conf
     Active: failed (Result: exit-code) since Thu 2023-08-31 19:31:14 UTC; 6s ago
    Process: 79707 ExecStartPre=/opt/scylladb/scripts/scylla_prepare (code=exited, status=0/SUCCESS)
    Process: 79908 ExecStart=/usr/bin/scylla $SCYLLA_ARGS $SEASTAR_IO $DEV_MODE $CPUSET $MEM_CONF (code=exited, status=1/FAILURE)
    Process: 80136 ExecStopPost=/opt/scylladb/scripts/scylla_stop (code=exited, status=0/SUCCESS)
   Main PID: 79908 (code=exited, status=1/FAILURE)
     Status: "starting sstables loader"
        CPU: 4.498s

Aug 31 19:31:13 ubuntu-16gb-ash-data-3 scylla[79908]:  [shard 0] init - Startup failed: std::runtime_error (Node 10.0.0.2 has gossip status=UNKNOWN. Try fixing it before adding new node to the cluster.)

Nodetool status:

nodetool: Unable to connect to Scylla API server: java.net.ConnectException: Connection refused (Connection refused)
See 'nodetool help' or 'nodetool help <command>'.

Cluster specs:

  • Provider Hetzner
  • Platform: x86 AMD Epyc
  • Specs: Shared 8vCPU 16gb RAM 200gb Storage
  • Hetzner private network setup shared by all 3 instances.
  • Ubuntu 22.04
  • No docker installation.

Network topology

  • Node 1: reachable at private IP 10.0.0.2
  • Node 2: reachable at private IP 10.0.0.4
  • Node 3: reachable at private IP 10.0.0.3

I have also made UFW rules for the 11 Scylla related ports:

sudo ufw allow from 10.0.0.2 to any port 7000 comment 'MASTER'
sudo ufw allow from 10.0.0.2 to any port 7001 comment 'MASTER'
sudo ufw allow from 10.0.0.2 to any port 7199 comment 'MASTER'
sudo ufw allow from 10.0.0.2 to any port 9042 comment 'MASTER'
sudo ufw allow from 10.0.0.2 to any port 9100 comment 'MASTER'
sudo ufw allow from 10.0.0.2 to any port 9142 comment 'MASTER'
sudo ufw allow from 10.0.0.2 to any port 9160 comment 'MASTER'
sudo ufw allow from 10.0.0.2 to any port 9180 comment 'MASTER'
sudo ufw allow from 10.0.0.2 to any port 10000 comment 'MASTER'
sudo ufw allow from 10.0.0.2 to any port 19042 comment 'MASTER'
sudo ufw allow from 10.0.0.2 to any port 19142 comment 'MASTER'

sudo ufw allow from 10.0.0.3 to any port 7000 comment 'REPLICA-2'
sudo ufw allow from 10.0.0.3 to any port 7001 comment 'REPLICA-2'
sudo ufw allow from 10.0.0.3 to any port 7199 comment 'REPLICA-2'
sudo ufw allow from 10.0.0.3 to any port 9042 comment 'REPLICA-2'
sudo ufw allow from 10.0.0.3 to any port 9142 comment 'REPLICA-2'
sudo ufw allow from 10.0.0.3 to any port 9100 comment 'REPLICA-2'
sudo ufw allow from 10.0.0.3 to any port 9160 comment 'REPLICA-2'
sudo ufw allow from 10.0.0.3 to any port 9180 comment 'REPLICA-2'
sudo ufw allow from 10.0.0.3 to any port 10000 comment 'REPLICA-2'
sudo ufw allow from 10.0.0.3 to any port 19042 comment 'REPLICA-2'
sudo ufw allow from 10.0.0.3 to any port 19142 comment 'REPLICA-2'

sudo ufw allow from 10.0.0.4 to any port 7000 comment 'REPLICA-1'
sudo ufw allow from 10.0.0.4 to any port 7001 comment 'REPLICA-1'
sudo ufw allow from 10.0.0.4 to any port 7199 comment 'REPLICA-1'
sudo ufw allow from 10.0.0.4 to any port 9042 comment 'REPLICA-1'
sudo ufw allow from 10.0.0.4 to any port 9100 comment 'REPLICA-1'
sudo ufw allow from 10.0.0.4 to any port 9142 comment 'REPLICA-1'
sudo ufw allow from 10.0.0.4 to any port 9160 comment 'REPLICA-1'
sudo ufw allow from 10.0.0.4 to any port 9180 comment 'REPLICA-1'
sudo ufw allow from 10.0.0.4 to any port 10000 comment 'REPLICA-1'
sudo ufw allow from 10.0.0.4 to any port 19042 comment 'REPLICA-1'
sudo ufw allow from 10.0.0.4 to any port 19142 comment 'REPLICA-1'

Configuration files

Node 1 - Config

Node 2 - Config

Node 3 - Config

Execution logs

Available at paste bin for convenience.

Node 1 - Execution Log

Node 2 - Execution Log

Node 3 - Execution Log

Thank you for your time and help.

Alright, so after some very decent amount of trial and error, it seems that I got it to work.

Somehow the “data” and “commitlog” directories became corrupt and were preventing Scylla from completing the intialization procedure.

Hope this helps anyone who happens to come by this situation in the future.

Cheers

1 Like

Hello, I encountered the same error as you when deploying scylladb5.2 using ubuntu 22.04, so I would like to ask, how did you solve this error?