Scylla 5.2 very slow startup

I just updated a 3 node cluster from 5.1 to scylla 5.2, but it does not start up. It seems scylla just hangs after the upgrade:

– A start job for unit scylla-server.service has begun execution.

– The job identifier is 145767327.
Aug 23 22:39:49 pcdev-1 scylla[2834247]: Scylla version 5.2.6-0.20230730.58acf071bf28 with build-id 17961be569f8503b27ff284a8de1e00a9d83811e starting …
Aug 23 22:39:49 pcdev-1 scylla[2834247]: command used: “/usr/bin/scylla --log-to-syslog 1 --log-to-stdout 0 --default-log-level info --network-stack posix --memory 10G --reserve-memory 52G --overprovisioned --kernel-page-cache 1 --unsafe-bypass-fsync 1 --io-properties-file=/etc/scylla.d/io_properties.yaml --developer-mode=1 --cpuset 0-3 --smp 4”
Aug 23 22:39:49 pcdev-1 scylla[2834247]: parsed command line options: [log-to-syslog, (positional) 1, log-to-stdout, (positional) 0, default-log-level, (positional) info, network-stack, (positional) posix, memory, (positional) 10G, reserve-memory, (positional) 52G, overprovisioned, kernel-page-cache, (positional) 1, unsafe-bypass-fsync, (positional) 1, io-properties-file: /etc/scylla.d/io_properties.yaml, developer-mode: 1, cpuset, (positional) 0-3, smp, (positional) 4]
Aug 23 22:39:49 pcdev-1 scylla[2834247]: seastar - Reactor backend: linux-aio
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:13, perf_event_open() failed: Permission denied)
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] seastar - Created fair group io-queue-66305, capacity rate 192:50000, limit 23649164, rate 16777216 (factor 1), threshold 11227761
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] seastar - IO queue uses 1.41ms latency goal for device 66305
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] seastar - Created io group dev(66305), length limit 131072:65536, rate 192000:50000000
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] seastar - Created io queue dev(66305) capacities: 512:11227761:16830904 1024:11270710:16884590 2048:11356610:16991964 4096:11528408:17206712 8192:11872006:17636210 16384:12559200:18495202 32768:13933590:20213190 65536:16682369:23649164 131072:22179928:X
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] seastar - Created fair group io-queue-0, capacity rate 2147483:2147483, limit 12582912, rate 16777216 (factor 1), threshold 2000
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] seastar - IO queue uses 0.75ms latency goal for device 0
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] seastar - Created io group dev(0), length limit 4194304:4194304, rate 2147483647:2147483647
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] seastar - Created io queue dev(0) capacities: 512:2000:2000 1024:3000:3000 2048:5000:5000 4096:9000:9000 8192:17000:17000 16384:33000:33000 32768:65000:65000 65536:129000:129000 131072:257000:257000
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 1] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:13, perf_event_open() failed: Permission denied)
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 3] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:13, perf_event_open() failed: Permission denied)
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 2] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:13, perf_event_open() failed: Permission denied)
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] seastar - updated: blocked-reactor-notify-ms=1000000
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 1] seastar - updated: blocked-reactor-notify-ms=1000000
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 3] seastar - updated: blocked-reactor-notify-ms=1000000
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 2] seastar - updated: blocked-reactor-notify-ms=1000000
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] init - Unknown option : max_size_of_hints_in_progress
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] init - installing SIGHUP handler
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] init - Scylla version 5.2.6-0.20230730.58acf071bf28 with build-id 17961be569f8503b27ff284a8de1e00a9d83811e starting …
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] init - starting prometheus API server
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] init - creating snitch
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] init - starting tokens manager
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] init - starting effective_replication_map factory
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] init - starting migration manager notifier
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] init - starting lifecycle notifier
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] init - creating tracing
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] init - starting API server
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] init - Scylla API server listening on 0.0.0.0:10000 …
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] service_level_controller - update_from_distributed_data: starting configuration polling loop
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] init - starting system keyspace
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] init - starting gossiper
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] init - seeds={192.168.178.101, 192.168.178.102, 192.168.178.103}, listen_address=192.168.178.103, broadcast_address=192.168.178.103
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] init - starting Raft address map
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] init - starting direct failure detector pinger service
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] init - starting direct failure detector service
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] init - initializing storage service
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] storage_service - Started node_ops_abort_thread
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 1] storage_service - Started node_ops_abort_thread
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 2] storage_service - Started node_ops_abort_thread
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 3] storage_service - Started node_ops_abort_thread
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] init - starting per-shard database core
Aug 23 22:39:49 pcdev-1 scylla[2834247]: [shard 0] init - creating and verifying directories

CPU and disk is idling. The other two nodes are still one 5.1 and still up.

Any ideas whats wrong or what I could check to identify the problem?

Strangely, I let it run over night and now its started up:

Aug 23 22:52:56 pcdev-1 scylla[2834247]:  [shard 0] init - starting compaction_manager
Aug 23 22:52:56 pcdev-1 scylla[2834247]:  [shard 0] init - starting database
Aug 23 22:52:56 pcdev-1 scylla[2834247]:  [shard 0] compaction_manager - Set unlimited compaction bandwidth
Aug 23 22:52:56 pcdev-1 scylla[2834247]:  [shard 0] init - loading system sstables
Aug 23 22:52:56 pcdev-1 scylla[2834247]:  [shard 0] database - Populating Keyspace system

Whats going on those 13 minutes? The cluster does not even have much data. Do I have to worry it takes even longer for larger clusters?

Between the log lines you quoted ScyllaDB just verifies that the directories for the tables exist and have the correct permissions.

Do you have a slow disk and/or many keyspaces/tables?

Around 2 keyspaces with 80 tables each.

The system seemed to be idling to me. But maybe the disk is getting old has some issues.

This should take at most a few seconds even with slow disks.

Is it reproducible? If so please run iostat -x 1 in parallel and post the results. Please mark the start/end time of the event.

I just saw that slow host is the one running still on Ubuntu 20, while the others have Ubuntu 22.

Disks are spinning disks, so they are not fast, but still itn’t shouldn’t take so long.

Here iostat output with some compactions running:


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.22    0.00    1.57   33.17    0.00   60.04

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md126           36.00   4608.00     0.00   0.00    0.00   128.00    1.00     12.00     0.00   0.00    0.00    12.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda             27.00   3456.00     0.00   0.00   13.00   128.00  619.00 148932.00    34.00   5.21    5.66   240.60    0.00      0.00     0.00   0.00    0.00     0.00    2.44  86.00
sdb             12.00   1536.00     0.00   0.00   97.33   128.00  622.00 150724.00    34.00   5.18    6.16   242.32    0.00      0.00     0.00   0.00    0.00     0.00    3.78  84.40


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          10.02    0.00    2.05    0.00    0.00   87.93

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md126          132.00  16864.00     0.00   0.00    0.00   127.76    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda            100.00  12800.00     0.00   0.00    1.20   128.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.06  50.00
sdb             31.00   3968.00     1.00   3.12    2.16   128.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.04  18.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          14.66    0.00    3.20    0.00    0.00   82.14

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md126          131.00  16768.00     0.00   0.00    0.00   128.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda             96.00  12288.00     0.00   0.00    1.27   128.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.06  46.40
sdb             35.00   4480.00     0.00   0.00    2.51   128.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.06  19.60


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          13.92    0.00    2.37    0.00    0.00   83.71

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md126          130.00  16640.00     0.00   0.00    0.00   128.00    3.00     12.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda            102.00  13056.00     0.00   0.00    0.79   128.00    1.00     12.00     2.00  66.67    0.00    12.00    0.00      0.00     0.00   0.00    0.00     0.00    0.02  42.00
sdb             28.00   3584.00     0.00   0.00    0.79   128.00    1.00     12.00     2.00  66.67    0.00    12.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00  13.20


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          10.30    0.00    2.00    0.25    0.00   87.45

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md126          133.00  16992.00     0.00   0.00    0.00   127.76   76.00    304.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda            100.00  12800.00     1.00   0.99    1.03   128.00    3.00    304.00    74.00  96.10    3.00   101.33    0.00      0.00     0.00   0.00    0.00     0.00    0.04  47.20
sdb             31.00   3968.00     0.00   0.00    1.87   128.00    3.00    304.00    74.00  96.10   12.00   101.33    0.00      0.00     0.00   0.00    0.00     0.00    0.07  20.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          11.48    0.00    1.92    0.00    0.00   86.59

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md126          133.00  17024.00     0.00   0.00    0.00   128.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda             97.00  12416.00     0.00   0.00    1.36   128.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.06  49.20
sdb             36.00   4608.00     0.00   0.00    1.72   128.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.03  20.40

btw: Disks do not seem to be able to saturate CPU during major compaction.

The restart of that server is slow again. It seems to be only that server. iostat number dont look that high to me:


Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md126            1.81    164.36     0.00   0.00    0.00    90.59   36.57    816.29     0.00   0.00    0.00    22.32    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda             19.19   1649.82     6.08  24.05    6.53    85.98   21.99    813.17    14.99  40.54    2.42    36.97    0.00      0.00     0.00   0.00    0.00     0.00    0.12   3.04
sdb             18.36   1579.02     6.08  24.87    6.39    86.02   21.95    813.17    15.04  40.65    2.45    37.04    0.00      0.00     0.00   0.00    0.00     0.00    0.11   2.66


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.18    0.00    1.06    0.94    0.00   94.82

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md126           30.00    124.00     0.00   0.00    0.00     4.13    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda             27.00    112.00     0.00   0.00    7.00     4.15    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.14  24.00
sdb              3.00     12.00     0.00   0.00    4.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.01   1.60


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.61    0.00    1.56    0.69    0.00   94.14

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md126           66.00    264.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda             36.00    196.00    13.00  26.53    5.19     5.44    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.13  27.20
sdb              1.00     68.00    16.00  94.12    1.00    68.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.40


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.81    0.00    0.69    0.75    0.00   97.75

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md126           72.00    288.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda             40.00    184.00     6.00  13.04    4.47     4.60    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.12  27.20
sdb              1.00    104.00    25.00  96.15    0.00   104.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.80


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.88    0.00    1.13    1.82    0.00   96.18

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md126          125.00    500.00     0.00   0.00    0.00     4.00  424.00   1848.00     0.00   0.00    0.00     4.36    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda             29.00    376.00    64.00  68.82    8.55    12.97  237.00   1848.00   189.00  44.37    2.35     7.80    0.00      0.00     0.00   0.00    0.00     0.00    0.28  38.00
sdb              4.00    120.00    26.00  86.67    8.00    30.00  237.00   1848.00   189.00  44.37    2.58     7.80    0.00      0.00     0.00   0.00    0.00     0.00    0.08  12.80


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.19    0.00    0.75    1.13    0.00   96.92

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md126           60.00    240.00     0.00   0.00    0.00     4.00  178.00    428.00     0.00   0.00    0.00     2.40    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda             28.00    144.00     8.00  22.22    4.64     5.14   94.00    428.00    90.00  48.91    1.67     4.55    0.00      0.00     0.00   0.00    0.00     0.00    0.16  33.20
sdb              4.00    100.00    21.00  84.00    7.25    25.00   94.00    428.00    90.00  48.91    1.49     4.55    0.00      0.00     0.00   0.00    0.00     0.00    0.07  19.60


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.63    0.00    0.81    1.06    0.00   95.50

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md126          177.00    708.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda             53.00    460.00    62.00  53.91    4.91     8.68    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.19  32.40
sdb              4.00    248.00    58.00  93.55    6.00    62.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.02   4.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.54    0.00    1.74    0.87    0.00   91.85

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md126          109.00    436.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda             46.00    400.00    54.00  54.00    4.26     8.70    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.13  28.80
sdb              2.00     32.00     6.00  75.00    2.50    16.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   1.20


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.88    0.00    1.44    2.56    0.00   94.12

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md126          222.00    888.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda            100.00    632.00    58.00  36.71    4.43     6.32    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.28  60.00
sdb              3.00    256.00    61.00  95.31    7.00    85.33    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.02   3.20


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.82    0.00    0.69    1.38    0.00   97.11

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md126           48.00    192.00     0.00   0.00    0.00     4.00  277.00   1096.00     0.00   0.00    0.00     3.96    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda             48.00    192.00     0.00   0.00    5.06     4.00   19.00   1096.00   261.00  93.21    3.05    57.68    0.00      0.00     0.00   0.00    0.00     0.00    0.20  38.00
sdb              0.00      0.00     0.00   0.00    0.00     0.00   19.00   1096.00   261.00  93.21    3.37    57.68    0.00      0.00     0.00   0.00    0.00     0.00    0.03   7.20


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.88    0.00    0.56    0.75    0.00   97.80

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md126           27.00    108.00     0.00   0.00    0.00     4.00   54.00    216.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda             28.00    112.00     0.00   0.00    5.79     4.00    4.00    216.00    51.00  92.73    2.00    54.00    0.00      0.00     0.00   0.00    0.00     0.00    0.12  23.60
sdb              0.00      0.00     0.00   0.00    0.00     0.00    4.00    216.00    51.00  92.73    2.00    54.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   1.20


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.82    0.00    1.13    1.70    0.00   96.36

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md126          238.00    952.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda             83.00    748.00   104.00  55.61    4.45     9.01    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.24  50.40
sdb              5.00    200.00    45.00  90.00    4.20    40.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.02   4.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.13    0.00    1.25    1.69    0.00   95.92

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md126          174.00    696.00     0.00   0.00    0.00     4.00  143.00   1796.00     0.00   0.00    0.00    12.56    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda             82.00    420.00    23.00  21.90    4.12     5.12   15.00   1796.00   130.00  89.66    3.13   119.73    0.00      0.00     0.00   0.00    0.00     0.00    0.26  50.40
sdb              4.00    280.00    66.00  94.29    9.00    70.00   15.00   1796.00   130.00  89.66    3.73   119.73    0.00      0.00     0.00   0.00    0.00     0.00    0.06   9.60


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.84    0.00    2.01    2.26    0.00   89.89

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md126          160.00    640.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda             97.00    492.00    26.00  21.14    4.82     5.07    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.34  58.80
sdb              2.00    148.00    35.00  94.59    5.00    74.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.01   1.60


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.00    0.00    0.94    1.38    0.00   95.68

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md126          161.00    644.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda             70.00    500.00    55.00  44.00    4.49     7.14    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.20  41.20
sdb              3.00    144.00    33.00  91.67    7.00    48.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.02   3.20


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.69    0.00    0.94    1.32    0.00   97.04

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md126          156.00    624.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda             63.00    452.00    50.00  44.25    4.03     7.17    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.17  38.80
sdb              3.00    172.00    40.00  93.02    8.00    57.33    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.02   4.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.38    0.00    1.25    0.82    0.00   96.55

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md126          105.00    420.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda             75.00    376.00    19.00  20.21    3.00     5.01    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.14  36.40
sdb              1.00     44.00    10.00  90.91    4.00    44.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.80


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.50    0.00    1.19    1.12    0.00   96.19

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md126          157.00    628.00     0.00   0.00    0.00     4.00  388.00   1552.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda             65.00    476.00    54.00  45.38    3.58     7.32    5.00   1552.00   384.00  98.71    5.20   310.40    0.00      0.00     0.00   0.00    0.00     0.00    0.18  39.20
sdb              3.00    152.00    35.00  92.11   47.33    50.67    5.00   1552.00   384.00  98.71    6.20   310.40    0.00      0.00     0.00   0.00    0.00     0.00    0.16  18.40


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.45    0.00    2.63    1.25    0.00   92.66

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md126          183.00    732.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda             61.00    512.00    67.00  52.34    4.41     8.39    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.20  39.60
sdb              3.00    220.00    52.00  94.55    3.67    73.33    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.01   2.80


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           7.86    0.00    3.14    1.32    0.00   87.68

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md126           70.00    280.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda             70.00    280.00     0.00   0.00    3.76     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.17  36.00
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           6.03    0.00    4.35    0.93    0.00   88.68

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md126          102.00    408.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda             71.00    400.00    29.00  29.00    3.17     5.63    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.14  35.60
sdb              1.00      4.00     0.00   0.00    1.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.40


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.18    0.00    2.50    6.18    0.00   87.15

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md126           89.00    356.00     0.00   0.00    0.00     4.00 1294.00   5548.00     0.00   0.00    0.00     4.29    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda             43.00    172.00     0.00   0.00    4.40     4.00 1237.00   5548.00    57.00   4.40    3.81     4.49    0.00      0.00     0.00   0.00    0.00     0.00    2.75  81.20
sdb             16.00    184.00    29.00  64.44    9.50    11.50 1232.00   5548.00    62.00   4.79    2.87     4.50    0.00      0.00     0.00   0.00    0.00     0.00    1.35  70.40


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.87    0.00    1.94    2.56    0.00   91.64

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
md126          198.00    792.00     0.00   0.00    0.00     4.00  365.00   1460.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00   0.00
sda             77.00    580.00    68.00  46.90    5.38     7.53    5.00   1460.00   361.00  98.63   14.00   292.00    0.00      0.00     0.00   0.00    0.00     0.00    0.38  49.60
sdb              4.00    216.00    50.00  92.59    4.00    54.00    5.00   1460.00   361.00  98.63    9.20   292.00    0.00      0.00     0.00   0.00    0.00     0.00    0.05   8.00

smartmontools report the disks are fine.

Honestly, I think this is a problem related to Ubuntu 20. The slow server is the the only server with ubuntu 20, the others in the cluster have Ubuntu 22.04.

Sorry, I didn’t get a notification and lost track of the thread.

Please share the Advanced dashboard (set to per instance) disk stats.

Also make use you set --io-latency-goal-ms=100 for spinning disks.

Don’t worry, I am happy about any response.

What I find strange that its only the Ubuntu 20 machine having that problem. The other two nodes of that cluster are on Ubuntu 22. Can it be some issue with scylla 5.2 on Ubuntu 20.04?

...
Sep 14 20:58:03 pcdev-1 scylla[958528]:  [shard 0] storage_service - Started node_ops_abort_thread
Sep 14 20:58:03 pcdev-1 scylla[958528]:  [shard 1] storage_service - Started node_ops_abort_thread
Sep 14 20:58:03 pcdev-1 scylla[958528]:  [shard 2] storage_service - Started node_ops_abort_thread
Sep 14 20:58:03 pcdev-1 scylla[958528]:  [shard 3] storage_service - Started node_ops_abort_thread
Sep 14 20:58:03 pcdev-1 scylla[958528]:  [shard 0] init - starting per-shard database core
Sep 14 20:58:03 pcdev-1 scylla[958528]:  [shard 0] init - creating and verifying directories

Sep 14 21:02:35 pcdev-1 scylla[958528]:  [shard 0] init - starting compaction_manager
Sep 14 21:02:35 pcdev-1 scylla[958528]:  [shard 0] init - starting database
Sep 14 21:02:35 pcdev-1 scylla[958528]:  [shard 0] compaction_manager - Set unlimited compaction bandwidth
...
Sep 14 21:02:42 pcdev-1 scylla[958528]:  [shard 0] seastar - updated: blocked-reactor-notify-ms=25
Sep 14 21:02:42 pcdev-1 scylla[958528]:  [shard 1] seastar - updated: blocked-reactor-notify-ms=25
Sep 14 21:02:42 pcdev-1 scylla[958528]:  [shard 3] seastar - updated: blocked-reactor-notify-ms=25
Sep 14 21:02:42 pcdev-1 scylla[958528]:  [shard 2] seastar - updated: blocked-reactor-notify-ms=25
...

It seems there are no metrics for the restart-period:

Its even worse (8 minutes) with the 100ms goal:

ep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:13, perf_event_open() failed: Permission denied)
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] seastar - Created fair group io-queue-66305, capacity rate 64:20000, limit 1677721600, rate 16777216 (factor 1), threshold 33661808
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] seastar - IO queue uses 100.00ms latency goal for device 66305
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] seastar - Created io group dev(66305), length limit 4194304:4194304, rate 64000:20000000
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] seastar - Created io queue dev(66305) capacities: 512:33661808:33697040 1024:33769180:33839644 2048:33983928:34124856 4096:34413424:34695284 8192:35272420:35836132 16384:36990404:38117836 32768:40426380:42681240 65536:47298328:51808044 131072:61042224:70061656
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] seastar - Created fair group io-queue-0, capacity rate 2147483:2147483, limit 1677721600, rate 16777216 (factor 1), threshold 2000
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] seastar - IO queue uses 100.00ms latency goal for device 0
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] seastar - Created io group dev(0), length limit 536870912:536870912, rate 2147483647:2147483647
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] seastar - Created io queue dev(0) capacities: 512:2000:2000 1024:3000:3000 2048:5000:5000 4096:9000:9000 8192:17000:17000 16384:33000:33000 32768:65000:65000 65536:129000:129000 131072:257000:257000
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 3] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:13, perf_event_open() failed: Permission denied)
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 1] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:13, perf_event_open() failed: Permission denied)
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 2] seastar - Creation of perf_event based stall detector failed, falling back to posix timer: std::system_error (error system:13, perf_event_open() failed: Permission denied)
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] seastar - updated: blocked-reactor-notify-ms=1000000
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 2] seastar - updated: blocked-reactor-notify-ms=1000000
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 1] seastar - updated: blocked-reactor-notify-ms=1000000
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 3] seastar - updated: blocked-reactor-notify-ms=1000000
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] init - Unknown option : max_size_of_hints_in_progress
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] init - installing SIGHUP handler
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] init - Scylla version 5.2.7-0.20230821.e0ebc95025d1 with build-id 0dff18311edc6ec24a66cfcfa280b6a150ab9fc5 starting ...
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] init - starting prometheus API server
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] init - creating snitch
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] init - starting tokens manager
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] init - starting effective_replication_map factory
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] init - starting migration manager notifier
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] init - starting lifecycle notifier
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] init - creating tracing
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] init - starting API server
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] init - Scylla API server listening on 0.0.0.0:10000 ...
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] service_level_controller - update_from_distributed_data: starting configuration polling loop
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] init - starting system keyspace
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] init - starting gossiper
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] init - seeds={192.168.178.101, 192.168.178.102, 192.168.178.103}, listen_address=192.168.178.103, broadcast_address=192.168.178.103
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] init - starting Raft address map
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] init - starting direct failure detector pinger service
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] init - starting direct failure detector service
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] init - initializing storage service
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] storage_service - Started node_ops_abort_thread
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 1] storage_service - Started node_ops_abort_thread
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 2] storage_service - Started node_ops_abort_thread
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 3] storage_service - Started node_ops_abort_thread
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] init - starting per-shard database core
Sep 14 21:38:28 pcdev-1 scylla[961506]:  [shard 0] init - creating and verifying directories


Sep 14 21:46:39 pcdev-1 scylla[961506]:  [shard 0] init - starting compaction_manager
Sep 14 21:46:39 pcdev-1 scylla[961506]:  [shard 0] init - starting database
Sep 14 21:46:39 pcdev-1 scylla[961506]:  [shard 0] compaction_manager - Set unlimited compaction bandwidth
Sep 14 21:46:39 pcdev-1 scylla[961506]:  [shard 0] init - loading system sstables
Sep 14 21:46:39 pcdev-1 scylla[961506]:  [shard 0] database - Populating Keyspace system
Sep 14 21:46:39 pcdev-1 scylla[961506]:  [shard 0] database - Keyspace system: Reading CF cluster_status id=fb70ea0a-1bf9-3772-a5ad-26960611b035 version=67d729c6-31a8-3b49-8c7f-600590c0189c
Sep 14 21:46:39 pcdev-1 scylla[961506]:  [shard 0] database - Keyspace system: Reading CF versions id=8b5611ad-b90c-3883-855a-bcc6ddc54f33 version=6af1527e-249e-3bd1-9346-00886d88df34
Sep 14 21:46:39 pcdev-1 scylla[961506]:  [shard 0] database - Keyspace system: Reading CF snapshots id=4a9392ae-1937-39f6-a263-01568fd6a3f6 version=24ef45f6-e8d5-3506-897b-66f102ea0cd2
Sep 14 21:46:39 pcdev-1 scylla[961506]:  [shard 0] database - Keyspace system: Reading CF config id=3e9372bc-f440-3892-899e-7377c6584b44 version=1c654a28-fa53-3446-bfb4-99803d604b46
Sep 14 21:46:39 pcdev-1 scylla[961506]:  [shard 0] database - Keyspace system: Reading CF large_partitions id=8a7fe624-96b0-34b1-b90e-f71bddcdd2d3 version=04fa9920-9369-3a96-be39-6dd9fdc816b6
...
Sep 14 21:46:40 pcdev-1 scylla[961506]:  [shard 3] compaction_manager - Done with off-strategy compaction for system.paxos
Sep 14 21:46:40 pcdev-1 scylla[961506]:  [shard 0] compaction_manager - Starting off-strategy compaction for system.large_partitions, 0 candidates were found
Sep 14 21:46:40 pcdev-1 scylla[961506]:  [shard 0] compaction_manager - Done with off-strategy compaction for system.large_partitions
Sep 14 21:46:40 pcdev-1 scylla[961506]:  [shard 1] compaction_manager - Starting off-strategy compaction for system.large_partitions, 0 candidates were found
Sep 14 21:46:40 pcdev-1 scylla[961506]:  [shard 3] compaction_manager - Starting off-strategy compaction for system.large_partitions, 0 candidates were found
Sep 14 21:46:40 pcdev-1 scylla[961506]:  [shard 1] compaction_manager - Done with off-strategy compaction for system.large_partitions
Sep 14 21:46:40 pcdev-1 scylla[961506]:  [shard 2] compaction_manager - Starting off-strategy compaction for system.large_partitions, 0 candidates were found
Sep 14 21:46:40 pcdev-1 scylla[961506]:  [shard 3] compaction_manager - Done with off-strategy compaction for system.large_partitions
Sep 14 21:46:40 pcdev-1 scylla[961506]:  [shard 2] compaction_manager - Done with off-strategy compaction for system.large_partitions
Sep 14 21:46:40 pcdev-1 scylla[961506]:  [shard 0] compaction_manager - Starting off-strategy compaction for system.compaction_history, 0 candidates were found
Sep 14 21:46:40 pcdev-1 scylla[961506]:  [shard 0] compaction_manager - Done with off-strategy compaction for system.compaction_history
Sep 14 21:46:40 pcdev-1 scylla[961506]:  [shard 2] compaction_manager - Starting off-strategy compaction for system.compaction_history, 0 candidates were found
Sep 14 21:46:40 pcdev-1 scylla[961506]:  [shard 3] compaction_manager - Starting off-strategy compaction for system.compaction_history, 0 candidates were found
Sep 14 21:46:40 pcdev-1 scylla[961506]:  [shard 2] compaction_manager - Done with off-strategy compaction for system.compaction_history
Sep 14 21:46:40 pcdev-1 scylla[961506]:  [shard 3] compaction_manager - Done with off-strategy compaction for system.compaction_history
Sep 14 21:46:40 pcdev-1 scylla[961506]:  [shard 1] compaction_manager - Starting off-strategy compaction for system.compaction_history, 0 candidates were found
Sep 14 21:46:40 pcdev-1 scylla[961506]:  [shard 1] compaction_manager - Done with off-strategy compaction for system.compaction_history
Sep 14 21:46:40 pcdev-1 scylla[961506]:  [shard 0] seastar - updated: blocked-reactor-notify-ms=25
Sep 14 21:46:40 pcdev-1 scylla[961506]:  [shard 1] seastar - updated: blocked-reactor-notify-ms=25
Sep 14 21:46:40 pcdev-1 scylla[961506]:  [shard 2] seastar - updated: blocked-reactor-notify-ms=25
Sep 14 21:46:40 pcdev-1 scylla[961506]:  [shard 3] seastar - updated: blocked-reactor-notify-ms=25
Sep 14 21:46:40 pcdev-1 scylla[961506]:  [shard 0] init - starting storage proxy
Sep 14 21:46:40 pcdev-1 scylla[961506]:  [shard 0] init - starting forward service
Sep 14 21:46:40 pcdev-1 scylla[961506]:  [shard 0] init - starting migration manager
Sep 14 21:46:40 pcdev-1 scylla[961506]:  [shard 0] init - starting query processor
Sep 14 21:46:40 pcdev-1 scylla[961506]:  [shard 0] init - initializing batchlog manager
Sep 14 21:46:40 pcdev-1 scylla[961506]:  [shard 0] system_keyspace - Loaded local host id: 2be11feb-d626-4308-81f8-64f699bed4a4
Sep 14 21:46:40 pcdev-1 scylla[961506]:  [shard 0] format_selector - Selected me sstables format
Sep 14 21:46:48 pcdev-1 scylla[961506]:  [shard 0] features - Feature AGGREGATE_STORAGE_OPTIONS is enabled
Sep 14 21:46:48 pcdev-1 scylla[961506]:  [shard 0] features - Feature ALTERNATOR_TTL is enabled
Sep 14 21:46:49 pcdev-1 scylla[961506]:  [shard 0] features - Feature CDC is enabled
Sep 14 21:46:49 pcdev-1 scylla[961506]:  [shard 0] features - Feature CDC_GENERATIONS_V2 is enabled
Sep 14 21:46:49 pcdev-1 scylla[961506]:  [shard 0] features - Feature COLLECTION_INDEXING is enabled
Sep 14 21:46:49 pcdev-1 scylla[961506]:  [shard 0] features - Feature COMPUTED_COLUMNS is enabled
Sep 14 21:46:49 pcdev-1 scylla[961506]:  [shard 0] features - Feature CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX is enabled
Sep 14 21:46:49 pcdev-1 scylla[961506]:  [shard 0] features - Feature DIGEST_FOR_NULL_VALUES is enabled
...
Sep 14 21:46:50 pcdev-1 scylla[961506]:  [shard 0] features - Feature UDA is enabled
Sep 14 21:46:50 pcdev-1 scylla[961506]:  [shard 0] features - Feature UDA_NATIVE_PARALLELIZED_AGGREGATION is enabled
Sep 14 21:46:50 pcdev-1 scylla[961506]:  [shard 0] features - Feature VIEW_VIRTUAL_COLUMNS is enabled
Sep 14 21:46:50 pcdev-1 scylla[961506]:  [shard 0] database - Using schema commit log.
Sep 14 21:46:50 pcdev-1 scylla[961506]:  [shard 0] commitlog - Cannot parse the version of the file: CommitLog-2-18014402769178147.log
Sep 14 21:46:50 pcdev-1 scylla[961506]:  [shard 0] commitlog - Cannot parse the version of the file: CommitLog-2-36028801278660131.log
Sep 14 21:46:50 pcdev-1 scylla[961506]:  [shard 0] commitlog - Cannot parse the version of the file: CommitLog-2-4259696162.log
Sep 14 21:46:50 pcdev-1 scylla[961506]:  [shard 0] commitlog - Cannot parse the version of the file: CommitLog-2-4259696163.log
Sep 14 21:46:50 pcdev-1 scylla[961506]:  [shard 0] commitlog - Cannot parse the version of the file: CommitLog-2-54043199788142115.log
Sep 14 21:46:50 pcdev-1 scylla[961506]:  [shard 0] init - loading system_schema sstables
Sep 14 21:46:50 pcdev-1 scylla[961506]:  [shard 0] database - Populating Keyspace system_schema
...
Sep 14 21:46:51 pcdev-1 scylla[961506]:  [shard 0] database - Truncating system.schema_functions without snapshot
Sep 14 21:46:51 pcdev-1 scylla[961506]:  [shard 0] database - Truncating system.schema_usertypes without snapshot
Sep 14 21:46:51 pcdev-1 scylla[961506]:  [shard 0] database - Truncating system.schema_columns without snapshot
Sep 14 21:46:51 pcdev-1 scylla[961506]:  [shard 0] database - Truncating system.schema_triggers without snapshot
Sep 14 21:46:51 pcdev-1 scylla[961506]:  [shard 0] database - Truncating system.schema_columnfamilies without snapshot
Sep 14 21:46:51 pcdev-1 scylla[961506]:  [shard 0] database - Truncating system.schema_keyspaces without snapshot
Sep 14 21:46:57 pcdev-1 scylla[961506]:  [shard 0] legacy_schema_migrator - Completed migration of legacy schema tables
Sep 14 21:46:57 pcdev-1 scylla[961506]:  [shard 0] init - setting up system keyspace
Sep 14 21:46:58 pcdev-1 scylla[961506]:  [shard 0] init - starting schema commit log
Sep 14 21:46:58 pcdev-1 scylla[961506]:  [shard 0] commitlog - Cannot parse the version of the file: CommitLog-2-18014402769178147.log
Sep 14 21:46:58 pcdev-1 scylla[961506]:  [shard 0] commitlog - Cannot parse the version of the file: CommitLog-2-18014402769178148.log
Sep 14 21:46:58 pcdev-1 scylla[961506]:  [shard 0] commitlog - Cannot parse the version of the file: CommitLog-2-36028801278660132.log
Sep 14 21:46:58 pcdev-1 scylla[961506]:  [shard 0] commitlog - Cannot parse the version of the file: CommitLog-2-54043199788142116.log
Sep 14 21:46:58 pcdev-1 scylla[961506]:  [shard 0] commitlog - Cannot parse the version of the file: CommitLog-2-36028801278660131.log
Sep 14 21:46:58 pcdev-1 scylla[961506]:  [shard 0] commitlog - Cannot parse the version of the file: CommitLog-2-4259696162.log
Sep 14 21:46:58 pcdev-1 scylla[961506]:  [shard 0] commitlog - Cannot parse the version of the file: CommitLog-2-4259696164.log
Sep 14 21:46:58 pcdev-1 scylla[961506]:  [shard 0] commitlog - Cannot parse the version of the file: CommitLog-2-4259696163.log
Sep 14 21:46:58 pcdev-1 scylla[961506]:  [shard 0] commitlog - Cannot parse the version of the file: CommitLog-2-54043199788142115.log
Sep 14 21:46:58 pcdev-1 scylla[961506]:  [shard 0] schema_tables - Schema version changed to 009ab3ee-3f2c-32fe-bbf5-498862f400d2
Sep 14 21:46:58 pcdev-1 scylla[961506]:  [shard 0] init - loading non-system sstables
Sep 14 21:47:02 pcdev-1 scylla[961506]:  [shard 0] database - Skipping undefined keyspace: system_distributed.bak
Sep 14 21:47:02 pcdev-1 scylla[961506]:  [shard 0] database - Populating Keyspace system_traces
Sep 14 21:47:02 pcdev-1 scylla[961506]:  [shard 0] database - Keyspace system_traces: Reading CF sessions_time_idx id=0ebf001c-c1d1-3693-9a63-c3d96ac53318 version=e7629311-eb4d-3b74-b7c2-37159070c7fc
Sep 14 21:47:02 pcdev-1 scylla[961506]:  [shard 0] database - Keyspace system_traces: Reading CF sessions id=c5e99f16-8677-3914-b17e-960613512345 version=369b9c81-e765-3f1c-811c-32698bb65afb
Sep 14 21:47:02 pcdev-1 scylla[961506]:  [shard 0] database - Keyspace system_traces: Reading CF node_slow_log_time_idx id=f9706768-aa1e-3d87-9e5c-51a3927c2870 version=d18030bc-fa64-386c-ac19-87f6bb199cf3
Sep 14 21:47:02 pcdev-1 scylla[961506]:  [shard 0] database - Keyspace system_traces: Reading CF node_slow_log id=bfcc4e62-5b63-3aa1-a1c3-6f5e47f3325c version=ce98ae38-dcd2-30e4-b019-a8deb92994bc
Sep 14 21:47:02 pcdev-1 scylla[961506]:  [shard 0] database - Keyspace system_traces: Reading CF events id=8826e8e9-e16a-3728-8753-3bc1fc713c25 version=dfd35ee9-87d1-3214-8440-7489c95e8108

← There are a couple of commitlog errors. Is it perhaps going over

Please strace -fF -ttt the process while it is happening, and attach the part where the timestamps match the area where it stalls.

I don’t think the OS version has anything to do with it, but it might.

You were faster. The entire cluster is quite slow now actually. That would confirm your theory that its not OS related.
We will check …

(post deleted by author)

We ran the strace during the verifying directories for a few seconds on 5.2.7. Uploaded strace log was uploaded as: 536aa7c0-9ad3-4edf-a47b-78a2f29b5c2a

In between we updated the cluster to 5.2.8, which seems to have made things much worse, to the point that the system was not usable any more. We reverted back to 5.2.7 for but will try again to confirm its really related to the version.