Compaction Storm slows down Scylla

Today we observed a weird behavior of ScyllaDB:

All of a sudden, the cluster became noticeable slow. Logs were not very interesting, but showed alot of compactions going on on some nodes of the cluster. Looking into the metrics in our prometheus/grafana, it seems to show that more and more nodes suffered from these intense compactions (going at IOPS-Limit of the underlying disks).

A little excerpt from the Logs (going on like this for hours):

Sep 25 05:00:09 o-p-L10-1 scylla[413744]:  [shard 4:comp] compaction - [Compact pc.businessobjectentries4 09d022b0-7afb-11ef-b677-bb2e6950de47] Compacting [/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_0mi6o2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw8_5oots2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw8_5v4b42rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_0ffk02rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw8_5t6v42rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw8_31yc02rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw8_3ntdc2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw8_4n6io2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw8_5mjo02rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw8_4pje82rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw8_3edv42rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_0o7ww2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_0902o2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw8_3jbc02rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_0556o2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw8_360xs2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_0k5b42rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw8_15smo2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw8_13fr42rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_0di402rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw8_3gj0w2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw8_3c8pc2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw8_49w4g2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_02sb42rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw8_4dyq82rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_07acg2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw8_45e342rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw8_5c9ao2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw8_3u14w2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw8_2xo0g2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw8_21ils2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw8_0tktd2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable]
Sep 25 05:00:09 o-p-L10-1 scylla[413744]:  [shard 4:comp] compaction - [Compact pc.businessobjectchangepointers 0930d570-7afb-11ef-b677-bb2e6950de47] Compacted 8 sstables to [/var/lib/scylla/data/pc/businessobjectchangepointers-0e6da7e0757d11e98f29000000000000/me-3gju_0dw8_0tktc2rw6a1kw0kvuv-big-Data.db:level=0]. 3MB to 3MB (~98% of original) in 1056ms = 3MB/s. ~1280 total partitions merged to 310.
Sep 25 05:00:09 o-p-L10-1 scylla[413744]:  [shard 4:comp] compaction - [Compact pc.businessobjectentries4 09d022b0-7afb-11ef-b677-bb2e6950de47] Compacted 32 sstables to [/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_1381d2rw6a1kw0kvuv-big-Data.db:level=0]. 217kB to 41kB (~18% of original) in 110ms = 1MB/s. ~4096 total partitions merged to 1.
Sep 25 05:00:09 o-p-L10-1 scylla[413744]:  [shard 4:comp] compaction - [Compact pc.customworkerstates 09e50a40-7afb-11ef-b677-bb2e6950de47] Compacting [/var/lib/scylla/data/pc/customworkerstates-92b8e11e884207c98f26000000000000/me-3gju_0dw8_57yz42rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/customworkerstates-92b8e11e884207c98f26000000000000/me-3gju_0dw9_17icw2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/customworkerstates-92b8e11e884207c98f26000000000000/me-3gju_0dw8_3s3ow2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/customworkerstates-92b8e11e884207c98f26000000000000/me-3gju_0dw8_2aqe82rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/customworkerstates-92b8e11e884207c98f26000000000000/me-3gju_0dw9_19fsw2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/customworkerstates-92b8e11e884207c98f26000000000000/me-3gju_0dw8_2tt4g2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/customworkerstates-92b8e11e884207c98f26000000000000/me-3gju_0dw9_10v5s2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/customworkerstates-92b8e11e884207c98f26000000000000/me-3gju_0dw7_5tu0k2rw6a1kw0kvuv-big-Data.db:level=0:origin=compaction]
Sep 25 05:00:09 o-p-L10-1 scylla[413744]:  [shard 4:comp] compaction - [Compact pc.tntevents2 09e50a41-7afb-11ef-b677-bb2e6950de47] Compacting [/var/lib/scylla/data/pc/tntevents2-98f63fb0a41011e9b8d6000000000000/me-3gju_0dw7_1tsts2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntevents2-98f63fb0a41011e9b8d6000000000000/me-3gju_0dw8_55eds2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntevents2-98f63fb0a41011e9b8d6000000000000/me-3gju_0dw7_1rvds2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntevents2-98f63fb0a41011e9b8d6000000000000/me-3gju_0dw8_539802rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntevents2-98f63fb0a41011e9b8d6000000000000/me-3gju_0dw7_1pq802rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntevents2-98f63fb0a41011e9b8d6000000000000/me-3gju_0dw6_27ink2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntevents2-98f63fb0a41011e9b8d6000000000000/me-3gju_0dw8_514282rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntevents2-98f63fb0a41011e9b8d6000000000000/me-3gju_0dw7_21y1c2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntevents2-98f63fb0a41011e9b8d6000000000000/me-3gju_0dw8_5a44w2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntevents2-98f63fb0a41011e9b8d6000000000000/me-3gju_0dw6_4s3zk2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntevents2-98f63fb0a41011e9b8d6000000000000/me-3gju_0dw8_4tea82rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntevents2-98f63fb0a41011e9b8d6000000000000/me-3gju_0dw4_3f8q82rw6a1kw0kvuv-big-Data.db:level=0:origin=compaction]
Sep 25 05:00:09 o-p-L10-1 scylla[413744]:  [shard 4:comp] compaction - [Compact pc.customworkerstates 09e50a40-7afb-11ef-b677-bb2e6950de47] Compacted 8 sstables to [/var/lib/scylla/data/pc/customworkerstates-92b8e11e884207c98f26000000000000/me-3gju_0dw9_1wl4w2rw6a1kw0kvuv-big-Data.db:level=0]. 450kB to 341kB (~75% of original) in 147ms = 3MB/s. ~1920 total partitions merged to 916.
Sep 25 05:00:09 o-p-L10-1 scylla[413744]:  [shard 4:comp] compaction - [Compact pc.tntindex2 0a0fc3c0-7afb-11ef-b677-bb2e6950de47] Compacting [/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw8_47yog2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw8_3ab9c2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw8_2ndn42rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw8_5x9gw2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw9_368nk2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw8_4z6m82rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw8_3z6bk2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw9_0qksg2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw8_2l8hc2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw8_413rk2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw8_3m3n42rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw9_0i05c2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw8_5gjm82rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw8_2ivls2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw8_5em682rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw7_2giq82rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw8_1l0gw2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw9_31qm82rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw7_3zlr42rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw8_2cnu82rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw8_17xsg2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw8_1r0io2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw8_4fw682rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw8_1xg002rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw8_3863k2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw7_5c1kw2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw7_125gg2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw8_4x9682rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw7_0c7tc2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw9_2f0ps2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw6_1nl282rw6a1kw0kvuv-big-Data.db:level=0:origin=compaction,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw8_2uvpc2rw6a1kw0kvuv-big-Data.db:level=0:origin=compaction]
Sep 25 05:00:09 o-p-L10-1 scylla[413744]:  [shard 4:comp] compaction - [Compact pc.tntindex2 0a0fc3c0-7afb-11ef-b677-bb2e6950de47] Compacted 32 sstables to [/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw9_3ktcg2rw6a1kw0kvuv-big-Data.db:level=0]. 1MB to 147kB (~8% of original) in 95ms = 18MB/s. ~65536 total partitions merged to 153.
Sep 25 05:00:09 o-p-L10-1 scylla[413744]:  [shard 4:comp] compaction - [Compact pc.tntevents2 09e50a41-7afb-11ef-b677-bb2e6950de47] Compacted 12 sstables to [/var/lib/scylla/data/pc/tntevents2-98f63fb0a41011e9b8d6000000000000/me-3gju_0dw9_1wsuo2rw6a1kw0kvuv-big-Data.db:level=0]. 7MB to 6MB (~99% of original) in 391ms = 17MB/s. ~1536 total partitions merged to 68.
Sep 25 05:00:09 o-p-L10-1 scylla[413744]:  [shard 4:comp] compaction - [Compact pc.businessobjectentries4 0a254790-7afb-11ef-b677-bb2e6950de47] Compacting [/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_0yxps2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_0uv402rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_0wsk02rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_15d742rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_2hdlc2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_3c8pc2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_268cw2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_38t8w2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_1espc2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_34b7k2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_1yav42rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_1nl282rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_3mqsg2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_1l86o2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_2cvk02rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_1381d2rw6a1kw0kvuv-big-Data.db:level=0:origin=compaction,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw7_5tu0i2rw6a1kw0kvuv-big-Data.db:level=0:origin=compaction]
Sep 25 05:00:09 o-p-L10-1 scylla[413744]:  [shard 4:comp] compaction - [Compact pc.serialvaultids2 091907b0-7afb-11ef-b677-bb2e6950de47] Compacted 8 sstables to [/var/lib/scylla/data/pc/serialvaultids2-efe8f8923b7e589fcd00000000000000/me-3gju_0dw7_5tu0j2rw6a1kw0kvuv-big-Data.db:level=0]. 15MB to 15MB (~99% of original) in 1823ms = 8MB/s. ~8832 total partitions merged to 7835.
Sep 25 05:00:09 o-p-L10-1 scylla[413744]:  [shard 4:comp] compaction - [Compact pc.usagecounters2 0a2fa7d0-7afb-11ef-b677-bb2e6950de47] Compacting [/var/lib/scylla/data/pc/usagecounters2-947b0b609f2c11e9bc5c000000000000/me-3gju_0dw8_2r8j42rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/usagecounters2-947b0b609f2c11e9bc5c000000000000/me-3gju_0dw6_0x09s2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/usagecounters2-947b0b609f2c11e9bc5c000000000000/me-3gju_0dw6_0inao2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/usagecounters2-947b0b609f2c11e9bc5c000000000000/me-3gju_0dw6_1rno02rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/usagecounters2-947b0b609f2c11e9bc5c000000000000/me-3gju_0dw5_1nss02rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/usagecounters2-947b0b609f2c11e9bc5c000000000000/me-3gju_0dw4_3fgg02rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/usagecounters2-947b0b609f2c11e9bc5c000000000000/me-3gju_0dw4_2ug9s2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/usagecounters2-947b0b609f2c11e9bc5c000000000000/me-3gju_0dw6_5abuo2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/usagecounters2-947b0b609f2c11e9bc5c000000000000/me-3gju_0dw6_3mj2o2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/usagecounters2-947b0b609f2c11e9bc5c000000000000/me-3gju_0dw6_4o93k2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/usagecounters2-947b0b609f2c11e9bc5c000000000000/me-3gju_0dw3_509742rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/usagecounters2-947b0b609f2c11e9bc5c000000000000/me-3gju_0dw6_3gqqo2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/usagecounters2-947b0b609f2c11e9bc5c000000000000/me-3gju_0dw3_4jjcg2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/usagecounters2-947b0b609f2c11e9bc5c000000000000/me-3gju_0dw5_23g1s2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/usagecounters2-947b0b609f2c11e9bc5c000000000000/me-3gju_0dw4_5x1r42rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/usagecounters2-947b0b609f2c11e9bc5c000000000000/me-3gju_0dw5_2dipc2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/usagecounters2-947b0b609f2c11e9bc5c000000000000/me-3gju_0dw5_1zl5s2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/usagecounters2-947b0b609f2c11e9bc5c000000000000/me-3gju_0dw8_3q68w2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/usagecounters2-947b0b609f2c11e9bc5c000000000000/me-3gju_0dw9_2k5wg2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/usagecounters2-947b0b609f2c11e9bc5c000000000000/me-3gju_0dw5_3ovy82rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/usagecounters2-947b0b609f2c11e9bc5c000000000000/me-3gju_0dw3_47j8w2rw6a1kw0kvuv-big-Data.db:level=0:origin=compaction]
Sep 25 05:00:09 o-p-L10-1 scylla[413744]:  [shard 4:comp] compaction - [Compact pc.businessobjectentries4 0a254790-7afb-11ef-b677-bb2e6950de47] Compacted 17 sstables to [/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_4elvk2rw6a1kw0kvuv-big-Data.db:level=0]. 5MB to 5MB (~97% of original) in 79ms = 67MB/s. ~2176 total partitions merged to 12.
Sep 25 05:00:10 o-p-L10-1 scylla[413744]:  [shard 4:comp] compaction - [Compact pc.usagecounters2 0a2fa7d0-7afb-11ef-b677-bb2e6950de47] Compacted 21 sstables to [/var/lib/scylla/data/pc/usagecounters2-947b0b609f2c11e9bc5c000000000000/me-3gju_0dw9_4syuo2rw6a1kw0kvuv-big-Data.db:level=0]. 1MB to 1MB (~88% of original) in 243ms = 7MB/s. ~3200 total partitions merged to 638.
Sep 25 05:00:10 o-p-L10-1 scylla[413744]:  [shard 4:comp] compaction - [Compact pc.businessobjectentries4 0a555840-7afb-11ef-b677-bb2e6950de47] Compacting [/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_4xgw02rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_4syup2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_4tm002rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_4jys02rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/businessobjectentries4-0e6da7e0757d11e98f28000000000000/me-3gju_0dw9_4elvk2rw6a1kw0kvuv-big-Data.db:level=0:origin=compaction]
Sep 25 05:00:10 o-p-L10-1 scylla[413744]:  [shard 4:comp] compaction - [Compact pc.outgoingintegrationstates 0a555841-7afb-11ef-b677-bb2e6950de47] Compacting [/var/lib/scylla/data/pc/outgoingintegrationstates-6d4884207c8e11e98f26000000000000/me-3gju_0dw9_1rno02rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/outgoingintegrationstates-6d4884207c8e11e98f26000000000000/me-3gju_0dw9_4pje82rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/outgoingintegrationstates-6d4884207c8e11e98f26000000000000/me-3gju_0dw9_4rw9s2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/outgoingintegrationstates-6d4884207c8e11e98f26000000000000/me-3gju_0dw9_5cwg02rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/outgoingintegrationstates-6d4884207c8e11e98f26000000000000/me-3gju_0dw9_5999s2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/outgoingintegrationstates-6d4884207c8e11e98f26000000000000/me-3gju_0dw8_2zlgg2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/outgoingintegrationstates-6d4884207c8e11e98f26000000000000/me-3gju_0dw8_2vya82rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/outgoingintegrationstates-6d4884207c8e11e98f26000000000000/me-3gju_0dw8_4vbq82rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/outgoingintegrationstates-6d4884207c8e11e98f26000000000000/me-3gju_0dw8_4l1cw2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/outgoingintegrationstates-6d4884207c8e11e98f26000000000000/me-3gju_0dw9_1pii82rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/outgoingintegrationstates-6d4884207c8e11e98f26000000000000/me-3gju_0dw8_0kd0w2rw6a1kw0kvuv-big-Data.db:level=0:origin=compaction]
Sep 25 05:00:10 o-p-L10-1 scylla[413744]:  [shard 4:comp] compaction - [Compact pc.tntindex2 0a555842-7afb-11ef-b677-bb2e6950de47] Compacting [/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw9_4vjg02rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw9_51r7k2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw9_574402rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw9_3io6o2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw9_3pbds2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw9_4rok02rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw9_3klmo2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw9_4ztrk2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dwa_05km82rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw9_53gxs2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw9_4i1c02rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw9_3ktcg2rw6a1kw0kvuv-big-Data.db:level=0:origin=compaction,/var/lib/scylla/data/pc/tntindex2-9a443dd0a31011e9b8d6000000000000/me-3gju_0dw4_4yjgx2rw6a1kw0kvuv-big-Data.db:level=0:origin=compaction]
Sep 25 05:00:10 o-p-L10-1 scylla[413744]:  [shard 4:comp] compaction - [Compact pc.outgoingintegrationstates 0a555841-7afb-11ef-b677-bb2e6950de47] Compacted 11 sstables to [/var/lib/scylla/data/pc/outgoingintegrationstates-6d4884207c8e11e98f26000000000000/me-3gju_0dwa_0bcy92rw6a1kw0kvuv-big-Data.db:level=0]. 175kB to 100kB (~57% of original) in 92ms = 1MB/s. ~1408 total partitions merged to 46.
Sep 25 05:00:10 o-p-L10-1 scylla[413744]:  [shard 4:comp] compaction - [Compact pc.lock4 0a666f40-7afb-11ef-b677-bb2e6950de47] Compacting [/var/lib/scylla/data/pc/lock4-e3f1e300676c11e9b751000000000000/me-3gju_0dw8_5r1pc2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/lock4-e3f1e300676c11e9b751000000000000/me-3gju_0dw9_5grc02rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/lock4-e3f1e300676c11e9b751000000000000/me-3gju_0dw9_0bcy82rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/lock4-e3f1e300676c11e9b751000000000000/me-3gju_0dw9_5etw02rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/lock4-e3f1e300676c11e9b751000000000000/me-3gju_0dw9_2x0v42rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/lock4-e3f1e300676c11e9b751000000000000/me-3gju_0dwa_0902o2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/lock4-e3f1e300676c11e9b751000000000000/me-3gju_0dw9_3gj0w2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/lock4-e3f1e300676c11e9b751000000000000/me-3gju_0dw9_4ee5s2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/lock4-e3f1e300676c11e9b751000000000000/me-3gju_0dw9_3elkw2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/lock4-e3f1e300676c11e9b751000000000000/me-3gju_0dw9_012kw2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/lock4-e3f1e300676c11e9b751000000000000/me-3gju_0dw9_2my7k2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/lock4-e3f1e300676c11e9b751000000000000/me-3gju_0dw9_4btkg2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/lock4-e3f1e300676c11e9b751000000000000/me-3gju_0dw9_1cfts2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/lock4-e3f1e300676c11e9b751000000000000/me-3gju_0dw9_28dio2rw6a1kw0kvuv-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/pc/lock4-e3f1e300676c11e9b751000000000000/me-3gju_0dw8_5djlc2rw6a1kw0kvuv-big-Data.db:level=0:origin=compaction]

The scylla-version running is 6.0.2-0.20240703.c9cd171f426e-1.

For some reasons the memtable-I/O started to freak out on a few nodes and the memtables-shares went up. Our application behavior has not changed and we have the same load as usual.

Restarting the whole cluster (every node, one by one) seems to have temporarily solved the issue, but we’re simply unsure about the root-cause and how to get there.

Maybe someone also seen this pattern appearing or has an idea what we’re seeing here?

Adding some screenshots of the Scylla-Advanced Dashboard.

Might this issue be caused by #16514 ?

Stability: Off-strategy compaction is used to make sstables conform to the compaction strategy after an operation such as repair. Off-strategy compaction for TWCS will now have less storage space overhead. [#16514]

Could this trigger anything, that might result in this weird memtable-compaction above?

From the logs:

Sep 26 16:22:30 o-p-L3-3 scylla[1381236]: [shard 5:stmt] lsa - Standard allocator failure, increasing head-room in section 0x604006144670 to 2048 [B]; trace: 0x645df8e 0x645e5a0 0x645e888 0x215033f 0x20f504e 0x20b6c0c 0x20b7000 0x1caeb23 0x1cadc83 0x1cab21d 0x1c70767 0x1c6b81d 0x1b8c108 0x47dde0a 0x144ccda 0x5f56c3f 0x5f57f27 0x5f7be90 0x5f1734a /opt/scylladb/libreloc/libc.so.6+0x8c946 /opt/scylladb/libreloc/libc.so.6+0x11296f
--------
seastar::internal::coroutine_traits_base::promise_type
Sep 26 16:22:30 o-p-L3-3 scylla[1381236]: [shard 5:stmt] lsa - Standard allocator failure, increasing head-room in section 0x604006144670 to 4096 [B]; trace: 0x645df8e 0x645e5a0 0x645e888 0x215033f 0x20f504e 0x20b6c0c 0x20b7000 0x1caeb23 0x1cadc83 0x1cab21d 0x1c70767 0x1c6b81d 0x1b8c108 0x47dde0a 0x144ccda 0x5f56c3f 0x5f57f27 0x5f7be90 0x5f1734a /opt/scylladb/libreloc/libc.so.6+0x8c946 /opt/scylladb/libreloc/libc.so.6+0x11296f
--------
seastar::internal::coroutine_traits_base::promise_type
Sep 26 16:22:30 o-p-L3-3 scylla[1381236]: [shard 5:stmt] lsa - Standard allocator failure, increasing head-room in section 0x604006144670 to 8192 [B]; trace: 0x645df8e 0x645e5a0 0x645e888 0x215033f 0x20f504e 0x20b6c0c 0x20b7000 0x1caeb23 0x1cadc83 0x1cab21d 0x1c70767 0x1c6b81d 0x1b8c108 0x47dde0a 0x144ccda 0x5f56c3f 0x5f57f27 0x5f7be90 0x5f1734a /opt/scylladb/libreloc/libc.so.6+0x8c946 /opt/scylladb/libreloc/libc.so.6+0x11296f
--------
seastar::internal::coroutine_traits_base::promise_type
Sep 26 16:22:30 o-p-L3-3 scylla[1381236]: [shard 5:stmt] lsa - Standard allocator failure, increasing head-room in section 0x604006144670 to 16384 [B]; trace: 0x645df8e 0x645e5a0 0x645e888 0x215033f 0x20f504e 0x20b6c0c 0x20b7000 0x1caeb23 0x1cadc83 0x1cab21d 0x1c70767 0x1c6b81d 0x1b8c108 0x47dde0a 0x144ccda 0x5f56c3f 0x5f57f27 0x5f7be90 0x5f1734a /opt/scylladb/libreloc/libc.so.6+0x8c946 /opt/scylladb/libreloc/libc.so.6+0x11296f

Interestingly, during the “Compaction Storm” we see alot of very small memtable flushes and compactions.

An even better screenshot “scylla_memtables_pending_flushes_bytes > 0”:

During the times of massive compactions occurring, it showed many small memtable flushes.

Question is: What can suddenly cause many small flushes to occur?

I think we had a problem with repair flushing at the beginning. Does time where the problem start match the start or a repair?

I was thinking about repairs triggering the flushes too, but it does not look like it (see graphs below).

But we have found something weird, that I think must be either a bug in the metrics, or perhaps some kinf of memory leak:

Graphs for the 24th:

Repairs:

Memtable ioops:

Regarding the github issue:

It seems the scylla_commitlog_memory_buffer_bytes is constantly going up since we updated to 6.0. Perhaps if that fillls up , it keeps flushing?

It’s probably related.

elcallio says that the commitlog-buffer-bytes cannot be responsible for any flushes.

I wonder what else can cause lots and lots of tiny flushes:

Please share commitlog metrics.

Also: does nodetool flush help?

Also we are quite a lot of memory related warnings:

Sep 30 18:02:09 o-p-L3-6 scylla[3457277]:  [shard 3:stmt] lsa - Standard allocator failure, increasing head-room in section 0x603003fc4670 to 262144 [B]; trace: 0x645df8e 0x645e5a0 0x645e888 0x215033f 0x20f504e 0x20b6c0c 0x20b7000 0x1caeb23 0x1cadc83 0x1cab21d 0x1c70767 0x1c6b81d 0x1b8c108 0x47dde0a 0x144ccda 0x5f56c3f 0x5f57f27 0x5f7be90 0x5f1734a /opt/scylladb/libreloc/libc.so.6+0x8c946 /opt/scylladb/libreloc/libc.so.6+0x11296f
                                                       --------
                                                       seastar::internal::coroutine_traits_base<void>::promise_type
Sep 30 18:02:09 o-p-L3-6 scylla[3457277]:  [shard 3:stmt] lsa - Standard allocator failure, increasing head-room in section 0x603003fc4670 to 524288 [B]; trace: 0x645df8e 0x645e5a0 0x645e888 0x215033f 0x20f504e 0x20b6c0c 0x20b7000 0x1caeb23 0x1cadc83 0x1cab21d 0x1c70767 0x1c6b81d 0x1b8c108 0x47dde0a 0x144ccda 0x5f56c3f 0x5f57f27 0x5f7be90 0x5f1734a /opt/scylladb/libreloc/libc.so.6+0x8c946 /opt/scylladb/libreloc/libc.so.6+0x11296f
                                                       --------
                                                       seastar::internal::coroutine_traits_base<void>::promise_type
Sep 30 18:02:09 o-p-L3-6 scylla[3457277]:  [shard 3:stmt] lsa - Standard allocator failure, increasing head-room in section 0x603003fc4670 to 1048576 [B]; trace: 0x645df8e 0x645e5a0 0x645e888 0x215033f 0x20f504e 0x20b6c0c 0x20b7000 0x1caeb23 0x1cadc83 0x1cab21d 0x1c70767 0x1c6b81d 0x1b8c108 0x47dde0a 0x144ccda 0x5f56c3f 0x5f57f27 0x5f7be90 0x5f1734a /opt/scylladb/libreloc/libc.so.6+0x8c946 /opt/scylladb/libreloc/libc.so.6+0x11296f
                                                       --------
                                                       seastar::internal::coroutine_traits_base<void>::promise_type
Sep 30 18:02:09 o-p-L3-6 scylla[3457277]:  [shard 3:stmt] lsa - Standard allocator failure, increasing head-room in section 0x603003fc4670 to 2097152 [B]; trace: 0x645df8e 0x645e5a0 0x645e888 0x215033f 0x20f504e 0x20b6c0c 0x20b7000 0x1caeb23 0x1cadc83 0x1cab21d 0x1c70767 0x1c6b81d 0x1b8c108 0x47dde0a 0x144ccda 0x5f56c3f 0x5f57f27 0x5f7be90 0x5f1734a /opt/scylladb/libreloc/libc.so.6+0x8c946 /opt/scylladb/libreloc/libc.so.6+0x11296f
                                                       --------
                                                       seastar::internal::coroutine_traits_base<void>::promise_type
Sep 30 18:02:09 o-p-L3-6 scylla[3457277]:  [shard 3:stmt] lsa - Standard allocator failure, increasing head-room in section 0x603003fc4670 to 4194304 [B]; trace: 0x645df8e 0x645e5a0 0x645e888 0x215033f 0x20f504e 0x20b6c0c 0x20b7000 0x1caeb23 0x1cadc83 0x1cab21d 0x1c70767 0x1c6b81d 0x1b8c108 0x47dde0a 0x144ccda 0x5f56c3f 0x5f57f27 0x5f7be90 0x5f1734a /opt/scylladb/libreloc/libc.so.6+0x8c946 /opt/scylladb/libreloc/libc.so.6+0x11296f
                                                       --------
                                                       seastar::internal::coroutine_traits_base<void>::promise_type
Sep 30 18:02:09 o-p-L3-6 scylla[3457277]:  [shard 3:stmt] lsa - Standard allocator failure, increasing head-room in section 0x603003fc4670 to 8388608 [B]; trace: 0x645df8e 0x645e5a0 0x645e888 0x215033f 0x20f504e 0x20b6c0c 0x20b7000 0x1caeb23 0x1cadc83 0x1cab21d 0x1c70767 0x1c6b81d 0x1b8c108 0x47dde0a 0x144ccda 0x5f56c3f 0x5f57f27 0x5f7be90 0x5f1734a /opt/scylladb/libreloc/libc.so.6+0x8c946 /opt/scylladb/libreloc/libc.so.6+0x11296f
                                                       --------
                                                       seastar::internal::coroutine_traits_base<void>::promise_type
Sep 30 18:02:09 o-p-L3-6 scylla[3457277]:  [shard 3:stmt] lsa - Standard allocator failure, increasing head-room in section 0x603003fc4670 to 16777216 [B]; trace: 0x645df8e 0x645e5a0 0x645e888 0x215033f 0x20f504e 0x20b6c0c 0x20b7000 0x1caeb23 0x1cadc83 0x1cab21d 0x1c70767 0x1c6b81d 0x1b8c108 0x47dde0a 0x144ccda 0x5f56c3f 0x5f57f27 0x5f7be90 0x5f1734a /opt/scylladb/libreloc/libc.so.6+0x8c946 /opt/scylladb/libreloc/libc.so.6+0x11296f
                                                       --------
                                                       seastar::internal::coroutine_traits_base<void>::promise_type
Sep 30 18:02:09 o-p-L3-6 scylla[3457277]:  [shard 3:stmt] lsa - Standard allocator failure, increasing head-room in section 0x603003fc4670 to 33554432 [B]; trace: 0x645df8e 0x645e5a0 0x645e888 0x215033f 0x20f504e 0x20b6c0c 0x20b7000 0x1caeb23 0x1cadc83 0x1cab21d 0x1c70767 0x1c6b81d 0x1b8c108 0x47dde0a 0x144ccda 0x5f56c3f 0x5f57f27 0x5f7be90 0x5f1734a /opt/scylladb/libreloc/libc.so.6+0x8c946 /opt/scylladb/libreloc/libc.so.6+0x11296f
                                                       --------
                                                       seastar::internal::coroutine_traits_base<void>::promise_type
Sep 30 18:02:09 o-p-L3-6 scylla[3457277]:  [shard 3:stmt] lsa - Standard allocator failure, increasing head-room in section 0x603003fc4670 to 67108864 [B]; trace: 0x645df8e 0x645e5a0 0x645e888 0x215033f 0x20f504e 0x20b6c0c 0x20b7000 0x1caeb23 0x1cadc83 0x1cab21d 0x1c70767 0x1c6b81d 0x1b8c108 0x47dde0a 0x144ccda 0x5f56c3f 0x5f57f27 0x5f7be90 0x5f1734a /opt/scylladb/libreloc/libc.so.6+0x8c946 /opt/scylladb/libreloc/libc.so.6+0x11296f
                                                       --------
                                                       seastar::internal::coroutine_traits_base<void>::promise_type
Sep 30 18:02:09 o-p-L3-6 scylla[3457277]:  [shard 3:stmt] lsa - Standard allocator failure, increasing head-room in section 0x603003fc4670 to 134217728 [B]; trace: 0x645df8e 0x645e5a0 0x645e888 0x215033f 0x20f504e 0x20b6c0c 0x20b7000 0x1caeb23 0x1cadc83 0x1cab21d 0x1c70767 0x1c6b81d 0x1b8c108 0x47dde0a 0x144ccda 0x5f56c3f 0x5f57f27 0x5f7be90 0x5f1734a /opt/scylladb/libreloc/libc.so.6+0x8c946 /opt/scylladb/libreloc/libc.so.6+0x11296f
                                                       --------
                                                       seastar::internal::coroutine_traits_base<void>::promise_type

My previous comment is still waiting for approval.

One metrics I find particular interesting is:

Another idea: Can the schema-commitlog cause flushes of our data tables?

For some reason the scylla_schema_commitlog_active_allocations was pretty high after the 6.0 upgrade. On 25th when we did a rolling restart it went down a lot and stayed like that ever since:

Is it possible that some action was still pending from the upgrade, that caused many schema commitlog activity?

I do not think that nodetool flush helps, as it seems it was flushing like crazy anyway. Also I am 99% sure we did some flushes in between and it did not help. Only the rolling restart did. I have no cluster where I can reproduce it, so I cannot say 100% for sure.

Lots of small flushes:

As for commitlog metrics. Here are the metrics around the time when the compactions started going up (~ 24th 11:00 UTC):

“Merged from Memtable to Cache” is going up for the hosts having the issue:


I cannot say if that is cause or symptom.

Memtable switches goes up for the hot tables:

Schema commitlog latency goes up on some of the hosts, which I think is a symptom (IO being bottlenecked):

What I find interesing is that the “reserved disk space” is similar in size

Commitlog buffer size (this is what should be fixed in the ticket) is similar in size. My initial theory was, that this puts pressure on scylla to flush to free up memory. But elcallio does not think so:

Let me know if you are looking for something in particular…

You can try running with the commitlog logger at debug mode. I’m looking at this log line:

    clogger.debug("Flushing ({} MB) to {}", flushing/(1024*1024), high);

If we correlate it with the memtable flush activity, we can focus on commitlog, and if not, it’s somewhere else.

Maybe that’s the answer. We keep increasing the memory reserve. But it’s not quite right, since the memtable size isn’t affected by the reserves.

In any case, nothing can work like this. Suggest looking at the system.large* tables to look for large rows/cells/collections (large partitions are fine).

Also, please decode the backtraces.

I would say its definetaly related to flushes:

I guess the question is: Were these flushes triggered by commitlog or memory?

As for large* …

  • We basically do not use collections. So I would rule that out. I must admit I do not know how to check for them, as there is no large_collections table.
  • There are quite some large cells up to 3MB. But it cannot go larger than that as our application checks for that and throws exception during write.
  • No large rows reported.
  • Partitions: some go up to 1GB and 25m rows. I would hope that this is not a problem.

Backtrace decode may help here